ActiveFence 集成#
NeMo Guardrails 支持开箱即用地使用 ActiveFence ActiveScore API 作为输入和输出 rail(您需要设置 ACTIVEFENCE_API_KEY
环境变量)。
rails:
input:
flows:
# The simplified version
- activefence moderation on input
# The detailed version with individual risk scores
# - activefence moderation on input detailed
activefence moderation on input
流程使用最大风险评分以及 0.85 的阈值来决定是否允许文本(即,如果风险评分高于阈值,则认为其违反)。activefence moderation on input detailed
具有每个违规类别的单独评分。
要自定义分数,您必须覆盖配置中的 默认流程。例如,要更改 activefence moderation on input
的阈值,您可以将以下流程添加到您的配置中
define subflow activefence moderation on input
"""Guardrail based on the maximum risk score."""
$result = execute call activefence api
if $result.max_risk_score > 0.85
bot inform cannot answer
stop
ActiveFence 的 ActiveScore API 可以灵活地单独控制各种受支持违规行为。为了利用这一点,您可以使用违规字典 (violations_dict
),这是 API 的输出之一,为不同的违规行为设置不同的阈值。下面是一个这样的输入审核流程的示例
define flow activefence input moderation detailed
$result = execute call activefence api
if $result.violations.get("abusive_or_harmful.hate_speech", 0) > 0.8
bot inform cannot engage in abusive or harmful behavior
stop
define bot inform cannot engage in abusive or harmful behavior
"I will not engage in any abusive or harmful behavior."