Llama-Guard 集成#

NeMo Guardrails 提供使用 Meta 的 Llama Guard 模型进行内容审核的开箱即用支持。

在我们的测试中，与自检方法相比，我们观察到输入和输出内容审核性能显著提高。请参阅其他文档以获取有关推荐的部署方法和性能评估数据的更多详细信息。

使用方法#

要配置您的 bot 使用 Llama Guard 进行输入/输出检查，请按照以下步骤操作

将类型为 llama_guard 的模型添加到 config.yml 文件的 models 部分（下面的示例使用 vLLM 设置）

models:
  ...

  - type: llama_guard
    engine: vllm_openai
    parameters:
      openai_api_base: "https://:5123/v1"
      model_name: "meta-llama/LlamaGuard-7b"

在 config.yml 文件的 rails 部分包含 llama guard check input 和 llama guard check output flow 名称

rails:
  input:
    flows:
      - llama guard check input
  output:
    flows:
      - llama guard check output

在 prompts.yml 文件中定义 llama_guard_check_input 和 llama_guard_check_output 提示词

prompts:
  - task: llama_guard_check_input
    content: |
      <s>[INST] Task: ...
      <BEGIN UNSAFE CONTENT CATEGORIES>
      O1: ...
      O2: ...
  - task: llama_guard_check_output
    content: |
      <s>[INST] Task: ...
      <BEGIN UNSAFE CONTENT CATEGORIES>
      O1: ...
      O2: ...

Guardrails 执行 llama_guard_check_* actions，如果允许用户输入或 bot 消息，则返回 True，否则返回 False，并附带 Llama Guard 提示词中定义的不安全内容类别列表。

define flow llama guard check input
  $llama_guard_response = execute llama_guard_check_input
  $allowed = $llama_guard_response["allowed"]
  $llama_guard_policy_violations = $llama_guard_response["policy_violations"]

  if not $allowed
    bot refuse to respond
    stop

# (similar flow for checking output)

此示例文件夹中提供了使用 Llama Guard 进行输入和输出审核的完整示例配置。