配置指南#

Guardrails 配置包含以下内容

通用选项：使用哪些 LLM，通用指令（类似于系统 prompt），示例对话，哪些 rails 处于活动状态，特定 rails 配置选项等；这些选项通常放在 config.yml 文件中。
Rails：实现 rails 的 Colang flows；这些通常放在 rails 文件夹中。
Actions：在 Python 中实现的自定义 action；这些通常放在配置根目录的 actions.py 模块中或 actions 子包中。
知识库文档：在使用内置知识库支持的 RAG（检索增强生成）场景中可以使用的文档；这些文档通常放在 kb 文件夹中。
初始化代码：执行额外初始化的自定义 Python 代码，例如注册新类型的 LLM。

这些文件通常包含在一个 config 文件夹中，在初始化 RailsConfig 实例或启动 CLI Chat 或 Server 时会引用此文件夹。

.
├── config
│   ├── rails
│   │   ├── file_1.co
│   │   ├── file_2.co
│   │   └── ...
│   ├── actions.py
│   ├── config.py
│   └── config.yml

自定义 action 可以放在配置根目录的 actions.py 模块中或 actions 子包中

.
├── config
│   ├── rails
│   │   ├── file_1.co
│   │   ├── file_2.co
│   │   └── ...
│   ├── actions
│   │   ├── file_1.py
│   │   ├── file_2.py
│   │   └── ...
│   ├── config.py
│   └── config.yml

自定义初始化#

如果存在，config.py 模块会在初始化 LLMRails 实例之前加载。

如果 config.py 模块包含一个 init 函数，则在初始化 LLMRails 实例时会调用该函数。例如，您可以使用 init 函数初始化数据库连接，并使用 register_action_param(...) 函数将其注册为自定义 action 参数

from nemoguardrails import LLMRails

def init(app: LLMRails):
    # Initialize the database connection
    db = ...

    # Register the action parameter
    app.register_action_param("db", db)

调用自定义 action 时，会将自定义 action 参数传递给它们。

通用选项#

以下小节描述了可以在 config.yml 文件中使用的所有配置选项。

LLM 模型#

要配置 Guardrails 配置将使用的主要 LLM 模型，请设置 models 键，如下所示

models:
  - type: main
    engine: openai
    model: gpt-3.5-turbo-instruct

属性的含义如下

type: 设置为“main”，表示主要 LLM 模型。
engine: LLM 提供商，例如 openai、huggingface_endpoint、self_hosted 等。
model: 模型名称，例如 gpt-3.5-turbo-instruct。
parameters: 任何附加参数，例如 temperature、top_k 等。

支持的 LLM 提供商#

您可以使用 LangChain 支持的任何 LLM 提供商，例如 ai21、aleph_alpha、anthropic、anyscale、azure、cohere、huggingface_endpoint、huggingface_hub、openai、self_hosted、self_hosted_hugging_face。请查看 LangChain 官方文档以获取完整列表。

除了上述 LangChain 提供商外，还支持使用 nvidia_ai_endpoints 或同义的 nim 引擎连接到 Nvidia NIM，无论是 Nvidia 托管的 NIM（通过 Nvidia AI Enterprise 许可访问）还是本地下载和自托管的 NIM 容器。

注意

要使用任何提供商，必须安装额外的软件包；当您首次尝试使用新提供商的配置时，通常会收到来自 LangChain 的错误，该错误会指示您应该安装哪些软件包。

重要

虽然您可以实例化任何上述 LLM 提供商，但根据模型的功能，NeMo Guardrails 工具包与某些提供商的配合优于其他提供商。工具包中包含已针对某些类型的模型（例如 openai 或 llama3 模型）进行优化的 prompt。对于其他提供商，您可以按照 LLM Prompts 部分的信息自行优化 prompt。

探索可用提供商#

为了帮助您探索和选择适合您需求的 LLM 提供商，NeMo Guardrails 提供了 find-providers 命令。此命令提供一个交互式界面来发现可用提供商

nemoguardrails find-providers [--list]

该命令支持两种模式

交互模式（默认）：引导您选择提供商类型（文本补全或聊天补全），然后显示该类型的可用提供商
列表模式（--list）：简单地列出所有可用提供商，无需交互式选择

这在您设置配置并需要探索哪些提供商可用且受支持时特别有用。

有关该命令及其用法的更多详细信息，请参阅CLI 文档。

使用带有推理轨迹的 LLM#

默认情况下，推理模型，例如 DeepSeek-R1，会在模型响应中包含推理轨迹。DeepSeek 模型使用 <think> 和 </think> 作为 token 来标识轨迹。

推理轨迹和 token 通常会干扰 NeMo Guardrails，导致对安全响应错误触发输出 guardrails。要使用这些推理模型，可以通过以下配置示例从模型响应中删除轨迹和 token。

models:
  - type: main
    engine: deepseek
    model: deepseek-reasoner
    reasoning_config:
      remove_reasoning_traces: True
      start_token: "<think>"
      end_token: "</think>"

模型的 reasoning_config 字段指定了返回推理轨迹的推理模型所需的配置。通过删除轨迹，guardrails 运行时仅处理来自 LLM 的实际响应。

您可以为推理模型指定以下参数

remove_reasoning_traces: 是否应忽略推理轨迹（默认为 True）。
start_token: 推理过程的开始 token（默认为 <think>）。
end_token: 推理过程的结束 token（默认为 </think>）。

适用于 LLM 的 NIM#

NVIDIA NIM 是一组易于使用的微服务，旨在加速在云、数据中心和工作站部署生成式 AI 模型。适用于 LLM 的 NVIDIA NIM 将最先进的 LLM 的强大功能带到企业应用中，提供无与伦比的自然语言处理和理解能力。详细了解 NIM。

NIM 可以通过下载的容器进行自托管，也可以由 Nvidia 托管并通过 Nvidia AI Enterprise (NVAIE) 许可访问。

NeMo Guardrails 支持按如下方式连接到 NIM

自托管 NIM#

要连接到自托管 NIM，请将 engine 设置为 nim。同时确保模型名称与托管 NIM 支持的模型名称之一匹配（可以使用 GET 请求到 v1/models 端点获取支持的模型列表）。

models:
  - type: main
    engine: nim
    model: <MODEL_NAME>
    parameters:
      base_url: <NIM_ENDPOINT_URL>

例如，要连接到本地部署的 meta/llama3-8b-instruct 模型（端口 8000），请使用以下模型配置

models:
  - type: main
    engine: nim
    model: meta/llama3-8b-instruct
    parameters:
      base_url: https://:8000/v1

NVIDIA AI Endpoints#

NVIDIA AI Endpoints 让用户可以轻松访问 NVIDIA 托管的 API 端点，用于 NVIDIA AI Foundation Models，例如 Llama 3、Mixtral 8x7B 和 Stable Diffusion。这些模型托管在 NVIDIA API catalog 上，经过优化、测试，并托管在 NVIDIA AI 平台上，使得它们能够快速轻松地进行评估、进一步定制并在任何加速堆栈上以最佳性能无缝运行。

要通过 NVIDIA AI Endpoints 使用 LLM 模型，请使用以下模型配置

models:
  - type: main
    engine: nim
    model: <MODEL_NAME>

例如，要使用 llama3-8b-instruct 模型，请使用以下模型配置

models:
  - type: main
    engine: nim
    model: meta/llama3-8b-instruct

重要

要使用 nvidia_ai_endpoints 或 nim LLM 提供商，您必须使用命令 pip install langchain-nvidia-ai-endpoints 安装 langchain-nvidia-ai-endpoints 软件包，并配置有效的 NVIDIA_API_KEY。

有关更多信息，请参阅用户指南。

以下是使用 Ollama 的 llama3 模型的配置示例

models:
  - type: main
    engine: ollama
    model: llama3
    parameters:
      base_url: http://your_base_url

TRT-LLM#

NeMo Guardrails 还支持连接到 TRT-LLM 服务器。

models:
  - type: main
    engine: trt_llm
    model: <MODEL_NAME>

以下是支持的参数列表及其默认值。更多详细信息请参阅 TRT-LLM 文档。

models:
  - type: main
    engine: trt_llm
    model: <MODEL_NAME>
    parameters:
      server_url: <SERVER_URL>
      temperature: 1.0
      top_p: 0
      top_k: 1
      tokens: 100
      beam_width: 1
      repetition_penalty: 1.0
      length_penalty: 1.0

自定义 LLM 模型#

要注册自定义 LLM 提供商，您需要创建一个继承自 BaseLanguageModel 的类，并使用 register_llm_provider 进行注册。

实现以下方法非常重要

必需:

_call
_llm_type

可选:

_acall
_astream
_stream
_identifying_params

换句话说，要创建自定义 LLM 提供商，需要实现以下接口方法：_call、_llm_type，以及可选的 _acall、_astream、_stream 和 _identifying_params。以下是如何实现

from typing import Any, Iterator, List, Optional

from langchain.base_language import BaseLanguageModel
from langchain_core.callbacks.manager import (
    CallbackManagerForLLMRun,
    AsyncCallbackManagerForLLMRun,
)
from langchain_core.outputs import GenerationChunk

from nemoguardrails.llm.providers import register_llm_provider


class MyCustomLLM(BaseLanguageModel):

    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs,
    ) -> str:
        pass

    async def _acall(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
        **kwargs,
    ) -> str:
        pass

    def _stream(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[GenerationChunk]:
        pass

    # rest of the implementation
    ...

register_llm_provider("custom_llm", MyCustomLLM)

然后您可以在配置中使用自定义 LLM 提供商

models:
  - type: main
    engine: custom_llm

按任务配置 LLM#

与 LLM 的交互是面向任务的结构。每次调用 LLM 都与特定任务相关联。这些任务是 guardrail 过程中不可或缺的一部分，包括

generate_user_intent: 此任务将原始用户话语转换为规范形式。例如，“Hello there”可能被转换为 express greeting。
generate_next_steps: 此任务确定 bot 的响应或要执行的 action。示例包括 bot express greeting 或 bot respond to question。
generate_bot_message: 此任务确定要返回的确切 bot 消息。
general: 此任务根据用户和 bot 消息的历史记录生成下一个 bot 消息。当未定义对话 rails（即没有用户消息规范形式）时使用此任务。

有关任务的完整列表，请参阅 Task type。

您可以为特定任务使用不同的 LLM 模型。例如，您可以为 self_check_input 和 self_check_output 任务使用来自不同提供商的不同模型。以下是一个配置示例

models:
  - type: main
    model: meta/llama-3.1-8b-instruct
    engine: nim
  - type: self_check_input
    model: meta/llama3-8b-instruct
    engine: nim
  - type: self_check_output
    model: meta/llama-3.1-70b-instruct
    engine: nim

在前面的示例中，self_check_input 和 self_check_output 任务使用了不同的模型。甚至可以更精细地为 generate_user_intent 等任务使用不同的模型

models:
  - type: main
    model: meta/llama-3.1-8b-instruct
    engine: nim
  - type: self_check_input
    model: meta/llama3-8b-instruct
    engine: nim
  - type: self_check_output
    model: meta/llama-3.1-70b-instruct
    engine: nim
  - type: generate_user_intent
    model: meta/llama-3.1-8b-instruct
    engine: nim

提示

请记住，最适合您需求的模型取决于您的具体要求和限制。尝试使用不同的模型，看看哪个模型最适合您的特定用例，通常是个好主意。

Embedding 模型#

要配置用于 guardrails 过程中各个步骤（例如规范形式生成和下一步生成）的 embedding 模型，请在 models 键中添加模型配置，如下面的配置文件所示

models:
  - ...
  - type: embeddings
    engine: FastEmbed
    model: all-MiniLM-L6-v2

FastEmbed 引擎是默认引擎，使用 all-MiniLM-L6-v2 模型。NeMo Guardrails 还支持使用 OpenAI 模型计算 embedding，例如

models:
  - ...
  - type: embeddings
    engine: openai
    model: text-embedding-ada-002

支持的 Embedding 提供商#

下表列出了支持的 embedding 提供商

提供商名称	`engine_name`	`model`
FastEmbed（默认）	`FastEmbed`	`all-MiniLM-L6-v2`（默认）等
OpenAI	`openai`	`text-embedding-ada-002` 等
SentenceTransformers	`SentenceTransformers`	`all-MiniLM-L6-v2` 等
NVIDIA AI Endpoints	`nvidia_ai_endpoints`	`nv-embed-v1` 等

注意

您可以使用任何受支持的 embedding 提供商的任何受支持模型。上表包含一个可用模型的示例。

自定义 Embedding 提供商#

您还可以通过使用 LLMRails.register_embedding_provider 函数注册自定义 embedding 提供商。

要注册自定义 LLM 提供商，请创建一个继承自 EmbeddingModel 的类，并在 config.py 中注册它。

from typing import List
from nemoguardrails.embeddings.providers.base import EmbeddingModel
from nemoguardrails import LLMRails


class CustomEmbeddingModel(EmbeddingModel):
    """An implementation of a custom embedding provider."""
    engine_name = "CustomEmbeddingModel"

    def __init__(self, embedding_model: str):
        # Initialize the model
        ...

    async def encode_async(self, documents: List[str]) -> List[List[float]]:
        """Encode the provided documents into embeddings.

        Args:
            documents (List[str]): The list of documents for which embeddings should be created.

        Returns:
            List[List[float]]: The list of embeddings corresponding to the input documents.
        """
        ...

    def encode(self, documents: List[str]) -> List[List[float]]:
        """Encode the provided documents into embeddings.

        Args:
            documents (List[str]): The list of documents for which embeddings should be created.

        Returns:
            List[List[float]]: The list of embeddings corresponding to the input documents.
        """
        ...


def init(app: LLMRails):
    """Initialization function in your config.py."""
    app.register_embedding_provider(CustomEmbeddingModel, "CustomEmbeddingModel")

然后您可以在配置中使用自定义 embedding 提供商

models:
  # ...
  - type: embeddings
    engine: SomeCustomName
    model: SomeModelName      # supported by the provider.

Embedding 搜索提供商#

NeMo Guardrails 使用 embedding 搜索（也称为向量数据库）来实现 guardrails 过程和知识库功能。默认的 embedding 搜索使用 FastEmbed 计算 embedding（all-MiniLM-L6-v2 模型），并使用 Annoy 执行搜索。如上一节所示，embedding 模型支持 FastEmbed 和 OpenAI。SentenceTransformers 也受支持。

对于高级用例或与现有知识库的集成，您可以提供自定义 embedding 搜索提供商。

通用指令#

通用指令（类似于系统 prompt）会被添加到每个 prompt 的开头，您可以按如下方式进行配置

instructions:
  - type: general
    content: |
      Below is a conversation between the NeMo Guardrails bot and a user.
      The bot is talkative and provides lots of specific details from its context.
      If the bot does not know the answer to a question, it truthfully says it does not know.

未来将支持多种类型的指令，因此会使用 type 属性和数组结构。

示例对话#

示例对话为用户和 bot 之间的对话设定了基调。它将帮助 LLM 更好地学习对话的格式、语气以及响应的详细程度。此部分应至少包含两个回合。由于我们将此示例对话添加到每个 prompt 中，因此建议将其保持简短且相关。

sample_conversation: |
  user "Hello there!"
    express greeting
  bot express greeting
    "Hello! How can I assist you today?"
  user "What can you do for me?"
    ask about capabilities
  bot respond about capabilities
    "As an AI assistant, I can help provide more information on NeMo Guardrails toolkit. This includes question answering on how to set it up, use it, and customize it for your application."
  user "Tell me a bit about the what the toolkit can do?"
    ask general question
  bot response for general question
    "NeMo Guardrails provides a range of options for quickly and easily adding programmable guardrails to LLM-based conversational systems. The toolkit includes examples on how you can create custom guardrails and compose them together."
  user "what kind of rails can I include?"
    request more information
  bot provide more information
    "You can include guardrails for detecting and preventing offensive language, helping the bot stay on topic, do fact checking, perform output moderation. Basically, if you want to control the output of the bot, you can do it with guardrails."
  user "thanks"
    express appreciation
  bot express appreciation and offer additional help
    "You're welcome. If you have any more questions or if there's anything else I can help you with, please don't hesitate to ask."

Actions Server URL#

如果使用 actions server，则必须在 config.yml 中配置 URL

actions_server_url: ACTIONS_SERVER_URL

LLM Prompts#

您可以使用 prompts 键自定义用于各种 LLM 任务（例如，生成用户意图、生成下一步、生成 bot 消息）的 prompt。例如，要覆盖用于 openai/gpt-3.5-turbo 模型的 generate_user_intent 任务的 prompt

prompts:
  - task: generate_user_intent
    models:
      - openai/gpt-3.5-turbo
    max_length: 3000
    output_parser: user_intent
    content: |-
      <<This is a placeholder for a custom prompt for generating the user intent>>

对于每个任务，您还可以指定用于 LLM 调用的 prompt 的最大长度（以字符数计）。这对于限制 LLM 使用的 token 数量或确保 prompt 长度不超过最大上下文长度非常有用。当超过最大长度时，通过从对话历史记录中删除较旧的回合来截断 prompt，直到 prompt 长度小于或等于最大长度。默认最大长度为 16000 个字符。

NeMo Guardrails 工具包使用的任务完整列表如下

general: 当未使用规范形式时，生成下一个 bot 消息；
generate_user_intent: 生成规范的用户消息；
generate_next_steps: 生成 bot 接下来应该做/说的事情；
generate_bot_message: 生成下一个 bot 消息；
generate_value: 为上下文变量生成值（也称为提取用户提供的值）；
self_check_facts: 根据提供的证据检查 bot 响应中的事实；
self_check_input: 检查是否应允许用户输入；
self_check_output: 检查是否应允许 bot 响应；
self_check_hallucination: 检查 bot 响应是否是幻觉。

您可以在 prompts 文件夹中查看默认 prompt。

多步生成#

对于针对指令遵循进行了微调的大型语言模型 (LLM)，尤其是参数超过 1000 亿的模型，可以启用复杂的多步 flow 的生成。

实验性：此功能是实验性的，只能用于测试和评估目的。

enable_multi_step_generation: True

最低温度#

此温度将用于需要确定性行为的任务（例如，dolly-v2-3b 需要严格的正值）。

lowest_temperature: 0.1

事件源 ID#

此 ID 将用作 Colang 运行时发出的所有事件的 source_uid。如果您需要在系统中区分多个 Colang 运行时（例如在多 agent 场景中），将其设置为不同于默认值（默认值为 NeMoGuardrails-Colang-2.x）的值会很有用。

event_source_uid : colang-agent-1

自定义数据#

如果您需要将额外的配置数据传递给配置中的任何自定义组件，可以使用 custom_data 字段。

custom_data:
  custom_config_field: "some_value"

例如，您可以在 config.py 中的 init 函数内访问自定义配置（请参阅自定义初始化）。

def init(app: LLMRails):
    config = app.config

    # Do something with config.custom_data

Guardrails 定义#

Guardrails（简称 rails）通过 flows 实现。根据其作用，rails 可以分为几个主要类别

输入 rails：当接收到来自用户的新输入时触发。
输出 rails：当应向用户发送新输出时触发。
对话 rails：在用户消息被解释（即已识别出规范形式）后触发。
检索 rails：在执行检索步骤后触发（即 retrieve_relevant_chunks action 完成后触发）。
执行 rails：在 action 调用之前和之后触发。

活动 rails 使用 config.yml 中的 rails 键进行配置。下面是一个快速示例

rails:
  # Input rails are invoked when a new message from the user is received.
  input:
    flows:
      - check jailbreak
      - check input sensitive data
      - check toxicity
      - ... # Other input rails

  # Output rails are triggered after a bot message has been generated.
  output:
    flows:
      - self check facts
      - self check hallucination
      - check output sensitive data
      - ... # Other output rails

  # Retrieval rails are invoked once `$relevant_chunks` are computed.
  retrieval:
    flows:
      - check retrieval sensitive data

所有非输入、输出或检索 flow 的 flow 都被视为对话 rails 和执行 rails，即控制对话流程以及何时如何调用 action 的 flow。对话/执行 rail flows 不需要明确在配置中列举。但是，还有一些其他配置选项可以用于控制其行为。

rails:
  # Dialog rails are triggered after user message is interpreted, i.e., its canonical form
  # has been computed.
  dialog:
    # Whether to try to use a single LLM call for generating the user intent, next step and bot message.
    single_call:
      enabled: False

      # If a single call fails, whether to fall back to multiple LLM calls.
      fallback_to_multiple_calls: True

    user_messages:
      # Whether to use only the embeddings when interpreting the user's message
      embeddings_only: False

输入 Rails#

输入 rails 处理来自用户的消息。例如

define flow self check input
  $allowed = execute self_check_input

  if not $allowed
    bot refuse to respond
    stop

输入 rails 可以通过更改 $user_message 上下文变量来修改输入。

输出护栏#

输出护栏处理机器人消息。待处理的消息可在上下文变量 $bot_message 中获取。输出护栏可以修改 $bot_message 变量，例如用于屏蔽敏感信息。

通过将 $skip_output_rails 上下文变量设置为 True，您可以暂时停用下一个机器人消息的输出护栏。

流式输出配置#

默认情况下，输出护栏的响应是同步的。您可以启用流式传输，以便更快地接收来自输出护栏的响应。

您必须在 config.yml 文件中设置顶层字段 streaming: True。

对于每个输出护栏，添加 streaming 字段和配置参数。

rails:
  output:
    - rail name
  streaming:
    chunk_size: 200
    context_size: 50
    stream_first: True

streaming: True

启用流式传输后，工具包会将输出护栏应用于令牌块。如果某个护栏阻止了一个令牌块，工具包将返回以下格式的 JSON 错误对象

{
  "error": {
    "message": "Blocked by <rail-name> rails.",
    "type": "guardrails_violation",
    "param": "<rail-name>",
    "code": "content_blocked"
  }
}

在与 OpenAI Python 客户端集成时，此 JSON 错误旨在由服务器代码捕获，并按照 OpenAI 的 SSE 格式转换为 API 错误。

下表描述了 streaming 字段的子字段

字段	描述	默认值
streaming.chunk_size	指定每个块的令牌数量。工具包将输出护栏应用于每个令牌块。较大的值可以为护栏评估提供更有意义的信息，但在累积令牌以形成完整块时可能会增加延迟。如果您指定 `stream_first: False`，则延迟增加的风险尤为突出。	`200`
streaming.context_size	指定从前一个块中保留的令牌数量，以便在处理中提供上下文和连续性。较大的值可以在块之间提供连续性，同时对延迟影响最小。较小的值可能无法检测跨块违规。指定大约 25% 的 `chunk_size` 是一个不错的折衷方案。	`50`
streaming.stream_first	当设置为 `False` 时，工具包在将令牌块流式传输到客户端之前对其应用输出护栏。如果您将此字段设置为 `False`，则可以避免流式传输被阻止内容的令牌块。默认情况下，工具包会尽快流式传输令牌块，并在对其应用输出护栏之前进行。	`True`

字段

描述

默认值

streaming.chunk_size

指定每个块的令牌数量。工具包将输出护栏应用于每个令牌块。

较大的值可以为护栏评估提供更有意义的信息，但在累积令牌以形成完整块时可能会增加延迟。如果您指定 stream_first: False，则延迟增加的风险尤为突出。

200

streaming.context_size

指定从前一个块中保留的令牌数量，以便在处理中提供上下文和连续性。

较大的值可以在块之间提供连续性，同时对延迟影响最小。较小的值可能无法检测跨块违规。指定大约 25% 的 chunk_size 是一个不错的折衷方案。

50

streaming.stream_first

当设置为 False 时，工具包在将令牌块流式传输到客户端之前对其应用输出护栏。如果您将此字段设置为 False，则可以避免流式传输被阻止内容的令牌块。

默认情况下，工具包会尽快流式传输令牌块，并在对其应用输出护栏之前进行。

True

下表显示了令牌数量、块大小和上下文大小如何相互作用以触发护栏调用次数。

输入长度	块大小	上下文大小	护栏调用次数
512	256	64	3
600	256	64	3
256	256	64	1
1024	256	64	5
1024	256	32	5
1024	256	32	5
1024	128	32	11
512	128	32	5

有关代码示例，请参阅流式输出。

检索护栏#

检索护栏处理检索到的块，即 $relevant_chunks 变量。

对话护栏#

对话护栏强制执行特定的预定义对话路径。要使用对话护栏，您必须为各种用户消息定义标准形式，并使用它们触发对话流程。查看 Hello World 机器人以获取快速示例。有关稍微高级的示例，请查看 ABC 机器人，其中使用对话护栏来确保机器人不讨论特定主题。

使用对话护栏需要三个步骤

生成标准用户消息
决定下一步并执行
生成机器人话语

有关详细说明，请查看护栏流程。

上述每个步骤可能需要调用 LLM。

单次调用模式#

从版本 0.6.0 开始，NeMo Guardrails 还支持“单次调用”模式，其中所有三个步骤都使用一次 LLM 调用来执行。要启用它，您必须将 single_call.enabled 标志设置为 True，如下所示。

rails:
  dialog:
    # Whether to try to use a single LLM call for generating the user intent, next step and bot message.
    single_call:
      enabled: True

      # If a single call fails, whether to fall back to multiple LLM calls.
      fallback_to_multiple_calls: True

在典型的 RAG（检索增强生成）场景中，使用此选项可在延迟方面带来 3 倍的改进，并减少 37% 的令牌使用量。

重要提示：当前，单次调用模式 只能预测机器人消息作为下一步。这意味着如果您希望 LLM 泛化并决定对动态生成的用户标准形式消息执行操作，它将不起作用。

仅使用嵌入#

加快对话护栏的另一种选择是仅使用预定义用户消息的嵌入来决定用户输入的标准形式。要启用此选项，您必须设置 embeddings_only 标志，如下所示

rails:
  dialog:
    user_messages:
      # Whether to use only the embeddings when interpreting the user's message
      embeddings_only: True
      # Use only the embeddings when the similarity is above the specified threshold.
      embeddings_only_similarity_threshold: 0.75
      # When the fallback is set to None, if the similarity is below the threshold, the user intent is computed normally using the LLM.
      # When it is set to a string value, that string value will be used as the intent.
      embeddings_only_fallback_intent: None

重要提示：仅当提供了足够的示例时才建议这样做。此处使用的阈值为 0.75，如果相似度低于此值，则会触发 LLM 调用以生成用户意图。如果遇到误报，请考虑将阈值提高到 0.8。请注意，该阈值取决于模型。

异常#

NeMo Guardrails 支持在流程中引发异常。异常是指名称以 Exception 结尾的事件，例如 InputRailException。当异常被引发时，最终输出是一条消息，其角色设置为 exception，内容设置为有关异常的附加信息。例如

define flow input rail example
  # ...
  create event InputRailException(message="Input not allowed.")

{
  "role": "exception",
  "content": {
    "type": "InputRailException",
    "uid": "45a452fa-588e-49a5-af7a-0bab5234dcc3",
    "event_created_at": "9999-99-99999:24:30.093749+00:00",
    "source_uid": "NeMoGuardrails",
    "message": "Input not allowed."
  }
}

护栏库异常#

默认情况下，护栏库中包含的所有护栏在触发时都会返回预定义消息。您可以通过在 config.yml 文件中将 enable_rails_exceptions 键设置为 True 来更改此行为

enable_rails_exceptions: True

启用此设置后，护栏被触发时将返回异常消息。为了更好地理解其内部工作原理，以下是 self check input 护栏的实现方式

define flow self check input
  $allowed = execute self_check_input
  if not $allowed
    if $config.enable_rails_exceptions
      create event InputRailException(message="Input not allowed. The input was blocked by the 'self check input' flow.")
    else
      bot refuse to respond
      stop

注意

在 Colang 2.x 中，您必须将 $config.enable_rails_exceptions 更改为 $system.config.enable_rails_exceptions，并将 create event 更改为 send。

触发 self check input 护栏时，将返回以下异常。

{
  "role": "exception",
  "content": {
    "type": "InputRailException",
    "uid": "45a452fa-588e-49a5-af7a-0bab5234dcc3",
    "event_created_at": "9999-99-99999:24:30.093749+00:00",
    "source_uid": "NeMoGuardrails",
    "message": "Input not allowed. The input was blocked by the 'self check input' flow."
  }
}

跟踪#

NeMo Guardrails 包含跟踪功能，使您可以监控和记录交互，以实现更好的可观察性和调试。跟踪可以通过现有的 config.yml 文件轻松配置。以下是在项目中启用和配置跟踪的步骤。

启用跟踪#

要启用跟踪，请在 config.yml 的 tracing 部分下将 enabled 标志设置为 true

tracing:
  enabled: true

重要

您必须安装必要的依赖项才能使用跟踪适配器。

  pip install "opentelemetry-api opentelemetry-sdk aiofiles"

配置跟踪适配器#

跟踪支持多种适配器，用于确定交互日志的导出方式和位置。您可以通过在 adapters 列表中指定一个或多个适配器来配置它们。以下是配置内置 OpenTelemetry 和 FileSystem 适配器的示例

tracing:
  enabled: true
  adapters:
    - name: OpenTelemetry
      service_name: "nemo_guardrails_service"
      exporter: "console"  # Options: "console", "zipkin", etc.
      resource_attributes:
        env: "production"
    - name: FileSystem
      filepath: './traces/traces.jsonl'

警告

“console”仅用于调试和演示目的，不应在生产环境中使用。使用此导出器会将跟踪信息直接输出到控制台，这可能会干扰应用程序输出、扭曲用户界面、降低性能，并可能暴露敏感信息。在生产环境中，请配置合适的导出器，将跟踪数据发送到专用的后端或监控系统。

OpenTelemetry 适配器#

OpenTelemetry 适配器与 OpenTelemetry 框架集成，允许您将跟踪导出到各种后端。主要配置选项包括

• service_name: 您的服务名称。• exporter: 要使用的导出器类型（例如，console、zipkin）。• resource_attributes: 要包含在跟踪资源中的附加属性（例如，环境）。

FileSystem 适配器#

FileSystem 适配器将交互日志导出到本地 JSON Lines 文件。主要配置选项包括

• filepath: 存储跟踪文件的路径。如果未指定，则默认为 ./.traces/trace.jsonl。

配置示例#

以下是启用 OpenTelemetry 和 FileSystem 适配器的完整 config.yml 文件示例

tracing:
  enabled: true
  adapters:
    - name: OpenTelemetry
      service_name: "nemo_guardrails_service"
      exporter: "zipkin"
      resource_attributes:
        env: "production"
    - name: FileSystem
      filepath: './traces/traces.jsonl'

要使用此配置，您必须确保 Zipkin 在本地运行或可以通过网络访问。

使用 Zipkin 作为导出器#

要使用 Zipkin 作为导出器，请按照以下步骤操作

安装 OpenTelemetry 的 Zipkin 导出器

pip install opentelemetry-exporter-zipkin

使用 Docker 运行 Zipkin 服务器

docker run -d -p 9411:9411 openzipkin/zipkin

注册 OpenTelemetry 导出器#

您还可以通过在 config.py 文件中注册其他 OpenTelemetry 导出器来使用它们。为此，您需要使用 register_otel_exporter 并注册导出器类。以下是注册 Jaeger 导出器的示例

# This assumes that Jaeger exporter is installed
# pip install opentelemetry-exporter-jaeger

from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from nemoguardrails.tracing.adapters.opentelemetry import register_otel_exporter

register_otel_exporter(JaegerExporter, "jaeger")

然后您可以在 config.yml 文件中如下使用它

tracing:
  enabled: true
  adapters:
    - name: OpenTelemetry
      service_name: "nemo_guardrails_service"
      exporter: "jaeger"
      resource_attributes:
        env: "production"

自定义 InteractionLogAdapters#

NeMo Guardrails 允许您通过创建自定义 InteractionLogAdapter 类来扩展其跟踪功能。这种灵活性使您能够将交互日志转换为并导出到适合您需求的任何后端或格式。

实现自定义适配器#

要创建自定义适配器，您需要实现 InteractionLogAdapter 抽象基类。以下是您必须遵循的接口

from abc import ABC, abstractmethod
from nemoguardrails.tracing import InteractionLog

class InteractionLogAdapter(ABC):
    name: Optional[str] = None


    @abstractmethod
    async def transform_async(self, interaction_log: InteractionLog):
        """Transforms the InteractionLog into the backend-specific format asynchronously."""
        raise NotImplementedError

    async def close(self):
        """Placeholder for any cleanup actions if needed."""
        pass

    async def __aenter__(self):
        """Enter the runtime context related to this object."""
        return self

    async def __aexit__(self, exc_type, exc_value, traceback):
        """Exit the runtime context related to this object."""
        await self.close()

注册您的自定义适配器#

实现自定义适配器后，您需要注册它，以便 NemoGuardrails 能够识别并利用它。这通过在您的 config.py: 中添加注册调用来完成：

from nemoguardrails.tracing.adapters.registry import register_log_adapter
from path.to.your.adapter import YourCustomAdapter

register_log_adapter(YourCustomAdapter, "CustomLogAdapter")

示例：创建自定义适配器#

这是一个将交互日志记录到自定义后端的自定义适配器的简单示例

from nemoguardrails.tracing.adapters.base import InteractionLogAdapter
from nemoguardrails.tracing import InteractionLog

class MyCustomLogAdapter(InteractionLogAdapter):
    name = "MyCustomLogAdapter"

    def __init__(self, custom_option1: str, custom_option2: str):
      self.custom_option1 = custom_option1
      self.custom_option2 = custom

    def transform(self, interaction_log: InteractionLog):
        # Implement your transformation logic here
        custom_format = convert_to_custom_format(interaction_log)
        send_to_custom_backend(custom_format)

    async def transform_async(self, interaction_log: InteractionLog):
        # Implement your asynchronous transformation logic here
        custom_format = convert_to_custom_format(interaction_log)
        await send_to_custom_backend_async(custom_format)

    async def close(self):
        # Implement any necessary cleanup here
        await cleanup_custom_resources()

使用您的 CustomLogAdapter 更新 config.yml

注册后，您可以像配置任何其他适配器一样在 config.yml 中配置您的自定义适配器

tracing:
  enabled: true
  adapters:
    - name: MyCustomLogAdapter
      custom_option1: "value1"
      custom_option2: "value2"

通过遵循这些步骤，您可以利用内置的跟踪适配器或创建和集成您自己的自定义适配器，以增强由 NeMo Guardrails 驱动的应用程序的可观察性。无论您选择将日志导出到文件系统、与 OpenTelemetry 集成，还是实现定制的日志解决方案，跟踪都提供了满足您需求的灵活性。

知识库文档#

默认情况下，LLMRails 实例支持使用一组文档作为生成机器人响应的上下文。要将文档作为知识库的一部分包含在内，您必须将它们放置在配置文件夹内的 kb 文件夹中

.
├── config
│   └── kb
│       ├── file_1.md
│       ├── file_2.md
│       └── ...

目前，仅支持 Markdown 格式。将来会尽快添加对其他格式的支持。