LLM 示例介绍#

这是一个简单的示例，展示了如何将 LLM 与 TinyLlama 一起使用。

from tensorrt_llm import LLM, SamplingParams


def main():

    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
    ]
    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

    llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0")

    outputs = llm.generate(prompts, sampling_params)

    # Print the outputs.
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")


# The entry point of the program need to be protected for spawning processes.
if __name__ == '__main__':
    main()

LLM API 可用于离线或在线使用。在此处查看 LLM API 的更多示例

LLM API 示例

使用引导解码生成文本
使用 logits 处理器控制生成的文本
生成文本
异步生成文本
以流式方式生成文本
使用自定义生成文本
分布式 LLM 生成
使用 Medusa 解码生成文本
使用量化生成文本
使用 Lookahead 解码生成文本
使用 Eagle 解码生成文本
获取 KV 缓存事件
使用多个 LoRA 适配器生成文本
使用 LLM 自动并行化
Llm Mgmn Llm Distributed
Llm Mgmn Trtllm Bench
Llm Mgmn Trtllm Serve

有关如何充分利用此 API 的更多详细信息，请查看

常用自定义
LLM API 参考