代码样例手写一个最基础的大模型推理示例但是为了方便测试和节省显存我们先用一个小模型比如HuggingFace 上的distilgpt2比 GPT-2 小很多适合 CPU/GPU 本地跑。下面我给你一个完整 Python 示例不用任何复杂框架也能做简单的文本生成。1️⃣ 安装依赖pipinstalltorch transformers2️⃣ 基础推理代码fromtransformersimportAutoModelForCausalLM,AutoTokenizerimporttorch# 选择小模型model_namedistilgpt2# 加载 tokenizer 和模型tokenizerAutoTokenizer.from_pretrained(model_name)modelAutoModelForCausalLM.from_pretrained(model_name)# 如果有 GPU可切换到 CUDAdevicecudaiftorch.cuda.is_available()elsecpumodel.to(device)# 输入 promptpromptHello, I am a small language model. I can# 编码输入inputstokenizer(prompt,return_tensorspt).to(device)# 推理生成# max_length 控制输出长度outputsmodel.generate(**inputs,max_length50,do_sampleTrue,temperature0.7,top_k50,top_p0.95,no_repeat_ngram_size2)# 解码输出texttokenizer.decode(outputs[0],skip_special_tokensTrue)print(生成结果)print(text)3️⃣ 代码说明Tokenizer把文字转换成模型可以理解的 token。ModelAutoModelForCausalLM用于自回归文本生成。小模型distilgpt2显存需求很低几百 MB 就够。生成参数max_length50生成长度上限。do_sampleTrue随机采样生成多样文本。temperature0.7温度越高生成越随机。参数作用效果max_length限制生成长度防止无限生成do_sample是否随机采样True → 多样化False → 贪心固定temperature调整采样随机性1 保守1 随机top_k只采样前 K 个 token限制输出范围降低低概率 token 出现top_p只采样累积概率 ≥ p 的 token动态选择高概率集合no_repeat_ngram_size避免 n-gram 重复防止生成重复短语GPU/CPU 切换用torch.cuda.is_available()自动检测是否有 GPU。✅ 这个示例可以在本地 CPU 或 GPU上运行快速实现大模型推理的最基础流程。运行效果Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads. Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 76/76 [00:0000:00, 1691.30it/s, Materializing paramtransformer.wte.weight] GPT2LMHeadModel LOAD REPORT from: distilgpt2 Key | Status | | ---------------------------------------------------------- transformer.h.{0, 1, 2, 3, 4, 5}.attn.bias | UNEXPECTED | | Notes: - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch. Setting pad_token_id to eos_token_id:50256 for open-end generation. 生成结果 Hello, I am a small language model. I can write code with a few different things like: 1. Create an object that can be modified 2. Use a custom function 3. Write an anonymous function in the body 4. Add an optional parameter to the function (or an argument) 5. Make a class that accepts a value and a function that returns a string 6. Get a method that uses a single argument 7. Return a new value 8. Convert an empty function into a .hf file 9. Delete the empty method 10. Replace the .hsf with the following code: