Cola DLMContinuousLatentDiffusionLanguageModel是一种分层连续潜空间扩散语言模型。它将文本自编码器Text VAE与基于块因果关系的扩散变换器DiT先验相结合自编码器将文本映射为连续的潜在序列并将潜在序列解码回令牌而扩散变换器则通过流匹配Flow Matching实现潜在先验的传递。本模型仓库包含论文《连续潜在扩散语言模型》的 HuggingFace 格式检查点。链接模型仓库:https://huggingface.co/ByteDance-Seed/Cola-DLMGitHub仓库:https://github.com/ByteDance-Seed/Cola-DLM论文:https://arxiv.org/abs/2605.06548HuggingFace每日论文:https://huggingface.co/papers/2605.06548项目主页:https://hongcanguo.github.io/Cola-DLM/博客文章:https://hongcanguo.github.io/posts/2026-cola-dlm.html知乎文章:https://zhuanlan.zhihu.com/p/2038324180920313704模型文件预期的仓库目录结构为:. ├── cola_dlm/ │ ├── cola_dit/ │ │ ├── config.json │ │ └── model.safetensors* │ └── cola_vae/ │ ├── config.json │ └── model.safetensors* ├── tokenizer.json ├── README.md └── README_zh.md检查点由两个协作模块组成ColaDiTModel一个块因果一维扩散变换器用于连续文本潜在空间的先验建模。ColaTextVAEModel一个文本变分自编码器包含编码器和条件解码器实现文本到潜在空间及潜在空间到文本的双向映射。快速开始从GitHub仓库安装Cola DLM代码包然后安装下载辅助工具gitclone https://github.com/ByteDance-Seed/Cola-DLM.gitcdCola-DLM pipinstall-e.pipinstallhuggingface_hub下载模型文件huggingface-cli download ByteDance-Seed/Cola-DLM --local-dir hf_models运行一个最小的Python示例importtorchfromtokenizersimportTokenizerfromcola_dlmimport(ColaDiTModel,ColaTextVAEModel,generate_task_repaint_inference,)devicetorch.device(cudaiftorch.cuda.is_available()elsecpu)ditColaDiTModel.from_pretrained(hf_models/cola_dlm/cola_dit).to(device)vaeColaTextVAEModel.from_pretrained(hf_models/cola_dlm/cola_vae).to(device)tokenizerTokenizer.from_file(hf_models/tokenizer.json)prompts[{question:Question: What is the capital of France? Answer:}]resultsgenerate_task_repaint_inference(ditdit,vaevae,tokenizertokenizer,promptsprompts,task_namelambada,devicedevice,max_new_tokens32,temperature0.0,guidance_scale7.0,timestep_num16,pad_token_id100277,)print(results[0][generate])OpenAI 兼容服务Cola DLM 代码版本中的配套openai_adapter/服务通过 OpenAI 兼容的 Chat Completions 端点公开此模型POST /v1/chat/completions从代码仓库根目录安装适配器依赖项pipinstall-e.pipinstall-ropenai_adapter/requirements.txt启动服务exportCOLA_DIT_PATHhf_models/cola_dlm/cola_ditexportCOLA_VAE_PATHhf_models/cola_dlm/cola_vaeexportCOLA_TOKENIZER_PATHhf_models/tokenizer.jsonexportCOLA_MODEL_NAMEcola-dlmexportCOLA_API_KEYchange-me uvicorn openai_adapter.server:app--host0.0.0.0--port8000然后发送一个请求curlhttp://127.0.0.1:8000/v1/chat/completions\-HContent-Type: application/json\-HAuthorization: Bearer change-me\-d{ model: cola-dlm, messages: [ { role: user, content: Question: What is the capital of France? Answer: } ], temperature: 0, max_tokens: 32, stream: false }该适配器目前支持非流式补全功能。模型详情架构:文本VAE 块因果DiT潜在先验训练目标:两阶段训练先进行文本VAE预训练再通过流匹配进行文本VAE与DiT联合训练训练算力节点:发布权重对应论文RQ4扩展曲线中2000 EFLOPs的检查点分词器:OLMo 2分词器词汇量100,278词条特殊标记ID:填充标记100277结束标记100257im_end标记100265框架:PyTorch 2.1 和 HuggingFace Transformers 4.40许可:Apache 2.0许可证评估结果开源推理实现的零样本基准测试结果任务准确率(%)LAMBADA50.80MMLU19.30OBQA23.00HellaSwag10.70RACE19.60SIQA28.90SQuAD30.90Story Cloze30.77任务平均26.75开源HuggingFace实现与论文内部实现可能存在细微差异各任务数值会有小幅波动但整体趋势与论文一致。使用范围Cola DLM主要用于以下研究领域分层潜变量语言模型文本连续潜在扩散流匹配先验基准式文本生成该检查点未经过指令微调且未进行RLHF处理不应视为生产级聊天机器人或用于安全关键决策。局限性主要基于英文文本训练其他语言评估不足输出可能包含事实错误、冒犯内容、偏见或幻觉生成质量对提示格式和长度敏感建议采用问题:...答案:式提示进行快速评估生成时使用可变KV缓存服务实现需在单进程内序列化生成除非显式隔离缓存处理引用如果您在工作中使用了Cola DLM请引用article{guo2026cola, title {Continuous Latent Diffusion Language Model}, author {Guo, Hongcan and Zhao, Qinyu and Zhao, Yian and Nie, Shen and Zhu, Rui and Guo, Qiushan and Wang, Feng and Yang, Tao and Zhao, Hengshuang and Wei, Guoqiang and Zeng, Yan}, journal {arXiv preprint arXiv:2605.06548}, year {2026}, url {https://arxiv.org/abs/2605.06548}, }