欧美乱妇高清无乱码国产,国产精品澳门永利久久,好男人社区www神马动漫

2025年9月29日，寒武紀(jì)已同步實(shí)現(xiàn)對(duì)深度求索公司最新模型DeepSeek-V3.2-Exp的適配，并開(kāi)源大模型推理引擎vLLM-MLU源代碼。代碼地址和測(cè)試步驟見(jiàn)文末，開(kāi)發(fā)者可以在寒武紀(jì)軟硬件平臺(tái)上第一時(shí)間體驗(yàn)DeepSeek-V3.2-Exp的亮點(diǎn)。

寒武紀(jì)一直高度重視大模型軟件生態(tài)建設(shè)，支持以DeepSeek為代表的所有主流開(kāi)源大模型。借助于長(zhǎng)期活躍的生態(tài)建設(shè)和技術(shù)積累，寒武紀(jì)得以快速實(shí)現(xiàn)對(duì)DeepSeek-V3.2-Exp這一全新實(shí)驗(yàn)性模型架構(gòu)的day 0適配和優(yōu)化。

寒武紀(jì)一直重視芯片和算法的聯(lián)合創(chuàng)新，致力于以軟硬件協(xié)同的方式，優(yōu)化大模型部署性能，降低部署成本。此前，我們對(duì)DeepSeek系列模型進(jìn)行了深入的軟硬件協(xié)同性能優(yōu)化，達(dá)成了業(yè)界領(lǐng)先的算力利用率水平。針對(duì)本次的DeepSeek-V3.2-Exp新模型架構(gòu)，寒武紀(jì)通過(guò)Triton算子開(kāi)發(fā)實(shí)現(xiàn)了快速適配，利用BangC融合算子開(kāi)發(fā)實(shí)現(xiàn)了極致性能優(yōu)化，并基于計(jì)算與通信的并行策略，再次達(dá)成了業(yè)界領(lǐng)先的計(jì)算效率水平。依托DeepSeek-V3.2-Exp帶來(lái)的全新DeepSeek Sparse Attention機(jī)制，疊加寒武紀(jì)的極致計(jì)算效率，可大幅降低長(zhǎng)序列場(chǎng)景下的訓(xùn)推成本，共同為客戶提供極具競(jìng)爭(zhēng)力的軟硬件解決方案。

↓ vLLM-MLU DeepSeek-V3.2-Exp適配的源碼(點(diǎn)擊文末“閱讀原文”可直接跳轉(zhuǎn))↓

https://github.com/Cambricon/vllm-mlu

基于vLLM-MLU的DeepSeek-V3.2-Exp運(yùn)行指南

一、環(huán)境準(zhǔn)備

軟件：需使用寒武紀(jì)訓(xùn)推一體鏡像Cambricon Pytorch Container部署，鏡像內(nèi)預(yù)裝運(yùn)行vLLM-MLU的各項(xiàng)依賴。

硬件：4臺(tái)8卡MLU服務(wù)器。

如需獲取完整的軟硬件運(yùn)行環(huán)境，請(qǐng)通過(guò)官方渠道聯(lián)系寒武紀(jì)。

二、運(yùn)行步驟及結(jié)果展示

Step1：模型下載

模型文件請(qǐng)從Huggingface官網(wǎng)自行下載，后文用${MODEL_PATH}表示下載好的模型路徑。

Step 2：?jiǎn)?dòng)容器

加載鏡像，啟動(dòng)容器，命令如下：

# 加載鏡像
docker load -i cambricon_pytorch_container-torch2.7.1-torchmlu1.28.0-ubuntu22.04-py310.tar.gz

# 啟動(dòng)容器
docker run -it --net=host 
  --shm-size'64gb'--privileged -it 
  --ulimitmemlock=-1${IMAGE_NAME}
  /bin/bash

# 安裝社區(qū)vLLM 0.9.1版本
pushd${VLLM_SRC_PATH}/vllm
  VLLM_TARGET_DEVICE=empty pip install .
popd
# 安裝寒武紀(jì)vLLM-mlu
pushd${VLLM_SRC_PATH}/vllm-mlu
  pip install .
popd

Step 3：?jiǎn)?dòng)Ray服務(wù)

在執(zhí)行模型前，需要先啟動(dòng)ray服務(wù)。啟動(dòng)命令如下：

# 設(shè)置環(huán)境變量
exportGLOO_SOCKET_IFNAME=${INFERENCE_NAME}
exportNOSET_MLU_VISIBLE_DEVICES_ENV_VAR=1

# 主節(jié)點(diǎn)
ray start --head--port${port}
# 從節(jié)點(diǎn)
ray start --address='${master_ip}:${port}'

Step 4：運(yùn)行離線推理

這里提供簡(jiǎn)易的離線推理腳本`offline_inference.py`：

importsys

fromvllmimportLLM, SamplingParams


defmain(model_path):
  # Sample prompts.
  prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
  ]
  sampling_params = SamplingParams(
    temperature=0.6, top_p=0.95, top_k=20, max_tokens=10)

  # Create an LLM.
  engine_args_dict = {
    "model": model_path,
    "tensor_parallel_size":32,
    "distributed_executor_backend":"ray",
    "enable_expert_parallel":True,
    "enable_prefix_caching":False,
    "enforce_eager":True,
    "trust_remote_code":True,
  }
  llm = LLM(**engine_args_dict)
  # Generate texts from the prompts.
  outputs = llm.generate(prompts, sampling_params)

  # Print the outputs.
  foroutputinoutputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt:{prompt!r}, Generated text:{generated_text!r}")


if__name__ =='__main__':
  main(sys[1])

運(yùn)行如下命令，完成模型離線推理：

# 運(yùn)行推理命令
pythonoffline_inference.py --model${MODEL_PATH}

運(yùn)行結(jié)果符合預(yù)期，具體結(jié)果如下：

Step 5：運(yùn)行在線推理

分別啟動(dòng)server和client，完成推理服務(wù)，示例如下：

# server
vllmserve${MODEL_PATH}
  --port8100
  --max-model-len40000
  --distributed-executor-backend ray 
  --trust-remote-code 
  --tensor-parallel-size32
  --enable-expert-parallel 
  --no-enable-prefix-caching 
  --disable-log-requests 
  --enforce-eager

# client, we post a single request here.
curl -X POST http://localhost:8100/v1/completions 
  -H"Content-Type: application/json"
  -d'{"model":${MODEL_PATH}, 
     "prompt": "The future of AI is", 
     "max_tokens": 50, "temperature": 0.7 
    }'

運(yùn)行結(jié)果如下：

提取輸入輸出信息如下，符合預(yù)期。

Prompt：The futureofAIis
Output：being shapedbya numberofkey trends. These include the riseoflargelanguagemodels, the increasing useofAIinenterprise, the developmentofmore powerfulandefficient AI hardware,andthe growing focusonAI ethicsandsafety.

Largelanguagemodelsare

Step 6：運(yùn)行交互式對(duì)話

使用vLLM-MLU框架，運(yùn)行交互式對(duì)話demo，執(zhí)行結(jié)果如下：

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點(diǎn)僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場(chǎng)。文章及其配圖僅供工程師學(xué)習(xí)之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問(wèn)題，請(qǐng)聯(lián)系本站處理。舉報(bào)投訴

開(kāi)源

開(kāi)源

+關(guān)注

關(guān)注
3

文章
3874

瀏覽量
45155
寒武紀(jì)

寒武紀(jì)

+關(guān)注

關(guān)注
13

文章
209

瀏覽量
74685
DeepSeek

DeepSeek

+關(guān)注

關(guān)注
2

文章
820

瀏覽量
2537

原文標(biāo)題：寒武紀(jì)Day 0適配DeepSeek-V3.2-Exp，同步開(kāi)源推理引擎vLLM-MLU

文章出處：【微信號(hào)：Cambricon_Developer，微信公眾號(hào)：寒武紀(jì)開(kāi)發(fā)者】歡迎添加關(guān)注！文章轉(zhuǎn)載請(qǐng)注明出處。

chinese直男口爆体育生外卖, 99久久er热在这里只有精品99, 又色又爽又黄18禁美女裸身无遮挡, gogogo高清免费观看日本电视,私密按摩师高清版在线,人妻视频毛茸茸,91论坛兴趣闲谈,欧美亚洲精品 8区,国产精品久久久久精品免费

搜索歷史

寒武紀(jì)成功適配DeepSeek-V3.2-Exp模型

評(píng)論