Fine-tune and Pretrain LLMs on Ascend NPU
Workbench-based recipes for running full-parameter SFT and pretraining on arm64 nodes with Huawei Ascend NPUs. These notebooks run training directly inside a single workbench container with multiple NPUs attached (no VolcanoJob).
All three notebooks are validation-first: the defaults use short sequences and few iterations so you can confirm the runtime, model loading, preprocessing, and distributed launch path before scaling up. The PyTorch CANN image bundles Python 3.12, CANN 8.5.0, PyTorch 2.9.0, and torch_npu 2.9.0.
TOC
Before you beginPrepare the base modelPrepare the datasetFine-tuning data (Alpaca-style JSONL)Pretraining data (raw text)Run a notebookParameters to reviewOutput pathsNotesBefore you begin
- The Ascend driver, CANN runtime, and Kubernetes device plugin are installed; your workbench can be scheduled to an
arm64Ascend node. - Plan for ≥ 4 NPUs for the PyTorch examples; the MindSpore notebook is tuned for 2× Ascend 910B 32G with
TP=1,PP=1,MBS=2. - The workspace uses persistent storage with room for both the HuggingFace model and the converted Megatron / MCore weights.
- Network: the PyTorch notebooks clone
MindSpeed-LLMfromhttps://gitcode.com/ascend/MindSpeed-LLM.gitat runtime. If the workbench can't reach it, drop a local copy in the workspace and update the path in the first cell. The MindSpore notebook uses the bundled tree under/opt/app-root/share/MindSpeed-Core-MSand doesn't clone anything.
Create the workbench (PyTorch CANN or MindSpore CANN) per Creating a Workbench and upload the notebook through the JupyterLab file browser.
Prepare the base model
Each notebook expects a HuggingFace-format model at:
Either drop the model files there or change HF_MODEL_DIR in the first parameter cell. For versioned, reusable models, push to the platform model repository and clone from there per Upload Models Using Notebook.
All three notebooks convert HF → Megatron / MCore before training; expect a second large weight directory on disk.
Prepare the dataset
Fine-tuning data (Alpaca-style JSONL)
Both fine-tuning notebooks expect:
ALPACA_PARQUET = /opt/app-root/src/datasets/alpaca/train-00000-of-00001-a09b74b3ef9c3b56.parquetRAW_DATA_FILE = .../finetune_dataset/alpaca_sample.jsonl(under the notebook's work dir)
If the parquet exists the notebook converts it to JSONL automatically. If you already have JSONL, place it at RAW_DATA_FILE or update the variable. The expected record:
ShareGPT and Pairwise formats can be swapped in by changing the handler in the parameter cell.
Pretraining data (raw text)
MindSpeed-LLM preprocessing accepts .parquet, .json, .jsonl, and .txt. Structured formats need a text field; plain text needs one segment per line. The default is the same ALPACA_PARQUET path; if it's missing the notebook falls back to a built-in sample so you can still verify the pipeline.
For larger datasets, mount a PVC or pull from the platform dataset repository — see Fine-tuning LLMs using Workbench.
Run a notebook
Each notebook follows the same shape:
- Environment check — confirms
torch_npu(ormindspore+msadapter),MindSpeed, andMindSpeed-LLMare importable, and that the available NPU count matchesTP × PP. - Dataset prep — converts parquet → JSONL (or falls back to a built-in sample for pretraining).
- HF → MCore weight conversion — writes weights into a
TP/PP-specific output directory. - Data preprocessing — generates the
.bin/.idxfiles MindSpeed needs. - Training —
posttrain_gpt.pyfor SFT,pretrain_gpt.pyfor pretraining. The MindSpore notebook usesmsrunwith--ai-framework mindspore --ckpt-format msadapterand writes logs to.../logs. - Validation — checks
latest_checkpointed_iteration.txt, listsiter_*directories, and (for PyTorch SFT) runs a quick inference smoke.
The MindSpore SFT notebook mirrors the upstream Qwen3 path:
examples/mindspore/qwen3/ckpt_convert_qwen3_hf2mcore.shexamples/mindspore/qwen3/data_convert_qwen3_instruction.shexamples/mindspore/qwen3/tune_qwen3_0point6b_4K_full_ms.sh
Default MindSpore parameters: TP=1, PP=1, MBS=2, SEQ_LENGTH=2048, TRAIN_ITERS=100, ENABLE_THINKING=true.
Parameters to review
Common across notebooks:
HF_MODEL_DIR,OUTPUT_DIRALPACA_PARQUETorRAW_DATA_FILETP,PP(andMBSfor MindSpore)SEQ_LENGTH,TRAIN_ITERSENABLE_THINKING(Qwen3)MASTER_ADDR,MASTER_PORT,NNODES,NODE_RANK(multi-node MindSpore)
For real runs, raise SEQ_LENGTH to your model's context window, raise TRAIN_ITERS to a production value, and adjust parallelism / batch size to fit available NPUs and dataset size. If you change TP or PP, rerun weight conversion so the checkpoint layout matches.
Output paths
Keep these on persistent storage. To push the result to the model repository, use the Git LFS workflow in Upload Models Using Notebook.
Notes
- The notebooks run full-parameter SFT — not LoRA. Treat the defaults as smoke tests and tune before serious runs.
- The MindSpore notebook validates checkpoint generation only; it doesn't include a stable inference step.
- For offline / restricted clusters: pre-stage the
MindSpeed-LLMrepo and model / dataset files in the workspace (PyTorch notebooks). The MindSpore notebook uses only the bundled source tree.