同步操作将从 Ascend/ModelLink 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
Here's a hardware summary of pre-training Baichuan-7B:
Hardware | Value |
---|---|
NPU | 8 x Ascend NPUs |
Here's a software summary of pre-training Baichuan-7B:
Software | Version | link |
---|---|---|
Python | 3.7.16 | - |
driver | 23.0.RC3.B050 | link |
firmware | 7.0.t8.0.b214 | link |
CANN | Ascend-cann-toolkit-7.0.T8-linux | link |
binary arithmetic package | Ascend-cann-kernels-XXX_7.0.T8_linux | link |
torch | 1.11.0 | link |
torch_npu | 1.11.0.post4-20230915 | link |
git clone https://gitee.com/ascend/AscendSpeed.git
cd AscendSpeed
mkdir logs
mkdir ckpt
# python3.7
conda create -n test python=3.7
conda activate test
# install torch and torch_npu
pip install torch-1.11.0-cp37-cp37m-manylinux2014_aarch64.whl
pip install torch_npu-1.11.0.post4_XXXXXX-cp37-cp37m-linux_aarch64.whl
pip install apex-0.1_ascend_XXXXXX-cp37-cp37m-linux_aarch64.whl
# install megatron-core
pip3 install -e git+https://github.com/NVIDIA/Megatron-LM.git@23.05#egg=megatron-core
# install deepspeed and deepspeed_npu
pip install deepspeed==0.9.2
git clone https://gitee.com/ascend/DeepSpeed.git -b v0.9.2 deepspeed_npu
cd deepspeed_npu
pip3 install -e ./
cd ..
# install other packages
pip install -r requirements.txt
#!/bin/bash
mkdir tokenizer
cd ./tokenizer
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/config.json
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/generation_config.json
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/special_tokens_map.json
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenization_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer.model
wget https://huggingface.co/baichuan-inc/Baichuan-7B/resolve/main/tokenizer_config.json
cd ..
Download the Baichuan-7B datasets from here
# download datasets
mkdir dataset_baichuan
cd ./dataset_baichuan
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
cd ..
# process datasets
python ./tools/preprocess_data.py \
--input ./dataset_baichuan/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
--tokenizer-name-or-path ./tokenizer \
--output-prefix ./dataset_baichuan/alpaca \
--workers 4 \
--log-interval 1000 \
--tokenizer-type PretrainedFromHF
# modify the script according to your own ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# modify script orign dataset path according to your own dataset path
TOKENIZER_PATH=./tokenizer/ #tokenizer path
DATA_PATH=./dataset_baichuan/alpaca_text_document #processed dataset
bash examples/baichuan/pretrain_baichuan_zero_7B.sh
The performance of Baichuan-7B in Ascend NPU and Reference:
Device | Model | total Iterations | throughput rate (samples/s/p) | throughput rate (tokens/s/p) | single-step time (s/step) | floating point operation (TFLOPs/s) |
---|---|---|---|---|---|---|
NPUs | Baichuan-7B | 1024 | 3.250 | 1914 | 2.14 | 102.69 |
Reference | Baichuan-7B | 1024 | 3.978 | 2068 | 1.98 | 125.66 |
NPU vs Reference loss.
The NPU runs smoothly, the resource usage is stable, no errors are reported in the middle of the process, the Loss is on a decreasing trend, and the convergence speed is as expected. The relative error of the average loss is 0.01093, less than 2%, the maximum relative error is 0.1243, and the maximum absolute error is 0.4859. The precision meets the requirements.
NPU vs Reference loss relative error.
Here's a hardware summary of pre-training Baichuan-13B:
Hardware | Value |
---|---|
NPU | 8 x Ascend NPUs |
Here's a software summary of pre-training Baichuan-13B:
Software | Version | link |
---|---|---|
Python | 3.7.16 | - |
driver | 23.0.RC3.B050 | link |
firmware | 7.0.t8.0.b214 | link |
CANN | Ascend-cann-toolkit-7.0.T8-linux | link |
binary arithmetic package | Ascend-cann-kernels-XXX_7.0.T8_linux | link |
torch | 1.11.0 | link |
torch_npu | 1.11.0.post4-20230915 | link |
git clone https://gitee.com/ascend/AscendSpeed.git
cd AscendSpeed
mkdir logs
mkdir ckpt
# python3.7
conda create -n test python=3.7
conda activate test
# install torch and torch_npu
pip install torch-1.11.0-cp37-cp37m-manylinux2014_aarch64.whl
pip install torch_npu-1.11.0.post4_XXXXXX-cp37-cp37m-linux_aarch64.whl
pip install apex-0.1_ascend_XXXXXX-cp37-cp37m-linux_aarch64.whl
#install megatron
git clone https://github.com/NVIDIA/Megatron-LM.git -b 23.05
cd Megatron-LM
pip3 install -e ./
# install deepspeed and deepspeed_npu
pip install deepspeed==0.9.2
git clone https://gitee.com/ascend/DeepSpeed.git -b v0.9.2 deepspeed_npu
cd deepspeed_npu
pip3 install -e ./
cd ..
# install other packages
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
Download the Baichuan-13B checkpoint from here
mkdir tokenizer
cd ./tokenizer
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/resolve/main/config.json
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/resolve/main/configuration_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/resolve/main/generation_config.json
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/resolve/main/modeling_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/resolve/main/pytorch_model-00001-of-00003.bin
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/resolve/main/pytorch_model-00002-of-00003.bin
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/resolve/main/pytorch_model-00003-of-00003.bin
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/resolve/main/pytorch_model.bin.index.json
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/resolve/main/quantizer.py
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/resolve/main/special_tokens_map.json
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/resolve/main/tokenization_baichuan.py
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/resolve/main/tokenizer_config.json
wget https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/resolve/main/tokenizer.model
cd ..
In order to adapt to the baichuan-13B model, the following script is used to convert the model pre-training weights.
mkdir model_weights
SCRIPT_PATH=./tools/ckpt_convert/llama/convert_weights_from_huggingface.py
python $SCRIPT_PATH \
--input-model-dir ./tokenizer \
--output-model-dir ./model_weights \
--tensor-model-parallel-size 8 \
--pipeline-model-parallel-size 1 \
--make-vocab-size-divisible-by 1 \
--type 13B \
--pse True
mkdir dataset_baichuan
mkdir model_save
cd ./dataset_baichuan
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
cd ..
#!/bin/bash
python ./tools/preprocess_data.py \
--input ./dataset_baichuan/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
--tokenizer-name-or-path ./tokenizer \
--output-prefix ./dataset_baichuan/alpaca \
--workers 4 \
--log-interval 1000 \
--tokenizer-type PretrainedFromHF
# modify the script according to your own ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# modify script orign dataset path according to your own dataset path
TOKENIZER_PATH=./tokenizer/
DATA_PATH=./dataset_baichuan/aplaca_text_document
LOAD_PATH=./model_weights
CHECKPOINT_PATH=./ckpt
bash examples/baichuan/pretrain_baichuan_ptd_13B.sh
There is an hourly pulse checking script running that checks that the training is either running or scheduled.
The performance of the Baichuan-13B in Ascend NPU and Reference:
Device | Model | total Iterations | throughput rate (samples/s/p) | throughput rate (tokens/s/p) | single-step time (s/step) | floating point operation (TFLOPs/s) |
---|---|---|---|---|---|---|
NPUs | Baichuan-13B | 1000 | 1.928 | 1024 | 16.067 | 89.37 |
Reference | Baichuan-13B | 1000 | 1.535 | 862 | 19.852 | 72.39 |
NPU vs Reference loss.
The NPU runs smoothly, the resource usage is stable, no errors are reported in the middle of the process, the Loss is on a decreasing trend, and the convergence speed is as expected. The relative error of the average loss is 0.00725, less than 2%, the maximum relative error is 0.01978, and the maximum absolute error is 0.10811. The precision meets the requirements.
NPU vs Reference loss relative error.
The relative error between NPU and Reference Loss is less than 0.02 throughout, as expected.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。