[ST][MS][NET][transformer][910 32p]Accuracy[27] can not reach 27.5

name	about	labels
Bug Report	Use this template for reporting a bug	kind/bug

Describe the current behavior / 问题描述 (Mandatory / 必填)

transformer网络脚本地址：https://gitee.com/mindspore/models/tree/master/official/nlp/Transformer
transformer网络在910 32p环境训练,推理精度27 达不到27.5

Environment / 环境信息 (Mandatory / 必填)

Hardware Environment(Ascend/GPU/CPU) / 硬件环境:

Please delete the backend not involved / 请删除不涉及的后端:
/device ascend 4机 32p

Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :
-- Python version (e.g., Python 3.7.5) :
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
Run包：HISI_C29/20230413
Mindspore版本：r2.0_20230424161532_93c1b983
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative/Graph):

Please delete the mode not involved / 请删除不涉及的模式:
/mode graph

Related testcase / 关联用例 (Mandatory / 必填)

用例目录：solution_test/remaining/test_scripts/mindspore/net/transformer/network
用例：test_ms_transformer_wmt_english_german_train_infer_910_32p_0001.py

Steps to reproduce the issue / 重现步骤 (Mandatory / 必填)

get code from models
cd models/official/nlp/Transformer
sh scripts/run_distribute_train_ascend_multi_machines.sh 32 0 52 ./transformer_mti/ende-1128-mindrecord ./hccl_32p.json ./default_config_large.yaml
验证网络是否训练成功
python eval.py --config_path=./default_config_large.yaml
验证网络是否推理成功
sh scripts/process_output.sh ./WMT-English-German/data/newstest2014.tok.de ./eval_output ./WMT-English-German/data/vocab.bpe.32000
perl multi-bleu.perl ./WMT-English-German/data/newstest2014.tok.de.forbleu < ./eval_output.forbleu
验证精度是否达到27.5

Describe the expected behavior / 预期结果 (Mandatory / 必填)

网络训练成功，推理成功，推理精度能达到27.5

Related log / screenshot / 日志 / 截图 (Mandatory / 必填)

不涉及

Special notes for this issue/备注 (Optional / 选填)

走给何茂华

Please assign maintainer to check this issue.
请为此issue分配处理人。
@zhongjicheng

Please add labels (comp or sig), also you can visit https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md to find more.
为了让代码尽快被审核，请您为Pull Request打上 组件(comp)或兴趣组(sig) 标签，打上标签的PR可直接推送给责任人进行审核。
更多的标签可以查看https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md
以组件相关代码提交为例，如果你提交的是data组件代码，你可以这样评论：
//comp/data
当然你也可以邀请data SIG组来审核代码，可以这样写：
//sig/data
另外你还可以给这个PR标记类型，例如是bugfix或者是特性需求：
//kind/bug or //kind/feature
恭喜你，你已经学会了使用命令来打标签，接下来就在下面的评论里打上标签吧！

32p不能用8p的参数跑精度，如需进行训练，需调整超参

昇腾测试能跑到29，经过对比发现是数据切分和运行方式有差别，他们是按8进行切分，也就是说一台机器分到的数据量还是之前单机的量，我们也从他们的运行时间证明了一点，他们运行总时长和我们8p差不多，我们是按32进行切分的，所以精度只有达到27.3左右，本来32p用的就是8p的参数，再用上和8p一样的数据量，精度就能达到28.7以上，因为我们8p精度也是可以达到这么多的；
昇腾修改成32之后，还是能跑到27.7以上，我们只能跑到27.3左右，后续用昇腾的运行方式，能跑到27.5，暂时定位是运行方式的差别，没有做大量的验证，昇腾是通过8p用例几乎同一时间在4台机器上运行，我们则是通过多机用例运行。

2023-5-25 ccb结论：之前32p没有看护过，32p用的是8p参数，对32p进行调参，精度能达到28.7以上之后再做看护

GVP MindSpore / mindspore

内容风险标识