2.4K Star 8.2K Fork 4.4K

GVPMindSpore / mindspore

盘古135B loss不收敛

device/ascend
gitee
master

[CT][MS][parallel]跟踪高精度通信可配置在rc2版本的静态告警清零

kind/bug
v2.3.0
sig/parallel
clouddragon
master
#I9JPTE zhouyaqiang0 3
负责人: zhouyaqiang0

[CT][MS][reinforcement]test_reinforcement_mcts_001在gpu graph下执行core

kind/bug
attr/function
sig/parallel
v2.3.0
foruda
rct/bugfix
rca/others
ctl/componenttest
#I923OM 杨凯璐 4
负责人: 杨凯璐

[ST][MS][2.2][910B][wide_deep&ps模型]网络训练失败,EmbeddingLookup算子报错

kind/bug
v2.2.0
attr/function
stage/func-debug
sig/modelzoo
rct/cann
v2.2.10
foruda
#I842I5 zhangjie18 9
负责人: zhongjicheng

The result of Pangu3.0 in MindSpore 2.1 B060 forward process with model paral...

kind/bug
v2.2.0
ctl/componenttest
rca/algorithm
rct/newfeature
foruda
#I7LIWZ 刘崇鸣 4
负责人: 6579380 liuchongming74 1593503138刘崇鸣

[ST][MS][盘古]设置batch_size为8,盘古网络会自动开启副本,导致自动并行性能下降,请在官网API进行说明

attr/function
kind/maintenance
usability
v2.1.0
sig/parallel
rct/refactor
rca/algorithm
ctl/doctest
gitee
foruda
#I7H58X 陶青 4
负责人: 陶青

[ST][MS][master][Bert_large][ascend][多机]网络训练失败

attr/function
kind/bug
stage/func-debug
v2.1.0
sig/parallel
rca/others
ctl/solutiontest
rct/bugfix
foruda
#I7BFIP sunjiawei999 4
负责人: sunjiawei999

Modelzoo模型lpcnet GPU推理失败

sig/ops
v2.0.0.rc1
#I694HP zhangyongxian 5
负责人: zhangyongxian

Resnet50 ge流程自动并行执行失败,AllReduce算子fusion属性校验失败

kind/bug
rct/cann
ctl/componenttest
rct/bugfix
rca/others
#I57AOG wuweikang 3
负责人: huangxinjing

[MS][Parallel][support_8n]24p pangu ascend 半自动并行训练偶现gather算子踩内存

attr/function
stage/func-debug
kind/bug
sig/parallel
ccb/bug
v2.0.0.rc1
#I5310Q liyanjun 6
负责人: 6575306 alouhahahahaha 1584445301wangjun

HCCL ERROR

kind/bug
mindspore-assistant
#I4E3C8 harasuki 5
负责人: lichen
Python
1
https://gitee.com/mindspore/mindspore.git
git@gitee.com:mindspore/mindspore.git
mindspore
mindspore
mindspore

搜索帮助