2.3K Star 8.1K Fork 4.3K

GVPMindSpore / mindspore

 / 详情

[ST][MS][dyn][ascend]dynamic_shape场景,graph,数据下沉模式,在不设置set_inputs,且sink_size>1时,会出现异常。异常的信息不够明确

TODO
RFC
创建于  
2023-05-09 15:42
name about labels
Bug Report Use this template for reporting a bug kind/bug

Describe the current behavior / 问题描述 (Mandatory / 必填)

dynamic_shape场景,graph,数据下沉模式,在不设置set_inputs时,sink_size>1时,会出现异常。异常的信息不够明确

Environment / 环境信息 (Mandatory / 必填)

  • Hardware Environment(Ascend/GPU/CPU) / 硬件环境:

Please delete the backend not involved / 请删除不涉及的后端:
/device ascend

  • Software Environment / 软件环境 (Mandatory / 必填):
    -- MindSpore version (e.g., 1.7.0.Bxxx) :
    -- Python version (e.g., Python 3.7.5) :
    -- OS platform and distribution (e.g., Linux Ubuntu 16.04):
    -- GCC/Compiler version (if compiled from source):

  • Excute Mode / 执行模式 (Mandatory / 必填)(PyNative/Graph):

Please delete the mode not involved / 请删除不涉及的模式:

/mode graph

Related testcase / 关联用例 (Mandatory / 必填)

test_ms_dynamic_shape_h_rank_dy_not_set_inputs_0001

Steps to reproduce the issue / 重现步骤 (Mandatory / 必填)

  1. 构造动态shape自定义网络
  2. 开启数据下沉,sink_size设为>1
  3. graph模式执行训练

Describe the expected behavior / 预期结果 (Mandatory / 必填)

用例执行成功

Related log / screenshot / 日志 / 截图 (Mandatory / 必填)

[WARNING] DEVICE(130913,fffd758250f0,python):2023-05-06-14:51:28.952.297 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:776] DumpTaskExceptionInfo] GetNext error may be caused by slow data processing (bigger than 20s / batch) or transfer data to device error.
[WARNING] DEVICE(130913,fffd758250f0,python):2023-05-06-14:51:28.952.315 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:778] DumpTaskExceptionInfo] Suggestion: 
[WARNING] DEVICE(130913,fffd758250f0,python):2023-05-06-14:51:28.952.331 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:779] DumpTaskExceptionInfo]     1) Set the parameter dataset_sink_mode=False of model.train(...) or model.eval(...) and try again.
[WARNING] DEVICE(130913,fffd758250f0,python):2023-05-06-14:51:28.952.347 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:781] DumpTaskExceptionInfo]     2) Reduce the batch_size in data processing and try again.
[WARNING] DEVICE(130913,fffd758250f0,python):2023-05-06-14:51:28.952.363 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:782] DumpTaskExceptionInfo]     3) You can create iterator by interface create_dict_iterator() of dataset class to independently verify the performance of data processing without training. Refer to the link for data processing optimization suggestions: https://mindspore.cn/tutorials/experts/zh-CN/master/dataset/optimize.html
[WARNING] DEVICE(130913,fffd758250f0,python):2023-05-06-14:51:28.952.379 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:786] DumpTaskExceptionInfo]     4) If it is a dynamic dataset, please set the input to dynamic through `set_inputs`, or set sink_size to 1. It is recommended to use the former, because the latter has poor performance.
[CRITICAL] DEVICE(130913,fffd758250f0,python):2023-05-06-14:51:28.977.556 [mindspore/ccsrc/plugin/device/ascend/hal/hardware/ascend_graph_executor.cc:256] RunGraph] Run task for graph:kernel_graph_1 error! The details refer to 'Ascend Error Message'.
[WARNING] MD(130913,fffca6ffd0f0,python):2023-05-06-14:51:28.978.377 [mindspore/ccsrc/minddata/dataset/engine/datasetops/data_queue_op.cc:280] SendDataToAscend] Thread has already been terminated.
[ERROR] DEBUG(130913,ffff8b50a440,python):2023-05-06-14:51:29.269.453 [mindspore/ccsrc/debug/rdr/graph_recorder.cc:41] DumpIRProto] Open file '/home/jenkins0/workspace/TDT_deployment/solution_test/cases/03subject_test/02usability/model_develop/dynamic_shape/test_ms_dynamic_shape_hw_dy_not_set_inputs_0001_GRAPH_MODE/rank_0/rdr/SESSION.graph_build.0.20230506145102.pb' failed!
[ERROR] DEBUG(130913,ffff8b50a440,python):2023-05-06-14:51:29.569.766 [mindspore/ccsrc/debug/rdr/graph_recorder.cc:41] DumpIRProto] Open file '/home/jenkins0/workspace/TDT_deployment/solution_test/cases/03subject_test/02usability/model_develop/dynamic_shape/test_ms_dynamic_shape_hw_dy_not_set_inputs_0001_GRAPH_MODE/rank_0/rdr/SESSION.graph_build.1.20230506145127.pb' failed!
Traceback (most recent call last):
  File "../test_ms_dynamic_shape_hw_dy_not_set_inputs_0001_GRAPH_MODE/train_custom_single_input_net.py", line 76, in <module>
    train_net_with_model()
  File "../test_ms_dynamic_shape_hw_dy_not_set_inputs_0001_GRAPH_MODE/train_custom_single_input_net.py", line 63, in train_net_with_model
    sink_size=config.sink_size)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 1066, in train
    initial_epoch=initial_epoch)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 100, in wrapper
    func(self, *args, **kwargs)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 617, in _train
    cb_params, sink_size, initial_epoch, valid_infos)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 700, in _train_dataset_sink_process
    outputs = train_network(*inputs)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 620, in __call__
    out = self.compile_and_run(*args, **kwargs)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 942, in compile_and_run
    return _cell_graph_executor(self, *new_args, phase=self.phase)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 1439, in __call__
    return self.run(obj, *args, phase=phase)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 1478, in run
    return self._exec_pip(obj, *args, phase=phase_real)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 102, in wrapper
    results = fn(*arg, **kwargs)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 1458, in _exec_pip
    return self._graph_executor(args, phase)
RuntimeError: Run task for graph:kernel_graph_1 error! The details refer to 'Ascend Error Message'.

----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/ccsrc/plugin/device/ascend/hal/hardware/ascend_graph_executor.cc:256 RunGraph

Special notes for this issue/备注 (Optional / 选填)

评论 (7)

田桐 创建了Bug-Report
田桐 添加了
 
v2.1.0
标签
田桐 添加了
 
usability
标签
田桐 添加了
 
attr/function
标签
田桐 添加了
 
stage/func-debug
标签
田桐 添加了
 
kind/maintenance
标签
田桐 添加协作者leiwei2
展开全部操作日志

Please assign maintainer to check this issue.
请为此issue分配处理人。
@田桐

Please add labels (comp or sig), also you can visit https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md to find more.
为了让代码尽快被审核,请您为Pull Request打上 组件(comp)或兴趣组(sig) 标签,打上标签的PR可直接推送给责任人进行审核。
更多的标签可以查看https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md
以组件相关代码提交为例,如果你提交的是data组件代码,你可以这样评论:
//comp/data
当然你也可以邀请data SIG组来审核代码,可以这样写:
//sig/data
另外你还可以给这个PR标记类型,例如是bugfix或者是特性需求:
//kind/bug or //kind/feature
恭喜你,你已经学会了使用命令来打标签,接下来就在下面的评论里打上标签吧!

田桐 添加了
 
sig/ds
标签
田桐 修改了标题
田桐 修改了标题
田桐 修改了描述

test_dynamic_shape_train_dataset_change

未设置set_inputs, GetNext算子输出shape发生改变,该场景需要适配。
图模式动态shape问题,转2.2

leiwei2 添加了
 
v2.2.0
标签
leiwei2 移除了
 
v2.1.0
标签
leiwei2 移除了
 
v2.1.0
标签
leiwei2 添加了
 
v2.1.0
标签

TDT例会决策结论,2.1版本动态shape支持场景与2.0保持一致,主要支持动态图动态shape,静态图动态shape等场景问题单挂在2.2版本

chenfei_mindspore 移除了
 
v2.1.0
标签
chenfei_mindspore 移除了
 
v2.1.0
标签

当前gpu环境,图模式动态shape "不设SetInput" + "sink_size大于1" 场景,目前不支持动态shape。有以下问题
[MS][ST][DYN]gpu环境,graph模式,动态shape场景,构造nhw维度动态,不设置set_Inputs时,出现非法内存异常
https://e.gitee.com/mind_spore/dashboard?issue=I7BEWD

chenfei_mindspore 添加了
 
待CCB
标签
chenfei_mindspore 添加了
 
待CCB
标签
chenfei_mindspore 移除了
 
待CCB
标签
chenfei_mindspore 移除了
 
待CCB
标签
chenfei_mindspore 移除了
 
待CCB
标签
chenfei_mindspore 移除了
 
待CCB
标签
chenfei_mindspore 添加了
 
ccb/rfc
标签
chenfei_mindspore 添加了
 
ccb/rfc
标签
chenfei_mindspore 任务类型Bug-Report 修改为RFC
MissYuanZi 添加了
 
v2.3.0.rc2
标签
田桐 添加了
 
device/ascend
标签
田桐 添加了
 
device/ascend
标签
fangwenyi 移除了
 
v2.3.0.rc2
标签
fangwenyi 添加了
 
master
标签

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(6)
6521784 chenfei52 1584972569 5280992 chen tanjie 1645579326
Python
1
https://gitee.com/mindspore/mindspore.git
git@gitee.com:mindspore/mindspore.git
mindspore
mindspore
mindspore

搜索帮助