name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
efficientnet-b3 imagenet2012数据集在GPU V100环境,将batch_size改为256(与竞品对齐),graph、pynative两个模式都报内存不足
模型地址:https://gitee.com/mindspore/models/tree/master/official/cv/Efficientnet/efficientnet-b3
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device /GPU/
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :
-- Python version (e.g., Python 3.7.5) :
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
mindspore:2.0.0.20221220
commit_id:470b760e
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative
/Graph
):
Please delete the mode not involved / 请删除不涉及的模式:
/mode pynative
/mode graph
用例仓地址:solution_test/cases/02network/00cv/efficientnetB3/train
用例:test_ms_efficientnetb3_imagenet2012_gpu_check_fps_1p_0001.py
test_ms_efficientnetb3_imagenet2012_gpu_check_loss_8p_0002.py
修改完batch_size,可以正常训练
[ERROR] DEVICE(92536,7f7485fff700,python):2023-01-10-16:40:36.975.732 [mindspore/ccsrc/runtime/pynative/op_executor.cc:174] WorkerLoop] Run lazy task failed, error message:Malloc for kernel input failed, Memory isn't enough, node:Default/Conv2D-op4
----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/ccsrc/runtime/pynative/run_op_helper.cc:464 LaunchKernels
[WARNING] MD(92536,7f76bb7d8740,python):2023-01-10-16:40:37.029.710 [mindspore/ccsrc/minddata/dataset/engine/datasetops/data_queue_op.cc:93] ~DataQueueOp] preprocess_batch: 4; batch_queue: 0, 0, 0, 0, 0, 0, 0, 0, 16; push_start_time: 2023-01-10-16:40:31.580.877, 2023-01-10-16:40:32.067.214, 2023-01-10-16:40:32.185.923, 2023-01-10-16:40:32.354.475; push_end_time: 2023-01-10-16:40:31.581.664, 2023-01-10-16:40:32.067.924, 2023-01-10-16:40:32.185.948, 2023-01-10-16:40:36.992.537.
Traceback (most recent call last):
File "./train.py", line 155, in <module>
model.train(config.epoch_size, dataset, callbacks=cb, dataset_sink_mode=True, sink_size=100)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 1051, in train
initial_epoch=initial_epoch)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 98, in wrapper
func(self, *args, **kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 625, in _train
cb_params, sink_size, initial_epoch, valid_infos)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 703, in _train_dataset_sink_process
outputs = train_network(*inputs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 644, in __call__
raise err
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 640, in __call__
output = self._run_construct(cast_inputs, kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 425, in _run_construct
output = self.construct(*cast_inputs, **kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/dataset_helper.py", line 107, in construct
return self.network(*outputs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 644, in __call__
raise err
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 640, in __call__
output = self._run_construct(cast_inputs, kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 425, in _run_construct
output = self.construct(*cast_inputs, **kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/wrap/loss_scale.py", line 336, in construct
loss = self.network(*inputs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 644, in __call__
raise err
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 640, in __call__
output = self._run_construct(cast_inputs, kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 425, in _run_construct
output = self.construct(*cast_inputs, **kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/amp.py", line 244, in construct
out = self._backbone(data)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 644, in __call__
raise err
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 640, in __call__
output = self._run_construct(cast_inputs, kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 425, in _run_construct
output = self.construct(*cast_inputs, **kwargs)
File "/home/jenkins0/solution_test/cases/02network/00cv/efficientnetB3/train/test_ms_efficientnetb3_imagenet2012_gpu_check_fps_1p_0001/scripts/train_standalone/src/models/effnet.py", line 125, in construct
x = self.blocks(stem)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 644, in __call__
raise err
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 640, in __call__
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 640, in __call__
output = self._run_construct(cast_inputs, kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 425, in _run_construct
output = self.construct(*cast_inputs, **kwargs)
File "/home/jenkins0/solution_test/cases/02network/00cv/efficientnetB3/train/test_ms_efficientnetb3_imagenet2012_gpu_check_fps_1p_0001/scripts/train_standalone/src/models/effnet.py", line 77, in construct
return self.layers(x)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 644, in __call__
raise err
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 640, in __call__
output = self._run_construct(cast_inputs, kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 425, in _run_construct
output = self.construct(*cast_inputs, **kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/layer/container.py", line 279, in construct
input_data = cell(input_data)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 644, in __call__
raise err
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 640, in __call__
output = self._run_construct(cast_inputs, kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 425, in _run_construct
output = self.construct(*cast_inputs, **kwargs)
File "/home/jenkins0/solution_test/cases/02network/00cv/efficientnetB3/train/test_ms_efficientnetb3_imagenet2012_gpu_check_fps_1p_0001/scripts/train_standalone/src/models/effnet.py", line 61, in construct
x = self.project_conv(x)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 644, in __call__
raise err
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 640, in __call__
output = self._run_construct(cast_inputs, kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 425, in _run_construct
output = self.construct(*cast_inputs, **kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/layer/container.py", line 279, in construct
input_data = cell(input_data)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 644, in __call__
raise err
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 640, in __call__
output = self._run_construct(cast_inputs, kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 425, in _run_construct
output = self.construct(*cast_inputs, **kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/layer/normalization.py", line 191, in construct
self.moving_variance)[0]
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 296, in __call__
return _run_op(self, self.name, args)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 98, in wrapper
results = fn(*arg, **kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 733, in _run_op
output = real_run_op(obj, op_name, args)
RuntimeError: Malloc for kernel input failed, Memory isn't enough, node:Default/Conv2D-op4
----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/ccsrc/runtime/pynative/run_op_helper.cc:464 LaunchKernels
走给安正气
Please assign maintainer to check this issue.
请为此issue分配处理人。
@zhangjie18
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
Please add labels (comp or sig), also you can visit https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md to find more.
为了让代码尽快被审核,请您为Pull Request打上 组件(comp)或兴趣组(sig) 标签,打上标签的PR可直接推送给责任人进行审核。
更多的标签可以查看https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md
以组件相关代码提交为例,如果你提交的是data组件代码,你可以这样评论:
//comp/data
当然你也可以邀请data SIG组来审核代码,可以这样写:
//sig/data
另外你还可以给这个PR标记类型,例如是bugfix或者是特性需求:
//kind/bug or //kind/feature
恭喜你,你已经学会了使用命令来打标签,接下来就在下面的评论里打上标签吧!
张杰负责的网络,走给张杰
网络维护和算法套件已分工,涉及竞品相关的问题单由算法团队承接 --- 套件组 & 维护组 & 版本PM已沟通
CCB结论:与竞品测试问题,先加rfc标签
登录 后才可以发表评论