name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
Hardware Environment(Ascend
/GPU
/CPU
):
/device ascend
Software Environment:
-- MindSpore version (source or binary):
-- Python version (e.g., Python 3.7.5):
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
import mindspore.ops.composite as C
from mindspore import context
from mindspore import Tensor
import mindspore as ms
from mindspore.common.api import ms_function
context.set_context(mode=context.GRAPH_MODE, save_graphs=True, save_graphs_path='./tir')
grad_by_all = C.GradOperation(get_all=True)
ONE = Tensor(1,ms.int32)
ZERO = Tensor(0,ms.int32)
@ms_function
def fibonacci(n):
if(n < 1):
return ZERO
elif(n == 1):
return ONE
else:
return fibonacci(n-1) + fibonacci(n-2)
x=Tensor(5,ms.int32)
print(x)
y = fibonacci(x)
print(y)
## Steps to reproduce the issue
1. python fibonacci.py
2. [ERROR] GE(108109,python):2021-05-08-15:57:50.620.928 [mindspore/ccsrc/runtime/device/ascend/ge_runtime/runtime_model.cc:231] Run] Call rt api rtStreamSynchronize failed, ret: 7bc83
3.
## Describe the current behavior
## Describe the expected behavior
## Related log / screenshot
## Special notes for this issue
Hey lanzhineng, Welcome to MindSpore Community.
All of the projects in MindSpore Community are maintained by @mindspore-ci-bot.
That means the developers can comment below every pull request or issue to trigger Bot Commands.
Please follow instructions at https://gitee.com/mindspore/community/blob/master/command.md to find the details.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
Please add labels (comp or sig), for example, if you found an issue in data component, you can type "//comp/data" in comment, also you can visit "https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md" to find more.
为了让问题更快得到响应,请您为该issue打上组件(comp)或兴趣组(sig)标签,例如,当你遇到有关data组件的问题时,你可以在评论中输入 "//comp/data", 这样issue会被打上"comp/data"标签,问题会分配给相应责任人更多的标签可以查看https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md"
import mindspore.ops.composite as C
from mindspore import context
from mindspore import Tensor
import mindspore as ms
from mindspore.common.api import ms_function
context.set_context(mode=context.GRAPH_MODE, save_graphs=True, save_graphs_path='./tir')
grad_by_all = C.GradOperation(get_all=True)
ONE = Tensor(1,ms.int32)
ZERO = Tensor(0,ms.int32)
@ms_function
def fibonacci(n):
if(n < 1):
return ZERO
elif(n == 1):
return ONE
else:
return fibonacci(n-1) + fibonacci(n-2)
x=Tensor(5,ms.int32)
print(x)
y = fibonacci(x)
print(y)
把 常量改Tensor ,前端生成图是对的,在 图运行 卡死了。
(gdb) bt
#0 0x0000ffffbf697c38 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1 0x0000ffffa3e74048 in __gthread_cond_wait (__mutex=, __cond=)
at /home/isuru/miniforge3/conda-bld/ctng-compilers_1589429670044/work/.build/aarch64-conda_cos7-linux-gnu/build/build-cc-gcc-final/aarch64-conda_cos7-linux-gnu/libstdc++-v3/include/aarch64-conda_cos7-linux-gnu/bits/gthr-default.h:877
#2 std::condition_variable::wait (this=, __lock=...)
at /home/isuru/miniforge3/conda-bld/ctng-compilers_1589429670044/work/.build/aarch64-conda_cos7-linux-gnu/src/gcc/libstdc++-v3/src/c++11/condition_variable.cc:53
#3 0x0000ffffaaaa8dd8 in std::condition_variable::wait<mindspore::tensor::WaitEvent::Wait() const::{lambda()#1}>(std::unique_lockstd::mutex&, mindspore::tensor::WaitEvent::Wait() const::{lambda()#1}) (
this=0xaaaaad5898f0, __lock=..., __p=...) at /usr/include/c++/7.3.0/condition_variable:99
#4 0x0000ffffaaaa5770 in mindspore::tensor::WaitEvent::Wait (this=0xaaaaad5898b0)
at /ssd1/lzn/mindspore/mindspore/core/ir/tensor.h:92
#5 0x0000ffffaaaa5b88 in mindspore::tensor::Tensor::Wait (this=0xaaaaad5fc780)
at /ssd1/lzn/mindspore/mindspore/core/ir/tensor.h:327
#6 0x0000ffffac308478 in mindspore::TensorToPyData (tensor=...)
at /ssd1/lzn/mindspore/mindspore/ccsrc/utils/convert_utils_py.cc:46
#7 0x0000ffffac309924 in mindspore::ValuePtrToPyData (value=...)
at /ssd1/lzn/mindspore/mindspore/ccsrc/utils/convert_utils_py.cc:122
#8 0x0000ffffac30b4bc in mindspore::BaseRefToPyData (value=...)
at /ssd1/lzn/mindspore/mindspore/ccsrc/utils/convert_utils_py.cc:227
#9 0x0000ffffabd29e60 in mindspore::pipeline::ExecutorPy::Run (this=0xaaaaaaef13b0, args=...,
--Type for more, q to quit, c to continue without paging--
phase=...) at /ssd1/lzn/mindspore/mindspore/ccsrc/pipeline/jit/pipeline.cc:934
subgraph @19_5_✗✗fibonacci.67(%para3_n) {
%0([CNode]37) = Sub(%para3_n, Tensor(shape=[], dtype=Int32, value= 1)) primitive_attrs: {output_names: [output], input_names: [x, y]}
: (<Tensor[Int32]x[const vector][]>, <Tensor[Int32]x[const vector][]>) -> (<Tensor[Int32]x[const vector][]>)
# In file /ssd1/lzn/mindspore/mindspore/ops/composite/multitype_ops/sub_impl.py(50)/ return F.tensor_sub(x, y)/
%1([CNode]11) = call @15_1_fibonacci.65(%0)
: (<Tensor[Int32]x[const vector][]>) -> (<Tensor[Int32]x[const vector][]>)
# In file fib.py(18)/ return fibonacci(n-1) + fibonacci(n-2)/
%2([CNode]37) = Sub(%para3_n, Tensor(shape=[], dtype=Int32, value= 2)) primitive_attrs: {output_names: [output], input_names: [x, y]}
: (<Tensor[Int32]x[const vector][]>, <Tensor[Int32]x[const vector][]>) -> (<Tensor[Int32]x[const vector][]>)
# In file /ssd1/lzn/mindspore/mindspore/ops/composite/multitype_ops/sub_impl.py(50)/ return F.tensor_sub(x, y)/
%3([CNode]8) = call @15_1_fibonacci.65(%2)
: (<Tensor[Int32]x[const vector][]>) -> (<Tensor[Int32]x[const vector][]>)
# In file fib.py(18)/ return fibonacci(n-1) + fibonacci(n-2)/
%4([CNode]39) = Add(%1, %3) primitive_attrs: {output_names: [output], input_names: [x, y]}
: (<Tensor[Int32]x[const vector][]>, <Tensor[Int32]x[const vector][]>) -> (<Tensor[Int32]x[const vector][]>)
# In file /ssd1/lzn/mindspore/mindspore/ops/composite/multitype_ops/add_impl.py(129)/ return F.add(x, y)/
Return(%4)
: (<Tensor[Int32]x[const vector][]>)
# In file fib.py(18)/ return fibonacci(n-1) + fibonacci(n-2)/
}
11_validate_0073.ir 图是对的。
import mindspore.ops.composite as C
from mindspore import context
from mindspore import Tensor
import mindspore as ms
from mindspore.common.api import ms_function
context.set_context(mode=context.GRAPH_MODE, save_graphs=True, save_graphs_path='./tir')
grad_by_all = C.GradOperation(get_all=True)
ONE = Tensor(1,ms.int32)
ZERO = Tensor(0,ms.int32)
@ms_function
def fibonacci(n):
if(n < 1):
return 0
elif(n == 1):
return 1
else:
return fibonacci(n-1) + fibonacci(n-2)
x=Tensor(5,ms.int32)
print(x)
y = fibonacci(x)
print(y)
(base) lzn@dggphispre18279:~/tests$ python fib.py
5
[WARNING] DEBUG(340,python):2021-05-08-14:44:47.702.218 [mindspore/ccsrc/debug/debugger/debugger.cc:80] Debugger] Not enabling debugger. Debugger does not support CPU.
[WARNING] CORE(340,python):2021-05-08-14:44:47.884.654 [mindspore/core/ir/anf_extends.cc:62] fullname_with_scope] Input 0 of cnode is not a value node, its type is CNode.
2
结果不对
问题在于 计算结果为标量的时候没有泛化对
subgraph @16_5_✗✗fibonacci.58() {
Return(2)
: ()
# In file /home/lzn/tests/fib.py(18)/ return fibonacci(n-1) + fibonacci(n-2)/
}
这里应该返回调用子图,而不是 常量。
改成这样也还是跑不通,这跟标量泛化关系不太大。
改成这样可通。
报错是后端的,看上去是控制流输出标量时,后端可能不支持,需要后端 同事一起看看 。
(ci3.7) [root@bms-aiserver-pod12-170-21 test]# python fib.py
5
[ERROR] GE(114356,python):2021-05-08-16:01:32.880.257 [mindspore/ccsrc/runtime/device/ascend/ge_runtime/runtime_model.cc:231] Run] Call rt api rtStreamSynchronize failed, ret: 7bc83
[ERROR] DEVICE(114356,python):2021-05-08-16:01:32.880.602 [mindspore/ccsrc/runtime/device/ascend/ascend_kernel_runtime.cc:623] DumpTaskExceptionInfo] Task fail infos task_id: 4, stream_id: 3, tid: 114468, device_id: 4, retcode: 507011
[ERROR] DEVICE(114356,python):2021-05-08-16:01:32.880.633 [mindspore/ccsrc/runtime/device/ascend/ascend_kernel_runtime.cc:632] DumpTaskExceptionInfo] Dump node (Default/StackPush-op41) task error input/output data to: ./task_error_dump/4 trace:
[ERROR] SESSION(114356,python):2021-05-08-16:01:32.894.691 [mindspore/ccsrc/backend/session/ascend_session.cc:1199] Execute] run task error!
标量问题,前端解决。Tensor 问题 请后端解决。
打开export ASCEND_SLOG_PRINT_TO_STDOUT=1
,发现是StackPush
算子报错导致的Run task error
:
请AICPU算子同事 @yanzhenxiang2020 帮忙进一步分析算子报错原因。
05.11 进展:
StackPush
报错StackPush
算子无法找到index为0的栈,通过执行序发现是由于StackInit
和StackDestroy
算子位置不对;start_label_
和end_goto_
,插入的StackInit
和StackDestroy
顺序就不会在生成执行序时被修改;但如果根图有start_label_
和end_goto_
,生成执行序时StackInit
等算子顺序就会被调整。如上修改完成后报错如下:
分析执行序及后端构图后,发现本用例较为特殊:
LabelSwitch
Why add an comp/akg label?
Why add an comp/akg label?
@anyrenwei 不小心点到的
@anyrenwei 不小心点到的
@liangzelang Ok..我正在看akg相关的issue,还以为这个case也跟akg这边有关系
import mindspore.ops.composite as C
from mindspore import context
from mindspore import Tensor
import mindspore as ms
from mindspore.common.api import ms_function
context.set_context(mode=context.GRAPH_MODE, save_graphs=True, save_graphs_path='./tir')
grad_by_all = C.GradOperation(get_all=True)
ONE = Tensor(1,ms.int32)
ZERO = Tensor(0,ms.int32)
@ms_function
def f(x):
def fibonacci(n):
if(n < 1):
return 0
elif(n == 1):
return 1
else:
return fibonacci(n-1) + fibonacci(n-2)
x=Tensor(5,ms.int32)
print(x)
y = f(x)
print(y)
验证OK
登录 后才可以发表评论