2.3K Star 8K Fork 4.2K

GVPMindSpore / mindspore

 / 详情

[CI] ST probabilistic core dump in gate<test_topk_op.py, test_math_ops.py>

DONE
Bug-Report
创建于  
2021-10-21 19:32
name about labels
Bug Report Use this template for reporting a bug kind/bug

Environment

  • Hardware Environment(Ascend/GPU/CPU):

Uncomment only one /device <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/device cpu

  • Software Environment:
    -- MindSpore version (source or binary):
    -- Python version (e.g., Python 3.7.5):
    -- OS platform and distribution (e.g., Linux Ubuntu 16.04):
    -- GCC/Compiler version (if compiled from source):

Related testcase

test_math_ops.py::test_logaddexp
test_topk_op.py::test_topk

Steps to reproduce the issue

  1. code compile
  2. install mindspore*.whl
  3. run testcase

Describe the current behavior

Testcase probabilistic core dump

Describe the expected behavior

These testcases run success

Related log / screenshot

URL: https://build.mindspore.cn/blue/organizations/jenkins/MindSpore_Gitee_Gate/detail/MindSpore_Gitee_Gate/106167/pipeline
https://build.mindspore.cn/blue/organizations/jenkins/MindSpore_Gitee_Gate/detail/MindSpore_Gitee_Gate/106150/pipeline/613
输入图片说明
输入图片说明

Special notes for this issue

cpu用例概率性失败,导致门禁堵塞

评论 (8)

wmzheng2020 优先级设置为严重
wmzheng2020 创建了Bug-Report
wmzheng2020 负责人设置为范吉斌
wmzheng2020 关联仓库设置为MindSpore/mindspore

Please add labels (comp or sig),also you can visit "https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md" to find more.
为了让问题更快得到响应,请您为该issue打上 组件(comp)或兴趣组(sig) 标签,打上标签的问题可以直接推送给责任人进行处理。更多的标签可以查看
https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md
以组件问题为例,如果你发现问题是data组件造成的,你可以这样评论:
//comp/data
当然你也可以向data SIG组求助,可以这样写:
//comp/data
//sig/data
如果是一个简单的问题,你可以留给刚进入社区的小伙伴来回答,这时候你可以这样写:
//good-first-issue
恭喜你,你已经学会了使用命令来打标签,接下来就在下面的评论里打上标签吧!

i-robot 添加了
 
kind/bug
标签
wmzheng2020 添加协作者杨林枫
wmzheng2020 计划开始日期设置为2021-10-21
wmzheng2020 计划截止日期设置为2021-10-22
wmzheng2020 关联分支设置为master
wmzheng2020 修改了标题
wmzheng2020 修改了描述
wmzheng2020 添加了
 
kind/occasionally
标签

每次都挂在不同用例,非cpu算子内部问题,大概率框架流程问题,转给黎明奇继续处理。

范吉斌 添加协作者范吉斌
范吉斌 负责人范吉斌 修改为limingqi107
wmzheng2020 里程碑设置为B-SIG-Executor-GPU
limingqi107 任务状态TODO 修改为VALIDATION
limingqi107 添加协作者limingqi107
limingqi107 负责人limingqi107 修改为wmzheng2020

Appearance & Root Cause

pyNative单算子插入cast场景,会概率出现执行完后tensor先析构device_address,memoryManageActor再执行FreeMemory消息里访问了device_address导致出现core

Fix Solution

确保memoryManageActor再先执行完FreeMemory,再退出执行流程触发tensor析构,增加loopcountActor控制
!25472:fix the coredump probability of pyNative free memory

wmzheng2020 任务状态VALIDATION 修改为DONE

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
Python
1
https://gitee.com/mindspore/mindspore.git
git@gitee.com:mindspore/mindspore.git
mindspore
mindspore
mindspore

搜索帮助