2.3K Star 8.1K Fork 4.3K

GVPMindSpore / mindspore

 / 详情

[CT][MS][parallel]pynative, Create multi group error

ACCEPTED
RFC
创建于  
2021-08-23 19:05
name about labels
Bug Report Use this template for reporting a bug kind/bug

Environment

  • Hardware Environment(Ascend/GPU/CPU):

Uncomment only one /device <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/device ascend

  • Software Environment:
    -- MindSpore version (source or binary):master
    -- Python version (e.g., Python 3.7.5):
    -- OS platform and distribution (e.g., Linux Ubuntu 16.04):
    -- GCC/Compiler version (if compiled from source):

Related testcase

test_get_group_rank_from_world_rank

Steps to reproduce the issue

  1. cd /home/wys/code/MindSporeTest/parallel/hcom_operations
  2. ../../share/parallel/tool/pytest_parallel.sh -r /root/mindspore/hccl/hccl_8p.json -s 8 -b 0 -e 7 -f test_get_group_rank_from_world_rank.py -t test_get_group_rank_from_world_rank
'''
TEST_SUMMARY:通信算子get_group_rank_from_world_rank正常场景测试,自定义group
'''

def test_get_group_rank_from_world_rank():
    with MetaFactory():
        create_group("my_group", [0,1,2,3,4,5,6,7])
        group = "my_group"
        group_rank_id = get_group_rank_from_world_rank(0,group)
        assert group_rank_id == 0

Describe the current behavior

pynative, Create multi group error

Describe the expected behavior

case pass

Related log / screenshot

def test_get_group_rank_from_world_rank():
    with MetaFactory():
      create_group("my_group", [0,1,2,3,4,5,6,7])

../test_get_group_rank_from_world_rank.py:16:


/root/archiconda3/envs/vm3.7/lib/python3.7/site-packages/mindspore/communication/management.py:321: in create_group
_create_group_helper(group, rank_ids, backend=GlobalComm.BACKEND)
/root/archiconda3/envs/vm3.7/lib/python3.7/site-packages/mindspore/communication/_comm_helper.py:149: in wrapper
return func(*args, **kargs)
/root/archiconda3/envs/vm3.7/lib/python3.7/site-packages/mindspore/communication/_comm_helper.py:355: in _create_group_helper
hccl.create_group(group, rank_size, rank_ids)


group = 'my_group', rank_num = 8, rank_ids = [0, 1, 2, 3, 4, 5, ...]

def create_group(group, rank_num, rank_ids):
    """
    Create group.

    A function that creates a collection communication group which includes 'rank_num'
    device and 'rank_ids' is the list of these ranks of devices.

    Note:
        The world group can not be created.

    Returns:
        None
    """
    check_group(group)
    check_rank_num(rank_num)
    if isinstance(rank_ids, (list)):
        if rank_num != len(rank_ids):
            raise ValueError('Rank number is not equal to the length of rank_ids.')
        for rank_id in rank_ids:
            if not isinstance(rank_id, (int)) or rank_id < 0:
                raise ValueError('Rank id must be unsigned integer!')
        c_array_rank_ids = c_array(ctypes.c_uint, rank_ids)
        c_rank_num = ctypes.c_uint(rank_num)
        c_group = c_str(group)
        ret = HCCL_LIB_CTYPES.HcomCreateGroup(c_group, c_rank_num, c_array_rank_ids)
        if ret != 0:
          raise RuntimeError('Create group error, the error code is ' + str(ret))

E RuntimeError: Create group error, the error code is 2

/root/archiconda3/envs/vm3.7/lib/python3.7/site-packages/mindspore/communication/_hccl_management.py:122: RuntimeError
=============================== warnings summary ===============================

Special notes for this issue

评论 (4)

mu_rong_meng 创建了Bug-Report
mu_rong_meng 关联分支设置为master
mu_rong_meng 计划截止日期设置为2021-12-31
mu_rong_meng 关联仓库设置为MindSpore/mindspore
mu_rong_meng 负责人设置为caifubi
mu_rong_meng 里程碑设置为B-VM-PYNATIVE
mu_rong_meng 计划开始日期设置为2021-08-23
mu_rong_meng 优先级设置为主要
mu_rong_meng 添加了device/ascend(已删除)标签
mu_rong_meng 添加了
 
kind/ci
标签
mu_rong_meng 添加了
 
sig/pynative
标签
mu_rong_meng 添加了
 
kind/bug
标签
展开全部操作日志

hello, @mu_rong_meng , we suggest you assign this issue to:
你好, @mu_rong_meng , 建议您将这个issue分配给:负责人zhongjicheng,协作人pandoublefeng,liu_xiao_93

2021年8月26日CCB:pynative模式下使用hccl单算子,不支持group通信,需要补充资料说明pynative下不支持group通信,同时补充需求统一pynative,异构、图模式等场景下的group通信方案。

caifubi 任务状态TODO 修改为ACCEPTED
caifubi 任务类型Bug-Report 修改为RFC

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
8777557 test bot 1617846881 6575002 guoqi1024 1584438719 6574868 jojohw 1584546516
Python
1
https://gitee.com/mindspore/mindspore.git
git@gitee.com:mindspore/mindspore.git
mindspore
mindspore
mindspore

搜索帮助