2 Star 3 Fork 0

MindSpore Lab / mindrl

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

MindSpore Reinforcement

View English

Python Version LICENSE PRs Welcome

概述

MindSpore Reinforcement是一个开源的强化学习框架,支持使用强化学习算法对agent进行分布式训练。MindSpore Reinforcement为编写强化学习算法提供了干净整洁的API抽象,它将算法与部署和执行注意事项解耦,包括加速器的使用、并行度和跨worker集群计算的分布。MindSpore Reinforcement将强化学习算法转换为一系列编译后的计算图,然后由MindSpore框架在CPU、GPU或Ascend AI处理器上高效运行。MindSpore Reinforcement的架构在如下展示:

MindSpore_RL_Architecture

安装

MindSpore Reinforcement依赖MindSpore训练推理框架,安装完MindSpore,再安装MindSpore Reinforcement。可以采用pip安装或者源码编译安装两种方式。

MindSpore版本依赖关系

由于MindSpore Reinforcement与MindSpore有依赖关系,请按照根据下表中所指示的对应关系,在MindSpore下载页面下载并安装对应的whl包。

pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/{MindSpore-Version}/MindSpore/cpu/ubuntu_x86/mindspore-{MindSpore-Version}-cp37-cp37m-linux_x86_64.whl
MindSpore Reinforcement 分支 MindSpore
0.7.0 r0.7 2.1.0
0.6.0 r0.6 2.0.0
0.5.0 r0.5 1.8.0
0.3.0 r0.3 1.7.0
0.2.0 r0.2 1.6.0
0.1.0 r0.1 1.5.0

pip安装

使用pip命令安装,请从MindSpore Reinforcement下载页面下载并安装whl包。

pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/{MindSpore_version}/Reinforcement/any/mindspore_rl-{Reinforcement_version}-py3-none-any.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple
  • 在联网状态下,安装whl包时会自动下载MindSpore Reinforcement安装包的依赖项(依赖项详情参见requirement.txt),其余情况需自行安装。
  • {MindSpore_version}表示MindSpore版本号,MindSpore和Reinforcement版本配套关系参见页面
  • {Reinforcement_version}表示Reinforcement版本号。例如下载0.1.0版本Reinforcement时,{MindSpore_version}应写为1.5.0,{Reinforcement_version}应写为0.1.0。

源码编译安装

下载源码,下载后进入mindrl目录。

git clone https://github.com/mindspore-lab/mindrl.git
cd mindrl/
bash build.sh
pip install output/mindspore_rl-{Reinforcement_version}-py3-none_{ARCH}.whl

其中,build.shmindrl目录下的编译脚本文件。{Reinforcement_version}表示MindSpore Reinforcement版本号,{ARCH}表示系统架构,可选x86_64aarch64

安装依赖项

cd mindrl && pip install requirements.txt

验证是否成功安装

执行以下命令,验证安装结果。导入Python模块不报错即安装成功:

import mindspore_rl

快速入门

MindSpore Reinforcement的算法示例位于mindrl/example/下,以一个简单的算法Deep Q-Learning (DQN) 示例,演示MindSpore Reinforcement如何使用。

第一种开箱即用方式,使用脚本文件直接运行:

cd mindrl/example/dqn/scripts
bash run_standalone_train.sh

第二种方式,直接使用config.pytrain.py,可以更灵活地修改配置:

cd mindrl/example/dqn
python train.py --episode 1000 --device_target GPU

第一种方式会在当前目录会生成dqn_train_log.txt日志文件,第二种在屏幕上打印日志信息:

Episode 0: loss is 0.396, rewards is 42.0
Episode 1: loss is 0.226, rewards is 15.0
Episode 2: loss is 0.202, rewards is 9.0
Episode 3: loss is 0.122, rewards is 15.0
Episode 4: loss is 0.107, rewards is 12.0
Episode 5: loss is 0.078, rewards is 10.0
Episode 6: loss is 0.075, rewards is 8.0
Episode 7: loss is 0.084, rewards is 12.0
Episode 8: loss is 0.069, rewards is 10.0
Episode 9: loss is 0.067, rewards is 10.0
Episode 10: loss is 0.056, rewards is 8.0
-----------------------------------------
Evaluate for episode 10 total rewards is 9.600
-----------------------------------------

更多有关安装指南、教程和API的详细信息,请参阅用户文档

特性

算法

算法 RL版本 动作空间 设备 示例环境
离散 连续 CPU GPU Ascend
DQN >= 0.1 ✔️ / ✔️ ✔️ ✔️ CartPole-v0
PPO >= 0.1 / ✔️ ✔️ ✔️ ✔️ HalfCheetah-v2
AC >= 0.1 ✔️ / ✔️ ✔️ ✔️ CartPole-v0
A2C >= 0.2 ✔️ / ✔️ ✔️ ✔️ CartPole-v0
DDPG >= 0.3 / ✔️ ✔️ ✔️ ✔️ HalfCheetah-v2
QMIX >= 0.5 ✔️ / ✔️ ✔️ ✔️ SMAC, Simple Spread
SAC >= 0.5 / ✔️ ✔️ ✔️ ✔️ HalfCheetah-v2
TD3 >= 0.6 / ✔️ ✔️ ✔️ ✔️ HalfCheetah-v2
C51 >= 0.6 ✔️ / ✔️ ✔️ ✔️ CartPole-v0
A3C >= 0.6 ✔️ / / ✔️ ✔️ CartPole-v0
CQL >= 0.6 / ✔️ ✔️ ✔️ ✔️ Hopper-v0
MAPPO >= 0.6 ✔️ / ✔️ ✔️ ✔️ Simple Spread
GAIL >= 0.6 / ✔️ ✔️ ✔️ ✔️ HalfCheetah-v2
MCTS >= 0.6 ✔️ / ✔️ ✔️ / Tic-Tac-Toe
AWAC >= 0.6 / ✔️ ✔️ ✔️ ✔️ Ant-v2
Dreamer >= 0.6 / ✔️ / ✔️ ✔️ Walker-walk
IQL >= 0.6 / ✔️ ✔️ ✔️ ✔️ Walker2d-v2
MADDPG >= 0.6 ✔️ / ✔️ ✔️ ✔️ simple_spread
Double DQN >= 0.6 ✔️ / ✔️ ✔️ ✔️ CartPole-v0
Policy Gradient >= 0.6 ✔️ / ✔️ ✔️ ✔️ CartPole-v0
Dueling DQN >= 0.6 ✔️ / ✔️ ✔️ ✔️ CartPole-v0

环境

强化学习领域中,智能体与环境交互过程中,学习策略来使得数值化的收益信号最大化。“环境”作为待解决的问题,是强化学习领域中重要的要素。当前已支持的环境如下表所示:

环境 版本
Gym >= v0.1
MuJoCo >= v0.1
MPE >= v0.6
SMAC >= v0.5
DMC >= v0.6
PettingZoo-mpe >= v0.6
D4RL >= v0.6

经验回放

在强化学习中,ReplayBuffer是一个常用的基本数据存储方式,它的功能在于存放智能体与环境交互得到的数据。 使用ReplayBuffer可以解决以下几个问题:

  1. 存储的历史经验数据,可以通过采样或一定优先级的方式抽取,以打破训练数据的相关性,使抽样的数据具有独立同分布的特性。

  2. 可以提供数据的临时存储,提高数据的利用率。

一般情况下,算法人员使用原生的Python数据结构或Numpy的数据结构来构造ReplayBuffer, 或者一般的强化学习框架也提供了标准的API封装。不同的是,MindSpore实现了设备端的ReplayBuffer结构,一方面能在使用GPU/Ascend硬件时减少数据在Host和Device之间的频繁拷贝,另一方面,以MindSpore算子的形式表达ReplayBuffer,可以构建完整的IR图,使能MindSpore GRAPH_MODE的各种图优化,提升整体的性能。

类别 特性 设备
CPU GPU Ascend
UniformReplayBuffer 1 FIFO先进先出
2 支持batch 输入
✔️ ✔️ /
PriorityReplayBuffer 1 proportional-based优先级策略
2 Sum Tree提升采样效率
✔️ ✔️ ✔️
ReservoirReplayBuffer 采用无偏采样 ✔️ ✔️ ✔️

分布式

MindSpore Reinforcement 将强化学习的算法定义与算法如何并行或分布式执行在硬件上进行了解偶。我们通过一个新的抽象,即数据流片段图(Fragmented Dataflow Graphs)来实现这一目标,算法的每一部分都将成为数据流片段,并由MSRL灵活地分发与并行。参考更多信息

当前已经支持如下分布式策略:

策略类别 策略 示例
MultiActorSingleLearnerDP 同步单learner多actor结构分布式策略 ppo
AsyncMultiActorSingleLearnerDP 异步单learner多actor结构分布式策略 a3c
SingleActorLearnerWithMultEnvDP 单actor learner 多远端环境分布式策略 ppo
SingleActorLearnerWithMultEnvHeterDP 单actor learner 多远端环境异构分布式策略 ppo

MultiActorSingleLearnerDP

AsyncMultiActorSingleLearnerDP

SingleActorLearnerWithMultEnvDPSingleActorLearnerWithMultEnvHeterDP

未来路标

MindSpore Reinforcement初始版本包含了一个稳定的API, 用于实现强化学习算法和使用MindSpore的计算图执行计算。现已支持算法并行和自动分布式执行能力,支持多智能体,offline-rl,门特卡罗树等多种场景。MindSpore Reinforcement的后续版本将继续完善并提升自动分布式功能以及接入大模型的能力,敬请期待。

社区

治理

查看MindSpore如何进行开放治理

交流

贡献

欢迎参与贡献。 MindSpore Reinforcement 会按3个月周期更新,如果遇到问题,请及时通知我们。我们感谢所有的贡献,可以通过issue/pr的形式提交您的问题或修改。

许可证

Apache License 2.0

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

MindSpore Reinforcement是一个开源的强化学习框架,支持使用强化学习算法对agent进行分布式训练。 展开 收起
Python 等 5 种语言
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/mindspore-lab/mindrl.git
git@gitee.com:mindspore-lab/mindrl.git
mindspore-lab
mindrl
mindrl
master

搜索帮助