4 Star 13 Fork 2

TensorLayer / RLzoo

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

Reinforcement Learning Zoo

Documentation Status Supported TF Version Downloads



RLzoo is a collection of the most practical reinforcement learning algorithms, frameworks and applications. It is implemented with Tensorflow 2.0 and API of neural network layers in TensorLayer 2, to provide a hands-on fast-developing approach for reinforcement learning practices and benchmarks. It supports basic toy-tests like OpenAI Gym and DeepMind Control Suite with very simple configurations. Moreover, RLzoo supports robot learning benchmark environment RLBench based on Vrep/Pyrep simulator. Other large-scale distributed training framework for more realistic scenarios with Unity 3D, Mujoco, Bullet Physics, etc, will be supported in the future. A Springer textbook is also provided, you can get the free PDF if your institute has Springer license.

Different from RLzoo for simple usage with high-level APIs, we also have a RL tutorial that aims to make the reinforcement learning tutorial simple, transparent and straight-forward with low-level APIs, as this would not only benefits new learners of reinforcement learning, but also provide convenience for senior researchers to testify their new ideas quickly.

Please check our Online Documentation. We suggest users to report bugs using Github issues. Users can also discuss how to use RLzoo in the following slack channel.



Table of contents:

Status: Release

Current status [click to expand]
We are currently open to any suggestions or pull requests from the community to make RLzoo a better repository. Given the scope of this project, we expect there could be some issues over the coming months after initial release. We will keep improving the potential problems and commit when significant changes are made in the future. Current default hyperparameters for each algorithm and each environment may not be optimal, so you can play around with those hyperparameters to achieve best performances. We will release a version with optimal hyperparameters and benchmark results for all algorithms in the future.
Version History [click to expand]
  • 1.0.3 (Current version)

    Changes:

    • Fix bugs in SAC algorithm
  • 1.0.1

    Changes:

    • Add interactive training configuration;
    • Better support RLBench environment, with multi-head network architectures to support dictionary as observation type;
    • Make the code cleaner.
  • 0.0.1

Installation

Ensure that you have Python >=3.5 (Python 3.6 is needed if using DeepMind Control Suite).

Direct installation:

pip3 install rlzoo --upgrade

Install RLzoo from Git:

git clone https://github.com/tensorlayer/RLzoo.git
cd RLzoo
pip3 install .

Prerequisites

pip3 install -r requirements.txt

List of prerequisites. [click to expand]
  • tensorflow >= 2.0.0 or tensorflow-gpu >= 2.0.0a0
  • tensorlayer >= 2.0.1
  • tensorflow-probability
  • tf-nightly-2.0-preview
  • Mujoco 2.0, dm_control, dm2gym (if using DeepMind Control Suite environments)
  • Vrep, PyRep, RLBench (if using RLBench environments, follows here, here and here)

Usage

For detailed usage, please check our online documentation.

Quick Start

Choose whatever environments with whatever RL algorithms supported in RLzoo, and enjoy the game by running following example in the root file of installed package:

# in the root folder of rlzoo package
cd RLzoo
python run_rlzoo.py

What's in run_rlzoo.py?

from rlzoo.common.env_wrappers import build_env
from rlzoo.common.utils import call_default_params
from rlzoo.algorithms import TD3  # import the algorithm to use
# choose an algorithm
AlgName = 'TD3'
# chose an environment
EnvName = 'Pendulum-v0'  
# select a corresponding environment type
EnvType = 'classic_control'
# build an environment with wrappers
env = build_env(EnvName, EnvType)  
# call default parameters for the algorithm and learning process
alg_params, learn_params = call_default_params(env, EnvType, AlgName)  
# instantiate the algorithm
alg = eval(AlgName+'(**alg_params)')
# start the training
alg.learn(env=env, mode='train', render=False, **learn_params)  
# test after training 
alg.learn(env=env, mode='test', render=True, **learn_params)  

The main script run_rlzoo.py follows (almost) the same structure for all algorithms on all environments, see the full list of examples.

General Descriptions: RLzoo provides at least two types of interfaces for running the learning algorithms, with (1) implicit configurations or (2) explicit configurations. Both of them start learning program through running a python script, instead of running a long command line with all configurations shortened to be arguments of it (e.g. in Openai Baseline). Our approaches are found to be more interpretable, flexible and convenient to apply in practice. According to the level of explicitness of learning configurations, we provided two different ways of setting learning configurations in python scripts: the first one with implicit configurations uses a default.py script to record all configurations for each algorithm, while the second one with explicit configurations exposes all configurations to the running scripts. Both of them can run any RL algorithms on any environments supported in our repository with a simple command line.

1. Implicit Configurations [click to expand]

RLzoo with implicit configurations means the configurations for learning are not explicitly contained in the main script for running (i.e. run_rlzoo.py), but in the default.py file in each algorithm folder (for example, rlzoo/algorithms/sac/default.py is the default parameters configuration for SAC algorithm). All configurations include (1) parameter values for the algorithm and learning process, (2) the network structures, (3) the optimizers, etc, are divided into configurations for the algorithm (stored in alg_params) and configurations for the learning process (stored in learn_params). Whenever you want to change the configurations for the algorithm or learning process, you can either go to the folder of each algorithm and modify parameters in default.py, or change the values in alg_params (a dictionary of configurations for the algorithm) and learn_params (a dictionary of configurations for the learning process) in run_rlzoo.py according to the keys.

Common Interface:

from rlzoo.common.env_wrappers import build_env
from rlzoo.common.utils import call_default_params
from rlzoo.algorithms import *
# choose an algorithm
AlgName = 'TD3'
# chose an environment
EnvName = 'Pendulum-v0'  
# select a corresponding environment type
EnvType = ['classic_control', 'atari', 'box2d', 'mujoco', 'robotics', 'dm_control', 'rlbench'][0] 
# build an environment with wrappers
env = build_env(EnvName, EnvType)  
# call default parameters for the algorithm and learning process
alg_params, learn_params = call_default_params(env, EnvType, AlgName)  
# instantiate the algorithm
alg = eval(AlgName+'(**alg_params)')
# start the training
alg.learn(env=env, mode='train', render=False, **learn_params)  
# test after training 
alg.learn(env=env, mode='test', render=True, **learn_params)  
# in the root folder of rlzoo package
cd rlzoo
python run_rlzoo.py
2. Explicit Configurations [click to expand]

RLzoo with explicit configurations means the configurations for learning, including parameter values for the algorithm and the learning process, the network structures used in the algorithms and the optimizers etc, are explicitly displayed in the main script for running. And the main scripts for demonstration are under the folder of each algorithm, for example, ./rlzoo/algorithms/sac/run_sac.py can be called with python algorithms/sac/run_sac.py from the file ./rlzoo to run the learning process same as in above implicit configurations.

A Quick Example

import gym
from rlzoo.common.utils import make_env, set_seed
from rlzoo.algorithms import AC
from rlzoo.common.value_networks import ValueNetwork
from rlzoo.common.policy_networks import StochasticPolicyNetwork

''' load environment '''
env = gym.make('CartPole-v0').unwrapped
obs_space = env.observation_space
act_space = env.action_space
# reproducible
seed = 2
set_seed(seed, env)

''' build networks for the algorithm '''
num_hidden_layer = 4 #number of hidden layers for the networks
hidden_dim = 64 # dimension of hidden layers for the networks
with tf.name_scope('AC'):
        with tf.name_scope('Critic'):
            	# choose the critic network, can be replaced with customized network
                critic = ValueNetwork(obs_space, hidden_dim_list=num_hidden_layer * [hidden_dim])
        with tf.name_scope('Actor'):
            	# choose the actor network, can be replaced with customized network
                actor = StochasticPolicyNetwork(obs_space, act_space, hidden_dim_list=num_hidden_layer * [hidden_dim], output_activation=tf.nn.tanh)
net_list = [actor, critic] # list of the networks

''' choose optimizers '''
a_lr, c_lr = 1e-4, 1e-2  # a_lr: learning rate of the actor; c_lr: learning rate of the critic
a_optimizer = tf.optimizers.Adam(a_lr)
c_optimizer = tf.optimizers.Adam(c_lr)
optimizers_list=[a_optimizer, c_optimizer]  # list of optimizers

# intialize the algorithm model, with algorithm parameters passed in
model = AC(net_list, optimizers_list)
''' 
full list of arguments for the algorithm
----------------------------------------
net_list: a list of networks (value and policy) used in the algorithm, from common functions or customization
optimizers_list: a list of optimizers for all networks and differentiable variables
gamma: discounted factor of reward
action_range: scale of action values
'''

# start the training process, with learning parameters passed in
model.learn(env, train_episodes=500,  max_steps=200,
            save_interval=50, mode='train', render=False)
''' 
full list of parameters for training
---------------------------------------
env: learning environment
train_episodes:  total number of episodes for training
test_episodes:  total number of episodes for testing
max_steps:  maximum number of steps for one episode
save_interval: time steps for saving the weights and plotting the results
mode: 'train' or 'test'
render:  if true, visualize the environment
'''

# test after training
model.learn(env, test_episodes=100, max_steps=200,  mode='test', render=True)

In the package folder, we provides examples with explicit configurations for each algorithm.

# in the root folder of rlzoo package
cd rlzoo
python algorithms/<ALGORITHM_NAME>/run_<ALGORITHM_NAME>.py 
# for example: run actor-critic
python algorithms/ac/run_ac.py

Interactive Configurations

We also provide an interactive learning configuration with Jupyter Notebook and ipywidgets, where you can select the algorithm, environment, and general learning settings with simple clicking on dropdown lists and sliders! A video demonstrating the usage is as following. The interactive mode can be used with rlzoo/interactive/main.ipynb by running $ jupyter notebook to open it.

Interactive Video

Contents

Algorithms

Choices for AlgName: 'DQN', 'AC', 'A3C', 'DDPG', 'TD3', 'SAC', 'PG', 'TRPO', 'PPO', 'DPPO'

Algorithms Papers
Value-based
Q-learning Technical note: Q-learning. Watkins et al. 1992
Deep Q-Network (DQN) Human-level control through deep reinforcement learning, Mnih et al. 2015.
Prioritized Experience Replay Schaul et al. Prioritized experience replay. Schaul et al. 2015.
Dueling DQN Dueling network architectures for deep reinforcement learning. Wang et al. 2015.
Double DQN Deep reinforcement learning with double q-learning. Van et al. 2016.
Retrace Safe and efficient off-policy reinforcement learning. Munos et al. 2016:
Noisy DQN Noisy networks for exploration. Fortunato et al. 2017.
Distributed DQN (C51) A distributional perspective on reinforcement learning. Bellemare et al. 2017.
Policy-based
REINFORCE(PG) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Ronald J. Williams 1992.
Trust Region Policy Optimization (TRPO) Abbeel et al. Trust region policy optimization. Schulman et al.2015.
Proximal Policy Optimization (PPO) Proximal policy optimization algorithms. Schulman et al. 2017.
Distributed Proximal Policy Optimization (DPPO) Emergence of locomotion behaviours in rich environments. Heess et al. 2017.
Actor-Critic
Actor-Critic (AC) Actor-critic algorithms. Konda er al. 2000.
Asynchronous Advantage Actor-Critic (A3C) Asynchronous methods for deep reinforcement learning. Mnih et al. 2016.
Deep Deterministic Policy Gradient (DDPG) Continuous Control With Deep Reinforcement Learning, Lillicrap et al. 2016
Twin Delayed DDPG (TD3) Addressing function approximation error in actor-critic methods. Fujimoto et al. 2018.
Soft Actor-Critic (SAC) Soft actor-critic algorithms and applications. Haarnoja et al. 2018.

Environments

Choices for EnvType: 'atari', 'box2d', 'classic_control', 'mujoco', 'robotics', 'dm_control', 'rlbench'

Some notes on environment usage. [click to expand]
  • Make sure the name of environment matches the type of environment in the main script. The types of environments include: 'atari', 'box2d', 'classic_control', 'mujoco', 'robotics', 'dm_control', 'rlbench'.

  • When using the DeepMind Control Suite, install the dm2gym package with: pip install dm2gym

  • When using the RLBench environments, please add the path of your local rlbench repository to python: export PYTHONPATH=PATH_TO_YOUR_LOCAL_RLBENCH_REPO

  • A dictionary of all different environments is stored in ./rlzoo/common/env_list.py

  • Full list of environments in RLBench is here.

  • Installation of Vrep->PyRep->RLBench follows here->here->here.

Configurations:

The supported configurations for RL algorithms with corresponding environments in RLzoo are listed in the following table.

Algorithms Action Space Policy Update Envs
DQN (double, dueling, PER) Discrete Only -- Off-policy Atari, Classic Control
AC Discrete/Continuous Stochastic On-policy All
PG Discrete/Continuous Stochastic On-policy All
DDPG Continuous Deterministic Off-policy Classic Control, Box2D, Mujoco, Robotics, DeepMind Control, RLBench
TD3 Continuous Deterministic Off-policy Classic Control, Box2D, Mujoco, Robotics, DeepMind Control, RLBench
SAC Continuous Stochastic Off-policy Classic Control, Box2D, Mujoco, Robotics, DeepMind Control, RLBench
A3C Discrete/Continuous Stochastic On-policy Atari, Classic Control, Box2D, Mujoco, Robotics, DeepMind Control
PPO Discrete/Continuous Stochastic On-policy All
DPPO Discrete/Continuous Stochastic On-policy Atari, Classic Control, Box2D, Mujoco, Robotics, DeepMind Control
TRPO Discrete/Continuous Stochastic On-policy All

Properties

1. Automatic model construction [click to expand]
We aim to make it easy to configure for all components within RL, including replacing the networks, optimizers, etc. We also provide automatically adaptive policies and value functions in the common functions: for the observation space, the vector state or the raw-pixel (image) state are supported automatically according to the shape of the space; for the action space, the discrete action or continuous action are supported automatically according to the shape of the space as well. The deterministic or stochastic property of policy needs to be chosen according to each algorithm. Some environments with raw-pixel based observation (e.g. Atari, RLBench) may be hard to train, be patient and play around with the hyperparameters!
3. Simple and flexible API [click to expand]
As described in the Section of Usage, we provide at least two ways of deploying RLzoo: implicit configuration and explicit configuration process. We ensure the maximum flexiblity for different use cases with this design.
3. Sufficient support for DRL algorithms and environments [click to expand]
As shown in above algorithms and environments tables.
4. Interactive reinforcement learning configuration. [click to expand]

As shown in the interactive use case in Section of Usage, a jupyter notebook is provided for more intuitively configuring the whole process of deploying the learning process (rlzoo/interactive/main.ipynb)

Troubleshooting

  • If you meet the error 'AttributeError: module 'tensorflow' has no attribute 'contrib'' when running the code after installing tensorflow-probability, try: pip install --upgrade tf-nightly-2.0-preview tfp-nightly
  • When trying to use RLBench environments, 'No module named rlbench' can be caused by no RLBench package installed at your local or a mistake in the python path. You should add export PYTHONPATH=/home/quantumiracle/research/vrep/PyRep/RLBench every time you try to run the learning script with RLBench environment or add it to you ~/.bashrc file once for all.
  • If you meet the error that the Qt platform is not loaded correctly when using DeepMind Control Suite environments, it's probably caused by your Ubuntu system not being version 14.04 or 16.04. Check here.

Credits

Our core contributors include:

Zihan Ding, Tianyang Yu, Yanhua Huang, Hongming Zhang, Hao Dong

Citing

@misc{RLzoo,
  author = {Zihan Ding, Tianyang Yu, Yanhua Huang, Hongming Zhang, Hao Dong},
  title = {Reinforcement Learning Algorithms Zoo},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tensorlayer/RLzoo}},
}

Other Resources





Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

RL Zoo 开源中国官方镜像 展开 收起
Python 等 2 种语言
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
Python
1
https://gitee.com/TensorLayer/RLzoo.git
git@gitee.com:TensorLayer/RLzoo.git
TensorLayer
RLzoo
RLzoo
master

搜索帮助