Author: Fu Qingxu, CASIA
由于Gitee已经凉了,我们的仓库已经转移至https://github.com/binary-husky/hmp2g
,此仓库不再维护
We have transfered our resp to https://github.com/binary-husky/hmp2g
, this resp is no longer maintained
Hybrid Multi-agent Playground (HMP) is an experimental framework designed for RL researchers. Unlike any other framework which only isolates the TASKs from the framework, HMP also separates the ALGORITHMs from the framework to achieve excellent compatibility.
Any algorithm, from the most straightforward script-AI to sophisticated RL learner, is abstracted into a module inside ./ALGORITHM/*.
We also put effect to interface all kinds of multi-agent environments, including gym, SMAC, air combat, et.al. Other frameworks such as pymarl2 can interface with HMP as well. The entire HMP can disguise as an RL environment in pymarl2. We make it happen by building a particular ALGORITHM module, which runs pymarl2 in a subprocess. This work is ongoing. Currently, HMP can link to a modified version of pymarl2.
Please star
the root Github project. Your encouragement is extremely important to us as researchers: https://github.com/binary-husky/hmp2g
By the way, we also have a gitee rep which is a mirror of this Github rep: https://gitee.com/hh505030475/hmp-2g
. Archived code used in our AAAI papers: https://github.com/binary-husky/hmp2g/tree/aaai-conc
.
http://cloud.fuqingxu.top:11601/
git pull && python main.py -c ZHECKPOINT/50RL-55opp/test-50RL-55opp.jsonc
(Trained in https://www.bilibili.com/video/BV1vF411M7N9/)
git pull && python main.py -c ZHECKPOINT/test-aii515/test-aii515.jsonc --skip
git pull && python main.py -c ZHECKPOINT/test-cargo50/test-cargo50.jsonc --skip
git pull && python main.py -c ZHECKPOINT/test-50+50/test-50+50.jsonc --skip
git pull && python main.py -c ZHECKPOINT/test-100+100/test-100+100.jsonc --skip
We use docker to solve dependency: SetupDocker
HMP aims to optimize the parameter control experience as a framework for researchers.
We discard the method of using the command line to control parameters; instead, the commented-JSON (JSONC) is used for experiment configuration. To run an experiment, just type:
python main.py --cfg Json_Experiment_Config_File.jsonc
Parameters assigned and overridden in the JSON file are NOT passed via init functions layer by layer as other frameworks usually do; instead, at the start of the main.py
, a special program defined in UTILS/config_args.py
will directly INJECT the overridden parameters to the desired location.
We give an example to demonstrate how simple it is to add new parameters.
Suppose we want to introduce HP into DCA, then an initial HP, let say HP_MAX
need to be defined as a parameter.
Then:
MISSIONS/collective_assult/collective_assult_parallel_run.py
. (You can create new file if you wish so.)ScenarioConfig
class add a new line writing HP_MAX=100
. (You can create another class if you wish so.)HP_MAX
, first from xxx.collective_assult_parallel_run import ScenarioConfig
,
then use the parameter by init_hp_of_some_agent = ScenarioConfig.HP_MAX
.HP_MAX=100
in JSON (e.g., in ./example_dca.jsonc
),
you just need to add a line in the field "MISSIONS.collective_assult_debug.collective_assult_parallel_run.py->ScenarioConfig"
,
for example:{
...... (other field)
"MISSIONS.collective_assult_debug.collective_assult_parallel_run.py->ScenarioConfig": {
"HP_MAX": 222, # <------ add this!
"random_jam_prob": 0.05, # (other config override in ScenarioConfig)
......
},
...... (other field)
}
our framework can fully support complicated parameter dependency.
Some parameters are sometimes just Chained together.
Changing one of them can lead to the change of another.
E.g., Let the number of parallel envs (num_threads
) be 32,
and we test the performance every test_interval
episode,
we wish to have relate them with test_interval
= 8*num_threads
,
meaning that a test run is shot every 8 round of parallel env executions.
Such need can be simply satisfied by defining a Chained var structure:
num_threads = 32 # run N parallel envs,
# define test interval
test_interval = 8*num_threads
# define the Chains of test interval
test_interval_cv = ChainVar(lambda num_threads:8*num_threads, chained_with=['num_threads'])
# all done! you need to do nothing else!
After this, you can expect following override (JSON config override) behaviors:
num_threads
= 32, test_interval
= 8*32)num_threads
in JSON, then test_interval
is also forced to change according to test_interval=8*num_threads
.test_interval
in JSON, the Chain will not work, obay JSON override, nothing has higher priority than an explicit JSON override.For details, please refer to config.py
and UTILS/config_args.py
,
it is very easy to understand once you read any example of this.
When the experiment starts, the Json config override will be stored in ZHECKPOINT/the-experiment-note-you-defined/experiment.json
.
If the experiment latter produces surprising results,
you can always reproduce it again using this config backup.
Task Runner (task_runner.py
) only have three lines of important code:
# line 1
actions_list, self.info_runner = self.platform_controller.act(self.info_runner)
# line 2:
obs, reward, done, info = self.envs.step(actions_list)
# line 3:
self.info_runner = self.update_runner(done, obs, reward, info)
self.platform_controller.act
: Get action, block infomation access between teams (LINK to ARGORITHM
), handle algorithm internal state loopback.self.envs.step
: Multi-thread environment step (LINK to MISSIONS
).self.update_runner
: Prepare obs (for decision making) and reward (for driving RL algorithms) for next step.In general, HMP task runner can operate two ways:
Please refer to MISSIONS README.
Unfinished doc
VHMAP is a visulization component of HMP. VHMAP
It is unfortunate that all existing RL environments fails to provide a visual interface satisfying following useful features:
VHMAP is just the answer,Features:
Interface functions, operation introduction.
We use docker to solve dependency: setup_docker. This project uses techniques such shared memory for extreme training efficiency, as a cost, WindowsOS+GPU training is not yet supported.
Please read setup_docker.md, then set up the container using:
$ docker run -itd --name hmp-$USER \
--net host \
--gpus all \
--shm-size=16G \
fuqingxu/hmp:latest
git pull && python main.py -c ZHECKPOINT/test-50+50/test-50+50.jsonc --skip
git pull && python main.py -c ZHECKPOINT/test-100+100/test-100+100.jsonc --skip
When the testing starts, open revealed url for monitoring. The front end is done by JavaScript and ThreeJS.
--------------------------------
JS visualizer online: http://172.18.116.150:aRandomPort
JS visualizer online (localhost): http://localhost:aRandomPort
--------------------------------
git pull && python main.py -c example.jsonc
git pull && python main.py -c example_dca.jsonc
launch with:
python main.py --cfg xx.json
git pull && python main.py -c ZHECKPOINT/test-aii515/test-aii515.jsonc --skip
git pull && python main.py -c ZHECKPOINT/test-cargo50/test-cargo50.jsonc --skip
git pull && python main.py --cfg ZHECKPOINT/adca-demo/test.json
git pull && python main.py --cfg ZHECKPOINT/basic-ma-40-demo/test.json
If you are interested in something, you may continue to read:
Handling parallel environment --> task_runner.py & shm_env.py
Link between teams and diverse algorithms --> multi_team.py
Adding new env --> MISSIONS.env_router.py
Adding algorithm --> ALGORITHM.example_foundation.py
Configuring by writing py files --> config.py
Configuring by json --> xx.json
colorful printing --> colorful.py
auto pip deployer --> pip_find_missing.py
efficient parallel execting --> shm_pool.pyx
auto gpu selection --> auto_gpu.py
matlab logging/plotting bridge --> mcom.py & mcom_rec.py
experiment batch executor --> mprofile.py
MISSIONS/env_router.py
, add the path of environment's init function in env_init_function_ref
, e.g.:env_init_function_ref = {
"bvr": ("MISSIONS.bvr_sim.init_env", "ScenarioConfig"),
}
# bvr is the final name that HMP recognize,
# MISSIONS.bvr_sim.init_env is a py file,
# ScenarioConfig is a class
MISSIONS/env_router.py
, add the path of environment's configuration in import_path_ref
import_path_ref = {
"bvr": ("MISSIONS.bvr_sim.init_env", 'make_bvr_env'),
}
# bvr will be the final name that HMP recognize,
# MISSIONS.bvr_sim.init_env is a py file,
# make_bvr_env is a function
MISSIONS.bvr_sim.init_env.ScenarioConfig
, as a template).MISSIONS.bvr_sim.init_env.make_bvr_env
, as a template).<1> Qingxu, F.; Tenghai, Q.; Jianqiang, Y.; Zhiqiang, Q.; and Shiguang, W. 2022. Concentration Network for Reinforcement Learning of Large-Scale Multi-Agent Systems. In Proceedings of the AAAI Conference on Artificial Intelligence
<2> Qingxu, F. A Cooperation Graph Approach for Multiagent Sparse Reward Reinforcement Learning. IJCNN
rm -rf ~/ATempDir
mkdir ~/ATempDir
cp -r ../hmp-2g ~/ATempDir
cd ~/ATempDir/hmp-2g
git remote add gitee git@gitee.com:hh505030475/hmp-2g.git
git push gitee master
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。