1 Star 0 Fork 0

Julia中文社区 / Bandits.jl

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

Bandits

Build Status

This package provides tools for simulation of multi-armed bandit problems.

Installation

Pkg.add("Bandits")

Documentation

There are several underlying types that need to be constructed before simulation.

The first is a Bandit type, which specifies the true distribution of each of the arms. Currently only StaticBandit is implemented which takes an array of Distribution types from Distributions.jl (i.e. staticbandit([Normal(0, 1), Uniform(0, 1)])).

The second is a Policy type, which specifies the policy the agent is going to follow. This type is used to specify the arm the agent should choose given the agent's beliefs over the arms.

Currently, the implemented policies are:

  • Greedy
  • Epsilon-Greedy
  • ThompsonSampling
  • UCB1
  • ExploreThenExploit

The third is a Agent type, which requires the prior of the agent, the underlying bandit, and the policy the agent should follow.

Currently, the following Agents are implemented:

  • BasicAgent - this agent forms beliefs over the arms based only on observed rewards (via the empirical mean) and an initial belief about the means of the arms.
  • BetaBernoulliAgent - this agent has beta priors and should be used wih Bernoulli-distributed arms. Posterior updating is done via the standard Bayesian updating formula for the Beta distribution.
  • NormalAgent - this agent has Gaussian priors and should be used with Gaussian arms.

Now, we can call simulate and get back a BanditStats object which returns the regret and the number of times each arm was pulled.

As well, this package provides an aggregate_simulate function which aggregates the results of N simulations run in parallel and returns the average.

Example usage as follows:

using Bandits

thompson_sampling = ThompsonSampling()
sb = staticbandit([Bernoulli(0.5), Bernoulli(0.6)])
beta_agent = BetaBernoulliAgent([0.6, 0.5], thompson_sampling, sb)
stats = simulate(sb, beta_agent, 100)
print(stats.regret)

License: MIT

空文件

简介

暂无描述 展开 收起
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/Julialang/Bandits.jl.git
git@gitee.com:Julialang/Bandits.jl.git
Julialang
Bandits.jl
Bandits.jl
master

搜索帮助