This package provides tools for simulation of multi-armed bandit problems.
Pkg.add("Bandits")
There are several underlying types that need to be constructed before simulation.
The first is a Bandit
type, which specifies the true distribution of each of the arms. Currently only StaticBandit
is implemented which takes an array of Distribution
types from Distributions.jl
(i.e. staticbandit([Normal(0, 1), Uniform(0, 1)])
).
The second is a Policy
type, which specifies the policy the agent is going to follow. This type is used to specify the arm the agent should choose given the agent's beliefs over the arms.
Currently, the implemented policies are:
The third is a Agent
type, which requires the prior of the agent, the underlying bandit, and the policy the agent should follow.
Currently, the following Agents are implemented:
Now, we can call simulate
and get back a BanditStats
object which returns the regret and the number of times each arm was pulled.
As well, this package provides an aggregate_simulate
function which aggregates the results of N simulations run in parallel and returns the average.
Example usage as follows:
using Bandits
thompson_sampling = ThompsonSampling()
sb = staticbandit([Bernoulli(0.5), Bernoulli(0.6)])
beta_agent = BetaBernoulliAgent([0.6, 0.5], thompson_sampling, sb)
stats = simulate(sb, beta_agent, 100)
print(stats.regret)
License: MIT
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。