Obviously Awesome

Bandit Algorithms - Booknotes

intro (c1)
language
applications
probability (c2)
probability spaces
random elements
algebras & knowledge
conditional prob.
independence
integration, expectation
conditional expectation
notes
multi-armed (c2)
k armed bandits
action-value methods
10 armed testbed
implementation (incremental)
non-stationary problem tracking
optimistic initial values
UCB action selection
gradient bandit algos
assoc.search (context.bandits)
summary
stochastic processes, markov chains (c3)
stochastic processes
markov chains
martingales
stopping times
notes
finite-armed stochastic bandits (c4)
learning objective
regret
decomposing regret
canonical bandit model
notes
concentration of measure (c5)
markov, chebyshev inequalities
cramer-chernoff subgaussian random vars
notes
upper conf bound (UCB) algo (c7)
optimism principle
notes
UCB algo: bernoulli noise (c10)
concentration of sums
KL-UCB algo
notes
exp3 algorithm (c11)
importance-weighted estimators
definition
regret analysis
notes
exp3-IX (c12)
regret analysis
notes
info theory (c14)
relative entropy
notes
minimax LB (c15)
relative entropy btwn bandits
minimax LB
notes
LB instance dependent (c16)
asymptotic bounds
finite-time bounds
notes
LB high probability (c17)
stochastic bandits
adversarial bandits
notes
contextual (c18)
one bandit per context
expert advice
higher? (exp4)
regret analysis
notes
stochastic linear bandits (c19)
st. contextual bandits
st. linear bandits
regret analysis
notes
LSEs: confidence bounds (c20)
martingale noise
laplace's method
notest
LSEs: optimal design (c21)
kiefer-wolfowitz proof
min volume ellipsoids
john's theorem
notes
SLBs with sparsity (c23)
sparse SLBs
elimination
proof
UCB with sparsity
online>confidence set conversion
sparse online linear prediction
notes
SLBs: minimax lower bounds (c24)
hypercube
sphere
sparse param vectors
unrealizable case
notes
convex analysis (c26)
sets & functions
jensen's inequality
bregman divergence
legendre functions
optimization
projections
notes
exp3: adversarial/linear (c27)
exponential signals
regret analysis
continuous exponential weights
notes
follow-the-leader | mirror descent (c28)
online linear optimization
regret analysis
online learning
the unit ball
notes
adversarial-vs-stochastic-linear (c29)
reducing SLBs to ALBs
SLB with noise
notes
combinatorial (c30)
apps
bandits
semibandits
follow the perturbed leader
notes
non-stationary (c31)
adversarial bandits
stochastic bandits
notes
ranking (c32)
click models
policy
regret analysis
notes
pure exploration (c33)
simple regret
best-arm id
best-arm id with budget
notes
bayesian (c34)
regret & optimality
optimal regret, finite-armed bandits
learning, posterior dists
conjugate priors
posterior distributions
one-armed bandits
gittins index
computing the GI
notes
partial monitoring (c36)
finite adversarial partial monitoring (FAPM)
structure
FAPM classification
lower bounds
policy for easy games
upper bound for easy games
theorem proof
notes
markov dcsn prcss (MDP) (c37)
the problem
optimal policies
bellman equation
finding opt policy
MDP learning
UCBs for reinf. learning
proof of UB, LB
notes