probability (c2)

probability spaces

random elements

algebras & knowledge

conditional prob.

independence

integration, expectation

conditional expectation

notes

multi-armed (c2)

k armed bandits

action-value methods

10 armed testbed

implementation (incremental)

non-stationary problem tracking

optimistic initial values

UCB action selection

gradient bandit algos

assoc.search (context.bandits)

summary

stochastic processes, markov chains (c3)

stochastic processes

markov chains

martingales

stopping times

notes

finite-armed stochastic bandits (c4)

learning objective

regret

decomposing regret

canonical bandit model

notes

concentration of measure (c5)

markov, chebyshev inequalities

cramer-chernoff subgaussian random vars

notes

SLBs with sparsity (c23)

sparse SLBs

elimination

proof

UCB with sparsity

online>confidence set conversion

sparse online linear prediction

notes

convex analysis (c26)

sets & functions

jensen's inequality

bregman divergence

legendre functions

optimization

projections

notes

exp3: adversarial/linear (c27)

exponential signals

regret analysis

continuous exponential weights

notes

follow-the-leader | mirror descent (c28)

online linear optimization

regret analysis

online learning

the unit ball

notes

bayesian (c34)

regret & optimality

optimal regret, finite-armed bandits

learning, posterior dists

conjugate priors

posterior distributions

one-armed bandits

gittins index

computing the GI

notes

partial monitoring (c36)

finite adversarial partial monitoring (FAPM)

structure

FAPM classification

lower bounds

policy for easy games

upper bound for easy games

theorem proof

notes

markov dcsn prcss (MDP) (c37)

the problem

optimal policies

bellman equation

finding opt policy

MDP learning

UCBs for reinf. learning

proof of UB, LB

notes

