Fall 2015 (AugDec)
Online Prediction and Learning
Course No.: E1 245
Instructor: Aditya Gopalan, ECE 2.09, Dept. of ECE, Email: firstname AT ece.iisc.ernet.in
Time: TTh 17:3019:00
Place: ECE 1.07
Course Description: The ability to make continual and accurate forecasts is key in many of today’s datadriven intelligent systems (e.g. Internet recommendation engines, automated trading, etc.). This elective course is aimed to expose students to techniques for online or sequential learning/decision making under uncertainty. We will explore several frameworks and algorithms for online prediction along with a rigorous understanding of their performance. We will also look at interesting applications of these techniques, such as portfolio optimization (finance), data compression (information theory), etc.
Contents: Online classification; Regret Minimization; Learning with experts; Online convex optimization; Bandits; Applications sequential investment/portfolio selection, universal lossless data compression, Stochastic games Blackwell approachability, Learning systems with state online reinforcement learning
Prerequisites: Probability/stochastic processes, linear algebra, some exposure to convexity/convex optimization. Contact the instructor for clarifications.
Text/References: We will not follow any standard textbook(s). However, a rough basis will be the excellent and encyclopaedic "Prediction, Learning and Games" (PLG). Nicolo CesaBianchi and Gabor Lugosi, Cambridge University Press, 2006 [local PDF copy].
Other useful resources:
Homework:

Homework 1 posted Aug 17, due Sep 1

Homework 2 posted Sep 15, due Oct 1

Homework 3 posted Nov 3, due Dec 9, EXP3IX paper by Neu, 2015, dataset for Problem 5
Miniproject:
The miniproject carries 15% of course credit, and can comprise either (a) a literature survey of a subtopic/area of online learning featuring recent results/stateoftheart developments (about 2 papers' worth of material, see list of topics below), (b) implementation and benchmarking of existing online learning algorithms applied to interesting and new problem domains (with realworld data, for instance), (c) work on a novel research idea that you may have developed, or (d) any other idea for about a monthlong venture that you may have. The student will have to make a small presentation to the class about his/her project at the end of the term (schedule TBD), of about 3040 min. Please discuss your proposed miniproject with the instructor in advance.
List of potential papers for surveys (will be continuously updated) — feel free to pick among these, or to study other topics that excite you. Generally, good venues to look for highquality work are the proceedings of the ICML, NIPS, COLT and AISTATS machine learning conferences (and their associated workshops) in the past few years.
Scribing: Each student is expected to scribe 23 lectures in LaTeX. The goal is to post highquality scribed notes (below) within a week of the lecture. Please send me a draft of the scribe notes (with LaTeX source) within 4 days of each lecture so that we can meet this goal by iterating if necessary. Please use this LaTeX template for scribe notes. Cite clearly all references that you use.
Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
Lectures:

Lecture 1 (Aug 4). Introduction to online learning, Motivation, Examples, Teaser: Batch estimation vs online learning: (a) estimating the bias of a coin by observing (a batch of) iid samples, (b) predicting the next outcome using (a batch of) iid samples, (c) predicting the next outcome of an arbitrary 01 sequence

Lecture 2 (Aug 6). Math fundamentals review. Probability spaces, events, random variables, expectation, conditional expectation, variance, independendence, the Strong Law of Large Numbers, the Central Limit theorem, Probability inequalities  Markov’s inequality, Chebyshev’s inequality, ChernoffHoeffding bound

Lecture 3 (Aug 11). Math fundamentals review (contd.)  Bernstein’s inequality, Convexity, Convex functions, Strongly convex functions. 1bit sequence prediction with expert advice, the Halving/Majority algorithm and mistake bound, the Weighted Majority algorithm for 1bit prediction with 01 loss.

Lecture 4 (Aug 13). Mistake bound for the Weighted Majority algorithm, General Prediction with Experts problem, Regret (worst case) for an online learning algorithm

Lecture 5 (Aug 18). Prediction with expert advice and convex loss functions, the Exponential Weights/Multiplicative Weights/Hedge algorithm and regret bound. Reference(s): PLG Chapter 2

Lecture 6 (Aug 20). Inevitability of suffering linear regret in the prediction game with general (nonconvex) losses, overcoming it by randomization, the Randomized Exponential Weights algorithm, expected and tail bounds on regret. Reference(s): GB Chapter 4, PLG Chapter 4

Lecture 7 (Aug 25). Regret bound holding with high probability (tail bound on regret) for the Randomized Exponential Weights algorithm. Minimax regret for prediction with experts and convex experts. Reference(s): GB Chapter 5, PLG Thm. 3.7

Lecture 8 (Aug 27). Tracking or Competing with Shifting Experts. Tracking regret, the Exponential Weights forecaster for classes of switching experts. Reference(s): GB Chapter 6, PLG Chapter 5

Lecture 9 (Sept 1). Tracking regret in the actions game (continued)  Regret of the Exponentially Weighted forecaster against static switching experts that switch a bounded no. of times, Towards an efficient algorithm for low regret against switching experts, ExpWts regret with arbitrary (not necessarily) uniform initial weights, the Fixed Share Forecaster. Reference(s): GB Chapter 6, PLG Chapter 5

Lecture 10 (Sept 3). Tracking regret in the actions game (continued)  Regret analysis for the Fixed Share Forecaster. Reference(s): GB Chapter 6, PLG Chapter 5

Lecture 11 (Sept 8). Improved regret in the Experts game with expconcave losses for the Exponentially Weighted forecaster, Intro. to the Sequential investment problem in an arbitrary market, BuyandHold portfolios, Constantly Rebalancing Portfolios (CRPs). Reference(s): PLG Chapter 10, JA Lectures 1113

Lecture 12 (Sept 10). Cover’s Universal Portfolio algorithm and its regret with respect to CRPs. Reference(s): PLG Chapter 10, JA Lectures 1113

Lecture 13 (Sept 15). Intro. to Online Convex Optimization, the FollowTheLeader (FTL) strategy and its regret. Reference(s): SSS Chapter 2, GB Chapter 8

Lecture 14 (Sept 22). Online Convex Optimization, the FollowTheRegularizedLeader (FTRL) strategy. Reference(s): SSS Chapter 2, GB Chapter 8

Lecture 15 (Sept 24). Online Convex Optimization  FTRL regret analysis, FTRL regret with strongly convex regularizers & Lipschitzcontinuous losses, why the entropic regularizer (ExpWts) is better than the Euclidean regularizer for the experts game. Reference(s): SSS Chapter 2, GB Chapter 8

Lecture 16 (Sept 29). Online Convex Optimization  FTRL and the Online Mirror Descent (OMD) framework, regret bounds via duality interpretation of FTRL, Fenchel dual. Reference(s): SSS Chapter 2, GB Chapter 8, Fenchel duality ref. 1, Fenchel duality ref. 2

Lecture 17 (Oct 1). Online Mirror Descent  Bregman divergences and Fenchel duals of Legendre functions. Reference(s): SSS Chapter 2, GB Chapter 8, AR Chapter 2

Lecture 18 (Oct 6). Online Mirror Descent algorithm  Lazy and Active versions. OMD regret bound using Bregman divergences. Intro. to (nonstochastic) Multiarmed Bandits. Reference(s): SSS Chapter 2, GB Chapters 8, 9, 12 & 15, AR Chapter 2

Lecture 19 (Oct 8). Nonstochastic bandits: the EXP3 algorithm and its regret. Reference(s): GB Chapter 15, JA lecture 20

Lecture 20 (Oct 13). Nonstochastic bandits: the EXP3IX (EXP3 with Implicit eXploration) algorithm and a bound on the tail of the regret (i.e., regret with high probability). Reference(s): Neu, 2015

Lecture 21 (Oct 15). Nonstochastic bandits: Fundamental lower bound on regret for any bandit algorithm. Reference(s): BCB Sec. 3.3

Lecture 22 (Oct 20). Nonstochastic bandits: Fundamental lower bound on regret for any bandit algorithm (continued). Reference(s): BCB Sec. 3.3

Lecture 23 (Oct 27). Stochastic bandits: the Upper Confidence Bound (UCB) algorithm. Reference(s): BCB Chap. 2, Auer et al 2002 UCB paper

Lecture 24 (Oct 29). Stochastic bandits: UCB regret, Thompson sampling algorithm. Reference(s): BCB Chap. 2, Auer et al 2002 UCB paper

Lecture 25 (Nov 3). Stochastic bandits: Thompson sampling regret bound (2 arms). Reference(s): AgrawalGoyal 2012

Lecture 26 (Nov 5). Stochastic bandits: Thompson sampling regret bound (2 arms, continued). Reference(s): AgrawalGoyal 2012

Lecture 27 (Nov 10). The bestarm identification/Pure exploration problem in bandits, PAC sample complexity, the Median Elimination algorithm. Reference(s): EvenDarMannorMansour '06 paper on Median Elimination

Lecture 28 (Nov 12). PAC sample complexity of the Median Elimination algorithm. Reference(s): EvenDarMannorMansour '06 paper on Median Elimination, MannorTsitsiklis '04 — lower bound on sample complexity of PAC bandit arm identification, paper by Bubeck et al, 2009 showing that regretoptimal strategies cannot induce PACoptimal algorithms
Final Exam:
December 2, 2015, 2pm5pm, ECE 1.07
Last updated: 10Oct2018, 15:21:29 IST