Bayesian models and inference for reinforcement learning: the multi-armed bandit case.
The most celebrated corners of machine learning over the past decades are those successful at predicting — e.g., spam classification, medical diagnoses, or cat faces. However, a wide variety of applied problems are prescriptive rather than predictive: those for which decisions must be made in order to maximize a reward. Such problems are common in health, commerce, and engineering. One particular setting for optimizing interactions with the unknown world is the multi-armed bandit, which describes sequential decision processes, a particular instance of reinforcement learning.
In this talk, I will show how Bayesian models and inference methods from the statistics and machine learning community — particularly variational and Monte Carlo methods — can be used to extend multi-armed bandit models, improve learning on complex scenarios, and make informed decisions.