In
probability theory, the
multi-armed bandit problem (sometimes called the
K- or N-armed bandit problem) is a problem in which a gambler at a row of
slot machines (sometimes known as "one-armed bandits") has to decide which machines to play, how many times to play each machine and in which order to play them. When played, each machine provides a random reward from a distribution specific to that machine. The objective of the gambler is to maximize the sum of rewards earned through a sequence of lever pulls.