ShinyBandit is an implimentation of a Bandit task. Bandit tasks are prototypical examples of reinforcement learning tasks where an person (or animal or organization) is trying to maximize their rewards by interacting with an uncertain environment (Sutton and Barto 1998).

In the task, players are presented with options (in this case, 3) represented as boxes. The boxes contain many tickets. Each ticket has a point value written on it ranging from -100 to +100. Tickets with high positive values are valuable and result in points, while tickets with low negative values are to be avoided as they remove points.

Each box has its own distribution of tickets. However, players do not know for sure what the distribution of ticket values is in each box. Some boxes may have higher ticket values on average than others, and some may have more variable ticket values than others.

Players have a fixed (e.g.; 50) number of trials where they can select a box and draw a random ticket. When a ticket is drawn, the point value on the ticket is displayed and its value is added (or subtracted) to the player’s cumulative point total. The ticket is then returned to the box. When the final trial is completed, the game is over.

Players play the game several times. The distributions (boxes) are the same in each game. However, their location is randomly determined at the start of each game. Therefore, while players can learn about the overall decision environment from one game to another, they will always have to learn which option is which in each game.

Play ShinyBandit

You can play ShinyBandit at or in the window below (Note that the app runs smoother in a separate window):

Source code

See for additional details of the code


Sutton, Richard S, and Andrew G Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. 1. MIT press Cambridge.