# d214: Q-learning

Q-learning is a model-free reinforcement learning technique. It can be used to find an optimal action-selection policy for any given Markov decision process (MDP)

Algorithm:

1. Agent senses its environment, using this information to determine its current state
2. Agent takes an action and obtain a penalty or reward
3. Agent senses its environment again – to see what effect its chosen action had
4. Agent learns from its experience (and so makes ‘better’ decisions next time)

Source: How does Q-learning work?

Implementation:

Links: