d214: Q-learning

Q-learning is a model-free reinforcement learning technique. It can be used to find an optimal action-selection policy for any given Markov decision process (MDP)



  1. Agent senses its environment, using this information to determine its current state
  2. Agent takes an action and obtain a penalty or reward
  3. Agent senses its environment again – to see what effect its chosen action had
  4. Agent learns from its experience (and so makes ‘better’ decisions next time)

Source: How does Q-learning work?


Python: http://mnemstudio.org/path-finding-q-learning-tutorial.htm [Raw]