Q-learning Wikipedia. Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. For any finite Markov. See more

Step 1: Create an initial Q-Table with all values initialized to 0. When we initially start, the values of all states and rewards will be 0. Consider the Q-Table shown below which.