Skip to main content

Q-Learning Algorithms

Q-Learning algorithms are a family of Reinforcement Learning algorithms.

Unlike policy gradient methods, which attempt to learn functions which directly map an observation to an action, Q-Learning attempts to learn the value of being in a given state, and taking a specific action there.

Policy Gradient Method - Attempts to learn functions which directly map an observation to an action

Q-Learning - Attempts to learn the value of being in a given state, and taking a specific action there

Q - Quality

Q - Long term discounted reward we expect from taking action a in state s

The policy for state s is to choose the actual bias Q value.

Policy - Policy is a simple lookup table: state -> best action

Reward - the reward from our immediate action, plus all discounted future rewards from applying the current policy (Denoted by capital G)

Policy Gradient Method - Attempts to learn functions which directly map an observation to an action
Q-Learning - Attempts to learn the value of being in a given state, and taking a specific action there
Policy - Policy is a simple lookup table: state -> best action
Reward - the reward from our immediate action, plus all discounted future rewards from applying the current policy (Denoted by capital G)