Markov Decision Processes – Engineering AI Agents

\(\epsilon\)-greedy Monte-Carlo (MC) Control

In this section we outline methods that can result in optimal policies when the MDP is unknown and we need to learn its underlying functions / models - also known as the mode…

Generalized Policy Iteration

As we saw in the dynamic programming (DP) solution MDP problem, policy iteration is an algorithm that consists of two simultaneous, interacting processes: one making the…

Model-free control

Monte-Carlo Prediction

In this chapter we find optimal policy solutions when the MDP is unknown and we need to learn its underlying value functions - also known as the model free prediction…

Reinforcement Learning

Different Approaches to solve known and unknown MDPs

Temporal Difference (TD) Prediction

If one had to identify one idea as central and novel to reinforcement learning, it would undoubtedly be temporal-difference(TD) learning. TD learning is a combination of…

The SARSA Algorithm

SARSA implements a \(Q(s,a)\) value-based GPI and naturally follows as an enhancement from the \(\epsilon-greedy\) policy improvement step of MC control.