Online Course Support | Practical Reinforcement Learning

What of the following may complicate optimization in RL?

 
 
 

See the lecture for illustration of the problem when the negative loop becomes a positive one after a mean subtraction.

 
 

Sparse signal is hard to find, so extensive exploration may be required, and, as a result, sample efficiency of the learning methods degrades.

 
 

These are reinforced by an agent, and it can get stuck in such a loop forever, with no intention to seek after the correct behavior, which is inferior in return because of the reward design errors.

Additionally, even if reward design is correct, any positive feedback loop captures the agent’s attention, slowing down the discovery of the best return possible.

 
 
 

Therefore, the exact value of the V(s) function is a sum of an infinite series.

 

Similar Posts