Online Course Support | Practical Reinforcement Learning

Which of these are correct ways to alter the reward function?

 
 
 
 
 
 

V(s) gets rescaled by the same factor. Optimal actions do not change.

 

Similar Posts