RCPPO

In this project I implemented the Reward Constrained Policy Optimization Paper by Tessler et al. into stable-baselines3 implementation of PPO. Additionally, I reproduced the original results by tracking my experiments using weights and biases. The code for this project can be found here. I also wrote an article elaborating on the theory of RCPO and my results and submitted it to the ICLR Blogposts Track! You can fin the article here

Share on

Twitter Facebook LinkedIn

Hayden Kwok

Share on