Preferential proximal policy optimization in reinforcement learning

Balasuntharam, Tamilselvan

Preferential proximal policy optimization in reinforcement learning

Files

Balasuntharam_Tamilselvan.pdf (2.17 MB)

Date

2023-12-01

Authors

Balasuntharam, Tamilselvan

Abstract

The Proximal Policy Optimization (PPO), a policy gradient method, excels in reinforcement learning with its ”surrogate” objective function and stochastic gradient ascent. However, PPO does not fully consider the significance of frequently encountered states in policy/value updates. To address this, this Thesis introduces Preferential Proximal Policy Optimization (P3O), which integrates the importance of these states into parameter updates. We determine state importance by multiplying the variance of action probabilities by the value function, then normalizing and smoothing this with the Exponentially Weighted Moving Average (EWMA). This calculated importance is incorporated into the surrogate objective function, redefining value and advantage estimation in PPO. Our method auto-selects state importance, which can apply to any on-policy reinforcement learning algorithm using a value function. Empirical evaluations across six Atari environments demonstrate that our approach outperforms the baseline (vanilla PPO) across different tested environments, highlighting the value of our proposed method in learning complex environments.

Keywords

Reinforcement learning, Policy gradient methods, Deep learning, Policy optimization

URI

https://hdl.handle.net/10155/1706

Collections

Electronic Theses and Dissertations
Master Theses & Projects (FSCI)

Full item page

Preferential proximal policy optimization in reinforcement learning

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections