Publications

PEBL: Pessimistic Ensembles for Offline Deep Reinforcement Learning

Jordi Smit, Canmanie Ponnambalam, Matthijs T. J. Spaan, and Frans A. Oliehoek. PEBL: Pessimistic Ensembles for Offline Deep Reinforcement Learning. In Robust and Reliable Autonomy in the Wild, 2021. Workshop at IJCAI-21

Download

pdf 

Abstract

Offline reinforcement learning (RL), or learning from a fixed data set, is an attractive alternative to online RL. Offline RL promises to address the cost and safety implications of taking numerous random or bad actions online, a crucial aspect of traditional RL that makes it difficult to apply in real-world problems. However, when RL is na\"ively applied to a fixed data set, the resulting policy may exhibit poor performance in the real environment. This happens due to over-estimation of the value of state-action pairs not sufficiently covered by the data set. A promising way to avoid this is by applying pessimism and acting according to a lower bound estimate on the value. It has been shown that penalizing the learned value according to a pessimistic bound on the uncertainty can drastically improve offline RL. In deep reinforcement learning, however, uncertainty estimation is highly non-trivial and development of effective uncertainty-based pessimistic algorithms remains an open question. This paper introduces two novel offline deep RL methods built on Double Deep Q-Learning and Soft Actor-Critic. We show how a multi-headed bootstrap approach to uncertainty estimation is used to cal- culate an effective pessimistic value penalty. Our approach is applied to benchmark offline deep RL domains, where we demonstrate that our methods can often beat the current state- of-the-art.

BibTeX Entry

@inproceedings{Smit21r2aw,
  title =        {{PEBL}: Pessimistic Ensembles for Offline Deep
                  Reinforcement Learning},
  author =       {Jordi Smit and Canmanie Ponnambalam and Matthijs
                 T. J. Spaan and Frans A. Oliehoek},
  year =         2021,
  booktitle =    {Robust and Reliable Autonomy in the Wild},
  note =         {Workshop at IJCAI-21}
}

Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Generated by bib2html.pl (written by Patrick Riley) on Thu Feb 29, 2024 16:15:45 UTC