PublicationsBad Habits: Policy Confounding and Out-of-Trajectory Generalization in RLMiguel Suau, Matthijs T. J. Spaan, and Frans A. Oliehoek. Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL. In European Workshop on Reinforcement Learning, 2023. DownloadAbstractReinforcement learning agents may sometimes develop habits that are effective only when specific policies are followed. After an initial exploration phase in which agents try out different actions, they eventually converge toward a particular policy. When this occurs, the distribution of state-action trajectories becomes narrower, and agents start experiencing the same transitions again and again. At this point, spurious correlations may arise. Agents may then pick up on these correlations and learn state representations that do not generalize beyond the agent's trajectory distribution. In this paper, we provide a mathematical characterization of this phenomenon, which we refer to as policy confounding, and show, through a series of examples, when and how it occurs in practice. BibTeX Entry@InProceedings{Suau23ewrl, author = {Miguel Suau and Matthijs T. J. Spaan and Frans A. Oliehoek}, title = {Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in {RL}}, year = 2023, booktitle = {European Workshop on Reinforcement Learning}, } Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. Generated by bib2html.pl (written by Patrick Riley) on Thu Feb 29, 2024 16:15:45 UTC |