PublicationsSafe Policy Improvement with Baseline Bootstrapping in Factored EnvironmentsThiago D. Simão and Matthijs T. J. Spaan. Safe Policy Improvement with Baseline Bootstrapping in Factored Environments. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pp. 4967–4974, 2019. DownloadAbstractWe present a novel safe reinforcement learning algorithm that exploits the factored dynamics of the environment to become less conservative. We focus on problem settings in which a policy is already running and the interaction with the environment is limited. In order to safely deploy an updated policy, it is necessary to provide a confidence level regarding its expected performance. However, algorithms for safe policy improvement might require a large number of past experiences to become confident enough to change the agent\u2019s behavior. Factored reinforcement learning, on the other hand, is known to make good use of the data provided. It can achieve a better sample complexity by exploiting independence between features of the environment, but it lacks a confidence level. We study how to improve the sample efficiency of the safe policy improvement with baseline bootstrapping algorithm by exploiting the factored structure of the environment. Our main result is a theoretical bound that is linear in the number of parameters of the factored representation instead of the number of states. The empirical analysis shows that our method can improve the policy using a number of samples potentially one order of magnitude smaller than the flat algorithm. BibTeX Entry@InProceedings{Simao19aaai, author = {Thiago D. Sim{\~a}o and Matthijs T. J. Spaan}, title = {Safe Policy Improvement with Baseline Bootstrapping in Factored Environments}, booktitle = {Proceedings of the 32nd AAAI Conference on Artificial Intelligence}, pages = {4967--4974}, year = 2019 } Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. Generated by bib2html.pl (written by Patrick Riley) on Thu Feb 29, 2024 16:15:45 UTC |