PublicationsInterval Q-Learning: Balancing Deep and Wide ExplorationGrigory Neustroev, Canmanie T. Ponnambalam, Mathijs M. de Weerdt, and Matthijs T. J. Spaan. Interval Q-Learning: Balancing Deep and Wide Exploration. In Adaptive and Learning Agents, 2020. Workshop at AAMAS20 DownloadAbstractReinforcement learning requires exploration, leading to repeated execution of sub-optimal actions. Naive exploration techniques address this problem by changing gradually from exploration to exploitation. This approach employs a wide search resulting in exhaustive exploration and low sample-efficiency. More advanced search methods explore optimistically based on an upper bound estimate of expected rewards. These methods employ deep search, aiming to reach states not previously visited. Another deep search strategy is found in action-elimination methods, which aim to discover and eliminate sub-optimal actions. Despite the effectiveness of advanced deep search strategies, some problems are better suited to naive exploration. We devise a new method, called Interval Q-Learning, that finds a balance between wide and deep search. It assigns a small probability to taking sub-optimal actions and combines both greedy and optimistic exploration. This allows for fast convergence to a near-optimal policy, and then exploration around it. We demonstrate the performance of tabular and deep Q-network versions of Interval Q-Learning, showing that it offers convergence speed-up both in problems that favor wide exploration methods and those that favor deep search strategies. BibTeX Entry@InProceedings{Neustroev20, author = {Grigory Neustroev and Canmanie T. Ponnambalam and Mathijs M. de Weerdt and Matthijs T. J. Spaan}, title = {Interval {Q}-Learning: Balancing Deep and Wide Exploration}, booktitle = {Adaptive and Learning Agents}, year = 2020, note = {Workshop at AAMAS20} } Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. Generated by bib2html.pl (written by Patrick Riley) on Thu Feb 29, 2024 16:15:45 UTC |