Bayesian Q-learning from Imperfect Expert Demonstrations
Author | : Fengdi Che |
Publisher | : |
Total Pages | : |
Release | : 2021 |
ISBN-10 | : OCLC:1291127672 |
ISBN-13 | : |
Rating | : 4/5 (72 Downloads) |
Download or read book Bayesian Q-learning from Imperfect Expert Demonstrations written by Fengdi Che and published by . This book was released on 2021 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: "Deep exploration plays a vital role in sequential decision-making tasks, but it is often difficult to achieve. One solution is to guide exploration with expert demonstrations, but current algorithms often overuse expert information and undermine performance. This thesis proposes a novel algorithm to speed up Q-learning with the help of a limited amount of imperfect expert demonstrations and avoid excessive reliance on expert data. Based on a Bayesian framework, our algorithm relaxes the optimal expert data assumption and alleviates the performance limitation caused by expert data. It gradually overlooks misleading expert information and reduces the usage of uninformative data. Furthermore, our Bayesian Q-learning model based on generalized extended Kalman filter requires fewer restrictions on value functions and decreases the Q-values approximation errors. Additionally, our algorithm can be simply implemented and statistically analyzed.We prove a tight Bayes regret bound for our algorithm, demonstrating the advantage of expert data. Experimentally, we evaluate our approach on a sparse-reward chain environment and six more complicated Atari games with delayed rewards. Our method performs well on tasks with challenging exploration issues and improves the convergence of Q-values in the presence of sub-optimal expert demonstrations. We further test our approach on a multiple-randomly-generated-maze environment and conclude that our current algorithm needs improvements in terms of generalization to unseen states. In general, our novel approach of utilizing expert demonstrations shows better results than a state-of-the-art method, Deep Q-learning from Demonstrations (Hester et al, 2017), in most environments"--