Open Access
Lecture Notes in Computer Science, pages 30-44
Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study
1
Instituto de Matemática e Estatística, Universidade de SÃo Paulo, SÃo Paulo, Brazil
|
Publication type: Book Chapter
Publication date: 2022-11-18
Journal:
Lecture Notes in Computer Science
Q2
SJR: 0.606
CiteScore: 2.6
Impact factor: —
ISSN: 03029743, 16113349, 18612075, 18612083
Abstract
Model-based Reinforcement Learning (MBRL) agents use data collected by exploration of the environment to produce a model of the dynamics, which is then used to select a policy that maximizes the objective function. Stochastic Value Gradient (SVG) methods perform the latter step by optimizing some estimate of the value function gradient. Despite showing promising empirical results, many implementations of SVG methods lack rigorous theoretical or empirical justification; this casts doubts as to whether good performance are in large part due to the benchmark-overfitting. To better understand the advantages and shortcomings of existing SVG methods, in this work we carry out a fine-grained empirical analysis of three core components of SVG-based agents: (i) the gradient estimator formula, (ii) the model learning and (iii) the value function approximation. To this end, we extend previous work that proposes using Linear Quadratic Gaussian (LQG) regulator problems to benchmark SVG methods. LQG problems are heavily studied in optimal control literature and deliver challenging learning settings while still allowing comparison with ground-truth values. We use such problems to investigate the contribution of each core component of SVG methods to the overall performance. We focus our analysis on the model learning component, which was neglected from previous work, and we show that overfitting to on-policy data can lead to accurate state predictions but inaccurate gradients, highlighting the importance of exploration also in model-based methods.
Found
Are you a researcher?
Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
Cite this
GOST |
RIS |
BibTex
Cite this
GOST
Copy
Lovatto Â. G., de Barros L. N., Mauá D. D. Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study // Lecture Notes in Computer Science. 2022. pp. 30-44.
GOST all authors (up to 50)
Copy
Lovatto Â. G., de Barros L. N., Mauá D. D. Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study // Lecture Notes in Computer Science. 2022. pp. 30-44.
Cite this
RIS
Copy
TY - GENERIC
DO - 10.1007/978-3-031-21689-3_3
UR - https://doi.org/10.1007/978-3-031-21689-3_3
TI - Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study
T2 - Lecture Notes in Computer Science
AU - Lovatto, Ângelo Gregório
AU - de Barros, Leliane Nunes
AU - Mauá, Denis Deratani
PY - 2022
DA - 2022/11/18
PB - Springer Nature
SP - 30-44
SN - 0302-9743
SN - 1611-3349
SN - 1861-2075
SN - 1861-2083
ER -
Cite this
BibTex (up to 50 authors)
Copy
@incollection{2022_Lovatto,
author = {Ângelo Gregório Lovatto and Leliane Nunes de Barros and Denis Deratani Mauá},
title = {Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study},
publisher = {Springer Nature},
year = {2022},
pages = {30--44},
month = {nov}
}