Open Access

Lecture Notes in Computer Science, pages 30-44

Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study

Ângelo Gregório Lovatto ¹

Leliane Nunes de Barros ¹

Denis Deratani Mauá ¹

Hide authors affiliations Show authors affiliations: 1 affiliation

Instituto de Matemática e Estatística, Universidade de SÃo Paulo, SÃo Paulo, Brazil |

Publication type: Book Chapter

Publication date: 2022-11-18

Springer Nature

Journal: Lecture Notes in Computer Science

SJR: 0.606

CiteScore: 2.6

Impact factor: —

ISSN: 03029743, 16113349, 18612075, 18612083

DOI: 10.1007/978-3-031-21689-3_3

Copy DOI

Abstract

Model-based Reinforcement Learning (MBRL) agents use data collected by exploration of the environment to produce a model of the dynamics, which is then used to select a policy that maximizes the objective function. Stochastic Value Gradient (SVG) methods perform the latter step by optimizing some estimate of the value function gradient. Despite showing promising empirical results, many implementations of SVG methods lack rigorous theoretical or empirical justification; this casts doubts as to whether good performance are in large part due to the benchmark-overfitting. To better understand the advantages and shortcomings of existing SVG methods, in this work we carry out a fine-grained empirical analysis of three core components of SVG-based agents: (i) the gradient estimator formula, (ii) the model learning and (iii) the value function approximation. To this end, we extend previous work that proposes using Linear Quadratic Gaussian (LQG) regulator problems to benchmark SVG methods. LQG problems are heavily studied in optimal control literature and deliver challenging learning settings while still allowing comparison with ground-truth values. We use such problems to investigate the contribution of each core component of SVG methods to the overall performance. We focus our analysis on the model learning component, which was neglected from previous work, and we show that overfitting to on-policy data can lead to accurate state predictions but inaccurate gradients, highlighting the importance of exploration also in model-based methods.

Found

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

Metrics

Cite this

GOST |

Cite this

GOST Copy

Lovatto Â. G., de Barros L. N., Mauá D. D. Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study // Lecture Notes in Computer Science. 2022. pp. 30-44.

GOST all authors (up to 50) Copy

Lovatto Â. G., de Barros L. N., Mauá D. D. Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study // Lecture Notes in Computer Science. 2022. pp. 30-44.

RIS |

Cite this

RIS Copy

TY - GENERIC

DO - 10.1007/978-3-031-21689-3_3

UR - https://doi.org/10.1007/978-3-031-21689-3_3

TI - Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study

T2 - Lecture Notes in Computer Science

AU - Lovatto, Ângelo Gregório

AU - de Barros, Leliane Nunes

AU - Mauá, Denis Deratani

PY - 2022

DA - 2022/11/18

PB - Springer Nature

SP - 30-44

SN - 0302-9743

SN - 1611-3349

SN - 1861-2075

SN - 1861-2083

ER -

BibTex

Cite this

BibTex (up to 50 authors) Copy

@incollection{2022_Lovatto,

author = {Ângelo Gregório Lovatto and Leliane Nunes de Barros and Denis Deratani Mauá},

title = {Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study},

publisher = {Springer Nature},

year = {2022},

pages = {30--44},

month = {nov}

}

Found error?

Publisher

Springer Nature

Journal

Lecture Notes in Computer Science

SJR

0.606

CiteScore

2.6

Impact factor

—

ISSN

03029743 (Print)

16113349 (Electronic)

18612075 (Print)

18612083 (Electronic)