Open Access

Lecture Notes in Computer Science, volume 13067 LNAI, pages 108-120

Long-Term Exploration in Persistent MDPs

Ugadiarov Leonid ¹

Skrynnik Alexey ^{1, 2}

Panov Aleksandr ^{1, 2}

Hide authors affiliations Show authors affiliations: 2 affiliations

Moscow Institute of Physics and Technology, Moscow, russia |

Artificial Intelligence Research Institute FRC CSC RAS, Moscow, Russia

Artificial Intelligence Research Institute

Federal Research Center Computer Science and Control of the Russian Academy of Sciences

Publication type: Book Chapter

Publication date: 2021-10-20

Springer Nature

Journal: Lecture Notes in Computer Science

Quartile SCImago

Quartile WOS

—

Impact factor: —

ISSN: 03029743, 16113349, 18612075, 18612083

DOI: 10.1007/978-3-030-89817-5_8

Copy DOI

Abstract

Exploration is an essential part of reinforcement learning, which restricts the quality of learned policy. Hard-exploration environments are defined by huge state space and sparse rewards. In such conditions, an exhaustive exploration of the environment is often impossible, and the successful training of an agent requires a lot of interaction steps. In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process, in which agents during training can roll back to visited states. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge. At all used levels of the game, our agent outperforms or shows comparable results with state-of-the-art curiosity methods with knowledge-based intrinsic motivation: ICM and RND. An implementation of RbExplore can be found at https://github.com/cds-mipt/RbExplore .

By date By citations

Citations by journals

	1
Neural Computing and Applications	Neural Computing and Applications, 1, 100% Neural Computing and Applications 1 publication, 100%
	1

Citations by publishers

	1
Springer Nature	Springer Nature, 1, 100% Springer Nature 1 publication, 100%
	1

We do not take into account publications that without a DOI.
Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
Statistics recalculated weekly.

{"yearsCitations":{"type":"bar","data":{"show":true,"labels":[2022],"ids":[0],"codes":[0],"imageUrls":[""],"datasets":[{"label":"Citations number","data":[1],"backgroundColor":["#3B82F6"],"percentage":["100"],"barThickness":null}]},"options":{"indexAxis":"x","maintainAspectRatio":true,"scales":{"y":{"ticks":{"precision":0,"autoSkip":false,"font":{"family":"Montserrat"},"color":"#000000"}},"x":{"ticks":{"stepSize":1,"precision":0,"font":{"family":"Montserrat"},"color":"#000000"}}},"plugins":{"legend":{"position":"top","labels":{"font":{"family":"Montserrat"},"color":"#000000"}},"title":{"display":true,"text":"Citations per year","font":{"size":24,"family":"Montserrat","weight":600},"color":"#000000"}}}},"journals":{"type":"bar","data":{"show":true,"labels":["Neural Computing and Applications"],"ids":[5072],"codes":[0],"imageUrls":["\/storage\/images\/resized\/voXLqlsvTwv5p3iMQ8Dhs95nqB4AXOG7Taj7G4ra_medium.webp"],"datasets":[{"label":"","data":[1],"backgroundColor":["#3B82F6"],"percentage":[100],"barThickness":13}]},"options":{"indexAxis":"y","maintainAspectRatio":false,"scales":{"y":{"ticks":{"precision":0,"autoSkip":false,"font":{"family":"Montserrat"},"color":"#000000"}},"x":{"ticks":{"stepSize":null,"precision":0,"font":{"family":"Montserrat"},"color":"#000000"}}},"plugins":{"legend":{"position":"top","labels":{"font":{"family":"Montserrat"},"color":"#000000"}},"title":{"display":true,"text":"Journals","font":{"size":24,"family":"Montserrat","weight":600},"color":"#000000"}}}},"publishers":{"type":"bar","data":{"show":true,"labels":["Springer Nature"],"ids":[8],"codes":[0],"imageUrls":["\/storage\/images\/resized\/voXLqlsvTwv5p3iMQ8Dhs95nqB4AXOG7Taj7G4ra_medium.webp"],"datasets":[{"label":"","data":[1],"backgroundColor":["#3B82F6"],"percentage":[100],"barThickness":13}]},"options":{"indexAxis":"y","maintainAspectRatio":false,"scales":{"y":{"ticks":{"precision":0,"autoSkip":false,"font":{"family":"Montserrat"},"color":"#000000"}},"x":{"ticks":{"stepSize":null,"precision":0,"font":{"family":"Montserrat"},"color":"#000000"}}},"plugins":{"legend":{"position":"top","labels":{"font":{"family":"Montserrat"},"color":"#000000"}},"title":{"display":true,"text":"Publishers","font":{"size":24,"family":"Montserrat","weight":600},"color":"#000000"}}}}}

Metrics

Cite this

GOST |

Cite this

GOST Copy

Ugadiarov L. et al. Long-Term Exploration in Persistent MDPs // Lecture Notes in Computer Science. 2021. Vol. 13067 LNAI. pp. 108-120.

GOST all authors (up to 50) Copy

Ugadiarov L., Skrynnik A., Panov A. Long-Term Exploration in Persistent MDPs // Lecture Notes in Computer Science. 2021. Vol. 13067 LNAI. pp. 108-120.

RIS |

Cite this

RIS Copy

TY - GENERIC

DO - 10.1007/978-3-030-89817-5_8

UR - https://doi.org/10.1007%2F978-3-030-89817-5_8

TI - Long-Term Exploration in Persistent MDPs

T2 - Lecture Notes in Computer Science

AU - Ugadiarov, Leonid

AU - Skrynnik, Alexey

AU - Panov, Aleksandr

PY - 2021

DA - 2021/10/20 00:00:00

PB - Springer Nature

SP - 108-120

VL - 13067 LNAI

SN - 0302-9743

SN - 1611-3349

SN - 1861-2075

SN - 1861-2083

ER -

BibTex

Cite this

BibTex Copy

@incollection{2021_Ugadiarov,

author = {Leonid Ugadiarov and Alexey Skrynnik and Aleksandr Panov},

title = {Long-Term Exploration in Persistent MDPs},

publisher = {Springer Nature},

year = {2021},

volume = {13067 LNAI},

pages = {108--120},

month = {oct}

}

Found error?

Publisher

Springer Nature

Journal

Lecture Notes in Computer Science

Quartile SCImago

Quartile WOS

—

Impact factor

—

ISSN

03029743 (Print)
16113349 (Electronic)
18612075 (Print)
18612083 (Electronic)

Labs

MIPT Center for Cognitive Modeling

Profiles

Panov, Aleksandr I