Open Access
IEEE Access, volume 9, pages 126034-126047
Hybrid Policy Learning for Multi-Agent Pathfinding
Publication type: Journal Article
Publication date: 2021-09-09
General Materials Science
General Engineering
General Computer Science
Abstract
In this work we study the behavior of groups of autonomous vehicles, which are the part of the Internet of Vehicles systems. One of the challenging modes of operation of such systems is the case when the observability of each vehicle is limited and the global/local communication is unstable, e.g. in the crowded parking lots. In such scenarios the vehicles have to rely on the local observations and exhibit cooperative behavior to ensure safe and efficient trips. This type of problems can be abstracted to the so-called multi-agent pathfinding when a group of agents, confined to a graph, have to find collision-free paths to their goals (ideally, minimizing an objective function e.g. travel time). Widely used algorithms for solving this problem rely on the assumption that a central controller exists for which the full state of the environment (i.e. the agents current positions, their targets, configuration of the static obstacles etc.) is known and they cannot be straightforwardly be adapted to the partially-observable setups. To this end, we suggest a novel approach which is based on the decomposition of the problem into the two sub-tasks: reaching the goal and avoiding the collisions. To accomplish each of this task we utilize reinforcement learning methods such as Deep Monte Carlo Tree Search, Q-mixing networks, and policy gradients methods to design the policies that map the agents’ observations to actions. Next, we introduce the policy-mixing mechanism to end up with a single hybrid policy that allows each agent to exhibit both types of behavior – the individual one (reaching the goal) and the cooperative one (avoiding the collisions with other agents). We conduct an extensive empirical evaluation that shows that the suggested hybrid-policy outperforms standalone stat-of-the-art reinforcement learning methods for this kind of problems by a notable margin.
Citations by journals
1
2
|
|
Lecture Notes in Computer Science
|
Lecture Notes in Computer Science
2 publications, 20%
|
PeerJ Computer Science
|
PeerJ Computer Science
1 publication, 10%
|
Doklady Mathematics
|
Doklady Mathematics
1 publication, 10%
|
IEEE Access
|
IEEE Access
1 publication, 10%
|
Studies in Computational Intelligence
|
Studies in Computational Intelligence
1 publication, 10%
|
Journal of Marine Science and Engineering
|
Journal of Marine Science and Engineering
1 publication, 10%
|
Applied Intelligence
|
Applied Intelligence
1 publication, 10%
|
Knowledge-Based Systems
|
Knowledge-Based Systems
1 publication, 10%
|
1
2
|
Citations by publishers
1
2
3
4
|
|
Springer Nature
|
Springer Nature
4 publications, 40%
|
IEEE
|
IEEE
2 publications, 20%
|
PeerJ
|
PeerJ
1 publication, 10%
|
Pleiades Publishing
|
Pleiades Publishing
1 publication, 10%
|
Multidisciplinary Digital Publishing Institute (MDPI)
|
Multidisciplinary Digital Publishing Institute (MDPI)
1 publication, 10%
|
Elsevier
|
Elsevier
1 publication, 10%
|
1
2
3
4
|
- We do not take into account publications that without a DOI.
- Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
- Statistics recalculated weekly.
{"yearsCitations":{"type":"bar","data":{"show":true,"labels":[2022,2023,2024],"ids":[0,0,0],"codes":[0,0,0],"imageUrls":["","",""],"datasets":[{"label":"Citations number","data":[4,5,1],"backgroundColor":["#3B82F6","#3B82F6","#3B82F6"],"percentage":["40","50","10"],"barThickness":null}]},"options":{"indexAxis":"x","maintainAspectRatio":true,"scales":{"y":{"ticks":{"precision":0,"autoSkip":false,"font":{"family":"Montserrat"},"color":"#000000"}},"x":{"ticks":{"stepSize":1,"precision":0,"font":{"family":"Montserrat"},"color":"#000000"}}},"plugins":{"legend":{"position":"top","labels":{"font":{"family":"Montserrat"},"color":"#000000"}},"title":{"display":true,"text":"Citations per year","font":{"size":24,"family":"Montserrat","weight":600},"color":"#000000"}}}},"journals":{"type":"bar","data":{"show":true,"labels":["Lecture Notes in Computer Science","PeerJ Computer Science","Doklady Mathematics","IEEE Access","Studies in Computational Intelligence","Journal of Marine Science and Engineering","Applied Intelligence","Knowledge-Based Systems"],"ids":[1022,4905,12175,25260,2714,839,13029,5293],"codes":[0,0,0,0,0,0,0,0],"imageUrls":["\/storage\/images\/resized\/voXLqlsvTwv5p3iMQ8Dhs95nqB4AXOG7Taj7G4ra_medium.webp","\/storage\/images\/resized\/aOPqEqNWK2Sq9T8CLDlsR6diWsFkHppx3bLsAnpx_medium.webp","\/storage\/images\/resized\/oZgeErrVFhuDksyqFURLvYS1wtVSBWczh001igGo_medium.webp","\/storage\/images\/resized\/6scCJegesojp2jubwY3uKCzTAmgsaH2GIFlg6Hfk_medium.webp","\/storage\/images\/resized\/voXLqlsvTwv5p3iMQ8Dhs95nqB4AXOG7Taj7G4ra_medium.webp","\/storage\/images\/resized\/MjH1ITP7lMYGxeqUZfkt2BnVLgjkk413jwBV97XX_medium.webp","\/storage\/images\/resized\/voXLqlsvTwv5p3iMQ8Dhs95nqB4AXOG7Taj7G4ra_medium.webp","\/storage\/images\/resized\/GDnYOu1UpMMfMMRV6Aqle4H0YLLsraeD9IP9qScG_medium.webp"],"datasets":[{"label":"","data":[2,1,1,1,1,1,1,1],"backgroundColor":["#3B82F6","#3B82F6","#3B82F6","#3B82F6","#3B82F6","#3B82F6","#3B82F6","#3B82F6"],"percentage":[20,10,10,10,10,10,10,10],"barThickness":13}]},"options":{"indexAxis":"y","maintainAspectRatio":false,"scales":{"y":{"ticks":{"precision":0,"autoSkip":false,"font":{"family":"Montserrat"},"color":"#000000"}},"x":{"ticks":{"stepSize":null,"precision":0,"font":{"family":"Montserrat"},"color":"#000000"}}},"plugins":{"legend":{"position":"top","labels":{"font":{"family":"Montserrat"},"color":"#000000"}},"title":{"display":true,"text":"Journals","font":{"size":24,"family":"Montserrat","weight":600},"color":"#000000"}}}},"publishers":{"type":"bar","data":{"show":true,"labels":["Springer Nature","IEEE","PeerJ","Pleiades Publishing","Multidisciplinary Digital Publishing Institute (MDPI)","Elsevier"],"ids":[8,6953,1089,101,202,17],"codes":[0,0,0,0,0,0],"imageUrls":["\/storage\/images\/resized\/voXLqlsvTwv5p3iMQ8Dhs95nqB4AXOG7Taj7G4ra_medium.webp","\/storage\/images\/resized\/6scCJegesojp2jubwY3uKCzTAmgsaH2GIFlg6Hfk_medium.webp","\/storage\/images\/resized\/aOPqEqNWK2Sq9T8CLDlsR6diWsFkHppx3bLsAnpx_medium.webp","\/storage\/images\/resized\/oZgeErrVFhuDksyqFURLvYS1wtVSBWczh001igGo_medium.webp","\/storage\/images\/resized\/MjH1ITP7lMYGxeqUZfkt2BnVLgjkk413jwBV97XX_medium.webp","\/storage\/images\/resized\/GDnYOu1UpMMfMMRV6Aqle4H0YLLsraeD9IP9qScG_medium.webp"],"datasets":[{"label":"","data":[4,2,1,1,1,1],"backgroundColor":["#3B82F6","#3B82F6","#3B82F6","#3B82F6","#3B82F6","#3B82F6"],"percentage":[40,20,10,10,10,10],"barThickness":13}]},"options":{"indexAxis":"y","maintainAspectRatio":false,"scales":{"y":{"ticks":{"precision":0,"autoSkip":false,"font":{"family":"Montserrat"},"color":"#000000"}},"x":{"ticks":{"stepSize":null,"precision":0,"font":{"family":"Montserrat"},"color":"#000000"}}},"plugins":{"legend":{"position":"top","labels":{"font":{"family":"Montserrat"},"color":"#000000"}},"title":{"display":true,"text":"Publishers","font":{"size":24,"family":"Montserrat","weight":600},"color":"#000000"}}}}}
Metrics
Cite this
GOST |
RIS |
BibTex
Cite this
GOST
Copy
Skrynnik A. et al. Hybrid Policy Learning for Multi-Agent Pathfinding // IEEE Access. 2021. Vol. 9. pp. 126034-126047.
GOST all authors (up to 50)
Copy
Skrynnik A., Yakovleva A., Davydov V., Yakovlev K., Panov A. Hybrid Policy Learning for Multi-Agent Pathfinding // IEEE Access. 2021. Vol. 9. pp. 126034-126047.
Cite this
RIS
Copy
TY - JOUR
DO - 10.1109/ACCESS.2021.3111321
UR - https://doi.org/10.1109%2FACCESS.2021.3111321
TI - Hybrid Policy Learning for Multi-Agent Pathfinding
T2 - IEEE Access
AU - Skrynnik, Alexey
AU - Yakovleva, Alexandra
AU - Davydov, Vasilii
AU - Yakovlev, Konstantin
AU - Panov, Aleksandr
PY - 2021
DA - 2021/09/09 00:00:00
PB - IEEE
SP - 126034-126047
VL - 9
SN - 2169-3536
ER -
Cite this
BibTex
Copy
@article{2021_Skrynnik,
author = {Alexey Skrynnik and Alexandra Yakovleva and Vasilii Davydov and Konstantin Yakovlev and Aleksandr Panov},
title = {Hybrid Policy Learning for Multi-Agent Pathfinding},
journal = {IEEE Access},
year = {2021},
volume = {9},
publisher = {IEEE},
month = {sep},
url = {https://doi.org/10.1109%2FACCESS.2021.3111321},
pages = {126034--126047},
doi = {10.1109/ACCESS.2021.3111321}
}