License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CONCUR.2020.21
URN: urn:nbn:de:0030-drops-128332
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/12833/
Go to the corresponding LIPIcs Volume Portal


Hahn, Ernst Moritz ; Perez, Mateo ; Schewe, Sven ; Somenzi, Fabio ; Trivedi, Ashutosh ; Wojtczak, Dominik

Model-Free Reinforcement Learning for Stochastic Parity Games

pdf-format:
LIPIcs-CONCUR-2020-21.pdf (0.6 MB)


Abstract

This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 1 1/2-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions.

BibTeX - Entry

@InProceedings{hahn_et_al:LIPIcs:2020:12833,
  author =	{Ernst Moritz Hahn and Mateo Perez and Sven Schewe and Fabio Somenzi and Ashutosh Trivedi and Dominik Wojtczak},
  title =	{{Model-Free Reinforcement Learning for Stochastic Parity Games}},
  booktitle =	{31st International Conference on Concurrency Theory (CONCUR 2020)},
  pages =	{21:1--21:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-160-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{171},
  editor =	{Igor Konnov and Laura Kov{\'a}cs},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2020/12833},
  URN =		{urn:nbn:de:0030-drops-128332},
  doi =		{10.4230/LIPIcs.CONCUR.2020.21},
  annote =	{Keywords: Reinforcement learning, Stochastic games, Omega-regular objectives}
}

Keywords: Reinforcement learning, Stochastic games, Omega-regular objectives
Collection: 31st International Conference on Concurrency Theory (CONCUR 2020)
Issue Date: 2020
Date of publication: 26.08.2020


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI