License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CONCUR.2020.3
URN: urn:nbn:de:0030-drops-128155
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/12815/
Go to the corresponding LIPIcs Volume Portal


Jansen, Nils ; Könighofer, Bettina ; Junges, Sebastian ; Serban, Alex ; Bloem, Roderick

Safe Reinforcement Learning Using Probabilistic Shields (Invited Paper)

pdf-format:
LIPIcs-CONCUR-2020-3.pdf (3 MB)


Abstract

This paper concerns the efficient construction of a safety shield for reinforcement learning. We specifically target scenarios that incorporate uncertainty and use Markov decision processes (MDPs) as the underlying model to capture such problems. Reinforcement learning (RL) is a machine learning technique that can determine near-optimal policies in MDPs that may be unknown before exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability. We employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. These results help to realize a shield that, when applied to an RL algorithm, restricts the agent from taking unsafe actions, while optimizing the performance objective. We discuss tradeoffs between sufficient progress in the exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.

BibTeX - Entry

@InProceedings{jansen_et_al:LIPIcs:2020:12815,
  author =	{Nils Jansen and Bettina K{\"o}nighofer and Sebastian Junges and Alex Serban and Roderick Bloem},
  title =	{{Safe Reinforcement Learning Using Probabilistic Shields (Invited Paper)}},
  booktitle =	{31st International Conference on Concurrency Theory (CONCUR 2020)},
  pages =	{3:1--3:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-160-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{171},
  editor =	{Igor Konnov and Laura Kov{\'a}cs},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2020/12815},
  URN =		{urn:nbn:de:0030-drops-128155},
  doi =		{10.4230/LIPIcs.CONCUR.2020.3},
  annote =	{Keywords: Safe Reinforcement Learning, Formal Verification, Safe Exploration, Model Checking, Markov Decision Process}
}

Keywords: Safe Reinforcement Learning, Formal Verification, Safe Exploration, Model Checking, Markov Decision Process
Collection: 31st International Conference on Concurrency Theory (CONCUR 2020)
Issue Date: 2020
Date of publication: 26.08.2020
Supplementary Material: http://shieldrl.nilsjansen.org


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI