License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.DISC.2017.36
URN: urn:nbn:de:0030-drops-80055
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2017/8005/
Go to the corresponding LIPIcs Volume Portal


Michael, Ellis ; Ports, Dan R. K. ; Sharma, Naveen Kr. ; Szekeres, Adriana

Recovering Shared Objects Without Stable Storage

pdf-format:
LIPIcs-DISC-2017-36.pdf (0.5 MB)


Abstract

This paper considers the problem of building fault-tolerant shared objects when processes can crash and recover but lose their persistent state on recovery. This Diskless Crash-Recovery (DCR) model matches the way many long-lived systems are built. We show that it presents new challenges, as operations that are recorded at a quorum may not persist after some of the processes in that quorum crash and then recover.

To address this problem, we introduce the notion of crash-consistent quorums, where no recoveries happen during the quorum responses. We show that relying on crash-consistent quorums enables a recovery procedure that can recover all operations that successfully finished. Crash-consistent quorums can be easily identified using a mechanism we term the crash vector, which tracks the causal relationship between crashes, recoveries, and other operations.

We apply crash-consistent quorums and crash vectors to build two storage primitives. We give a new algorithm for multi-writer, multi-reader atomic registers in the DCR model that guarantees safety under all conditions and termination under a natural condition. It improves on the best prior protocol for this problem by requiring fewer rounds, fewer nodes to participate in the quorum, and a less restrictive liveness condition. We also present a more efficient single-writer, single-reader atomic set - a virtual stable storage abstraction. It can be used to lift any existing algorithm from the traditional Crash-Recovery model to the DCR model. We examine a specific application, state machine replication, and show that existing diskless protocols can violate their correctness guarantees, while ours offers a general and correct solution.

BibTeX - Entry

@InProceedings{michael_et_al:LIPIcs:2017:8005,
  author =	{Ellis Michael and Dan R. K. Ports and Naveen Kr. Sharma and Adriana Szekeres},
  title =	{{Recovering Shared Objects Without Stable Storage}},
  booktitle =	{31st International Symposium on Distributed Computing (DISC 2017)},
  pages =	{36:1--36:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-053-8},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{91},
  editor =	{Andr{\'e}a W. Richa},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2017/8005},
  URN =		{urn:nbn:de:0030-drops-80055},
  doi =		{10.4230/LIPIcs.DISC.2017.36},
  annote =	{Keywords: asynchronous system, fault-tolerance, crash-recovery, R/W register, state machine replication}
}

Keywords: asynchronous system, fault-tolerance, crash-recovery, R/W register, state machine replication
Collection: 31st International Symposium on Distributed Computing (DISC 2017)
Issue Date: 2017
Date of publication: 12.10.2017


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI