License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.DISC.2023.17
URN: urn:nbn:de:0030-drops-191431
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/19143/
Go to the corresponding LIPIcs Volume Portal


Dhoked, Sahil ; Golab, Wojciech ; Mittal, Neeraj

Modular Recoverable Mutual Exclusion Under System-Wide Failures

pdf-format:
LIPIcs-DISC-2023-17.pdf (1 MB)


Abstract

Recoverable mutual exclusion (RME) is a fault-tolerant variation of Dijkstra’s classic mutual exclusion (ME) problem that allows processes to fail by crashing as long as they recover eventually. A growing body of literature on this topic, starting with the problem formulation by Golab and Ramaraju (PODC'16), examines the cost of solving the RME problem, which is quantified by counting the expensive shared memory operations called remote memory references (RMRs), under a variety of conditions. Published results show that the RMR complexity of RME algorithms, among other factors, depends crucially on the failure model used: individual process versus system-wide. Recent work by Golab and Hendler (PODC'18) also suggests that explicit failure detection can be helpful in attaining constant RMR solutions to the RME problem in the system-wide failure model. Follow-up work by Jayanti, Jayanti, and Joshi (SPAA'23) shows that such a solution exists even without employing a failure detector, albeit this solution uses a more complex algorithmic approach.
In this work, we dive deeper into the study of RMR-optimal RME algorithms for the system-wide failure model, and present contributions along multiple directions. First, we introduce the notion of withdrawing from a lock acquisition rather than resetting the lock. We use this notion to design a withdrawable RME algorithm with optimal O(1) RMR complexity for both cache-coherent (CC) and distributed shared memory (DSM) models in a modular way without using an explicit failure detector. In some sense, our technique marries the simplicity of Golab and Hendler’s algorithm with Jayanti, Jayanti and Joshi’s weaker system model. Second, we present a variation of our algorithm that supports fully dynamic process participation (i.e., both joining and leaving) in the CC model, while maintaining its constant RMR complexity. We show experimentally that our algorithm is substantially faster than Jayanti, Jayanti, and Joshi’s algorithm despite having stronger correctness properties. Finally, we establish an impossibility result for fully dynamic RME algorithms with bounded RMR complexity in the DSM model that are adaptive with respect to space, and provide a wait-free withdraw section.

BibTeX - Entry

@InProceedings{dhoked_et_al:LIPIcs.DISC.2023.17,
  author =	{Dhoked, Sahil and Golab, Wojciech and Mittal, Neeraj},
  title =	{{Modular Recoverable Mutual Exclusion Under System-Wide Failures}},
  booktitle =	{37th International Symposium on Distributed Computing (DISC 2023)},
  pages =	{17:1--17:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-301-0},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{281},
  editor =	{Oshman, Rotem},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/19143},
  URN =		{urn:nbn:de:0030-drops-191431},
  doi =		{10.4230/LIPIcs.DISC.2023.17},
  annote =	{Keywords: mutual exclusion, shared memory, persistent memory, fault tolerance, system-wide failure, RMR complexity, dynamic joining, dynamic leaving}
}

Keywords: mutual exclusion, shared memory, persistent memory, fault tolerance, system-wide failure, RMR complexity, dynamic joining, dynamic leaving
Collection: 37th International Symposium on Distributed Computing (DISC 2023)
Issue Date: 2023
Date of publication: 05.10.2023


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI