License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ESA.2023.78
URN: urn:nbn:de:0030-drops-187313
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/18731/
Go to the corresponding LIPIcs Volume Portal


Li, Zelin ; Peng, Pan ; Zhu, Xianbin

Massively Parallel Algorithms for the Stochastic Block Model

pdf-format:
LIPIcs-ESA-2023-78.pdf (1.0 MB)


Abstract

Learning the community structure of a large-scale graph is a fundamental problem in machine learning, computer science and statistics. Among others, the Stochastic Block Model (SBM) serves a canonical model for community detection and clustering, and the Massively Parallel Computation (MPC) model is a mathematical abstraction of real-world parallel computing systems, which provides a powerful computational framework for handling large-scale datasets. We study the problem of exactly recovering the communities in a graph generated from the SBM in the MPC model. Specifically, given kn vertices that are partitioned into k equal-sized clusters (i.e., each has size n), a graph on these kn vertices is randomly generated such that each pair of vertices is connected with probability p if they are in the same cluster and with probability q if not, where p > q > 0.
We give MPC algorithms for the SBM in the (very general) s-space MPC model, where each machine is guaranteed to have memory s = Ω(log n). Under the condition that (p-q)/√p ≥ Ω̃(k^{1/2} n^{-1/2+1/(2(r-1))}) for any integer r ∈ [3,O(log n)], our first algorithm exactly recovers all the k clusters in O(kr log_s n) rounds using Õ(m) total space, or in O(rlog_s n) rounds using Õ(km) total space. If (p-q)/√p ≥ Ω̃(k^{3/4} n^{-1/4}), our second algorithm achieves O(log_s n) rounds and Õ(m) total space complexity. Both algorithms significantly improve upon a recent result of Cohen-Addad et al. [PODC'22], who gave algorithms that only work in the sublinear space MPC model, where each machine has local memory s = O(n^δ) for some constant δ > 0, with a much stronger condition on p,q,k. Our algorithms are based on collecting the r-step neighborhood of each vertex and comparing the difference of some statistical information generated from the local neighborhoods for each pair of vertices. To implement the clustering algorithms in parallel, we present efficient approaches for implementing some basic graph operations in the s-space MPC model.

BibTeX - Entry

@InProceedings{li_et_al:LIPIcs.ESA.2023.78,
  author =	{Li, Zelin and Peng, Pan and Zhu, Xianbin},
  title =	{{Massively Parallel Algorithms for the Stochastic Block Model}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{78:1--78:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/18731},
  URN =		{urn:nbn:de:0030-drops-187313},
  doi =		{10.4230/LIPIcs.ESA.2023.78},
  annote =	{Keywords: Massively Parallel Computation, Stochastic Block Model, Graph Algorithms}
}

Keywords: Massively Parallel Computation, Stochastic Block Model, Graph Algorithms
Collection: 31st Annual European Symposium on Algorithms (ESA 2023)
Issue Date: 2023
Date of publication: 30.08.2023


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI