License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2021.24
URN: urn:nbn:de:0030-drops-139759
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2021/13975/
Sobel, Joshua ;
Bertram, Noah ;
Ding, Chen ;
Nargesian, Fatemeh ;
Gildea, Daniel
AWLCO: All-Window Length Co-Occurrence
Abstract
Analyzing patterns in a sequence of events has applications in text analysis, computer programming, and genomics research. In this paper, we consider the all-window-length analysis model which analyzes a sequence of events with respect to windows of all lengths. We study the exact co-occurrence counting problem for the all-window-length analysis model. Our first algorithm is an offline algorithm that counts all-window-length co-occurrences by performing multiple passes over a sequence and computing single-window-length co-occurrences. This algorithm has the time complexity O(n) for each window length and thus a total complexity of O(n²) and the space complexity O(|I|) for a sequence of size n and an itemset of size |I|. We propose AWLCO, an online algorithm that computes all-window-length co-occurrences in a single pass with the time complexity of O(n) and space complexity of O(√{n|I|}), assuming perfect hashing. Following this, we generalize our use case to patterns in which we propose an algorithm that computes all-window-length co-occurrence with time complexity O(n|I|), assuming perfect hashing, with an additional pre-processing step and space complexity O(√{n|I|}+|I|), plus the overhead of the Aho-Corasick algorithm [Aho and Corasick, 1975].
BibTeX - Entry
@InProceedings{sobel_et_al:LIPIcs.CPM.2021.24,
author = {Sobel, Joshua and Bertram, Noah and Ding, Chen and Nargesian, Fatemeh and Gildea, Daniel},
title = {{AWLCO: All-Window Length Co-Occurrence}},
booktitle = {32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
pages = {24:1--24:21},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-186-3},
ISSN = {1868-8969},
year = {2021},
volume = {191},
editor = {Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2021/13975},
URN = {urn:nbn:de:0030-drops-139759},
doi = {10.4230/LIPIcs.CPM.2021.24},
annote = {Keywords: Itemsets, Data Sequences, Co-occurrence}
}
Keywords: |
|
Itemsets, Data Sequences, Co-occurrence |
Collection: |
|
32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021) |
Issue Date: |
|
2021 |
Date of publication: |
|
30.06.2021 |