License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/DagSemProc.09061.14
URN: urn:nbn:de:0030-drops-21294
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2009/2129/
Go to the corresponding Portal |
Bozdag, Doruk ;
Barbacioru, Catalin C. ;
Catalyurek, Umit V.
Parallelization of Mapping Algorithms for Next Generation Sequencing Applications
Abstract
With the advent of next-generation high throughput sequencing
instruments, large volumes of short sequence data are generated at an
unprecedented rate. Processing and analyzing these massive data
requires overcoming several challenges. A particular challenge
addressed in this abstract is the mapping of short sequences (reads)
to a reference genome by allowing mismatches. This is a significantly
time consuming combinatorial problem in many applications including
whole-genome resequencing, targeted sequencing, transcriptome/small
RNA, DNA methylation and ChiP sequencing, and takes time on the order
of days using existing sequential techniques on large scale
datasets. In this work, we introduce six parallelization methods each
having different scalability characteristics to speedup short sequence
mapping. We also address an associated load balancing problem that
involves grouping nodes of a tree from different levels. This problem
arises due to a trade-off between computational cost and granularity
while partitioning the workload. We comparatively present the
proposed parallelization methods and give theoretical cost models for
each of them. Experimental results on real datasets demonstrate the
effectiveness of the methods and indicate that they are successful at
reducing the execution time from the order of days to under just a few
hours for large datasets.
To the best of our knowledge this is the first study on
parallelization of short sequence mapping problem.
BibTeX - Entry
@InProceedings{bozdag_et_al:DagSemProc.09061.14,
author = {Bozdag, Doruk and Barbacioru, Catalin C. and Catalyurek, Umit V.},
title = {{Parallelization of Mapping Algorithms for Next Generation Sequencing Applications}},
booktitle = {Combinatorial Scientific Computing},
pages = {1--1},
series = {Dagstuhl Seminar Proceedings (DagSemProc)},
ISSN = {1862-4405},
year = {2009},
volume = {9061},
editor = {Uwe Naumann and Olaf Schenk and Horst D. Simon and Sivan Toledo},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2009/2129},
URN = {urn:nbn:de:0030-drops-21294},
doi = {10.4230/DagSemProc.09061.14},
annote = {Keywords: Genome sequencing, sequence mapping}
}
Keywords: |
|
Genome sequencing, sequence mapping |
Collection: |
|
09061 - Combinatorial Scientific Computing |
Issue Date: |
|
2009 |
Date of publication: |
|
16.09.2009 |