License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.GCB.2013.125
URN: urn:nbn:de:0030-drops-42379
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2013/4237/
Go to the corresponding OASIcs Volume Portal


Martin, Marcel ; Rahmann, Sven

Aligning Flowgrams to DNA Sequences

pdf-format:
p125-martin.pdf (0.5 MB)


Abstract

A read from 454 or Ion Torrent sequencers is natively represented as a flowgram, which is a sequence of pairs of a nucleotide and its (fractional) intensity. Recent work has focused on improving the accuracy of base calling (conversion of flowgrams to DNA sequences) in order to facilitate read mapping and downstream analysis of sequence variants. However, base calling always incurs a loss of information by discarding fractional intensity information.
We argue that base calling can be avoided entirely by directly aligning the flowgrams to DNA sequences. We introduce an algorithm for flowgram-string alignment based on dynamic programming, but covering more cases than standard local or global sequence alignment. We also propose a scoring scheme that takes into account sequence variations (from substitutions, insertions, deletions) and sequencing errors (flow intensities contradicting the homopolymer length) separately. This allows to resolve fractional intensities, ambiguous homopolymer lengths and editing events at alignment time by choosing the most likely read sequence given both the nucleotide intensities and the reference sequence. We provide a proof-of-concept implementation and demonstrate the advantages of flowgram-string alignment compared to base-called alignments.

BibTeX - Entry

@InProceedings{martin_et_al:OASIcs:2013:4237,
  author =	{Marcel Martin and Sven Rahmann},
  title =	{{Aligning Flowgrams to DNA Sequences}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{125--135},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Tim Bei{\ss}barth and Martin Kollmar and Andreas Leha and Burkhard Morgenstern and Anne-Kathrin Schultz and Stephan Waack and Edgar Wingender},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2013/4237},
  URN =		{urn:nbn:de:0030-drops-42379},
  doi =		{10.4230/OASIcs.GCB.2013.125},
  annote =	{Keywords: flowgram, sequencing, alignment algorithm, scoring scheme}
}

Keywords: flowgram, sequencing, alignment algorithm, scoring scheme
Collection: German Conference on Bioinformatics 2013
Issue Date: 2013
Date of publication: 09.09.2013


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI