License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2017.16
URN: urn:nbn:de:0030-drops-76459
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2017/7645/
Go to the corresponding LIPIcs Volume Portal


Nikumbh, Sarvesh ; Ebert, Peter ; Pfeifer, Nico

All Fingers Are Not the Same: Handling Variable-Length Sequences in a Discriminative Setting Using Conformal Multi-Instance Kernels

pdf-format:
LIPIcs-WABI-2017-16.pdf (1 MB)


Abstract

Most string kernels for comparison of genomic sequences are generally tied to using (absolute) positional information of the features in the individual sequences. This poses limitations when comparing variable-length sequences using such string kernels. For example, profiling chromatin interactions by 3C-based experiments results in variable-length genomic sequences (restriction fragments). Here, exact position-wise occurrence of signals in sequences may not be as important as in the scenario of analysis of the promoter sequences, that typically have a transcription start site as reference. Existing position-aware string kernels have been shown to be useful for the latter scenario.

In this work, we propose a novel approach for sequence comparison that enables larger positional freedom than most of the existing approaches, can identify a possibly dispersed set of features in comparing variable-length sequences, and can handle both the aforementioned scenarios. Our approach, \emph{CoMIK}, identifies not just the features useful towards classification but also their locations in the variable-length sequences, as evidenced by the results of three binary classification experiments, aided by recently introduced visualization techniques. Furthermore, we show that we are able to efficiently retrieve and interpret the weight vector for the complex setting of multiple multi-instance kernels.

BibTeX - Entry

@InProceedings{nikumbh_et_al:LIPIcs:2017:7645,
  author =	{Sarvesh Nikumbh and Peter Ebert and Nico Pfeifer},
  title =	{{All Fingers Are Not the Same: Handling Variable-Length Sequences in a Discriminative Setting Using Conformal Multi-Instance Kernels}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{16:1--16:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Russell Schwartz and Knut Reinert},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2017/7645},
  URN =		{urn:nbn:de:0030-drops-76459},
  doi =		{10.4230/LIPIcs.WABI.2017.16},
  annote =	{Keywords: Multiple instance learning, conformal MI kernels, 5C, Hi-C}
}

Keywords: Multiple instance learning, conformal MI kernels, 5C, Hi-C
Collection: 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)
Issue Date: 2017
Date of publication: 11.08.2017


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI