License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SCOR.2016.5
URN: urn:nbn:de:0030-drops-65174
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2016/6517/
Go to the corresponding OASIcs Volume Portal


Dal Sasso, Veronica ; De Giovanni, Luigi ; Labbé, Martine

A Column Generation Approach for Pure Parsimony Haplotyping

pdf-format:
OASIcs-SCOR-2016-5.pdf (0.4 MB)


Abstract

The knowledge of nucleotides chains that compose the double DNA chain of an individual has a relevant role in detecting diseases and studying populations. However, determining experimentally the single nucleotides chains that, paired, form a certain portion of the DNA is expensive and time-consuming. Mathematical programming approaches have been proposed instead, e.g. formulating the Haplotype Inference by Pure Parsimony problem (HIPP). Abstractly, we are given a set of genotypes (strings over a ternary alphabet {0,1,2}) and we want to determine the smallest set of haplotypes (binary strings over the set {0,1}) so that each genotype can be "generated" by some pair of haplotypes, meaning that they are compatible with the genotype and can fully explain its structure.

A polynomial-sized Integer Programming model was proposed by Catanzaro, Godi and Labbé (2010), which is highly efficient but hardly scalable to instances with a large number of genotypes. In order to deal with larger instances, we propose a new model involving an exponential number of variables to be solved via column generation, where variables are dynamically introduced into the model by iteratively solving a pricing problem. We compared different ways of solving the pricing problem, based on integer programming, smart enumeration and local search heuristic. The efficiency of the approach is improved by stabilization and by a heuristic to provide a good initial solution. Results show that, with respect to the linear relaxations of both the polynomial and exponential-size models, our approach yields a tighter formulation and outperforms in both efficiency and effectiveness the previous model for instances with a large number of genotypes.

BibTeX - Entry

@InProceedings{dalsasso_et_al:OASIcs:2016:6517,
  author =	{Veronica Dal Sasso and Luigi De Giovanni and Martine Labb{\'e}},
  title =	{{A Column Generation Approach for Pure Parsimony Haplotyping}},
  booktitle =	{5th Student Conference on Operational Research (SCOR 2016)},
  pages =	{5:1--5:11},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-004-0},
  ISSN =	{2190-6807},
  year =	{2016},
  volume =	{50},
  editor =	{Bradley Hardy and Abroon Qazi and Stefan Ravizza},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2016/6517},
  URN =		{urn:nbn:de:0030-drops-65174},
  doi =		{10.4230/OASIcs.SCOR.2016.5},
  annote =	{Keywords: computational biology, haplotyping, column generation, integer programming, combinatorial optimization}
}

Keywords: computational biology, haplotyping, column generation, integer programming, combinatorial optimization
Collection: 5th Student Conference on Operational Research (SCOR 2016)
Issue Date: 2016
Date of publication: 23.08.2016


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI