Abstract
The knowledge of nucleotides chains that compose the double DNA chain of an individual has a relevant role in detecting diseases and studying populations. However, determining experimentally the single nucleotides chains that, paired, form a certain portion of the DNA is expensive and timeconsuming. Mathematical programming approaches have been proposed instead, e.g. formulating the Haplotype Inference by Pure Parsimony problem (HIPP). Abstractly, we are given a set of genotypes (strings over a ternary alphabet {0,1,2}) and we want to determine the smallest set of haplotypes (binary strings over the set {0,1}) so that each genotype can be "generated" by some pair of haplotypes, meaning that they are compatible with the genotype and can fully explain its structure.
A polynomialsized Integer Programming model was proposed by Catanzaro, Godi and LabbĂ© (2010), which is highly efficient but hardly scalable to instances with a large number of genotypes. In order to deal with larger instances, we propose a new model involving an exponential number of variables to be solved via column generation, where variables are dynamically introduced into the model by iteratively solving a pricing problem. We compared different ways of solving the pricing problem, based on integer programming, smart enumeration and local search heuristic. The efficiency of the approach is improved by stabilization and by a heuristic to provide a good initial solution. Results show that, with respect to the linear relaxations of both the polynomial and exponentialsize models, our approach yields a tighter formulation and outperforms in both efficiency and effectiveness the previous model for instances with a large number of genotypes.
BibTeX  Entry
@InProceedings{dalsasso_et_al:OASIcs:2016:6517,
author = {Veronica Dal Sasso and Luigi De Giovanni and Martine Labb{\'e}},
title = {{A Column Generation Approach for Pure Parsimony Haplotyping}},
booktitle = {5th Student Conference on Operational Research (SCOR 2016)},
pages = {5:15:11},
series = {OpenAccess Series in Informatics (OASIcs)},
ISBN = {9783959770040},
ISSN = {21906807},
year = {2016},
volume = {50},
editor = {Bradley Hardy and Abroon Qazi and Stefan Ravizza},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2016/6517},
URN = {urn:nbn:de:0030drops65174},
doi = {10.4230/OASIcs.SCOR.2016.5},
annote = {Keywords: computational biology, haplotyping, column generation, integer programming, combinatorial optimization}
}
Keywords: 

computational biology, haplotyping, column generation, integer programming, combinatorial optimization 
Collection: 

5th Student Conference on Operational Research (SCOR 2016) 
Issue Date: 

2016 
Date of publication: 

23.08.2016 