Search CORE

3,230 research outputs found

Proceedings of the 1st Computer Science Student Workshop: Koc University Istinye Campus, Istanbul, Turkey, February 21, 2010

Author
Publication venue: Sabancı University
Publication date: 01/01/2010
Field of study

Sabanci University Research Database

An efficient parallel algorithm for haplotype inference based on rule based approach and consensus methods.

Author: Saeed Qamar
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2007
Field of study

Scholarship at UWindsor

The minimum-entropy set cover problem

Author: Halperin Eran
Karp Richard M.
Publication venue: Published by Elsevier B.V.
Publication date: 08/12/2005
Field of study

AbstractWe consider the minimum entropy principle for learning data generated by a random source and observed with random noise.In our setting we have a sequence of observations of objects drawn uniformly at random from a population. Each object in the population belongs to one class. We perform an observation for each object which determines that it belongs to one of a given set of classes. Given these observations, we are interested in assigning the most likely class to each of the objects.This scenario is a very natural one that appears in many real life situations. We show that under reasonable assumptions finding the most likely assignment is equivalent to the following variant of the set cover problem. Given a universe U and a collection S=(S1,…,St) of subsets of U, we wish to find an assignment f:U→S such that u∈f(u) and the entropy of the distribution defined by the values |f-1(Si)| is minimized.We show that this problem is NP-hard and that the greedy algorithm for set cover s with an additive constant error with respect to the optimal cover. This sheds a new light on the behavior of the greedy set cover algorithm. We further enhance the greedy algorithm and show that the problem admits a polynomial time approximation scheme (PTAS).Finally, we demonstrate how this model and the greedy algorithm can be useful in real life scenarios, and in particular, in problems arising naturally in computational biology

Elsevier - Publisher Connector

The Binary Perfect Phylogeny with Persistent characters

Author: Bonizzoni Paola
Braghin Chiara
Dondi Riccardo
Trucco Gabriella
Publication venue
Publication date: 01/01/2012
Field of study

The binary perfect phylogeny model is too restrictive to model biological events such as back mutations. In this paper we consider a natural generalization of the model that allows a special type of back mutation. We investigate the problem of reconstructing a near perfect phylogeny over a binary set of characters where characters are persistent: characters can be gained and lost at most once. Based on this notion, we define the problem of the Persistent Perfect Phylogeny (referred as P-PP). We restate the P-PP problem as a special case of the Incomplete Directed Perfect Phylogeny, called Incomplete Perfect Phylogeny with Persistent Completion, (refereed as IP-PP), where the instance is an incomplete binary matrix M having some missing entries, denoted by symbol ?, that must be determined (or completed) as 0 or 1 so that M admits a binary perfect phylogeny. We show that the IP-PP problem can be reduced to a problem over an edge colored graph since the completion of each column of the input matrix can be represented by a graph operation. Based on this graph formulation, we develop an exact algorithm for solving the P-PP problem that is exponential in the number of characters and polynomial in the number of species.Comment: 13 pages, 3 figure

arXiv.org e-Print Archive

Elsevier - Publisher Connector

AIR Universita degli studi di Milano

Haplotyping a Quantitative Trait with a High-Density Map in Experimental Crosses

Author: Cheverud James M.
Hou Wei
Liu Tian
Wu Rongling
Wu Song
Yap John Stephen F.
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

BACKGROUND: The ultimate goal of genetic mapping of quantitative trait loci (QTL) is the positional cloning of genes involved in any agriculturally or medically important phenotype. However, only a small portion (< or = 1%) of the QTL detected have been characterized at the molecular level, despite the report of hundreds of thousands of QTL for different traits and populations. METHODS/RESULTS: We develop a statistical model for detecting and characterizing the nucleotide structure and organization of haplotypes that underlie QTL responsible for a quantitative trait in an F2 pedigree. The discovery of such haplotypes by the new model will facilitate the molecular cloning of a QTL. Our model is founded on population genetic properties of genes that are segregating in a pedigree, constructed with the mixture-based maximum likelihood context and implemented with the EM algorithm. The closed forms have been derived to estimate the linkage and linkage disequilibria among different molecular markers, such as single nucleotide polymorphisms, and quantitative genetic effects of haplotypes constructed by non-alleles of these markers. Results from the analysis of a real example in mouse have validated the usefulness and utilization of the model proposed. CONCLUSION: The model is flexible to be extended to model a complex network of genetic regulation that includes the interactions between different haplotypes and between haplotypes and environments

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

Recommended from our members

Efficient analysis and storage of large-scale genomic data

Author: Klarqvist Marcus
Publication venue: University of Cambridge
Publication date: 01/09/2019
Field of study

The impending advent of population-scaled sequencing cohorts involving tens of millions of individuals with matched phenotypic measurements will produce unprecedented volumes of genetic data. Storing and analysing such gargantuan datasets places computational performance at a pivotal position in medical genomics. In this thesis, I explore the potential for accelerating and parallelizing standard genetics workflows, file formats, and algorithms using both hardware-accelerated vectorization, parallel and distributed algorithms, and heterogeneous computing. First, I describe a novel bit-counting operation termed the positional population-count, which can be used together with succinct representations and standard efficient operations to accelerate many genetic calculations. In order to enable the use of this new operator and the canonical population count on any target machine I developed a unified low-level library using CPU dispatching to select the optimal method contingent on the available instruction set architecture and the given input size at run-time. As a proof-of-principle application, I apply the positional population-count operator to computing quality control-related summary statistics for terabyte-scaled sequencing readsets with >3,800-fold speed improvements. As another application, I describe a framework for efficiently computing the cardinality of set intersection using these operators and applied this framework to efficiently compute genome-wide linkage-disequilibrium in datasets with up to 67 million samples resulting in up to >60-fold improvements in speed for dense genotypic vectors and up to >250,000-fold savings in memory and >100,000-fold improvement in speed for sparse genotypic vectors. I next describe a framework for handling the terabytes of compressed output data and describe graphical routines for visualizing long-range linkage-disequilibrium blocks as seen over many human centromeres. Finally, I describe efficient algorithms for storing and querying very large genetic datasets and specialized algorithms for the genotype component of such datasets with >10,000-fold savings in memory compared to the current interchange format.Wellcome Trus

Apollo (Cambridge)

Shape-IT: new rapid and accurate algorithm for haplotype inference

Author: Coulonges Cédric
Delaneau Olivier
Zagury Jean-François
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background We have developed a new computational algorithm, Shape-IT, to infer haplotypes under the genetic model of coalescence with recombination developed by Stephens et al in Phase v2.1. It runs much faster than Phase v2.1 while exhibiting the same accuracy. The major algorithmic improvements rely on the use of binary trees to represent the sets of candidate haplotypes for each individual. These binary tree representations: (1) speed up the computations of posterior probabilities of the haplotypes by avoiding the redundant operations made in Phase v2.1, and (2) overcome the exponential aspect of the haplotypes inference problem by the smart exploration of the most plausible pathways (ie. haplotypes) in the binary trees. Results Our results show that Shape-IT is several orders of magnitude faster than Phase v2.1 while being as accurate. For instance, Shape-IT runs 50 times faster than Phase v2.1 to compute the haplotypes of 200 subjects on 6,000 segments of 50 SNPs extracted from a standard Illumina 300 K chip (13 days instead of 630 days). We also compared Shape-IT with other widely used software, Gerbil, PL-EM, Fastphase, 2SNP, and Ishape in various tests: Shape-IT and Phase v2.1 were the most accurate in all cases, followed by Ishape and Fastphase. As a matter of speed, Shape-IT was faster than Ishape and Fastphase for datasets smaller than 100 SNPs, but Fastphase became faster -but still less accurate- to infer haplotypes on larger SNP datasets. Conclusion Shape-IT deserves to be extensively used for regular haplotype inference but also in the context of the new high-throughput genotyping chips since it permits to fit the genetic model of Phase v2.1 on large datasets. This new algorithm based on tree representations could be used in other HMM-based haplotype inference software and may apply more largely to other fields using HMM.</p

Crossref

Directory of Open Access Journals

Serveur académique lausannois

PubMed Central

Recommended from our members

Polygenic Adaptation to an Environmental Shift: Temporal Dynamics of Variation Under Gaussian Stabilizing Selection and Additive Effects on a Single Trait.

Author: Thornton Kevin R
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

Predictions about the effect of natural selection on patterns of linked neutral variation are largely based on models involving the rapid fixation of unconditionally beneficial mutations. However, when phenotypes adapt to a new optimum trait value, the strength of selection on individual mutations decreases as the population adapts. Here, I use explicit forward simulations of a single trait with additive-effect mutations adapting to an "optimum shift." Detectable "hitchhiking" patterns are only apparent if (i) the optimum shifts are large with respect to equilibrium variation for the trait, (ii) mutation rates to large-effect mutations are low, and (iii) large-effect mutations rapidly increase in frequency and eventually reach fixation, which typically occurs after the population reaches the new optimum. For the parameters simulated here, partial sweeps do not appreciably affect patterns of linked variation, even when the mutations are strongly selected. The contribution of new mutations vs. standing variation to fixation depends on the mutation rate affecting trait values. Given the fixation of a strongly selected variant, patterns of hitchhiking are similar on average for the two classes of sweeps because sweeps from standing variation involving large-effect mutations are rare when the optimum shifts. The distribution of effect sizes of new mutations has little effect on the time to reach the new optimum, but reducing the mutational variance increases the magnitude of hitchhiking patterns. In general, populations reach the new optimum prior to the completion of any sweeps, and the times to fixation are longer for this model than for standard models of directional selection. The long fixation times are due to a combination of declining selection pressures during adaptation and the possibility of interference among weakly selected sites for traits with high mutation rates

eScholarship - University of California

Haplotype-based quantitative trait mapping using a clustering algorithm

Author: AD Long
C Durrant
DB Allison
DJ Sheskin
GA Churchill
HT Toivonen
HW Deng
HW Deng
J Li
J Li
J Molitor
Jing Li
JM Comeron
JS Liu
JY Tzeng
K Song
K Zhang
L Kruglyak
M Ester
M Lynch
M Stephens
MJ Daly
MS McPeek
PIW de Bakker
R Fan
Robert C Elston
RR Hudson
S Zollner
SB Gabriel
T Niu
The International HapMap Consortium
The International HapMap Consortium
Yingyao Zhou
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: With the availability of large-scale, high-density single-nucleotide polymorphism (SNP) markers, substantial effort has been made in identifying disease-causing genes using linkage disequilibrium (LD) mapping by haplotype analysis of unrelated individuals. In addition to complex diseases, many continuously distributed quantitative traits are of primary clinical and health significance. However the development of association mapping methods using unrelated individuals for quantitative traits has received relatively less attention. RESULTS: We recently developed an association mapping method for complex diseases by mining the sharing of haplotype segments (i.e., phased genotype pairs) in affected individuals that are rarely present in normal individuals. In this paper, we extend our previous work to address the problem of quantitative trait mapping from unrelated individuals. The method is non-parametric in nature, and statistical significance can be obtained by a permutation test. It can also be incorporated into the one-way ANCOVA (analysis of covariance) framework so that other factors and covariates can be easily incorporated. The effectiveness of the approach is demonstrated by extensive experimental studies using both simulated and real data sets. The results show that our haplotype-based approach is more robust than two statistical methods based on single markers: a single SNP association test (SSA) and the Mann-Whitney U-test (MWU). The algorithm has been incorporated into our existing software package called HapMiner, which is available from our website at . CONCLUSION: For QTL (quantitative trait loci) fine mapping, to identify QTNs (quantitative trait nucleotides) with realistic effects (the contribution of each QTN less than 10% of total variance of the trait), large samples sizes (≥ 500) are needed for all the methods. The overall performance of HapMiner is better than that of the other two methods. Its effectiveness further depends on other factors such as recombination rates and the density of typed SNPs. Haplotype-based methods might provide higher power than methods based on a single SNP when using tag SNPs selected from a small number of samples or some other sources (such as HapMap data). Rank-based statistics usually have much lower power, as shown in our study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central