Search CORE

2,127 research outputs found

A Mutagenetic Tree Hidden Markov Model for Longitudinal Clonal HIV Sequence Data

Author: Bacheler
Bacheler
Beerenwinkel
Clavel
Crandall
Desper
Drummond
Gonzales
Heydebreck
M. Drton
Michalakis
Molla
N. Beerenwinkel
Seo
Siepel
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2006
Field of study

RNA viruses provide prominent examples of measurably evolving populations. In HIV infection, the development of drug resistance is of particular interest, because precise predictions of the outcome of this evolutionary process are a prerequisite for the rational design of antiretroviral treatment protocols. We present a mutagenetic tree hidden Markov model for the analysis of longitudinal clonal sequence data. Using HIV mutation data from clinical trials, we estimate the order and rate of occurrence of seven amino acid changes that are associated with resistance to the reverse transcriptase inhibitor efavirenz.Comment: 20 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Inferring a DNA sequence from erroneous copies

Author: Kececioglu J.
Li M. (Ming)
Tromp J.T. (John)
Publication venue: Elsevier B.V.
Publication date: 01/01/1997
Field of study

AbstractWe suggest a novel approach for efficiently reconstructing an original DNA sequence from erroneous copies

Elsevier - Publisher Connector

CWI's Institutional Repository

InPhaDel: integrative shotgun and proximity-ligation sequencing to phase deletions with single nucleotide polymorphisms.

Author: Bafna Vineet
Bansal Vikas
Edge Peter
Patel Anand
Selvaraj Siddarth
Publication venue: eScholarship, University of California
Publication date: 21/04/2016
Field of study

Phasing of single nucleotide (SNV), and structural variations into chromosome-wide haplotypes in humans has been challenging, and required either trio sequencing or restricting phasing to population-based haplotypes. Selvaraj et al demonstrated single individual SNV phasing is possible with proximity ligated (HiC) sequencing. Here, we demonstrate HiC can phase structural variants into phased scaffolds of SNVs. Since HiC data is noisy, and SV calling is challenging, we applied a range of supervised classification techniques, including Support Vector Machines and Random Forest, to phase deletions. Our approach was demonstrated on deletion calls and phasings on the NA12878 human genome. We used three NA12878 chromosomes and simulated chromosomes to train model parameters. The remaining NA12878 chromosomes withheld from training were used to evaluate phasing accuracy. Random Forest had the highest accuracy and correctly phased 86% of the deletions with allele-specific read evidence. Allele-specific read evidence was found for 76% of the deletions. HiC provides significant read evidence for accurately phasing 33% of the deletions. Also, eight of eight top ranked deletions phased by only HiC were validated using long range polymerase chain reaction and Sanger. Thus, deletions from a single individual can be accurately phased using a combination of shotgun and proximity ligation sequencing. InPhaDel software is available at: http://l337x911.github.io/inphadel/

PubMed Central

eScholarship - University of California

The inference of gene trees with species trees

Author: Bastien Boussau
Eric Tannier
Gergely J. Szöllősi
Montbonnot France
Vincent Daubin
Publication venue
Publication date: 04/11/2013
Field of study

Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational Evolutionary Biology" conference, Montpellier, 201

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

PubMed Central

HAL

Repository of the Academy's Library

ELTE Digital Institutional Repository (EDIT)

Hal-Diderot

Inferring introduction routes of invasive species using approximate Bayesian computation on microsatellite data

Author: A Estoup
A Estoup
A Estoup
AV Suarez
B Rannala
BS Weir
DB Goldstein
GR Terrel
J Roman
J-M Cornuet
JC Garza
JFC Kingman
JJ Kolbe
JM Cornuet
K Saltonstall
KM Dlugosch
KS Kim
M A Beaumont
M Beaumont
M Ciosi
M Ciosi
M Nei
M Nordborg
M Pascual
M Voisin
MA Beaumont
N Miller
NJR Fagundes
OE Gaggiotti
RR Hudson
S Toepfer
T Fawcett
T Guillemaud
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Determining the routes of introduction provides not only information about the history of an invasion process, but also information about the origin and construction of the genetic composition of the invading population. It remains difficult, however, to infer introduction routes from molecular data because of a lack of appropriate methods. We evaluate here the use of an approximate Bayesian computation (ABC) method for estimating the probabilities of introduction routes of invasive populations based on microsatellite data. We considered the crucial case of a single source population from which two invasive populations originated either serially from a single introduction event or from two independent introduction events. Using simulated datasets, we found that the method gave correct inferences and was robust to many erroneous beliefs. The method was also more efficient than traditional methods based on raw values of statistics such as assignment likelihood or pairwise F(ST). We illustrate some of the features of our ABC method, using real microsatellite datasets obtained for invasive populations of the western corn rootworm, Diabrotica virgifera virgifera. Most computations were performed with the DIYABC program (http://www1.montpellier.inra.fr/CBGP/diyabc/)

Explore Bristol Research

Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases

Author: Teslovich Tanya M.
Zöllner Sebastian
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 25/10/2010
Field of study

Copy number variants (CNVs) account for more polymorphic base pairs in the human genome than do single nucleotide polymorphisms (SNPs). CNVs encompass genes as well as noncoding DNA, making these polymorphisms good candidates for functional variation. Consequently, most modern genome-wide association studies test CNVs along with SNPs, after inferring copy number status from the data generated by high-throughput genotyping platforms. Here we give an overview of CNV genomics in humans, highlighting patterns that inform methods for identifying CNVs. We describe how genotyping signals are used to identify CNVs and provide an overview of existing statistical models and methods used to infer location and carrier status from such data, especially the most commonly used methods exploring hybridization intensity. We compare the power of such methods with the alternative method of using tag SNPs to identify CNV carriers. As such methods are only powerful when applied to common CNVs, we describe two alternative approaches that can be informative for identifying rare CNVs contributing to disease risk. We focus particularly on methods identifying de novo CNVs and show that such methods can be more powerful than case-control designs. Finally we present some recommendations for identifying CNVs contributing to common complex disorders.Comment: Published in at http://dx.doi.org/10.1214/09-STS304 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

The Confounding Effect of Population Structure on Bayesian Skyline Plot Inferences of Demographic History

Author: Chikhi Lounes
Heller Rasmus
Siegismund Hans Redlef
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Many coalescent-based methods aiming to infer the demographic history of populations assume a single, isolated and panmictic population (i.e. a Wright-Fisher model). While this assumption may be reasonable under many conditions, several recent studies have shown that the results can be misleading when it is violated. Among the most widely applied demographic inference methods are Bayesian skyline plots (BSPs), which are used across a range of biological fields. Violations of the panmixia assumption are to be expected in many biological systems, but the consequences for skyline plot inferences have so far not been addressed and quantified. We simulated DNA sequence data under a variety of scenarios involving structured populations with variable levels of gene flow and analysed them using BSPs as implemented in the software package BEAST. Results revealed that BSPs can show false signals of population decline under biologically plausible combinations of population structure and sampling strategy, suggesting that the interpretation of several previous studies may need to be re-evaluated. We found that a balanced sampling strategy whereby samples are distributed on several populations provides the best scheme for inferring demographic change over a typical time scale. Analyses of data from a structured African buffalo population demonstrate how BSP results can be strengthened by simulations. We recommend that sample selection should be carefully considered in relation to population structure previous to BSP analyses, and that alternative scenarios should be evaluated when interpreting signals of population size change.Danish Council for Independent Research, Laboratoire d’Excellence (LABEX) grant: (ANR-10-LABX-41)

Access to Research and Communications Annals

Directory of Open Access Journals

Copenhagen University Research Information System

PubMed Central

FigShare