Search CORE

296 research outputs found

Pseudoalignment for metagenomic read assignment

Author: Bray N.
Melsted P.
Pachter L.
Pimentel H.
Schaeffer L.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 15/07/2017
Field of study

Motivation: Read assignment is an important first step in many metagenomic analysis workflows, providing the basis for identification and quantification of species. However ambiguity among the sequences of many strains makes it difficult to assign reads at the lowest level of taxonomy, and reads are typically assigned to taxonomic levels where they are unambiguous. We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data in order to develop novel methods for rapid and accurate quantification of metagenomic strains. Results: We find that the recent idea of pseudoalignment introduced in the RNA-Seq context is highly applicable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software, making it possible and practical for the first time to analyze abundances of individual genomes in metagenomics projects

Caltech Authors

Markov basis and Groebner basis of Segre-Veronese configuration for testing independence in group-wise selections

Author: A. Takemura
Akimichi Takemura
B.D. Ripley
E. Negri De
E.L. Lehmann
F. Rapallo
H. Ohsugi
H. Ohsugi
Hidefumi Ohsugi
J.E. Crow
L. Pachter
M. Huber
N. Beerenwinkel
P. Diaconis
S. Aoki
S. Guo
Satoshi Aoki
T. Hibi
T. Oguma
Takayuki Hibi
W.K. Hastings
Y.M.M. Bishop
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

We consider testing independence in group-wise selections with some restrictions on combinations of choices. We present models for frequency data of selections for which it is easy to perform conditional tests by Markov chain Monte Carlo (MCMC) methods. When the restrictions on the combinations can be described in terms of a Segre-Veronese configuration, an explicit form of a Gr\"obner basis consisting of moves of degree two is readily available for performing a Markov chain. We illustrate our setting with the National Center Test for university entrance examinations in Japan. We also apply our method to testing independence hypotheses involving genotypes at more than one locus or haplotypes of alleles on the same chromosome.Comment: 25 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Lassoing and corraling rooted phylogenetic trees

Author: A. Dress
A. L. Harper
A. W. M. Dress
Andrei-Alin Popescu
C. Semple
F. Felsenstein
G. Soete De
H. Philippe
Katharina T. Huber
L. Pachter
M. J. Sanderson
M. J. Warrens
M. M. Deza
M. Steel
R. Diestel
S. Herrmann
W. M. Muir
Z. Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/02/2013
Field of study

The construction of a dendogram on a set of individuals is a key component of a genomewide association study. However even with modern sequencing technologies the distances on the individuals required for the construction of such a structure may not always be reliable making it tempting to exclude them from an analysis. This, in turn, results in an input set for dendogram construction that consists of only partial distance information which raises the following fundamental question. For what subset of its leaf set can we reconstruct uniquely the dendogram from the distances that it induces on that subset. By formalizing a dendogram in terms of an edge-weighted, rooted phylogenetic tree on a pre-given finite set X with |X|>2 whose edge-weighting is equidistant and a set of partial distances on X in terms of a set L of 2-subsets of X, we investigate this problem in terms of when such a tree is lassoed, that is, uniquely determined by the elements in L. For this we consider four different formalizations of the idea of "uniquely determining" giving rise to four distinct types of lassos. We present characterizations for all of them in terms of the child-edge graphs of the interior vertices of such a tree. Our characterizations imply in particular that in case the tree in question is binary then all four types of lasso must coincide

arXiv.org e-Print Archive

Crossref

University of East Anglia digital repository

Recognizing Treelike k-Dissimilarities

Author: A Schrijver
AD Gordon
Andreas Spillner
AWM Dress
AWM Dress
C Bocci
C Hayashi
D Levy
DP Faith
E Rubei
G Soete de
H-J Bandelt
H-J Bandelt
J Culberson
J Felsenstein
K Zaretsky
Katharina T. Huber
L Pachter
M Steel
M-M Deza
MJ Warrens
N Grishin
S Joly
Sven Herrmann
Vincent Moulton
WJ Heiser
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

A k-dissimilarity D on a finite set X, |X| >= k, is a map from the set of size k subsets of X to the real numbers. Such maps naturally arise from edge-weighted trees T with leaf-set X: Given a subset Y of X of size k, D(Y) is defined to be the total length of the smallest subtree of T with leaf-set Y . In case k = 2, it is well-known that 2-dissimilarities arising in this way can be characterized by the so-called "4-point condition". However, in case k > 2 Pachter and Speyer recently posed the following question: Given an arbitrary k-dissimilarity, how do we test whether this map comes from a tree? In this paper, we provide an answer to this question, showing that for k >= 3 a k-dissimilarity on a set X arises from a tree if and only if its restriction to every 2k-element subset of X arises from some tree, and that 2k is the least possible subset size to ensure that this is the case. As a corollary, we show that there exists a polynomial-time algorithm to determine when a k-dissimilarity arises from a tree. We also give a 6-point condition for determining when a 3-dissimilarity arises from a tree, that is similar to the aforementioned 4-point condition.Comment: 18 pages, 4 figure

arXiv.org e-Print Archive

Crossref

University of East Anglia digital repository

Likelihood Geometry

Author: A. Hovanskiĭ
A. Varchenko
C. Concini De
C. Raicu
C. Uhler
D. Cohen
D. Mond
F. Catanese
F. Rapallo
G. Denham
H. Terao
J. Franecki
J. Huh
J. Tevelev
J.M. Landsberg
L. Pachter
M. Kapranov
O. Gabber
P. Aluffi
P. Aluffi
P. Orlik
R. Sanyal
S. Boyd
S. Hoşten
S. Lauritzen
Y. Bishop
Publication venue
Publication date: 17/09/2013
Field of study

We study the critical points of monomial functions over an algebraic subset of the probability simplex. The number of critical points on the Zariski closure is a topological invariant of that embedded projective variety, known as its maximum likelihood degree. We present an introduction to this theory and its statistical motivations. Many favorite objects from combinatorial algebraic geometry are featured: toric varieties, A-discriminants, hyperplane arrangements, Grassmannians, and determinantal varieties. Several new results are included, especially on the likelihood correspondence and its bidegree. These notes were written for the second author's lectures at the CIME-CIRM summer course on Combinatorial Algebraic Geometry at Levico Terme in June 2013.Comment: 45 pages; minor changes and addition

arXiv.org e-Print Archive

CiteSeerX

Crossref

Parametric Analysis of RNA Branching Configurations

Author: A. E. Walter
B. A. Shapiro
B. Grünbaum
C. N. Dewey
C. N. Dewey
Christine E. Heitsch
D. Gusfield
D. H. Mathews
D. H. Mathews
D. H. Mathews
D. H. Mathews
E. Deutsch
G. Chen
G. M. Ziegler
H. H. Gan
H. Iseri
H.-P. Lenhof
J. M. Diamond
J. SantaLucia
K. J. Doshi
L. Pachter
L. Pachter
L. Wang
M. Andronescu
M. E. Burkardm
M. S. Waterman
M. Zuker
M. Zuker
M. Zuker
N. Dershowitz
R. Dowell
R. P. Stanley
S. Smit
S.-Y. Le
Valerie Hower
W. R. Schmitt
Y. Bakhtin
Y. Bakhtin
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Determinants of response to a parent questionnaire about development and behaviour in 3 year olds: European multicentre study of congenital toxoplasmosis.

Author: A Heiser
A Prusa
A Salt
AW Gottfried
B Oliver
CM Stott
D Schmidt
D Zelterman
DC Eisert
FL Goodenough
FP Glascoe
FP Glascoe
G Malm
H Klasen
H Smedje
HK Tan
JP Mackenbach
K Freeman
K Saudino
K Sonnander
KB Doig
KE Beery
L Dinnebeil
LM Pachter
N Ferret
Organisation for Economic Co-operation and Development (OECD).
P Edwards
PS Dale
R Gilbert
R Goodman
R Goodman
R Griffiths
RE Gilbert
RT Anderson
S Johnson
SA Petrill
TM Marteau
TV Perneger
W Buffolano
W Tin
Publication venue
Publication date: 01/01/2005
Field of study

Background: We aimed to determine how response to a parent-completed postal questionnaire measuring development, behaviour, impairment, and parental concerns and anxiety, varies in different European centres. Methods: Prospective cohort study of 3 year old children, with and without congenital toxoplasmosis, who were identified by prenatal or neonatal screening for toxoplasmosis in 11 centres in 7 countries. Parents were mailed a questionnaire that comprised all or part of existing validated tools. We determined the effect of characteristics of the centre and child on response, age at questionnaire completion, and response to child drawing tasks. Results: The questionnaire took 21 minutes to complete on average. 67% (714/1058) of parents responded. Few parents (60/1058) refused to participate. The strongest determinants of response were the score for organisational attributes of the study centre (such as direct involvement in follow up and access to an address register), and infection with congenital toxoplasmosis. Age at completion was associated with study centre, presence of neurological abnormalities in early infancy, and duration of prenatal treatment. Completion rates for individual questions exceeded 92% except for child completed drawings of a man (70%), which were completed more by girls, older children, and in certain centres. Conclusion: Differences in response across European centres were predominantly related to the organisation of follow up and access to correct addresses. The questionnaire was acceptable in all six countries and offers a low cost tool for assessing development, behaviour, and parental concerns and anxiety, in multinational studies

Crossref

AIR Universita degli studi di Milano

Springer - Publisher Connector

UCL Discovery

PubMed Central

Optimality regions and fluctuations for Bernoulli last passage models

Author: A-L Basdevant
AS Malaspinas
C Houdré
C Vinzant
CA Tracy
CN Dewey
D Fernández-Baca
D Gusfield
D Maier
DS Hirschberg
EW Myers
H Cramèr
J Komlós
J Lember
J Lember
Janosch Ortmann
JB Martin
L Bergroth
L Pachter
L Pachter
M Kiwi
M Vingron
N Georgiou
N Georgiou
N O’Connell
Nicos Georgiou
PC Ng
PW Glynn
S Aluru
S Amsalu
S Henikoff
SB Needleman
T Bodineau
T Seppäläinen
T Seppäläinen
TF Smith
V Chvátal
V Hower
VB Priezzev
WJ Masek
X Xia
Y Baryshnikov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/03/2018
Field of study

We study the sequence alignment problem and its independent version,the discrete Hammersley process with an exploration penalty. We obtain rigorous upper bounds for the number of optimality regions in both models near the soft edge.At zero penalty the independent model becomes an exactly solvable model and we identify cases for which the law of the last passage time converges to a Tracy-Widom law

arXiv.org e-Print Archive

Crossref

Sussex Research Online

Viral population estimation using pyrosequencing

Author: A Dempster
A Rambaut
AMN Tsibris
B Gaschen
Baback Gharizadeh
C Wang
Chunlin Wang
D O'Meara
DC Douek
E Domingo
E Halperin
EH Simpson
ES Lander
Glenn Tesler
GS Gottlieb
GW Tyson
H Fakhrai-Rad
I Malet
IM Rouzine
J Kececioglu
JE Hopcroft
JF Simons
K Chen
KJ Metzner
L Bacheler
L Doukhan
L Excoffier
Lior Pachter
LR Ford
M Breitbart
M Eigen
M Margulies
M Stephens
MA Nowak
MJ Gonzales
ML Collins
ML Sogin
Mostafa Ronaghi
MT Tammi
N Beerenwinkel
Nicholas Eriksson
Niko Beerenwinkel
P Jenkins
PA Pevzner
R Schmid
R Shankarappa
Robert W. Shafer
RP Dilworth
S Huse
S-Y Rhee
S-Y Rhee
Soo-Yon Rhee
VA Johnson
Yumi Mitsuya
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2008
Field of study

The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an EM algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.Comment: 23 pages, 13 figure

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Repository for Publications and Research Data

Crossref

Directory of Open Access Journals

PubMed Central

Caltech Authors