Search CORE

13 research outputs found

PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination

Author: A Loytynoja
A Loytynoja
A Mangold
A Phillips
A Stamatakis
A Stamatakis
AF Zuur
AL Hipp
Alexandros Stamatakis
B Dwivedi
B McCune
B McCune
B Staiger
BD Redelings
BD Redelings
BG Hall
BG Hall
Brendan P Hodkinson
C Moritz
CW Cunningham
D González
DF Robinson
DL Aylor
DL Swofford
DM Hillis
DT Jones
EW Price
F Lutzoni
F Ronquist
G Didier
G Didier
G Landan
G Landan
G Lunter
G Talavera
GJ Olsen
GJ Olsen
J Gatesy
J Miadlikowska
JD Lawrey
JD Thompson
K Katoh
K Katoh
K Katoh
K Kjer
K Liu
M Kimura
MA Larkin
MJ Anderson
MSY Lee
O Penn
O Penn
P Legendre
P Legendre
PD Hebert
PR Minchin
R Development Core Team
R Fleissner
R Meier
RA Cartwright
RA Cartwright
RA Cartwright
RA Cartwright
RC Edgar
Reed A Cartwright
Robert Lücking
S Karlin
S Lehtonen
S Roch
SA Berger
SA Smith
TH Ogden
TH Ogden
W Fletcher
WC Wheeler
WC Wheeler
WC Wheeler
WC Wheeler
WC Wheeler
WP Maddison
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background We present a novel method to encode ambiguously aligned regions in fixed multiple sequence alignments by 'Pairwise Identity and Cost Scores Ordination' (PICS-Ord). The method works via ordination of sequence identity or cost scores matrices by means of Principal Coordinates Analysis (PCoA). After identification of ambiguous regions, the method computes pairwise distances as sequence identities or cost scores, ordinates the resulting distance matrix by means of PCoA, and encodes the principal coordinates as ordered integers. Three biological and 100 simulated datasets were used to assess the performance of the new method. Results Including ambiguous regions coded by means of PICS-Ord increased topological accuracy, resolution, and bootstrap support in real biological and simulated datasets compared to the alternative of excluding such regions from the analysis a priori. In terms of accuracy, PICS-Ord performs equal to or better than previously available methods of ambiguous region coding (e.g., INAASE), with the advantage of a practically unlimited alignment size and increased analytical speed and the possibility of PICS-Ord scores to be analyzed together with DNA data in a partitioned maximum likelihood model. Conclusions Advantages of PICS-Ord over step matrix-based ambiguous region coding with INAASE include a practically unlimited number of OTUs and seamless integration of PICS-Ord codes into phylogenetic datasets, as well as the increased speed of phylogenetic analysis. Contrary to word- and frequency-based methods, PICS-Ord maintains the advantage of pairwise sequence alignment to derive distances, and the method is flexible with respect to the calculation of distance scores. In addition to distance and maximum parsimony, PICS-Ord codes can be analyzed in a Bayesian or maximum likelihood framework. RAxML (version 7.2.6 or higher that was developed for this study) allows up to 32-state ordered or unordered characters. A GTR, MK, or ORDERED model can be applied to analyse the PICS-Ord codes partition, with GTR performing slightly better than MK and ORDERED. Availability An implementation of the PICS-Ord algorithm is available from <url>http://scit.us/projects/ngila/wiki/PICS-Ord</url>. It requires both the statistical software, R <url>http://www.r-project.org</url> and the alignment software Ngila <url>http://scit.us/projects/ngila</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs

Author: A Dress
A Godzik
A Löytynoja
A Löytynoja
A Novák
A Novák
A Sali
A Siepel
A Tramontano
Adrienn Szabó
AS Schwartz
AS Schwartz
B Dwivedi
B Knudsen
B Larget
B Misof
B Schwikowski
BD Redelings
BD Redelings
BJM Webb
BP Blackburne
C Dessimoz
C Notredame
C Notredame
CB Do
CJ Challis
D Altschuh
D Chivian
D DeBlasio
D Lupyan
D Metzler
D Metzler
D Robinson
DA Morrison
DF Feng
E Levy Karin
G Jordan
G Landan
G Lunter
G Lunter
G Lunter
G Raghava
G Talavera
GA Churchill
GA Lunter
Hall B G
HT Mevissen
I Holmes
I Miklós
I Miklós
IL Dryden
IM Wallace
István Miklós
J Castresana
J Felsenstein
J Gatesy
J Hein
J Kim
J Zhu
JA Lake
JD Thompson
JD Thompson
JL Thorne
JL Thorne
JL Thorne
JL Thorne
Joseph L Herman
Jotun Hein
K Bucka-Lassen
K Liu
K Liu
KM Wong
L Wang
L Yu
LE Carvalho
LS Wang
M Hamada
M Hamada
M Hamada
M Höhl
M Vingron
M Vingron
M Wu
M Zuker
MA Suchard
MJ Wise
MO Dayhoff
MP Simmons
MS Waterman
MSY Lee
O Gotoh
O Penn
O Penn
O Penn
P Ajawatanawong
P Arunapuram
P Collingridge
PJ Green
PJ Green
PP Gardner
R Durbin
R Satija
R Satija
R Schwarzenbacher
RA Cartwright
RC Edgar
RJ Dickson
RJ Dickson
RK Bradley
Rune Lyngsø
S Capella-Gutiérrez
S Karlin
S Miyazawa
S Needleman
S Sinha
Silla-Martínez Capella-Gutiérrez S
SME Sahraeian
TA Hopf
TH Ogden
TL Blundell
U Roshan
V Ahola
W Fletcher
WC Wheeler
Y Liu
Y Ruffieux
Ádám Novák
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Background A standard procedure in many areas of bioinformatics is to use a single multiple sequence alignment (MSA) as the basis for various types of analysis. However, downstream results may be highly sensitive to the alignment used, and neglecting the uncertainty in the alignment can lead to significant bias in the resulting inference. In recent years, a number of approaches have been developed for probabilistic sampling of alignments, rather than simply generating a single optimum. However, this type of probabilistic information is currently not widely used in the context of downstream inference, since most existing algorithms are set up to make use of a single alignment. Results In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased. Conclusions The alignment DAG provides a natural way to represent a distribution in the space of MSAs, and allows for existing algorithms to be efficiently scaled up to operate on large sets of alignments. As an example, we show how this can be used to compute marginal probabilities for tree topologies, averaging over a very large number of MSAs. This framework can also be used to generate a statistically meaningful summary alignment; example applications show that this summary alignment is consistently more accurate than the majority of the alignment samples, leading to improvements in downstream tree inference. Implementations of the methods described in this article are available at http://statalign.github.io/WeaveAlign webcite

Crossref

SZTAKI Publication Repository

Springer - Publisher Connector

PubMed Central

Oxford University Research Archive

Large-Scale Phylogenetic Analysis of Emerging Infectious Diseases

Author: A Moilanen
A Phillips
A Tehler
AR Lemmon
B Budowle
B Chang
B Grenfell
B Rannala
B Rannala
BD Redelings
BE Martina
C Ceron
C Scholtissek
D Earn
D Franz
D Janies
D Janies
D Morrison
D Pol
D Sankoff
D Searls
DJ Zwickl
DL Swofford
DL Swofford
DL Swofford
DM Hillis
DM Hillis
E Ghedin
E Holmes
E Ukkonen
EM Rubin
G Laver
H Song
J Antonovics
J Felsenstein
J Felsenstein
J Felsenstein
J Huelsenbeck
J Plotkin
J Silvertown
J Thornton
JD Thompson
JK Taubenberger
JK Taubenberger
JK Taubenberger
JK Taubenberger
JL Thorne
JP Carulli
JS Farris
JS Farris
JS Farris
K Li
K Li
K Ungchusak
KC Nixon
KC Nixon
KP White
L Wang
L Watrous
LA Salter
LH Taylor
LR Foulds
M Gammelin
M Gibbs
M Koopmans
M Metzker
MA Charleston
MA Marra
MD Hendy
MJ Brauer
N Saitou
NM Ferguson
NM Ferguson
P Palese
PA Goloboff
PA Goloboff
PA Rota
PO Lewis
Q Wang
R Fleissner
RG Webster
RM Bush
RM Bush
RM Bush
RS Ross
S Lau
S Li
S Morse
S Poe
T Fanning
T Grant
T Ksiazek
The Chinese SARS Molecular Epidemiology Consortium
W Hennig
W Li
W Wheeler
W Wheeler
WC Wheeler
WC Wheeler
WM Fitch
WM Fitch
WM Fitch
Y Guan
Y Guan
Y Lin
Y Suzuki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Microorganisms that cause infectious diseases present critical issues of national security, public health, and economic welfare. For example, in recent years, highly pathogenic strains of avian influenza have emerged in Asia, spread through Eastern Europe and threaten to become pandemic. As demonstrated by the coordinated response to Severe Acute Respiratory Syndrome (SARS) and influenza, agents of infectious disease are being addressed via large-scale genomic sequencing. The goal of genomic sequencing projects are to rapidly put large amounts of data in the public domain to accelerate research on disease surveillance, treatment, and prevention. However, our ability to derive information from large comparative genomic datasets lags far behind acquisition. Here we review the computational challenges of comparative genomic analyses, specifically sequence alignment and reconstruction of phylogenetic trees. We present novel analytical results on from two important infectious diseases, Severe Acute Respiratory Syndrome (SARS) and influenza.SARS and influenza have similarities and important differences both as biological and comparative genomic analysis problems. Influenza viruses (Orthymxyoviridae) are RNA based. Current evidence indicates that influenza viruses originate in aquatic birds from wild populations. Influenza has been studied for decades via well-coordinated international efforts. These efforts center on surveillance via antibody characterization of the hemagglutinin (HA) and neuraminidase (N) proteins of the circulating strains to inform vaccine design. However we still do not have a clear understanding of: 1) various transmission pathways such as the role of intermediate hosts such as swine and domestic birds and 2) the key mutation and genomic recombination events that underlie periodic pandemics of influenza. In the past 30 years, sequence data from HA and N loci has become an important data type. In the past year, full genomic data has become prominent. These data present exciting opportunities to address unanswered questions in influenza pandemics.SARS is caused by a previously unrecognized lineage of coronavirus, SARS-CoV, which like influenza has an RNA based genome. Although SARS-CoV is widely believed to have originated in animals there remains disagreement over the candidate animal source that lead to the original outbreak of SARS. In contrast to the long history of the study of influenza, SARS was only recognized in late 2002 and the virus that causes SARS has been documented primarily by genomic sequencing.In the past, most studies of influenza were performed on a limited number of isolates and genes suited to a particular problem. Major goals in science today are to understand emerging diseases in broad geographic, environmental, societal, biological, and genomic contexts. Synthesizing diverse information brought together by various researchers is important to find out what can be done to prevent future outbreaks {JON03}. Thus comprehensive means to organize and analyze large amounts of diverse information are critical. For example, the relationships of isolates and patterns of genomic change observed in large datasets might not be consistent with hypotheses formed on partial data. Moreover when researchers rely on partial datasets, they restrict the range of possible discoveries.Phylogenetics is well suited to the complex task of understanding emerging infectious disease. Phylogenetic analyses can test many hypotheses by comparing diverse isolates collected from various hosts, environments, and points in time and organizing these data into various evolutionary scenarios. The products of a phylogenetic analysis are a graphical tree of ancestor-descendent relationships and an inferred summary of mutations, recombination events, host shifts, geographic, and temporal spread of the viruses. However, this synthesis comes at a price. The cost of computation of phylogenetic analysis expands combinatorially as the number of isolates considered increases. Thus, large datasets like those currently produced are commonly considered intractable. We address this problem with synergistic development of heuristics tree search strategies and parallel computing.Fil: Janies, D.. Ohio State University; Estados UnidosFil: Pol, Diego. Ohio State University; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentin

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution

Author: A Gelman
AB Prasad
AJ Drummond
BD Redelings
DG Hwang
DJ Zwickl
G Baele
G Baele
G Baele
G Baele
G Baele
G Baele
Guy Baele
GW Oehlert
J Felsenstein
JA Nylander
JP Huelsenbeck
KG Karol
MA Newton
MA Steel
MA Suchard
MH Chen
N Lartillot
N Lartillot
N Rodrigue
Philippe Lemey
RE Kass
Stijn Vansteelandt
TJ DiCiccio
W Xie
XL Meng
XL Meng
Y Fan
Y Ogata
Z Yang
Z Yang
Z Yang
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Measuring Accelerated Rates of Insertions and Deletions Independent of Rates of Nucleotide Substitution

Author: A Löytynoja
A Siepel
BD Redelings
E Rivas
F Liu
G McGuire
G Talavera
H Zhao
J Felsenstein
J Hu
J Thorne
KS Pollard
Mira V. Han
MJ Hubisz
NV Grishin
Omar Navarro Leija
P Sætrom
PC Ng
S Sandhya
S Tao
Sanju Varghese
SLK Pond
WW de Jong
X Gu
Z Yang
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A gustatory receptor paralogue controls rapid warmth avoidance in Drosophila

Author: AD Flouris
AF Silbering
B Xiao
BD Redelings
E Gingl
FN Hamada
G Wang
H Cho
HM Robertson
J Liu
JG Bernstein
K Kang
K Kang
K Sato
L Vyklický
L Zhong
LB Vosshall
M Gallio
MA Suchard
N Thorne
N Thorne
O Sayeed
PA Garrity
RF Foelix
SH Kim
SQ Le
SR Pulver
ST Sweeney
T Zars
V Viswanath
Y Xiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Solving the master equation for Indels

Author: A Bouchard-Côté
A Bouchard-Côté
A Caspi
A Hobolth
A Novak
A Siepel
A Siepel
A Siepel
A Stamatakis
AJ Drummond
AJ Drummond
B Knudsen
B Knudsen
B Knudsen
B Redelings
BD Redelings
BD Redelings
BE Engelhardt
C Burge
C Drosten
CB Do
CE Hinchliff
CJ Michel
CL Strope
D Metzler
D Sankoff
D Sankoff
DA Liberles
DB Searls
DG Arquès
DG Arquès
E Benard
E Benard
E Birney
E Eskin
E Rivas
E Rivas
E Rivas
EA Gaucher
EA Ortlund
F Bielejec
G Hickey
G Lunter
G McGuire
GA Lunter
GA Lunter
GH Gonnet
GH Mealy
H Ashkenazy
H Matsui
HA Schmidt
I Holmes
I Holmes
I Holmes
I Miklós
I Miklós
Ian H. Holmes
IH Holmes
IM Meyer
J Bérard
J Felsenstein
J Felsenstein
J Felsenstein
J Hein
J Kim
J Santiago-Ortiz
J Wang
JA Ugalde
JL Thorne
JL Thorne
JM Bahi
JP McCrow
JS Pedersen
JS Pedersen
JS Pedersen
K Ezawa
K Ezawa
K Ezawa
K Ezawa
K Yamane
KS Pollard
LE Williams
LJ Pollock
M Blanchette
M Hasegawa
M Hsing
M Kimura
M Mohri
M Worobey
MA Suchard
MS Chang
N Goldman
O Westesson
O Westesson
O Westesson
OG Pybus
P Arunapuram
P Liò
PM Zakas
PS Klosterman
RA Cartwright
RA Cartwright
RF Schwarz
RK Bradley
S Hohna
S Lèbre
SA Benner
SL Pond
TH Jukes
U Alcolombri
W Feller
W Fletcher
X Gu
X Gu
Y Fan
Z Yang
Z Yang
Z Yang
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2017
Field of study

Abstract Background Despite the long-anticipated possibility of putting sequence alignment on the same footing as statistical phylogenetics, theorists have struggled to develop time-dependent evolutionary models for indels that are as tractable as the analogous models for substitution events. Main text This paper discusses progress in the area of insertion-deletion models, in view of recent work by Ezawa (BMC Bioinformatics 17:304, 2016); (BMC Bioinformatics 17:397, 2016); (BMC Bioinformatics 17:457, 2016) on the calculation of time-dependent gap length distributions in pairwise alignments, and current approaches for extending these approaches from ancestor-descendant pairs to phylogenetic trees. Conclusions While approximations that use finite-state machines (Pair HMMs and transducers) currently represent the most practical approach to problems such as sequence alignment and phylogeny, more rigorous approaches that work directly with the matrix exponential of the underlying continuous-time Markov chain also show promise, especially in view of recent advances

Crossref

Ezid

Directory of Open Access Journals

eScholarship - University of California

Solving the master equation for Indels

Author: A Bouchard-Côté
A Bouchard-Côté
A Caspi
A Hobolth
A Novak
A Siepel
A Siepel
A Siepel
A Stamatakis
AJ Drummond
AJ Drummond
B Knudsen
B Knudsen
B Knudsen
B Redelings
BD Redelings
BD Redelings
BE Engelhardt
C Burge
C Drosten
CB Do
CE Hinchliff
CJ Michel
CL Strope
D Metzler
D Sankoff
D Sankoff
DA Liberles
DB Searls
DG Arquès
DG Arquès
E Benard
E Benard
E Birney
E Eskin
E Rivas
E Rivas
E Rivas
EA Gaucher
EA Ortlund
F Bielejec
G Hickey
G Lunter
G McGuire
GA Lunter
GA Lunter
GH Gonnet
GH Mealy
H Ashkenazy
H Matsui
HA Schmidt
I Holmes
I Holmes
I Holmes
I Miklós
I Miklós
Ian H. Holmes
IH Holmes
IM Meyer
J Bérard
J Felsenstein
J Felsenstein
J Felsenstein
J Hein
J Kim
J Santiago-Ortiz
J Wang
JA Ugalde
JL Thorne
JL Thorne
JM Bahi
JP McCrow
JS Pedersen
JS Pedersen
JS Pedersen
K Ezawa
K Ezawa
K Ezawa
K Ezawa
K Yamane
KS Pollard
LE Williams
LJ Pollock
M Blanchette
M Hasegawa
M Hsing
M Kimura
M Mohri
M Worobey
MA Suchard
MS Chang
N Goldman
O Westesson
O Westesson
O Westesson
OG Pybus
P Arunapuram
P Liò
PM Zakas
PS Klosterman
RA Cartwright
RA Cartwright
RF Schwarz
RK Bradley
S Hohna
S Lèbre
SA Benner
SL Pond
TH Jukes
U Alcolombri
W Feller
W Fletcher
X Gu
X Gu
Y Fan
Z Yang
Z Yang
Z Yang
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The tree alignment problem

Author: A Varón
A Varón
Andrés Varón
B Morgenstern
B Schwikowski
B Schwikowski
BD Redelings
Cartwright R A
CB Do
D Sankoff
D Sankoff
D Sankoff
DR Powell
E Ukkonen
F Yue
G Lancia
G Lancia
J Hein
J Hein
JD Thompson
K Katoh
K Liu
L Wang
L Wang
L Wang
L Wang
MS Waterman
MSS Chang
O Gotoh
R Fleissner
R Ravi
RA Cartwright
RC Edgar
S Lehtonen
S Nelesen
SA Benner
SB Needleman
TH Ogden
Ward C Wheeler
WC Wheeler
WC Wheeler
WC Wheeler
WC Wheeler
WC Wheeler
WC Wheeler
X Gu
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref