Search CORE

117 research outputs found

Meta-Alignment with Crumble and Prune: Partitioning very large alignment problems for performance and parallelization

Author: A Siepel
A Siepel
AS Schwartz
B Paten
B Paten
B Rhead
Benedict Paten
C Lee
CN Dewey
David Haussler
DF Feng
G Myers
I Lumb
J Ma
JE Stajich
JS Pedersen
K Katoh
K Katoh
K Kryukov
K Liu
K Reinert
KM Roskin
Krishna M Roskin
M Blanchette
M Hasegawa
M Waterman
N Bray
P Di Tommaso
RC Edgar
RK Bradley
S Griffiths-Jones
S Schwartz
T Kim
U Tönges
W Gentzsch
WJ Kent
WJ Kent
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Continuing research into the global multiple sequence alignment problem has resulted in more sophisticated and principled alignment methods. Unfortunately these new algorithms often require large amounts of time and memory to run, making it nearly impossible to run these algorithms on large datasets. As a solution, we present two general methods, Crumble and Prune, for breaking a phylogenetic alignment problem into smaller, more tractable sub-problems. We call Crumble and Prune <it>meta-alignment </it>methods because they use existing alignment algorithms and can be used with many current alignment programs. Crumble breaks long alignment problems into shorter sub-problems. Prune divides the phylogenetic tree into a collection of smaller trees to reduce the number of sequences in each alignment problem. These methods are orthogonal: they can be applied together to provide better scaling in terms of sequence length and in sequence depth. Both methods partition the problem such that many of the sub-problems can be solved independently. The results are then combined to form a solution to the full alignment problem. Results Crumble and Prune each provide a significant performance improvement with little loss of accuracy. In some cases, a gain in accuracy was observed. Crumble and Prune were tested on real and simulated data. Furthermore, we have implemented a system called Job-tree that allows hierarchical sub-problems to be solved in parallel on a compute cluster, significantly shortening the run-time. Conclusions These methods enabled us to solve gigabase alignment problems. These methods could enable a new generation of biologically realistic alignment algorithms to be applied to real world, large scale alignment problems.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Accurate reconstruction of insertion-deletion histories by statistical phylogenetics

Author: A Heger
A Löytynoja
A Löytynoja
A Siepel
A Siepel
A Siepel
AG Clark
AM Moses
Art F. Y. Poon
B Knudsen
B Paten
B Rannala
Benedict Paten
C Lee
C Strope
DG Higgins
EF Moore
FA Matsen
FR Kschischang
G Lunter
Gerton Lunter
I Holmes
I Miklós
Ian Holmes
J Felsenstein
JD Thompson
JL Thorne
JL Thorne
JS Pedersen
K Katoh
K Liu
KM Wong
KS Pollard
L Gomez-Valero
L Zhu
M Larkin
M Mohri
MA Suchard
N de la Chaux
O Kamneva
O Westesson
Oscar Westesson
P Markova-Raina
R Mills
RA Cartwright
RC Edgar
RK Bradley
RK Bradley
S Nelesen
S Saccone
S Sinha
T Beissbarth
X Qu
Z Wang
Z Yang
Z Yang
Z Yang
Z Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.Comment: 28 pages, 15 figures. arXiv admin note: text overlap with arXiv:1103.434

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

FigShare

webPRANK: a phylogeny-aware multiple sequence aligner with interactive alignment browser

Author: A Löytynoja
A Löytynoja
A Löytynoja
Ari Löytynoja
B Paten
C Dessimoz
C Kosiol
D Maddison
H McWilliam
J Felsenstein
K Wong
M Hasegawa
Nick Goldman
R Development Core Team
S Whelan
W Fletcher
W Pearson
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Phylogeny-aware progressive alignment has been found to perform well in phylogenetic alignment benchmarks and to produce superior alignments for the inference of selection on codon sequences. Its implementation in the PRANK alignment program package also allows modelling of complex evolutionary processes and inference of posterior probabilities for sequence sites evolving under each distinct scenario, either simultaneously with the alignment of sequences or as a post-processing step for an existing alignment. This has led to software with many advanced features, and users may find it difficult to generate optimal alignments, visualise the full information in their alignment results, or post-process these results, e.g. by objectively selecting subsets of alignment sites. Results We have created a web server called webPRANK that provides an easy-to-use interface to the PRANK phylogeny-aware alignment algorithm. The webPRANK server supports the alignment of DNA, protein and codon sequences as well as protein-translated alignment of cDNAs, and includes built-in structure models for the alignment of genomic sequences. The resulting alignments can be exported in various formats widely used in evolutionary sequence analyses. The webPRANK server also includes a powerful web-based alignment browser for the visualisation and post-processing of the results in the context of a cladogram relating the sequences, allowing (e.g.) removal of alignment columns with low posterior reliability. In addition to <it>de novo </it>alignments, webPRANK can be used for the inference of ancestral sequences with phylogenetically realistic gap patterns, and for the annotation and post-processing of existing alignments. The webPRANK server is freely available on the web at <url>http://tinyurl.com/webprank</url> . Conclusions The webPRANK server incorporates phylogeny-aware multiple sequence alignment, visualisation and post-processing in an easy-to-use web interface. It widens the user base of phylogeny-aware multiple sequence alignment and allows the performance of all alignment-related activity for small sequence analysis projects using only a standard web browser.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Fully Phased Human Genome Assembly without Parental Data using Single-Cell Strand Sequencing and Long Reads

Author: Audano P.
Chaisson M.
Devine S.
Ebert P.
Ebler J.
Eichler E.
Ghareghani M.
Harvey W.
Haukness M.
Korbel J.
Lansdorp P.
Lee C.
Marijon P.
Marschall T.
Munson K.
Paten B.
Porubsky D.
Sanders A.
Sorensen M.
Sulovari A.
Vollger M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

MPG.PuRe

Alignathon: A competitive assessment of whole-genome alignment methods

Author: Beal K
Brudno M
Chang JM
Clawson H
Darling AE
Dubchak I
Earl D
Erb I
Fitzgerald S
Harris RS
Haussler D
Herrero J
Hickey G
Hou M
Kemena C
Kent WJ
Kim J
Ma J
Molodtsov V
Nguyen N
Notredame C
Paten B
Poliakov A
Raney BJ
Seledtsov I
Solovyev V
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/2014
Field of study

© 2014 Earl et al. Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments

Crossref

OPUS - University of Technology Sydney

PubMed Central

eScholarship - University of California

Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model

Author: A Subramanian
B Morgenstern
B Morgenstern
B Paten
C Notredame
C Peng
DN Cooper
E Segal
F Rodríguez
G Baele
Gayathri Jayaraman
GD Stormo
GR Reeck
GZ Hertz
J Felsenstein
J Kim
J Pei
J Thorne
J Zhu
JD Thompson
JL Thorne
K Tamura
K Tamura
M Brudno
M Hasegawa
M Kimura
M Kimura
M Larkin
M Steel
N Bray
PF Arndt
R Siddharthan
R Siddharthan
R Siddharthan
Rahul Siddharthan
RC Edgar
RK Bradley
S Padmanabhan
S Sinha
S Tavaré
T Jukes
T Lassmann
T Uzzell
TF Smith
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of <it>homology </it>and not <it>similarity</it>, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a <it>p</it>-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the <it>p</it>-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence. Results We demonstrate that, on real and synthetic data, Sigma-2 significantly outperforms other programs in specificity to genuine homology (that is, it minimises alignment of spuriously similar regions that do not have a common ancestry) while it is now as sensitive as the best current programs. Conclusions Comparing these results with an extrapolation of the best results from other available programs, we suggest that conservation rates in intergenic DNA are often significantly over-estimated. It is increasingly important to align non-coding DNA correctly, in regulatory genomics and in the context of whole-genome alignment, and Sigma-2 is an important step in that direction.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Development of competences while solving real industrial interdisciplinary problems: a successful cooperation with industry

Author: Bellmunt O. G.
Chinowsky P. S.
Chung P.
Crosthwaite C.
Deshpande A. A.
Diana Mesquita
Fernandes S.
Graaff E. d.
Hansen R. C.
Healy A.
Hung I. W.
José Dinis-Carvalho
Kanigolla D.
Kanigolla D.
Kim M. S.
Kolmos A.
Lau H. Y. K.
Lau H. Y. K.
Lau H. Y. K.
Le Boterf G.
Lima R. M.
Lima R. M.
Lima R. M.
Lima R. M.
Lima R. M.
Massey A. P.
McCarthy M.
Mesquita D.
Mesquita D.
Paten C. J. K.
Pedro Arezes
Pomales-García C.
Powell P. C.
Rui Manuel Lima
Rui Manuel Sousa
Shingo S.
Soares F. O.
Xu X.
Zabalza M.
Publication venue: 'FapUNIFESP (SciELO)'
Publication date: 01/01/2017
Field of study

The development of projects in industrial context constitutes an exceptional opportunity for engineering students to develop competences expected by the labour market. Therefore, the adoption of this type of interaction within engineering curricula is highly recommended, not only at the end of the degree, but also in the previous years. The main purpose of this paper is to present and analyse a Project-Based Learning (PBL) semester in which six teams of Industrial Engineering and Management (IEM) students integrate different areas of knowledge, while solving real problems of five companies, emphasizing the technical solutions developed by the students and the feedback provided by the companies. Students' feedback will be also addressed. The main outcomes of this study reveal that most of the technical solutions lie in areas of Lean applications and ergonomic improvement of workplaces. Companies were very pleased with the results of this type of University-Business Cooperation (UBC).This work was funded by COMPETE-POCI-01-0145-FEDER-007043 and FCT-UID-CEC-00319-2013info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Crossref

The landscape of Neandertal ancestry in present-day humans

Author: A Keinan
B Paten
C Sutton
C-I Wu
D Reich
DB Percival
DC Presgraves
FL Mendez
FL Mendez
G Hellenthal
G McVicker
HA Orr
HR Kunsch
J Lachance
JAOHA Coyne
JD Wall
JM Good
K Prüfer
K Prüfer
KD Pruitt
L Abi-Rached
LA Hindorff
M Ashburner
M Meyer
PK Tucker
RE Green
RH Byrd
RR Hudson
S Anders
S Gravel
S Myers
S Sankararaman
T Derrien
The 1000 Genomes Project Consortium
V Yotova
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Analyses of Neandertal genomes have revealed that Neandertals have contributed genetic variants to modern humans1–2. The antiquity of Neandertal gene flow into modern humans means that regions that derive from Neandertals in any one human today are usually less than a hundred kilobases in size. However, Neandertal haplotypes are also distinctive enough that several studies have been able to detect Neandertal ancestry at specific loci1,3–8. Here, we have systematically inferred Neandertal haplotypes in the genomes of 1,004 present-day humans12. Regions that harbor a high frequency of Neandertal alleles in modern humans are enriched for genes affecting keratin filaments suggesting that Neandertal alleles may have helped modern humans adapt to non-African environments. Neandertal alleles also continue to shape human biology, as we identify multiple Neandertal-derived alleles that confer risk for disease. We also identify regions of millions of base pairs that are nearly devoid of Neandertal ancestry and enriched in genes, implying selection to remove genetic material derived from Neandertals. Neandertal ancestry is significantly reduced in genes specifically expressed in testis, and there is an approximately 5-fold reduction of Neandertal ancestry on chromosome X, which is known to harbor a disproportionate fraction of male hybrid sterility genes20–22. These results suggest that part of the reduction in Neandertal ancestry near genes is due to Neandertal alleles that reduced fertility in males when moved to a modern human genetic background

Crossref

Harvard University - DASH

PubMed Central

eScholarship - University of California

MPG.PuRe

Evidence for intron length conservation in a set of mammalian genes associated with embryonic development

Author: A Nott
A Oates
A Vinogradov
APO Aulehla
B Paten
C Castillo-Davis
Cathal Seoighe
D Huang
D Huang
D Larson
D Rearick
EAR Zdobnov
I Letunic
I Swinburne
J Mattick
L Fedorova
M Lynch
M Lynch
P Flicek
Paul K Korir
R Elkon
R Waterhouse
S Boireau
S Lu
T Brend
T Hubbard
T Tange
Y Bessho
Y Takashima
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background We carried out an analysis of intron length conservation across a diverse group of nineteen mammalian species. Motivated by recent research suggesting a role for time delays associated with intron transcription in gene expression oscillations required for early embryonic patterning, we searched for examples of genes that showed the most extreme conservation of total intron content in mammals. Results Gene sets annotated as being involved in pattern specification in the early embryo or containing the homeobox DNA-binding domain, were significantly enriched among genes with highly conserved intron content. We used ancestral sequences reconstructed with probabilistic models that account for insertion and deletion mutations to distinguish insertion and deletion events on lineages leading to human and mouse from their last common ancestor. Using a randomization procedure, we show that genes containing the homeobox domain show less change in intron content than expected, given the number of insertion and deletion events within their introns. Conclusions Our results suggest selection for gene expression precision or the existence of additional development-associated genes for which transcriptional delay is functionally significant.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Access to Research at National University of Ireland, Galway

Fast and robust multiple sequence alignment with phylogeny-aware gap placement

Author: A Biegert
A Löytynoja
A Löytynoja
A Löytynoja
A Viterbi
Adam M Szalkowski
AM Altenhoff
AM Szalkowski
B Paten
C Dessimoz
C Grasso
C Lee
D Robinson
DA Dalquen
G Gonnet
GH Gonnet
GH Gonnet
GW Stuart
J Felsenstein
JD Thompson
JD Thompson
JL Thorne
JM Sauder
K Katoh
M Anisimova
M Kimura
O Gascuel
O Gotoh
R Durbin
RC Edgar
S Pascarella
S Whelan
SA Benner
SB Needleman
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref