Search CORE

26 research outputs found

Fast Algorithms for Large-Scale Phylogenetic Reconstruction

Author: Truszkowski Jakub
Publication venue: 'University of Waterloo'
Publication date: 01/01/2013
Field of study

One of the most fundamental computational problems in biology is that of inferring evolutionary histories of groups of species from sequence data. Such evolutionary histories, known as phylogenies are usually represented as binary trees where leaves represent extant species, whereas internal nodes represent their shared ancestors. As the amount of sequence data available to biologists increases, very fast phylogenetic reconstruction algorithms are becoming necessary. Currently, large sequence alignments can contain up to hundreds of thousands of sequences, making traditional methods, such as Neighbor Joining, computationally prohibitive. To address this problem, we have developed three novel fast phylogenetic algorithms. The first algorithm, QTree, is a quartet-based heuristic that runs in O(n log n) time. It is based on a theoretical algorithm that reconstructs the correct tree, with high probability, assuming every quartet is inferred correctly with constant probability. The core of our algorithm is a balanced search tree structure that enables us to locate an edge in the tree in O(log n) time. Our algorithm is several times faster than all the current methods, while its accuracy approaches that of Neighbour Joining. The second algorithm, LSHTree, is the first sub-quadratic time algorithm with theoretical performance guarantees under a Markov model of sequence evolution. Our new algorithm runs in O(n^{1+γ(g)} log^2 n) time, where γ is an increasing function of an upper bound on the mutation rate along any branch in the phylogeny, and γ(g) < 1 for all g. For phylogenies with very short branches, the running time of our algorithm is close to linear. In experiments, our prototype implementation was more accurate than the current fast algorithms, while being comparably fast. In the final part of this thesis, we apply the algorithmic framework behind LSHTree to the problem of placing large numbers of short sequence reads onto a fixed phylogenetic tree. Our initial results in this area are promising, but there are still many challenges to be resolved

University of Waterloo's Institutional Repository

Rapidly Computing the Phylogenetic Transfer Index

Author: Gascuel Olivier
Swenson Krister M.
Truszkowski Jakub
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)
Publication date: 01/01/2019
Field of study

Given trees T and T_o on the same taxon set, the transfer index phi(b,T_o) is the number of taxa that need to be ignored so that the bipartition induced by branch b in T is equal to some bipartition in T_o. Recently, Lemoine et al. [Lemoine et al., 2018] used the transfer index to design a novel bootstrap analysis technique that improves on Felsenstein\u27s bootstrap on large, noisy data sets. In this work, we propose an algorithm that computes the transfer index for all branches b in T in O(n log^3 n) time, which improves upon the current O(n^2)-time algorithm by Lin, Rajan and Moret [Lin et al., 2012]. Our implementation is able to process pairs of trees with hundreds of thousands of taxa in minutes and considerably speeds up the method of Lemoine et al. on large data sets. We believe our algorithm can be useful for comparing large phylogenies, especially when some taxa are misplaced (e.g. due to horizontal gene transfer, recombination, or reconstruction errors)

HAL Descartes

Dagstuhl Research Online Publication Server

HAL-Pasteur

Hal-Diderot

New decoding algorithms for Hidden Markov Models using distance measures on labellings

Author: A Krogh
B Brejová
Daniel G Brown
ELL Sonnhammer
GE Tusnady
Jakub Truszkowski
L Käll
L Käll
M Stanke
P Fariselli
R Durbin
SL Cawley
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Existing hidden Markov model decoding algorithms do not focus on approximately identifying the sequence feature boundaries. Results We give a set of algorithms to compute the conditional probability of all labellings "near" a reference labelling <it>λ </it>for a sequence <it>y </it>for a variety of definitions of "near". In addition, we give optimization algorithms to find the best labelling for a sequence in the robust sense of having all of its feature boundaries nearly correct. Natural problems in this domain are <it>NP</it>-hard to optimize. For membrane proteins, our algorithms find the approximate topology of such proteins with comparable success to existing programs, while being substantially more accurate in estimating the positions of transmembrane helix boundaries. Conclusion More robust HMM decoding may allow for better analysis of sequence features, in reasonable runtimes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Studia z Dziejów Państwa i Prawa Polskiego T. XXI Badania nad rozwojem instytucji politycznych i prawnych

Author: Bednaruk Waldemar
Bieda Justyna
Cetwiński Marek
Czech-Jezierska Bożena Anna
Dworas-Kulik Judyta
Gajewska Jolanta
Gałędek Michał
Graczyk Konrad
Kazimierczuk Marcin
Kozyra Waldemar
Kruszewski Tomasz
Krzysztofek Katarzyna
Machut-Kowalczyk Joanna
Majdański Paweł
Marszałek Piotr Krzysztof
Mataniak Mateusz
Mielnik Hubert
Pokoj Jakub
Pyter Magdalena
Sitek Bronisław
Szewczak-Daniel Mariola
Truszkowski Bartosz
Wałdoch Jacek
Ługowski Bartłomiej
Publication venue: Oficyna Wydawnicza AFM
Publication date: 01/01/2018
Field of study

Tytuł finansowany przez: Krakowską Akademię im. Andrzeja Frycza Modrzewskiego, Uniwersytet im. Marii Curie-Skłodowskiej w Lublinie, Uniwersytet Gdański, Uniwersytet Warszawsk

Repozytorium Instytucjonalne Krakowskiej Akademii

A machine learning approach to estimating the geographical origin of timber

Author: Truszkowski Jakub
Publication venue: 'Center for Open Science'
Publication date
Field of study

Ezid

Computing the probability of gene trees concordant with the species tree in the multispecies coalescent

Author: Pardi Fabio
Scornavacca Celine
Truszkowski Jakub
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

International audienceThe multispecies coalescent process models the genealogical relationships of genes sampled from several species, enabling useful predictions about phenomena such as the discordance between a gene tree and the species phylogeny due to incomplete lineage sorting. Conversely, knowledge of large collections of gene trees can inform us about several aspects of the species phylogeny, such as its topology and ancestral population sizes. A fundamental open problem in this context is how to e ciently compute the probability of a gene tree topology, given the species phylogeny. Although a number of algorithms for this task have been proposed, they either produce approximate results, or, when they are exact, they do not scale to large data sets. In this paper, we present some progress towards exact and e cient computation of the probability of a gene tree topology. We provide a new algorithm that, given a species tree and the number of genes sampled for each species, calculates the probability that the gene tree topology will be concordant with the species tree. Moreover, we provide an algorithm that computes the probability of any specific gene tree topology concordant with the species tree. Both algorithms run in polynomial time and have been implemented in Python. Experiments show that they are able to analyse data sets where thousands of genes are sampled in a matter of minutes to hours

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

HAL-IRD

HAL-CIRAD