Search CORE

38 research outputs found

Longest common substrings with k mismatches

Author: Flouri Tomas
Giaquinta Emanuele
Kobert Kassian
Ukkonen Esko
Publication venue
Publication date: 01/01/2015
Field of study

The longest common substring with k-mismatches problem is to find, given two strings S-1 and S-2, a longest substring A(1) of S-1 and A(2) of S-2 such that the Hamming distance between A(1) and A(2) isPeer reviewe

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

Correction to: Bayesian Phylogenetic Inference using Relaxed-clocks and the Multispecies Coalescent

Author: Flouri Tomas
Huang Jun
Jiao Xiyun
Kapli Paschalia
Rannala Bruce
Yang Ziheng
Publication venue: OXFORD UNIV PRESS
Publication date: 06/12/2022
Field of study

UCL Discovery

ModelTest-NG: A New and Scalable Tool for the Selection of DNA and Protein Evolutionary Models

Author: Darriba Diego
Flouri Tomas
Kozlov Alexey M.
Morel Benoit
Posada David
Stamatakis Alexandros
Publication venue: Oxford University Press
Publication date: 01/01/2019
Field of study

ModelTest-NG is a reimplementation fromscratch of jModelTest and ProtTest, two popular tools for selecting the best-fit nucleotide and amino acid substitution models, respectively. ModelTest-NG is one to two orders of magnitude faster than jModelTest and ProtTest but equally accurate and introduces several new features, such as ascertainment bias correction, mixture, and free-rate models, or the automatic processing of single partitions

Repositorio da Universidade da Coruña

Investigo

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

KITopen

UCL Discovery

VSEARCH: a versatile open source tool for metagenomics

Author: Flouri Tomas
Mahe Frederic
Nichols Ben
Quince Christopher
Rognes Torbjorn
Publication venue: PeerJ
Publication date: 01/01/2016
Field of study

Background. VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool (Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily described, and only a memory-confined 32-bit version is freely available for academic use. Methods. When searching nucleotide sequences, VSEARCH uses a fast heuristic based on words shared by the query and target sequences in order to quickly identify similar sequences, a similar strategy is probably used in USEARCH. VSEARCH then performs optimal global sequence alignment of the query against potential target sequences, using full dynamic programming instead of the seed-and-extend heuristic used by USEARCH. Pairwise alignments are computed in parallel using vectorisation and multiple threads. Results. VSEARCH includes most commands for analysing nucleotide sequences available in USEARCH version 7 and several of those available in USEARCH version 8, including searching (exact or based on global alignment), clustering by similarity (using length pre-sorting, abundance pre-sorting or a user-defined order), chimera detection (reference-based or de novo), dereplication (full length or prefix), pairwise alignment, reverse complementation, sorting, and subsampling. VSEARCH also includes commands for FASTQ file processing, i.e., format detection, filtering, read quality statistics, and merging of paired reads. Furthermore, VSEARCH extends functionality with several new commands and improvements, including shuffling, rereplication, masking of low-complexity sequences with the well-known DUST algorithm, a choice among different similarity definitions, and FASTQ file format conversion. VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with USEARCH for paired-ends read merging. VSEARCH is slower than USEARCH when performing clustering and chimera detection, but significantly faster when performing paired-end reads merging and dereplication. VSEARCH is available at https://github.com/torognes/vsearch under either the BSD 2-clause license or the GNU General Public License version 3.0. Discussion. VSEARCH has been shown to be a fast, accurate and full-fledged alternativeto USEARCH. A free and open-source versatile tool for sequence analysis is nowavailable to the metagenomics community

Crossref

KITopen

Directory of Open Access Journals

PubMed Central

Agritrop

Warwick Research Archives Portal Repository

Enlighten

NORA - Norwegian Open Research Archives

University of East Anglia digital repository

The Phylogenetic Likelihood Library

Author: Aberer Andre J.
Darriba Diego
Flouri Tomas
Haeseler Arndt von
Izquierdo-Carrasco F.
Minh B.Q.
Nguyen Lam-Tung
Stamatakis Alexandros
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

[Abstract] We introduce the Phylogenetic Likelihood Library (PLL), a highly optimized application programming interface for developing likelihood-based phylogenetic inference and postanalysis software. The PLL implements appropriate data structures and functions that allow users to quickly implement common, error-prone, and labor-intensive tasks, such as likelihood calculations, model parameter as well as branch length optimization, and tree space exploration. The highly optimized and parallelized implementation of the phylogenetic likelihood function and a thorough documentation provide a framework for rapid development of scalable parallel phylogenetic software. By example of two likelihood-based phylogenetic codes we show that the PLL improves the sequential performance of current software by a factor of 2–10 while requiring only 1 month of programming time for integration. We show that, when numerical scaling for preventing floating point underflow is enabled, the double precision likelihood calculations in the PLL are up to 1.9 times faster than those in BEAGLE. On an empirical DNA dataset with 2000 taxa the AVX version of PLL is 4 times faster than BEAGLE (scaling enabled and required).DFG, German Research Foundation; STA/860-4. F.I.-C.DFG, German Research Foundation; STA/860-3DFG, German Research Foundation; STA/860-2. L.-T.N.University of Vienna; I059-NAustrian Science Fund; I760-B1

Repositorio da Universidade da Coruña

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Genome-scale data reveal deep lineage divergence and a complex demographic history in the Texas horned lizard (Phrynosoma cornutum) throughout the southwestern and central US

Author: Blair Christopher
Bracken Jason T.
Charran Tristan
Farleigh Keaka
Finger Nicholas
Flouri Tomas
François Olivier
Jezkova Tereza
Leaché Adam D.
Williams Dean A.
Yang Ziheng
Publication venue: CUNY Academic Works
Publication date: 26/11/2021
Field of study

The southwestern and central US serve as an ideal region to test alternative hypotheses regarding biotic diversification. Genomic data can now be combined with sophisticated computational models to quantify the impacts of paleoclimate change, geographic features, and habitat heterogeneity on spatial patterns of genetic diversity. In this study we combine thousands of genotyping-by-sequencing (GBS) loci with mtDNA sequences (ND1) from the Texas Horned Lizard (Phrynosoma cornutum) to quantify relative support for different catalysts of diversification. Phylogenetic and clustering analyses of the GBS data indicate support for at least three primary populations. The spatial distribution of populations appears concordant with habitat type, with desert populations in Arizona and New Mexico showing the largest genetic divergence from the remaining populations. The mtDNA data also support a divergent desert population, but other relationships differ and suggest mtDNA introgression. Genotype-environment association with bioclimatic variables support divergence along precipitation gradients more than along temperature gradients. Demographic analyses support a complex history, with introgression and gene flow playing an important role during diversification. Bayesian multispecies coalescent analyses with introgression (MSci) analyses also suggest that gene flow occurred between populations. Paleo-species distribution models support two southern refugia that geographically correspond to contemporary lineages. We find that divergence times are underestimated and population sizes are over-estimated when introgression occurred and is ignored in coalescent analyses, and furthermore, inference of ancient introgression events and demographic history is sensitive to inclusion of a single recently admixed sample. Our analyses cannot refute the riverine barrier or glacial refugia hypotheses. Results also suggest that populations are continuing to diverge along habitat gradients. Finally, the strong evidence of admixture, gene flow, and mtDNA introgression among populations suggests that P. cornutum should be considered a single widespread species under the General Lineage Species Concept

City University of New York

UCL Discovery

PubMed Central

An optimal algorithm for computing all subtree repeats in trees

Author: Flouri Tomas
Kobert Kassian
Pissis Solon
Stamatakis Alexandros
Publication venue: 'The Royal Society'
Publication date: 01/01/2013
Field of study

Given a labelled tree T, our goal is to group repeating subtrees of T into equivalence classes with respect to their topologies and the node labels. We present an explicit, simple and time-optimal algorithm for solving this problem for unrooted unordered labelled trees and show that the running time of our method is linear with respect to the size of T. By unordered, we mean that the order of the adjacent nodes (children/neighbours) of any node of T is irrelevant. An unrooted tree T does not have a node that is designated as root and can also be referred to as an undirected tree. We show how the presented algorithm can easily be modified to operate on trees that do not satisfy some or any of the aforementioned assumptions on the tree structure; for instance, how it can be applied to rooted, ordered or unlabelled trees

Crossref

PubMed Central

King's Research Portal

Recommended from our members

Correction to: A Bayesian Implementation of the Multispecies Coalescent Model with Introgression for Phylogenomic Analysis.

Author: Flouri Tomas
Jiao Xiyun
Rannala Bruce
Yang Ziheng
Publication venue: eScholarship, University of California
Publication date: 01/11/2022
Field of study

eScholarship - University of California

DynMap: mapping short reads to multiple related genomes

Author: Flouri Tomas
Iliopoulos Costas S.
Pissis Solon P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Crossref

King's Research Portal

An algorithm for mapping short reads to a dynamically changing genomic sequence

Author: Flouri Tomas
Holub Jan
Iliopoulos Costas S.
Pissis Solon P.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Crossref

King's Research Portal