Search CORE

47 research outputs found

Ultra-fast sequence clustering from similarity networks with SiLiX

Author: A Krishnamurthy
AJ Enright
AJ Vilella
AY Signorovitch
F Servant
H Li
HJ Atkinson
I Katriel
J Ruan
JL Boore
JM Joseph
KD Pruitt
Laurent Duret
MH Alsuwaiyel
PK Wall
PS Dehal
R Petryszak
R Tarjan
RD Finn
RE Tarjan
S Hartmann
S Hunter
S Penel
S Vishwanathan
SF Altschul
Simon Penel
SK Das
T Meinel
T Wittkop
Vincent Miele
Y Bramoulle
Y Han
Y Loewenstein
Y Tian
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The number of gene sequences that are available for comparative genomics approaches is increasing extremely quickly. A current challenge is to be able to handle this huge amount of sequences in order to build families of homologous sequences in a reasonable time. Results We present the software package <monospace>SiLiX</monospace> that implements a novel method which reconsiders single linkage clustering with a graph theoretical approach. A parallel version of the algorithms is also presented. As a demonstration of the ability of our software, we clustered more than 3 millions sequences from about 2 billion BLAST hits in 7 minutes, with a high clustering quality, both in terms of sensitivity and specificity. Conclusions Comparing state-of-the-art software, <monospace>SiLiX</monospace> presents the best up-to-date capabilities to face the problem of clustering large collections of sequences. <monospace>SiLiX</monospace> is freely available at <url>http://lbbe.univ-lyon1.fr/SiLiX</url>.</p

Crossref

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

HAL Descartes

Building a History of Horizontal Gene Transfer in E. Coli

Author: Wilber Matthew
Publication venue: Scholarship @ Claremont
Publication date: 01/01/2016
Field of study

Bacteria\u27s ability to pass entire genes between one another, a process called Horizontal Gene Transfer (HGT), has a major impact on bacterial evolution. In an ongoing project at Harvey Mudd, computational methods have been used to catalogue the HGT events that have impacted a group of closely related bacteria. This thesis builds on that project, by improving our ability to identify gene families --- groups of genes in different strains that are related. Previously, similarity was measured only by comparing two genes\u27 DNA sequences, ignoring their positions on the organism\u27s DNA. Here, we leverage genes\u27 relative position to make a better measurement of gene similarity. These improved similarity measurements will improve the existing pipeline\u27s ability to identify HGT events

Scholarship@Claremont

Complete Genome Sequence of \u3ci\u3eBurkholderia phymatum\u3c/i\u3e STM815T , a Broad Host Range and Efficient Nitrogen-Fixing Symbiont of \u3ci\u3eMimosa\u3c/i\u3e Species

Author: Bena Gilles
Booth Kristina
Bristow Jim
Bruce David
Caroline Bournaud
Chain Patrick
Copeland Alex
Hauser Loren
James Euan K.
Klonowska Agnieszka
Kyrpides Nikos
Land Miriam
Lizotte-Waniewski Michelle
Melkonian Rémy
Moulin Lionel
Pitluck Sam
Riley Margaret
Vriezen Jan A.C.
Woyke Tanja
Young Peter W.
Publication venue: Smith ScholarWorks
Publication date: 01/01/2014
Field of study

Burkholderia phymatum is a soil bacterium able to develop a nitrogen-fixing symbiosis with species of the legume genus Mimosa, and is frequently found associated specifically with Mimosa pudica. The type strain of the species, STM 815T , was isolated from a root nodule in French Guiana in 2000. The strain is an aerobic, motile, non-spore forming, Gram-negative rod, and is a highly competitive strain for nodulation compared to other Mimosa symbionts, as it also nodulates a broad range of other legume genera and species. The 8,676,562 bp genome is composed of two chromosomes (3,479,187 and 2,697,374 bp), a megaplasmid (1,904,893 bp) and a plasmid hosting the symbiotic functions (595,108 bp)

Smith College: Smith ScholarWorks

Pick Your Poison: Molecular Evolution of Venom Proteins in Asilidae (Insecta: Diptera)

Author: Brewer Michael
Cole T. Jeffrey
Publication venue: 'MDPI AG'
Publication date: 24/11/2020
Field of study

ScholarShip

Complete Genome sequence of STM815, a broad host range and efficient nitrogen-fixing symbiont of species

Author
Publication venue: BioMed Central
Publication date
Field of study

Springer - Publisher Connector

Inference and reconstruction of the heimdallarchaeial ancestry of eukaryotes

Author: Caceres Eva F.
De Anda Valerie
Dombrowski Nina
Eme Laura
multiple additional authors
Reysenbach Anna-Louise
Schön Max E.
Seitz Kiley W.
Stairs Courtney W.
Tamarit Daniel
Publication venue: PDXScholar
Publication date: 01/04/2023
Field of study

In the ongoing debates about eukaryogenesis—the series of evolutionary events leading to the emergence of the eukaryotic cell from prokaryotic ancestors— members of the Asgard archaea play a key part as the closest archaeal relatives of eukaryotes1. However, the nature and phylogenetic identity of the last common ancestor of Asgard archaea and eukaryotes remain unresolved2–4. Here we analyse distinct phylogenetic marker datasets of an expanded genomic sampling of Asgard archaea and evaluate competing evolutionary scenarios using state-of-the-art phylogenomic approaches. We find that eukaryotes are placed, with high confidence, as a well-nested clade within Asgard archaea and as a sister lineage to Hodarchaeales, a newly proposed order within Heimdallarchaeia. Using sophisticated gene tree and species tree reconciliation approaches, we show that analogous to the evolution of eukaryotic genomes, genome evolution in Asgard archaea involved significantly more gene duplication and fewer gene loss events compared with other archaea. Finally, we infer that the last common ancestor of Asgard archaea was probably a thermophilic chemolithotroph and that the lineage from which eukaryotes evolved adapted to mesophilic conditions and acquired the genetic potential to support a heterotrophic lifestyle. Our work provides key insights into the prokaryote-to-eukaryote transition and a platform for better understanding the emergence of cellular complexity in eukaryotic cells

PDXScholar (Portland State University)

kClust: fast and sensitive clustering of large protein sequence databases

Author: Hauser Maria
Mayer Christian E.
Soeding Johannes
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: Fueled by rapid progress in high-throughput sequencing, the size of public sequence databases doubles every two years. Searching the ever larger and more redundant databases is getting increasingly inefficient. Clustering can help to organize sequences into homologous and functionally similar groups and can improve the speed, sensitivity, and readability of homology searches. However, because the clustering time is quadratic in the number of sequences, standard sequence search methods are becoming impracticable. Results: Here we present a method to cluster large protein sequence databases such as UniProt within days down to 20\%-30\% maximum pairwise sequence identity. kClust owes its speed and sensitivity to an alignment-free prefilter that calculates the cumulative score of all similar 6-mers between pairs of sequences, and to a dynamic programming algorithm that operates on pairs of similar 4-mers. To increase sensitivity further, kClust can run in profile-sequence comparison mode, with profiles computed from the clusters of a previous kClust iteration. kClust is two to three orders of magnitude faster than clustering based on NCBI BLAST, and on multidomain sequences of 20\%-30\% maximum pairwise sequence identity it achieves comparable sensitivity and a lower false discovery rate. It also compares favorably to CD-HIT and UCLUST in terms of false discovery rate, sensitivity, and speed. Conclusions: kClust fills the need for a fast, sensitive, and accurate tool to cluster large protein sequence databases to below 30\% sequence identity. kClust is freely available under GPL at ftp://toolkit.lmb.uni-muenchen.de/pub/kClust/

Key components of the eight classes of type IV secretion systems involved in bacterial conjugation or protein secretion

Author: Abajy
Alvarez-Martinez
Anthony
Arechaga
Arends
Arutyunov
Arutyunov
Bantwal
Baron
Bayer
Berger
Bertrand Néron
Bertsch
Bhatty
Bi
Bonheyo
Bonheyo
Brouwer
Cao
Caryl
Chandran
Chen
Christie
Christie
Davies
de la Cruz
de la Cruz
de Vries
Dunny
Eddy
Edgar
Eduardo P. C. Rocha
Fernando de la Cruz
Finn
Firth
Franco
Fronzes
Garcillan-Barcia
Garcillan-Barcia
Gascuel
Gillespie
Gillespie
Goessweiner-Mohr
Gogarten
Grohmann
Guglielmini
Guglielmini
Haase
Hamilton
Harris
Harris
Hofreuter
Judd
Juhas
Juhas
Julien Guglielmini
Kathir
Kerr
Kim
Klimke
Klimke
Komano
Laverde Gomez
Laverde Gomez
Lawley
Li
Maneewannakul
María Pilar Garcillán-Barcia
Miele
Moore
Moore
Morton
Mossey
Muro-Pastor
Nagai
Ochman
Parsons
Planet
Porter
Rivera-Calzada
Roberts
Rocco
Sakuma
Sampei
Schroder
Segal
Senghas
Serfiotis-Mitsa
Seth-Smith
Smillie
Soding
Sophie S. Abby
Souza
Steen
Tato
Teng
Tettelin
Thompson
Vincent
Zhang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

Conjugation of DNA through a type IV secretion system (T4SS) drives horizontal gene transfer. Yet little is known on the diversity of these nanomachines. We previously found that T4SS can be divided in eight classes based on the phylogeny of the only ubiquitous protein of T4SS (VirB4). Here, we use an ab initio approach to identify protein families systematically and specifically associated with VirB4 in each class. We built profiles for these proteins and used them to scan 2262 genomes for the presence of T4SS. Our analysis led to the identification of thousands of occurrences of 116 protein families for a total of 1623 T4SS. Importantly, we could identify almost always in our profiles the essential genes of well-studied T4SS. This allowed us to build a database with the largest number of T4SS described to date. Using profile-profile alignments, we reveal many new cases of homology between components of distant classes of T4SS. We mapped these similarities on the T4SS phylogenetic tree and thus obtained the patterns of acquisition and loss of these protein families in the history of T4SS. The identification of the key VirB4-associated proteins paves the way toward experimental analysis of poorly characterized T4SS classes

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Population genomics of the maize pathogen Ustilago maydis: demographic history and role of virulence clusters in adaptation

Author: Barroso G.
Dutheil J.
Haider M.
Kahmann R.
Munch K.
Rossel N.
Schweizer G.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2021
Field of study

The tight interaction between pathogens and their hosts results in reciprocal selective forces that impact the genetic diversity of the interacting species. The footprints of this selection differ between pathosystems because of distinct life-history traits, demographic histories, or genome architectures. Here, we studied the genome-wide patterns of genetic diversity of 22 isolates of the causative agent of the corn smut disease, Ustilago maydis, originating from five locations in Mexico, the presumed center of origin of this species. In this species, many genes encoding secreted effector proteins reside in so-called virulence clusters in the genome, an arrangement that is so far not found in other filamentous plant pathogens. Using a combination of population genomic statistical analyses, we assessed the geographical, historical, and genome-wide variation of genetic diversity in this fungal pathogen. We report evidence of two partially admixed subpopulations that are only loosely associated with geographic origin. Using the multiple sequentially Markov coalescent model, we inferred the demographic history of the two pathogen subpopulations over the last 0.5 Myr. We show that both populations experienced a recent strong bottleneck starting around 10,000years ago, coinciding with the assumed time of maize domestication. Although the genome average genetic diversity is low compared with other fungal pathogens, we estimated that the rate of nonsynonymous adaptive substitutions is three times higher in genes located within virulence clusters compared with nonclustered genes, including nonclustered effector genes. These results highlight the role that these singular genomic regions play in the evolution of this pathogen

MPG.PuRe

Computational Analysis of Large-Scale Trends and Dynamics in Eukaryotic Protein Family Evolution

Author: Ahrens Joseph Boehm
Publication venue: FIU Digital Commons
Publication date: 01/01/2019
Field of study

The myriad protein-coding genes found in present-day eukaryotes arose from a combination of speciation and gene duplication events, spanning more than one billion years of evolution. Notably, as these proteins evolved, the individual residues at each site in their amino acid sequences were replaced at markedly different rates. The relationship between protein structure, protein function, and site-specific rates of amino acid replacement is a topic of ongoing research. Additionally, there is much interest in the different evolutionary constraints imposed on sequences related by speciation (orthologs) versus sequences related by gene duplication (paralogs). A principal aim of this dissertation is to evaluate and characterize several broad trends in eukaryote protein evolution. To this end, I use sequence-based computational predictors of protein structure (intrinsic disorder and protein secondary structure) and protein function (predicted functional domains), in addition to Bayesian phylogenetic inference methods, to analyze thousands of homologous protein sequence clusters from four eukaryotic lineages: animals, plants, fungi and protists. Using these data, I performed large-scale factorial analyses, testing the correlation between protein structure/function and rates of sequence evolution. The combined results of these analyses somewhat corroborate the findings of previous research in the field, but they also illuminate a subtle interaction among multiple drivers of protein sequence evolution, which is consistently observed across multiple eukaryote groups. Furthermore, using the results of Bayesian phylogenetic analysis on real and simulated protein sequence alignments, I show that orthologous and paralogous proteins exhibit significantly different overall patterns of sequence divergence, indicating that paralogs tend to evolve under relaxed selective pressure. The acquisition of homologous biological sequence clusters is a prominent component of computational biological research. To assist in the identification of protein families within large sequence databases, I implement a simple, graph-based single-linkage clustering procedure, and I demonstrate its capacity to recover homologous subunits of the Rpt regulatory ring in the 26S proteasome complex

DigitalCommons@Florida International University