Search CORE

3,547 research outputs found

Discriminating Microbial Species Using Protein Sequence Properties and Machine Learning

Author: Breitling Rainer
Gilbert David
Shahib Ali Al-
Publication venue: University of Groningen, Groningen Biomolecular Sciences and Biotechnology Institute (GBB)
Publication date: 01/01/2007
Field of study

ARTS repository - University of Groningen

Discriminating Microbial Species Using Protein Sequence Properties and Machine Learning

Author: Breitling Rainer
Gilbert David
Shahib Ali Al-
Publication venue: University of Groningen, Groningen Biomolecular Sciences and Biotechnology Institute (GBB)
Publication date: 01/01/2007
Field of study

Proceedings - University of Groningen

Discriminating Microbial Species Using Protein Sequence Properties and Machine Learning

Author: Breitling Rainer
Gilbert David
Shahib Ali Al-
Publication venue: University of Groningen, Groningen Biomolecular Sciences and Biotechnology Institute (GBB)
Publication date: 01/01/2007
Field of study

Dissertations of the University of Groningen

Spaced seeds improve k-mer-based metagenomic classification

Author: Brinda Karel
Kucherov Gregory
Sykulski Maciej
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/07/2015
Field of study

Metagenomics is a powerful approach to study genetic content of environmental samples that has been strongly promoted by NGS technologies. To cope with massive data involved in modern metagenomic projects, recent tools [4, 39] rely on the analysis of k-mers shared between the read to be classified and sampled reference genomes. Within this general framework, we show in this work that spaced seeds provide a significant improvement of classification accuracy as opposed to traditional contiguous k-mers. We support this thesis through a series a different computational experiments, including simulations of large-scale metagenomic projects. Scripts and programs used in this study, as well as supplementary material, are available from http://github.com/gregorykucherov/spaced-seeds-for-metagenomics.Comment: 23 page

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification

Author: Babakir Mina Muhammed
Bertolazzi Paola
Cella Eleonora
Ciccozzi Massimo
Ciotti Marco
Felici Giovanni
Fiscon Giulia
Giovanetti Marta
Lo Presti Alessandra
Pierangeli Alessandra
Weitschek Emanuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods

PubMed Central

Archivio della ricerca- Università di Roma La Sapienza

FigShare

Interactions between species introduce spurious associations in microbiome studies

Author: Korolev Kirill S.
Menon Rajita
Ramanan Vivek
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Microbiota contribute to many dimensions of host phenotype, including disease. To link specific microbes to specific phenotypes, microbiome-wide association studies compare microbial abundances between two groups of samples. Abundance differences, however, reflect not only direct associations with the phenotype, but also indirect effects due to microbial interactions. We found that microbial interactions could easily generate a large number of spurious associations that provide no mechanistic insight. Using techniques from statistical physics, we developed a method to remove indirect associations and applied it to the largest dataset on pediatric inflammatory bowel disease. Our method corrected the inflation of p-values in standard association tests and showed that only a small subset of associations is directly linked to the disease. Direct associations had a much higher accuracy in separating cases from controls and pointed to immunomodulation, butyrate production, and the brain-gut axis as important factors in the inflammatory bowel disease.Comment: 4 main text figures, 15 supplementary figures (i.e appendix) and 6 supplementary tables. Overall 49 pages including reference

arXiv.org e-Print Archive

Directory of Open Access Journals

Translational Selection Is Ubiquitous in Prokaryotes

Author: A Carbone
A Carbone
A Carbone
A Wagner
AM Resch
AV Glyakina
B Lafay
C Kimchi-Sarfaty
C Nadeau
C Rispe
DC Hess
EP Rocha
EP Rocha
EP Rocha
EV Koonin
F Meier
F Supek
F Supek
Fran Supek
G Perriere
H Charles
H Grosjean
H Roy
H Suzuki
HS Najafabadi
J Mrazek
J Rozenski
JA Ranea
JC Marioni
JD Selengut
Jelena Repar
JG Lawrence
JH McDonald
JH McDonald
JL Bennetzen
JL Parmley
JO McInerney
JR Lobry
JT Herbeck
K Chen
K Mizuguchi
KA Dittmar
KB Zeldovich
Kristian Vlahoviček
L Breiman
L Dethlefsen
LC Seaver
M Ashburner
M dos Reis
M Neuhauser
M Oresic
MG Langille
MK Kruger
N Molina
N Stoletzki
Nancy A. Moran
Nives Škunca
P Lu
PF Agris
PF Agris
PM Sharp
PM Sharp
PP Chan
R Hershberg
RD Knight
RJ Grocock
RL Tatusov
RM Weiner
S D'Amico
S Kanaya
S Kanaya
S Karlin
S Karlin
S Karlin
S Karlin
SL Chen
T Banerjee
T Fawcett
Tomislav Šmuc
V Daubin
X Xia
Y Ishihama
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Codon usage bias in prokaryotic genomes is largely a consequence of background substitution patterns in DNA, but highly expressed genes may show a preference towards codons that enable more efficient and/or accurate translation. We introduce a novel approach based on supervised machine learning that detects effects of translational selection on genes, while controlling for local variation in nucleotide substitution patterns represented as sequence composition of intergenic DNA. A cornerstone of our method is a Random Forest classifier that outperformed previous distance measure-based approaches, such as the codon adaptation index, in the task of discerning the (highly expressed) ribosomal protein genes by their codon frequencies. Unlike previous reports, we show evidence that translational selection in prokaryotes is practically universal: in 460 of 461 examined microbial genomes, we find that a subset of genes shows a higher codon usage similarity to the ribosomal proteins than would be expected from the local sequence composition. These genes constitute a substantial part of the genome—between 5% and 33%, depending on genome size—while also exhibiting higher experimentally measured mRNA abundances and tending toward codons that match tRNA anticodons by canonical base pairing. Certain gene functional categories are generally enriched with, or depleted of codon-optimized genes, the trends of enrichment/depletion being conserved between Archaea and Bacteria. Prominent exceptions from these trends might indicate genes with alternative physiological roles; we speculate on specific examples related to detoxication of oxygen radicals and ammonia and to possible misannotations of asparaginyl–tRNA synthetases. Since the presence of codon optimizations on genes is a valid proxy for expression levels in fully sequenced genomes, we provide an example of an “adaptome” by highlighting gene functions with expression levels elevated specifically in thermophilic Bacteria and Archaea

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

An Integrated Metabolomic and Genomic Mining Workflow to Uncover the Biosynthetic Potential of Bacteria

Author: Andreas Klitgaard
Don D. Nguyen
Jane L. Nybo
Jette Melchiorsen
Laura M. Sanchez
Lone Gram
Maria Maansson
Mikael R. Andersen
Nadine Ziemert
Nikolaj G. Vynne
Peter J. Turnbaugh
Pieter C. Dorrestein
Publication venue: 'American Society for Microbiology'
Publication date: 01/01/2016
Field of study

Microorganisms are a rich source of bioactives; however, chemical identification is a major bottleneck. Strategies that can prioritize the most prolific microbial strains and novel compounds are of great interest. Here, we present an integrated approach to evaluate the biosynthetic richness in bacteria and mine the associated chemical diversity. Thirteen strains closely related to Pseudoalteromonas luteoviolacea isolated from all over the Earth were analyzed using an untargeted metabolomics strategy, and metabolomic profiles were correlated with whole-genome sequences of the strains. We found considerable diversity: only 2% of the chemical features and 7% of the biosynthetic genes were common to all strains, while 30% of all features and 24% of the genes were unique to single strains. The list of chemical features was reduced to 50 discriminating features using a genetic algorithm and support vector machines. Features were dereplicated by tandem mass spectrometry (MS/MS) networking to identify molecular families of the same biosynthetic origin, and the associated pathways were probed using comparative genomics. Most of the discriminating features were related to antibacterial compounds, including the thiomarinols that were reported from P. luteoviolacea here for the first time. By comparative genomics, we identified the biosynthetic cluster responsible for the production of the antibiotic indolmycin, which could not be predicted with standard methods. In conclusion, we present an efficient, integrative strategy for elucidating the chemical richness of a given set of bacteria and link the chemistry to biosynthetic genes. IMPORTANCE We here combine chemical analysis and genomics to probe for new bioactive secondary metabolites based on their pattern of distribution within bacterial species. We demonstrate the usefulness of this combined approach in a group of marine Gram-negative bacteria closely related to Pseudoalteromonas luteoviolacea, which is a species known to produce a broad spectrum of chemicals. The approach allowed us to identify new antibiotics and their associated biosynthetic pathways. Combining chemical analysis and genetics is an efficient “mining” workflow for identifying diverse pharmaceutical candidates in a broad range of microorganisms and therefore of great use in bioprospecting

Directory of Open Access Journals

PubMed Central

Online Research Database In Technology

MACHINE LEARNING AND BIOINFORMATIC INSIGHTS INTO KEY ENZYMES FOR A BIO-BASED CIRCULAR ECONOMY

Author: Gado Japheth E.
Publication venue: UKnowledge
Publication date: 01/01/2021
Field of study

The world is presently faced with a sustainability crisis; it is becoming increasingly difficult to meet the energy and material needs of a growing global population without depleting and polluting our planet. Greenhouse gases released from the continuous combustion of fossil fuels engender accelerated climate change, and plastic waste accumulates in the environment. There is need for a circular economy, where energy and materials are renewably derived from waste items, rather than by consuming limited resources. Deconstruction of the recalcitrant linkages in natural and synthetic polymers is crucial for a circular economy, as deconstructed monomers can be used to manufacture new products. In Nature, organisms utilize enzymes for the efficient depolymerization and conversion of macromolecules. Consequently, by employing enzymes industrially, biotechnology holds great promise for energy- and cost-efficient conversion of materials for a circular economy. However, there is need for enhanced molecular-level understanding of enzymes to enable economically viable technologies that can be applied on a global scale. This work is a computational study of key enzymes that catalyze important reactions that can be utilized for a bio-based circular economy. Specifically, bioinformatics and data- mining approaches were employed to study family 7 glycoside hydrolases (GH7s), which are the principal enzymes in Nature for deconstructing cellulose to simple sugars; a cytochrome P450 enzyme (GcoA) that catalyzes the demethylation of lignin subunits; and MHETase, a tannase-family enzyme utilized by the bacterium, Ideonella sakaiensis, in the degradation and assimilation of polyethylene terephthalate (PET). Since enzyme function is fundamentally dependent on the primary amino-acid sequence, we hypothesize that machine-learning algorithms can be trained on an ensemble of functionally related enzymes to reveal functional patterns in the enzyme family, and to map the primary sequence to enzyme function such that functional properties can be predicted for a new enzyme sequence with significant accuracy. We find that supervised machine learning identifies important residues for processivity and accurately predicts functional subtypes and domain architectures in GH7s. Bioinformatic analyses revealed conserved active-site residues in GcoA and informed protein engineering that enabled expanded enzyme specificity and improved activity. Similarly, bioinformatic studies and phylogenetic analysis provided evolutionary context and identified crucial residues for MHET-hydrolase activity in a tannase-family enzyme (MHETase). Lastly, we developed machine-learning models to predict enzyme thermostability, allowing for high-throughput screening of enzymes that can catalyze reactions at elevated temperatures. Altogether, this work provides a solid basis for a computational data-driven approach to understanding, identifying, and engineering enzymes for biotechnological applications towards a more sustainable world

University of Kentucky

Artificial intelligence-driven antimicrobial peptide discovery

Author: Szczurek Ewa
Szymczak Paulina
Publication venue
Publication date: 21/08/2023
Field of study

Antimicrobial peptides (AMPs) emerge as promising agents against antimicrobial resistance, providing an alternative to conventional antibiotics. Artificial intelligence (AI) revolutionized AMP discovery through both discrimination and generation approaches. The discriminators aid the identification of promising candidates by predicting key peptide properties such as activity and toxicity, while the generators learn the distribution over peptides and enable sampling novel AMP candidates, either de novo, or as analogues of a prototype peptide. Moreover, the controlled generation of AMPs with desired properties is achieved by discriminator-guided filtering, positive-only learning, latent space sampling, as well as conditional and optimized generation. Here we review recent achievements in AI-driven AMP discovery, highlighting the most exciting directions

arXiv.org e-Print Archive