Search CORE

1,736 research outputs found

Mining semantic networks of bioinformatics e-resources from the literature

Author: Afzal Hammad
Eales James
Nenadic Goran
Stevens Robert
Publication venue: RWTH Aachen University
Publication date: 01/01/2009
Field of study

The University of Manchester - Institutional Repository

From learning taxonomies to phylogenetic learning: a computational approach to FAME-based bacterial species identification

Author: Slabbinck Bram
Publication venue: Ghent University. Faculty of Bioscience Engineering
Publication date: 01/01/2009
Field of study

Ghent University Academic Bibliography

Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes

Author: Aguilar Pablo S.
Dessimoz Christophe
Kilchoer Laurent
Moi David
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

Phylogenetic profiling is a computational method to predict genes involved in the same biological process by identifying protein families which tend to be jointly lost or retained across the tree of life. Phylogenetic profiling has customarily been more widely used with prokaryotes than eukaryotes, because the method is thought to require many diverse genomes. There are now many eukaryotic genomes available, but these are considerably larger, and typical phylogenetic profiling methods require at least quadratic time as a function of the number of genes. We introduce a fast, scalable phylogenetic profiling approach entitled HogProf, which leverages hierarchical orthologous groups for the construction of large profiles and locality-sensitive hashing for efficient retrieval of similar profiles. We show that the approach outperforms Enhanced Phylogenetic Tree, a phylogeny-based method, and use the tool to reconstruct networks and query for interactors of the kinetochore complex as well as conserved proteins involved in sexual reproduction: Hap2, Spo11 and Gex1. HogProf enables large-scale phylogenetic profiling across the three domains of life, and will be useful to predict biological pathways among the hundreds of thousands of eukaryotic species that will become available in the coming few years. HogProf is available at https://github.com/DessimozLab/HogProf

Serveur académique lausannois

UCL Discovery

Comparison of eukaryotic phylogenetic profiling approaches using species tree aware methods

Author: Poch Olivier
Ruano-Rubio Valentín
Thompson Julie D
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Phylogenetic profiling encompasses an important set of methodologies for <it>in silico </it>high throughput inference of functional relationships between genes. The simplest profiles represent the distribution of gene presence-absence in a set of species as a sequence of 0's and 1's, and it is assumed that functionally related genes will have more similar profiles. The methodology has been successfully used in numerous studies of prokaryotic genomes, although its application in eukaryotes appears problematic, with reported low accuracy due to the complex genomic organization within this domain of life. Recently some groups have proposed an alternative approach based on the correlation of homologous gene group sizes, taking into account all potentially informative genetic events leading to a change in group size, regardless of whether they result in a <it>de novo </it>group gain or total gene group loss. Results We have compared the performance of classical presence-absence and group size based approaches using a large, diverse set of eukaryotic species. In contrast to most previous comparisons in Eukarya, we take into account the species phylogeny. We also compare the approaches using two different group categories, based on orthology and on domain-sharing. Our results confirm a limited overall performance of phylogenetic profiling in eukaryotes. Although group size based approaches initially showed an increase in performance for the domain-sharing based groups, this seems to be an overestimation due to a simplistic negative control dataset and the choice of null hypothesis rejection criteria. Conclusion Presence-absence profiling represents a more accurate classifier of related versus non-related profile pairs, when the profiles under consideration have enough information content. Group size based approaches provide a complementary means of detecting domain or family level co-evolution between groups that may be elusive to presence-absence profiling. Moreover positive correlation between co-evolution scores and functional links imply that these methods could be used to estimate functional distances between gene groups and to cluster them based on their functional relatedness. This study should have important implications for the future development and application of phylogenetic profiling methods, not only in eukaryotic, but also in prokaryotic datasets.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

HAL-Inserm

PubMed Central

Distributed Tree Kernels

Author: Dell'Arciprete Lorenzo
Zanzotto Fabio Massimo
Publication venue
Publication date: 01/01/2012
Field of study

In this paper, we propose the distributed tree kernels (DTK) as a novel method to reduce time and space complexity of tree kernels. Using a linear complexity algorithm to compute vectors for trees, we embed feature spaces of tree fragments in low-dimensional spaces where the kernel computation is directly done with dot product. We show that DTKs are faster, correlate with tree kernels, and obtain a statistically similar performance in two natural language processing tasks.Comment: ICML201

arXiv.org e-Print Archive

ART

In silico prioritisation of candidate genes for prokaryotic gene function discovery: an application of phylogenetic profiles

Author: C Médigue
C Perez-Iratxeta
CM Fraser
DM Raskin
EA Adie
EA Adie
EC Lin
EM Marcotte
Enrico Coiera
Frank PY Lin
FS Turner
G Michal
IH Witten
J Freudenberg
J Wu
JP Gogarten
JP Vert
KJ Gaulton
M Kanehisa
M Pellegrini
MY Galperin
N López-Bigas
N Tiffin
PD Karp
R Jothi
Ruiting Lan
S Aerts
Vitali Sintchenko
WJ Kent
Y Yamanishi
Y Zheng
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: In silico candidate gene prioritisation (CGP) aids the discovery of gene functions by ranking genes according to an objective relevance score. While several CGP methods have been described for identifying human disease genes, corresponding methods for prokaryotic gene function discovery are lacking. Here we present two prokaryotic CGP methods, based on phylogenetic profiles, to assist with this task. Results: Using gene occurrence patterns in sample genomes, we developed two CGP methods (statistical and inductive CGP) to assist with the discovery of bacterial gene functions. Statistical CGP exploits the differences in gene frequency against phenotypic groups, while inductive CGP applies supervised machine learning to identify gene occurrence pattern across genomes. Three rediscovery experiments were designed to evaluate the CGP frameworks. The first experiment attempted to rediscover peptidoglycan genes with 417 published genome sequences. Both CGP methods achieved best areas under receiver operating characteristic curve (AUC) of 0.911 in Escherichia coli K-12 (EC-K12) and 0.978 Streptococcus agalactiae 2603 (SA-2603) genomes, with an average improvement in precision of >3.2-fold and a maximum of >27-fold using statistical CGP. A median AUC of >0.95 could still be achieved with as few as 10 genome examples in each group of genome examples in the rediscovery of the peptidoglycan metabolism genes. In the second experiment, a maximum of 109-fold improvement in precision was achieved in the rediscovery of anaerobic fermentation genes in EC-K12. The last experiment attempted to rediscover genes from 31 metabolic pathways in SA-2603, where 14 pathways achieved AUC >0.9 and 28 pathways achieved AUC >0.8 with the best inductive CGP algorithms. Conclusion: Our results demonstrate that the two CGP methods can assist with the study of functionally uncategorised genomic regions and discovery of bacterial gene-function relationships. Our rediscovery experiments also provide a set of standard tasks against which future methods may be compared.12 page(s

Crossref

PubMed Central

UNSWorks

Macquarie University ResearchOnline

Visual and computational analysis of structure-activity relationships in high-throughput screening data

Author: Agrafiotis
Agrafiotis
Ahlberg
Ajay
Ajay
Bayada
Bemis
Bernard
Bonabeau
Brown
Calvert
Card
Chen
Chen
Cho
Christianini
Clark
Clark
Cox
Duda
Edwards
Engels
Frimurer
Gao
Garrido
Ghose
Gillet
Gillet
Hand
Hann
Haupts
Hayward
Hertzberg
Izrailev
Jiang
Jones-Hertzog
Kirew
Kobayashi
Kohonen
Ladd
Lee
Lepre
Martin
Mason
Mello
Meyer
Miller
Mitchell
Oprea
Peter Gedeck
Peter Willett
Poroikov
Rhodes
Roberts
Roberts
Ros
Rusinko
Sadowski
Sadowski
Scherf
Sheridan
Shi
Stanton
Su
Teague
Thompson
Tropsha
Tufte
Tufte
Wagener
Walters
Wang
Wedin
Xie
Xu
Zupan
Publication venue: 'Elsevier BV'
Publication date: 01/08/2001
Field of study

Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets

Crossref

White Rose Research Online

A network-based comparative framework to study conservation and divergence of proteomes in plant phylogenies

Author: Ané Jean-Michel
Chakraborty Sanhita
Coon Joshua
Jayaraman Dhileepkumar
Maeda Junko
Marx Harald
Richards Alicia
Roy Sushmita
Shin Junha
Sussman Michael
Vandepoele Klaas
Vaneechoutte Dries
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2021
Field of study

Comparative functional genomics offers a powerful approach to study species evolution. To date, the majority of these studies have focused on the transcriptome in mammalian and yeast phylogenies. Here, we present a novel multi-species proteomic dataset and a computational pipeline to systematically compare the protein levels across multiple plant species. Globally we find that protein levels diverge according to phylogenetic distance but is more constrained than the mRNA level. Module-level comparative analysis of groups of proteins shows that proteins that are more highly expressed tend to be more conserved. To interpret the evolutionary patterns of conservation and divergence, we develop a novel network-based integrative analysis pipeline that combines publicly available transcriptomic datasets to define co-expression modules. Our analysis pipeline can be used to relate the changes in protein levels to different species-specific phenotypic traits. We present a case study with the rhizobia-legume symbiosis process that supports the role of autophagy in this symbiotic association

Ghent University Academic Bibliography

Microbial abundance analysis and phylogenetic adoption in functional metagenomics

Author: Fiona Browne
Wang Haiying / HY
Wassan Jyotsna Talreja
Zheng Huiru
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/10/2017
Field of study

Ulster University's Research Portal