19,837 research outputs found

    MorphDB : prioritizing genes for specialized metabolism pathways and gene ontology categories in plants

    Get PDF
    Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF

    GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data

    Get PDF
    Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.Department of Agriculture, Food and the MarineEuropean Commission - Seventh Framework Programme (FP7)Science Foundation IrelandUniversity College Dubli

    A measure of centrality based on the spectrum of the Laplacian

    Get PDF
    We introduce a family of new centralities, the k-spectral centralities. k-Spectral centrality is a measurement of importance with respect to the deformation of the graph Laplacian associated with the graph. Due to this connection, k-spectral centralities have various interpretations in terms of spectrally determined information. We explore this centrality in the context of several examples. While for sparse unweighted networks 1-spectral centrality behaves similarly to other standard centralities, for dense weighted networks they show different properties. In summary, the k-spectral centralities provide a novel and useful measurement of relevance (for single network elements as well as whole subnetworks) distinct from other known measures.Comment: 12 pages, 6 figures, 2 table

    Do functional traits improve prediction of predation rates for a disparate group of aphid predators?

    Get PDF
    Aphid predators are a systematically disparate group of arthropods united on the basis that they consume aphids as part of their diet. In Europe, this group includes Araneae, Opiliones, Heteroptera, chrysopids, Forficulina, syrphid larvae, carabids, staphylinids, cantharids and coccinellids. This functional group has no phylogenetic meaning but was created by ecologists as a way of understanding predation, particularly for conservation biological control. We investigated whether trait-based approaches could bring some cohesion and structure to this predator group. A taxonomic hierarchy-based null model was created from taxonomic distances in which a simple multiplicative relationship described the Linnaean hierarchies (species, genera, etc.) of fifty common aphid predators. Using the same fifty species, a functional groups model was developed using ten behavioural traits (e.g. polyphagy, dispersal, activity, etc.) to describe the way in which aphids were predated in the field. The interrelationships between species were then expressed as dissimilarities within each model and separately analysed using PROXSCAL, a multidimensional scaling (MDS) program. When ordinated using PROXSCAL and then statistically compared using Procrustes analysis, we found that only 17% of information was shared between the two configurations. Polyphagy across kingdoms (i.e. predatory behaviour across animal, plant and fungi kingdoms) and the ability to withstand starvation over days, weeks and months were particularly divisive within the functional groups model. Confirmatory MDS indicated poor prediction of aphid predation rates by the configurations derived from either model. The counterintuitive conclusion was that the inclusion of functional traits, pertinent to the way in which predators fed on aphids, did not lead to a large improvement in the prediction of predation rate when compared to the standard taxonomic approach

    A systematic comparison of genome-scale clustering algorithms

    Get PDF
    Background: A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae. Methods: For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each clusters agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method. Results: Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods. Conclusions: Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted

    How to understand the cell by breaking it: network analysis of gene perturbation screens

    Get PDF
    Modern high-throughput gene perturbation screens are key technologies at the forefront of genetic research. Combined with rich phenotypic descriptors they enable researchers to observe detailed cellular reactions to experimental perturbations on a genome-wide scale. This review surveys the current state-of-the-art in analyzing perturbation screens from a network point of view. We describe approaches to make the step from the parts list to the wiring diagram by using phenotypes for network inference and integrating them with complementary data sources. The first part of the review describes methods to analyze one- or low-dimensional phenotypes like viability or reporter activity; the second part concentrates on high-dimensional phenotypes showing global changes in cell morphology, transcriptome or proteome.Comment: Review based on ISMB 2009 tutorial; after two rounds of revisio
    corecore