19,837 research outputs found
MorphDB : prioritizing genes for specialized metabolism pathways and gene ontology categories in plants
Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest
GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data
Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.Department of Agriculture, Food and the MarineEuropean Commission - Seventh Framework Programme (FP7)Science Foundation IrelandUniversity College Dubli
A measure of centrality based on the spectrum of the Laplacian
We introduce a family of new centralities, the k-spectral centralities.
k-Spectral centrality is a measurement of importance with respect to the
deformation of the graph Laplacian associated with the graph. Due to this
connection, k-spectral centralities have various interpretations in terms of
spectrally determined information.
We explore this centrality in the context of several examples. While for
sparse unweighted networks 1-spectral centrality behaves similarly to other
standard centralities, for dense weighted networks they show different
properties. In summary, the k-spectral centralities provide a novel and useful
measurement of relevance (for single network elements as well as whole
subnetworks) distinct from other known measures.Comment: 12 pages, 6 figures, 2 table
Do functional traits improve prediction of predation rates for a disparate group of aphid predators?
Aphid predators are a systematically disparate group of arthropods united on the basis that they consume aphids as part of their diet. In Europe, this group includes Araneae, Opiliones, Heteroptera, chrysopids, Forficulina, syrphid larvae, carabids, staphylinids, cantharids and coccinellids. This functional group has no phylogenetic meaning but was created by ecologists as a way of understanding predation, particularly for conservation biological control. We investigated whether trait-based approaches could bring some cohesion and structure to this predator group. A taxonomic hierarchy-based null model was created from taxonomic distances in which a simple multiplicative relationship described the Linnaean hierarchies (species, genera, etc.) of fifty common aphid predators. Using the same fifty species, a functional groups model was developed using ten behavioural traits (e.g. polyphagy, dispersal, activity, etc.) to describe the way in which aphids were predated in the field. The interrelationships between species were then expressed as dissimilarities within each model and separately analysed using PROXSCAL, a multidimensional scaling (MDS) program. When ordinated using PROXSCAL and then statistically compared using Procrustes analysis, we found that only 17% of information was shared between the two configurations. Polyphagy across kingdoms (i.e. predatory behaviour across animal, plant and fungi kingdoms) and the ability to withstand starvation over days, weeks and months were particularly divisive within the functional groups model. Confirmatory MDS indicated poor prediction of aphid predation rates by the configurations derived from either model. The counterintuitive conclusion was that the inclusion of functional traits, pertinent to the way in which predators fed on aphids, did not lead to a large improvement in the prediction of predation rate when compared to the standard taxonomic approach
A systematic comparison of genome-scale clustering algorithms
Background: A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae. Methods: For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each clusters agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method. Results: Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods. Conclusions: Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted
How to understand the cell by breaking it: network analysis of gene perturbation screens
Modern high-throughput gene perturbation screens are key technologies at the
forefront of genetic research. Combined with rich phenotypic descriptors they
enable researchers to observe detailed cellular reactions to experimental
perturbations on a genome-wide scale. This review surveys the current
state-of-the-art in analyzing perturbation screens from a network point of
view. We describe approaches to make the step from the parts list to the wiring
diagram by using phenotypes for network inference and integrating them with
complementary data sources. The first part of the review describes methods to
analyze one- or low-dimensional phenotypes like viability or reporter activity;
the second part concentrates on high-dimensional phenotypes showing global
changes in cell morphology, transcriptome or proteome.Comment: Review based on ISMB 2009 tutorial; after two rounds of revisio
- …