36 research outputs found

    Robust multi-group gene set analysis with few replicates

    Get PDF
    Background: Competitive gene set analysis is a standard exploratory tool for gene expression data. Permutation-based competitive gene set analysis methods are preferable to parametric ones because the latter make strong statistical assumptions which are not always met. For permutation-based methods, we permute samples, as opposed to genes, as doing so preserves the inter-gene correlation structure. Unfortunately, up until now, sample permutation-based methods have required a minimum of six replicates per sample group. Results: We propose a new permutation-based competitive gene set analysis method for multi-group gene expression data with as few as three replicates per group. The method is based on advanced sample permutation technique that utilizes all groups within a data set for pairwise comparisons. We present a comprehensive evaluation of different permutation techniques, using multiple data sets and contrast the performance of our method, mGSZm, with other state of the art methods. We show that mGSZm is robust, and that, despite only using less than six replicates, we are able to consistently identify a high proportion of the top ranked gene sets from the analysis of a substantially larger data set. Further, we highlight other methods where performance is highly variable and appears dependent on the underlying data set being analyzed. Conclusions: Our results demonstrate that robust gene set analysis of multi-group gene expression data is permissible with as few as three replicates. In doing so, we have extended the applicability of such approaches to resource constrained experiments where additional data generation is prohibitively difficult or expensive. An R package implementing the proposed method and supplementary materials are available from the website http:// ekhidna.biocenter.helsinki.fi/downloads/pashupati/mGSZm.html.Peer reviewe

    Epigenome-450K-wide methylation signatures of active cigarette smoking : The Young Finns Study

    Get PDF
    Smoking as a major risk factor for morbidity affects numerous regulatory systems of the human body including DNA methylation. Most of the previous studies with genome-wide methylation data are based on conventional association analysis and earliest threshold-based gene set analysis that lacks sensitivity to be able to reveal all the relevant effects of smoking. The aim of the present study was to investigate the impact of active smoking on DNA methylation at three biological levels: 5'-C-phosphate-G-3' (CpG) sites, genes and functionally related genes (gene sets). Gene set analysis was done with mGSZ, a modern threshold-free method previously developed by us that utilizes all the genes in the experiment and their differential methylation scores. Application of such method in DNA methylation study is novel. Epigenome-wide methylation levels were profiled from Young Finns Study (YFS) participants' whole blood from 2011 follow-up using Illumina Infinium Hu-manMethylation450 BeadChips. We identified three novel smoking related CpG sites and replicated 57 of the previously identified ones. We found that smoking is associated with hypomethylation in shore (genomic regions 0-2 kilobases from CpG island). We identified smoking related methylation changes in 13 gene sets with false discovery rate (FDR)Peer reviewe

    An update on the strategies in multicomponent activity monitoring within the phytopharmaceutical field

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To-date modern drug research has focused on the discovery and synthesis of single active substances. However, multicomponent preparations are gaining increasing importance in the phytopharmaceutical field by demonstrating beneficial properties with respect to efficacy and toxicity.</p> <p>Discussion</p> <p>In contrast to single drug combinations, a botanical multicomponent therapeutic possesses a complex repertoire of chemicals that belong to a variety of substance classes. This may explain the frequently observed pleiotropic bioactivity spectra of these compounds, which may also suggest that they possess novel therapeutic opportunities. Interestingly, considerable bioactivity properties are exhibited not only by remedies that contain high doses of phytochemicals with prominent pharmaceutical efficacy, but also preparations that lack a sole active principle component. Despite that each individual substance within these multicomponents has a low molar fraction, the therapeutic activity of these substances is established via a potentialization of their effects through combined and simultaneous attacks on multiple molecular targets. Although beneficial properties may emerge from such a broad range of perturbations on cellular machinery, validation and/or prediction of their activity profiles is accompanied with a variety of difficulties in generic risk-benefit assessments. Thus, it is recommended that a comprehensive strategy is implemented to cover the entirety of multicomponent-multitarget effects, so as to address the limitations of conventional approaches.</p> <p>Summary</p> <p>An integration of standard toxicological methods with selected pathway-focused bioassays and unbiased data acquisition strategies (such as gene expression analysis) would be advantageous in building an interaction network model to consider all of the effects, whether they were intended or adverse reactions.</p

    Genomic characterization of the most barotolerant Listeria monocytogenes RO15 strain compared to reference strains used to evaluate food high pressure processing

    Get PDF
    BackgroundHigh pressure processing (HPP; i.e. 100-600MPa pressure depending on product) is a non-thermal preservation technique adopted by the food industry to decrease significantly foodborne pathogens, including Listeria monocytogenes, from food. However, susceptibility towards pressure differs among diverse strains of L. monocytogenes and it is unclear if this is due to their intrinsic characteristics related to genomic content. Here, we tested the barotolerance of 10 different L. monocytogenes strains, from food and food processing environments and widely used reference strains including clinical isolate, to pressure treatments with 400 and 600MPa. Genome sequencing and genome comparison of the tested L. monocytogenes strains were performed to investigate the relation between genomic profile and pressure tolerance.ResultsNone of the tested strains were tolerant to 600MPa. A reduction of more than 5 log(10) was observed for all strains after 1min 600MPa pressure treatment. L. monocytogenes strain RO15 showed no significant reduction in viable cell counts after 400MPa for 1min and was therefore defined as barotolerant. Genome analysis of so far unsequenced L. monocytogenes strain RO15, 2HF33, MB5, AB199, AB120, C7, and RO4 allowed us to compare the gene content of all strains tested. This revealed that the three most pressure tolerant strains had more than one CRISPR system with self-targeting spacers. Furthermore, several anti-CRISPR genes were detected in these strains. Pan-genome analysis showed that 10 prophage genes were significantly associated with the three most barotolerant strains.ConclusionsL. monocytogenes strain RO15 was the most pressure tolerant among the selected strains. Genome comparison suggests that there might be a relationship between prophages and pressure tolerance in L. monocytogenes.Peer reviewe

    An Expanded Evaluation of Protein Function Prediction Methods Shows an Improvement In Accuracy

    Get PDF
    Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent

    An expanded evaluation of protein function prediction methods shows an improvement in accuracy

    Get PDF
    Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent. Keywords: Protein function prediction, Disease gene prioritizationpublishedVersio

    SOM-Based Exploratory Analysis of Gene Expression Data

    No full text
    . Applications of new SOM-based exploratory data analysis methods to bioinformatics are described. Cluster structures are revealed in data describing the expression of a set of yeast genes in several experimental treatments. The structures are visualized in an intuitive manner with colors: The similarity of hue corresponds to the similarity of the multivariate data. The clusters can be interpreted by visualizing changes of the data variables (expression in dierent treatments) at the cluster borders. The relationship between the organization of the SOM and the functional classes of the proteins encoded by the genes may additionally reveal interesting relationships between the functional classes, and substructures within them
    corecore