5,511 research outputs found
Partial Least Squares: A Versatile Tool for the Analysis of High-Dimensional Genomic Data
Partial Least Squares (PLS) is a highly efficient statistical regression technique that is well suited for the analysis of high-dimensional genomic data. In this paper we review the theory and applications of PLS both under methodological and biological points of view. Focusing on microarray expression data we provide a systematic comparison of the PLS approaches currently employed, and discuss problems as different as tumor classification, identification of relevant genes, survival analysis and modeling of gene networks
Simcluster: clustering enumeration gene expression data on the simplex space
Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space.

Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster.

Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data
HTself2: Combining p-values to Improve Classification of Differential Gene Expression in HTself
HTself is a web-based bioinformatics tool designed to deal with the classification of differential gene expression for low replication microarray studies. It is based on a statistical test that uses self-self experiments to derive intensity-dependent cutoffs. The method was previously described in Vêncio et al, (DNA Res. 12: 211- e 214, 2005). In this work we consider an extension of HTself by calculating p-values instead of using a fixed credibility level α. As before, the statistic used to compute single spots p-values is obtained from the gaussian Kernel Density Estimator method applied to self-self data. Different spots corresponding to the same biological gene (replicas) give rise to a set of independent p-values which can be combined by well known statistical methods. The combined p-value can be used to decide whether a gene can be considered differentially expressed or not. HTself2 is a new version of HTself that uses the idea of p-values combination. It was implemented as a user-friendly desktop application to help laboratories without a bioinformatics infrastructure
Use of pre-transformation to cope with outlying values in important candidate genes
Outlying values in predictors often strongly affect the results of statistical analyses in high-dimensional settings. Although they frequently occur with most high-throughput techniques, the problem is often ignored in the literature. We suggest to use a very simple transformation, proposed before in a different context by Royston and Sauerbrei, as an intermediary step between array normalization and high-level statistical analysis. This straightforward univariate transformation identifies extreme values and reduces the influence of outlying values considerably in all further steps of statistical analysis without eliminating the incriminated observation or feature. The use of the transformation and its effects are demonstrated for diverse univariate and multivariate statistical analyses using nine publicly available microarray data sets
GliomaPredict: A Clinically Useful Tool for Assigning Glioma Patients to Specific Molecular Subtypes
Background: Advances in generating genome-wide gene expression data have accelerated the development of molecular-based tumor classification systems. Tools that allow the translation of such molecular classification schemas from research into clinical applications are still missing in the emerging era of personalized medicine.
Results: We developed GliomaPredict as a computational tool that allows the fast and reliable classification of glioma patients into one of six previously published stratified subtypes based on sets of extensively validated classifiers derived from hundreds of glioma transcriptomic profiles. Our tool utilizes a principle component analysis (PCA)-based approach to generate a visual representation of the analyses, quantifies the confidence of the underlying subtype assessment and presents results as a printable PDF file. GliomaPredict tool is implemented as a plugin application for the widely-used GenePattern framework.
Conclusions: GliomaPredict provides a user-friendly, clinically applicable novel platform for instantly assigning gene expression-based subtype in patients with gliomas thereby aiding in clinical trial design and therapeutic decisionmaking. Implemented as a user-friendly diagnostic tool, we expect that in time GliomaPredict, and tools like it, will become routinely used in translational/clinical research and in the clinical care of patients with gliomas
maigesPack: A Computational Environment for Microarray Data Analysis
Microarray technology is still an important way to assess gene expression in
molecular biology, mainly because it measures expression profiles for thousands
of genes simultaneously, what makes this technology a good option for some
studies focused on systems biology. One of its main problem is complexity of
experimental procedure, presenting several sources of variability, hindering
statistical modeling. So far, there is no standard protocol for generation and
evaluation of microarray data. To mitigate the analysis process this paper
presents an R package, named maigesPack, that helps with data organization.
Besides that, it makes data analysis process more robust, reliable and
reproducible. Also, maigesPack aggregates several data analysis procedures
reported in literature, for instance: cluster analysis, differential
expression, supervised classifiers, relevance networks and functional
classification of gene groups or gene networks
Web-based Tools for the Analysis of DNA Microarrays
End of project reportDNA microarrays are widely used for gene expression profiling. Raw data resulting from microarray experiments, however, tends to be very noisy and there are many sources of technical variation and bias. This raw data needs to be quality assessed and interactively preprocessed to minimise variation before statistical analysis in order to achieve meaningful result. Therefore microarray analysis requires a combination of visualisation and statistical tools, which vary depending on what microarray platform or experimental design is used.Bioconductor is an existing open source software project that attempts to facilitate
analysis of genomic data. It is a collection of packages for the statistical programming
language R. Bioconductor is particularly useful in analyzing microarray experiments. The
problem is that the R programming language’s command line interface is intimidating to
many users who do not have a strong background in computing. This often leads to a
situation where biologists will resort to using commercial software which often uses
antiquated and much less effective statistical techniques, as well as being expensively
priced. This project aims to bridge this gap by providing a user friendly web-based
interface to the cutting edge statistical techniques of Bioconductor
IsoGeneGUI: Multiple Approaches for Dose-Response Analysis of Microarray Data Using R
The analysis of transcriptomic experiments with ordered covariates, such as dose-response data, has become a central topic in bioinformatics, in particular in omics studies. Consequently, multiple R packages on CRAN and Bioconductor are designed to analyse microarray data from various perspectives under the assumption of order restriction. We introduce the new R package IsoGene Graphical User Interface (IsoGeneGUI), an extension of the original IsoGene package that includes methods from most of available R packages designed for the analysis of order restricted microarray data, namely orQA, ORIClust, goric and ORCME. The methods included in the new IsoGeneGUI range from inference and estimation to model selection and clustering tools. The IsoGeneGUI is not only the most complete tool for the analysis of order restricted microarray experiments available in R but also it can be used to analyse other types of dose-response data. The package provides all the methods in a user friendly fashion, so analyses can be implemented by users with limited knowledge of R programming
- …