44,287 research outputs found
Diverse correlation structures in gene expression data and their utility in improving statistical inference
It is well known that correlations in microarray data represent a serious
nuisance deteriorating the performance of gene selection procedures. This paper
is intended to demonstrate that the correlation structure of microarray data
provides a rich source of useful information. We discuss distinct correlation
substructures revealed in microarray gene expression data by an appropriate
ordering of genes. These substructures include stochastic proportionality of
expression signals in a large percentage of all gene pairs, negative
correlations hidden in ordered gene triples, and a long sequence of weakly
dependent random variables associated with ordered pairs of genes. The reported
striking regularities are of general biological interest and they also have
far-reaching implications for theory and practice of statistical methods of
microarray data analysis. We illustrate the latter point with a method for
testing differential expression of nonoverlapping gene pairs. While designed
for testing a different null hypothesis, this method provides an order of
magnitude more accurate control of type 1 error rate compared to conventional
methods of individual gene expression profiling. In addition, this method is
robust to the technical noise. Quantitative inference of the correlation
structure has the potential to extend the analysis of microarray data far
beyond currently practiced methods.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS120 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bioinformatics tools in predictive ecology: Applications to fisheries
This article is made available throught the Brunel Open Access Publishing Fund - Copygith @ 2012 Tucker et al.There has been a huge effort in the advancement of analytical techniques for molecular biological data over the past decade. This has led to many novel algorithms that are specialized to deal with data associated with biological phenomena, such as gene expression and protein interactions. In contrast, ecological data analysis has remained focused to some degree on off-the-shelf statistical techniques though this is starting to change with the adoption of state-of-the-art methods, where few assumptions can be made about the data and a more explorative approach is required, for example, through the use of Bayesian networks. In this paper, some novel bioinformatics tools for microarray data are discussed along with their ‘crossover potential’ with an application to fisheries data. In particular, a focus is made on the development of models that identify functionally equivalent species in different fish communities with the aim of predicting functional collapse
Circular RNAs in Clear Cell Renal Cell Carcinoma: Their Microarray-Based Identification, Analytical Validation, and Potential Use in a Clinico-Genomic Model to Improve Prognostic Accuracy
Circular RNAs (circRNAs) may act as novel cancer biomarkers. However, a genome-wide evaluation of circRNAs in clear cell renal cell carcinoma (ccRCC) has yet to be conducted. Therefore, the objective of this study was to identify and validate circRNAs in ccRCC tissue with a focus to evaluate their potential as prognostic biomarkers. A genome-wide identification of circRNAs in total RNA extracted from ccRCC tissue samples was performed using microarray analysis. Three relevant differentially expressed circRNAs were selected (circEGLN3, circNOX4, and circRHOBTB3), their circular nature was experimentally confirmed, and their expression-along with that of their linear counterparts-was measured in 99 malignant and 85 adjacent normal tissue samples using specifically established RT-qPCR assays. The capacity of circRNAs to discriminate between malignant and adjacent normal tissue samples and their prognostic potential (with the endpoints cancer-specific, recurrence-free, and overall survival) after surgery were estimated by C-statistics, Kaplan-Meier method, univariate and multivariate Cox regression analysis, decision curve analysis, and Akaike and Bayesian information criteria. CircEGLN3 discriminated malignant from normal tissue with 97% accuracy. We generated a prognostic for the three endpoints by multivariate Cox regression analysis that included circEGLN3, circRHOBT3 and linRHOBTB3. The predictive outcome accuracy of the clinical models based on clinicopathological factors was improved in combination with this circRNA-based signature. Bootstrapping as well as Akaike and Bayesian information criteria confirmed the statistical significance and robustness of the combined models. Limitations of this study include its retrospective nature and the lack of external validation. The study demonstrated the promising potential of circRNAs as diagnostic and particularly prognostic biomarkers in ccRCC patients
Stability and aggregation of ranked gene lists
Ranked gene lists are highly instable in the sense that similar measures of differential gene expression may yield very different rankings, and that a small change of the data set usually affects the obtained gene list considerably. Stability issues have long been under-considered in the literature, but they have grown to a hot topic in the last few years, perhaps as a consequence of the increasing skepticism on the reproducibility and clinical applicability of molecular research findings. In this article, we review existing approaches for the assessment of stability of ranked gene lists and the related problem of aggregation, give some practical recommendations, and warn against potential misuse of these methods. This overview is illustrated through an application to a recent leukemia data set using the freely available Bioconductor package GeneSelector
Physico-chemical foundations underpinning microarray and next-generation sequencing experiments
Hybridization of nucleic acids on solid surfaces is a key process involved in high-throughput technologies such as microarrays and, in some cases, next-generation sequencing (NGS). A physical understanding of the hybridization process helps to determine the accuracy of these technologies. The goal of a widespread research program is to develop reliable transformations between the raw signals reported by the technologies and individual molecular concentrations from an ensemble of nucleic acids. This research has inputs from many areas, from bioinformatics and biostatistics, to theoretical and experimental biochemistry and biophysics, to computer simulations. A group of leading researchers met in Ploen Germany in 2011 to discuss present knowledge and limitations of our physico-chemical understanding of high-throughput nucleic acid technologies. This meeting inspired us to write this summary, which provides an overview of the state-of-the-art approaches based on physico-chemical foundation to modeling of the nucleic acids hybridization process on solid surfaces. In addition, practical application of current knowledge is emphasized
Computational Models for Transplant Biomarker Discovery.
Translational medicine offers a rich promise for improved diagnostics and drug discovery for biomedical research in the field of transplantation, where continued unmet diagnostic and therapeutic needs persist. Current advent of genomics and proteomics profiling called "omics" provides new resources to develop novel biomarkers for clinical routine. Establishing such a marker system heavily depends on appropriate applications of computational algorithms and software, which are basically based on mathematical theories and models. Understanding these theories would help to apply appropriate algorithms to ensure biomarker systems successful. Here, we review the key advances in theories and mathematical models relevant to transplant biomarker developments. Advantages and limitations inherent inside these models are discussed. The principles of key -computational approaches for selecting efficiently the best subset of biomarkers from high--dimensional omics data are highlighted. Prediction models are also introduced, and the integration of multi-microarray data is also discussed. Appreciating these key advances would help to accelerate the development of clinically reliable biomarker systems
- …