117 research outputs found
Reproducible probe-level analysis of the Affymetrix Exon 1.0 ST array with R/Bioconductor
The presence of different transcripts of a gene across samples can be
analysed by whole-transcriptome microarrays. Reproducing results from published
microarray data represents a challenge due to the vast amounts of data and the
large variety of pre-processing and filtering steps employed before the actual
analysis is carried out. To guarantee a firm basis for methodological
development where results with new methods are compared with previous results
it is crucial to ensure that all analyses are completely reproducible for other
researchers. We here give a detailed workflow on how to perform reproducible
analysis of the GeneChip Human Exon 1.0 ST Array at probe and probeset level
solely in R/Bioconductor, choosing packages based on their simplicity of use.
To exemplify the use of the proposed workflow we analyse differential splicing
and differential gene expression in a publicly available dataset using various
statistical methods. We believe this study will provide other researchers with
an easy way of accessing gene expression data at different annotation levels
and with the sufficient details needed for developing their own tools for
reproducible analysis of the GeneChip Human Exon 1.0 ST Array
Early life predictors of intelligence in young adulthood and middle age
BACKGROUND:Studies on early predictors of intelligence often focus on single or few predictors and often on childhood intelligence. This study compared the contributions of a broad selection of potential early predictors of intelligence at different adult ages. METHODS:Information on predictors was recorded prospectively in the Copenhagen Perinatal Cohort during pregnancy, at delivery, and at 1- and 3-year examinations for children born between 1959-61. Adult intelligence was assessed at three independent follow-ups using three different tests of intelligence: Børge Priens Prøve, Wechsler Adult Intelligence Scale, and Intelligenz-Struktur-Test 2000R. From a total of 4697 cohort members, three non-overlapping samples were derived. RESULTS:The included predictors explained between 22.2-24.3% of the variance in adult IQ, with parental socioeconomic status and sex explaining 16.2-17.0%. Other consistent predictors were head circumference at birth, increase in head circumference head during the first three years, and 3-year milestones. Head circumference was the most important anthropometric measure compared to measures of weight and length. CONCLUSION:Besides social status and sex, the strongest and most consistent early predictors of adult intelligence were physical or behavioural characteristics that to some extent reflect brain-and cognitive development
Unaccounted uncertainty from qPCR efficiency estimates entails uncontrolled false positive rates
BACKGROUND: Accurate adjustment for the amplification efficiency (AE) is an important part of real-time quantitative polymerase chain reaction (qPCR) experiments. The most commonly used correction strategy is to estimate the AE by dilution experiments and use this as a plug-in when efficiency correcting the ΔΔC(q). Currently, it is recommended to determine the AE with high precision as this plug-in approach does not account for the AE uncertainty, implicitly assuming an infinitely precise AE estimate. Determining the AE with such precision, however, requires tedious laboratory work and vast amounts of biological material. Violation of the assumption leads to overly optimistic standard errors of the ΔΔC(q), confidence intervals, and p-values which ultimately increase the type I error rate beyond the expected significance level. As qPCR is often used for validation it should be a high priority to account for the uncertainty of the AE estimate and thereby properly bounding the type I error rate and achieve the desired significance level. RESULTS: We suggest and benchmark different methods to obtain the standard error of the efficiency adjusted ΔΔC(q) using the statistical delta method, Monte Carlo integration, or bootstrapping. Our suggested methods are founded in a linear mixed effects model (LMM) framework, but the problem and ideas apply in all qPCR experiments. The methods and impact of the AE uncertainty are illustrated in three qPCR applications and a simulation study. In addition, we validate findings suggesting that MGST1 is differentially expressed between high and low abundance culture initiating cells in multiple myeloma and that microRNA-127 is differentially expressed between testicular and nodal lymphomas. CONCLUSIONS: We conclude, that the commonly used efficiency corrected quantities disregard the uncertainty of the AE, which can drastically impact the standard error and lead to increased false positive rates. Our suggestions show that it is possible to easily perform statistical inference of ΔΔC(q), whilst properly accounting for the AE uncertainty and better controlling the false positive rate
Quantification of reproducibility of microarray experiments by semi-parametric mixture models applied to the detection of differentially expressed genes in B-cell subpopulations
High CXCR4 expression impairs rituximab response and the prognosis of R-CHOP-treated diffuse large B-cell lymphoma patients
Unaccounted uncertainty from qPCR efficiency estimates entails uncontrolled false positive rates
Table 3: The GO terms, GO molecular function, GO biological process, GO cellular component of the 10 significant colorectal cancer genes.
Background Colorectal cancer (CRC) is one of the leading cancers worldwide. Several studies have performed microarray data analyses for cancer classification and prognostic analyses. Microarray assays also enable the identification of gene signatures for molecular characterization and treatment prediction. Objective Microarray gene expression data from the online Gene Expression Omnibus (GEO) database were used to to distinguish colorectal cancer from normal colon tissue samples. Methods We collected microarray data from the GEO database to establish colorectal cancer microarray gene expression datasets for a combined analysis. Using the Prediction Analysis for Microarrays (PAM) method and the GSEA MSigDB resource, we analyzed the 14,698 genes that were identified through an examination of their expression values between normal and tumor tissues. Results Ten genes (ABCG2, AQP8, SPIB, CA7, CLDN8, SCNN1B, SLC30A10, CD177, PADI2, and TGFBI) were found to be good indicators of the candidate genes that correlate with CRC. From these selected genes, an average of six significant genes were obtained using the PAM method, with an accuracy rate of 95%. The results demonstrate the potential of utilizing a model with the PAM method for data mining. After a detailed review of the published reports, the results confirmed that the screened candidate genes are good indicators for cancer risk analysis using the PAM method. Conclusions Six genes were selected with 95% accuracy to effectively classify normal and colorectal cancer tissues. We hope that these results will provide the basis for new research projects in clinical practice that aim to rapidly assess colorectal cancer risk using microarray gene expression analysis
Machine learning and data mining frameworks for predicting drug response in cancer:An overview and a novel <i>in silico</i> screening process based on association rule mining
A major challenge in cancer treatment is predicting the clinical response to anti-cancer drugs on a personalized basis. The success of such a task largely depends on the ability to develop computational resources that integrate big "omic" data into effective drug-response models. Machine learning is both an expanding and an evolving computational field that holds promise to cover such needs. Here we provide a focused overview of: 1) the various supervised and unsupervised algorithms used specifically in drug response prediction applications, 2) the strategies employed to develop these algorithms into applicable models, 3) data resources that are fed into these frameworks and 4) pitfalls and challenges to maximize model performance. In this context we also describe a novel in silico screening process, based on Association Rule Mining, for identifying genes as candidate drivers of drug response and compare it with relevant data mining frameworks, for which we generated a web application freely available at: https://compbio.nyumc.org/drugs/. This pipeline explores with high efficiency large sample-spaces, while is able to detect low frequency events and evaluate statistical significance even in the multidimensional space, presenting the results in the form of easily interpretable rules. We conclude with future prospects and challenges of applying machine learning based drug response prediction in precision medicine.</p
“Did you ever drink more?” A detailed description of pregnant women’s drinking patterns
Statistical Methods for Tracing the Molecular Origin of Treatment Resistance in Diffuse Large B-Cell Lymphoma
- …
