36 research outputs found
Supervised normalization of microarrays
Motivation: A major challenge in utilizing microarray technologies to measure nucleic acid abundances is ‘normalization’, the goal of which is to separate biologically meaningful signal from other confounding sources of signal, often due to unavoidable technical factors. It is intuitively clear that true biological signal and confounding factors need to be simultaneously considered when performing normalization. However, the most popular normalization approaches do not utilize what is known about the study, both in terms of the biological variables of interest and the known technical factors in the study, such as batch or array processing date
Recommended from our members
Redefinition of Affymetrix probe sets by sequence overlap with cDNA microarray probes reduces cross-platform inconsistencies in cancer-associated gene expression measurements
BACKGROUND: Comparison of data produced on different microarray platforms often shows surprising discordance. It is not clear whether this discrepancy is caused by noisy data or by improper probe matching between platforms. We investigated whether the significant level of inconsistency between results produced by alternative gene expression microarray platforms could be reduced by stringent sequence matching of microarray probes. We mapped the short oligo probes of the Affymetrix platform onto cDNA clones of the Stanford microarray platform. Affymetrix probes were reassigned to redefined probe sets if they mapped to the same cDNA clone sequence, regardless of the original manufacturer-defined grouping. The NCI-60 gene expression profiles produced by Affymetrix HuFL platform were recalculated using these redefined probe sets and compared to previously published cDNA measurements of the same panel of RNA samples. RESULTS: The redefined probe sets displayed a substantially higher level of cross-platform consistency at the level of gene correlation, cell line correlation and unsupervised hierarchical clustering. The same strategy allowed an almost complete correspondence of breast cancer subtype classification between Affymetrix gene chip and cDNA microarray derived gene expression data, and gave an increased level of similarity between normal lung derived gene expression profiles using the two technologies. In total, two Affymetrix gene-chip platforms were remapped to three cDNA platforms in the various cross-platform analyses, resulting in improved concordance in each case. CONCLUSION: We have shown that probes which target overlapping transcript sequence regions on cDNA microarrays and Affymetrix gene-chips exhibit a greater level of concordance than the corresponding Unigene or sequence matched features. This method will be useful for the integrated analysis of gene expression data generated by multiple disparate measurement platforms
Genetic background influences murine prostate gene expression: implications for cancer phenotypes
Microarray analyses to quantitate transcript levels in the prostates of five inbred mouse strains identified differences in gene expression in benign epithelium that correlated with the differentiation state of adjacent tumors
Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling
Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models
Expression Profiles of the Mouse Lung Identify a Molecular Signature of Time-to-Birth
A greater understanding of the regulatory processes contributing to lung development could help ameliorate morbidity and mortality in premature infants and identify individuals at risk for congenital and/or chronic lung diseases. Genomics technologies have provided rich gene expression datasets for the developing lung that enable systems biology approaches for identifying large-scale molecular signatures within this complex phenomenon. Here, we applied unsupervised principal component analysis on two developing lung datasets and identified common dominant transcriptomic signatures. Of particular interest, we identify an overlying biological program we term “time-to-birth,” which describes the distance in age from the day of birth. We identify groups of genes contributing to the time-to-birth molecular signature. Statistically overrepresented are genes involved in oxygen and gas transport activity, as expected for a transition to air breathing, as well as host defense function. In addition, we identify genes with expression patterns associated with the initiation of alveolar formation. Finally, we present validation of gene expression patterns across the two datasets, and independent validation of select genes by qPCR and immunohistochemistry. These data contribute to our understanding of genetic components contributing to large-scale biological processes and may be useful, particularly in animal models of abnormal lung development, to predict the state of organ development or preparation for birth
Individual Matrix Metalloproteinases Control Distinct Transcriptional Responses in Airway Epithelial Cells Infected with Pseudomonas aeruginosa▿ †
Airway epithelium is the initial point of host-pathogen interaction in Pseudomonas aeruginosa infection, an important pathogen in cystic fibrosis and nosocomial pneumonia. We used global gene expression analysis to determine airway epithelial transcriptional responses dependent on matrilysin (matrix metalloproteinase 7 [MMP-7]) and stromelysin-2 (MMP-10), two MMPs induced by acute P. aeruginosa pulmonary infection. Extraction of differential gene expression (EDGE) analysis of gene expression changes in P. aeruginosa-infected organotypic tracheal epithelial cell cultures from wild-type, Mmp7−/−, and Mmp10−/− mice identified 2,091 matrilysin-dependent and 1,628 stromelysin-2-dependent genes that were differentially expressed. Key node network analysis showed that these MMPs controlled distinct gene expression programs involved in proliferation, cell death, immune responses, and signal transduction, among other host defense processes. Our results demonstrate discrete roles for these MMPs in regulating epithelial responses to Pseudomonas infection and show that a global genomics strategy can be used to assess MMP function
Analysis of strain-dependent differences in prostate gene expression by qRT-PCR
RNAs from preparations used in the microarray analysis or microdissected epithelium were reverse transcribed and amplified using qRT-PCR with primers specific for (), (), () and (). Ribosomal protein S16 expression levels were used to normalize qRT-PCR data. Normalized results are expressed relative to the lowest expressing value. Error bars indicate the standard deviation of four biological independent replicates. qRT-PCR for microdissected epithelium is represented by one sample per strain for each gene. White bars denote measurements from the microarray analysis. Black bars denote measurements generated by qRT-PCR from whole prostate. Diagonal lines denote measurements generated by qRT-PCR from microdissected prostate epithelium.<p><b>Copyright information:</b></p><p>Taken from "Genetic background influences murine prostate gene expression: implications for cancer phenotypes"</p><p>http://genomebiology.com/2007/8/6/R117</p><p>Genome Biology 2007;8(6):R117-R117.</p><p>Published online 18 Jun 2007</p><p>PMCID:PMC2394769.</p><p></p