134 research outputs found

    Microarray Data from a Statistician’s Point of View

    Get PDF

    Changes Across 25 Years of Statistics in Medicine

    Get PDF
    [This piece is a series of interviews with giants in the field of medicine on their views of how statistics is changing medicine. I interviewed the editor of the New England Journal of Medicine, a preeminent doctor/researcher of lung cancer, the director of the LA County Department of Public Health, and a Harvard statistician who sits on the editorial board of the New England Journal of Medicine.

    Yeast through the Ages: a Statistical Analysis of Genetic Changes in Aging Yeast

    Get PDF
    Microarray technology allows for the expression levels of thousands of genes in a cell to be measured simultaneously. The technology provides great potential in the fields of biology and medicine, as the analysis of data obtained from microarray experiments gives insight into the roles of specific genes and the associated changes across experimental conditions (e.g., aging, mutation, radiation therapy, drug dosage). The application of statistical tools to microarray data can help make sense of the experiment and thereby advance genetic, biological, and medical research. Likewise, microarrays provide an exciting means through which to explore statistical techniques

    Analyzing DNA Microarrays with Undergraduate Statisticians

    Get PDF
    With advances in technology, biologists have been saddled with high dimensional data that need modern statistical methodology for analysis. DNA microarrays are able to simultaneously measure thousands of genes (and the activity of those genes) in a single sample. Biologists use microarrays to trace connections between pathways or to identify all genes that respond to a signal. The statistical tools we usually teach our undergraduates are inadequate for analyzing thousands of measurements on tens of samples. The project materials include readings on microarrays as well as computer lab activities. The topics covered include image analysis, filtering and normalization techniques, and statistical methods. The course materials are designed for someone with little or no statistical background, but due to the novel concepts covered, they could easily be adjusted to accommodate students with practically any background

    Evaluation of Multiple Models to Distinguish Closely Related Forms of Disease Using DNA Microarray Data: an Application to Multiple Myeloma

    Get PDF
    Motivation: Standard laboratory classification of the plasma cell dyscrasia monoclonal gammopathy of undetermined significance (MGUS) and the overt plasma cell neoplasm multiple myeloma (MM) is quite accurate, yet, for the most part, biologically uninformative. Most, if not all, cancers are caused by inherited or acquired genetic mutations that manifest themselves in altered gene expression patterns in the clonally related cancer cells. Microarray technology allows for qualitative and quantitative measurements of the expression levels of thousands of genes simultaneously, and it has now been used both to classify cancers that are morphologically indistinguishable and to predict response to therapy. It is anticipated that this information can also be used to develop molecular diagnostic models and to provide insight into mechanisms of disease progression, e.g., transition from healthy to benign hyperplasia or conversion of a benign hyperplasia to overt malignancy. However, standard data analysis techniques are not trivial to employ on these large data sets. Methodology designed to handle large data sets (or modified to do so) is needed to access the vital information contained in the genetic samples, which in turn can be used to develop more robust and accurate methods of clinical diagnostics and prognostics.Results: Here we report on the application of a panel of statistical and data mining methodologies to classify groups of samples based on expression of 12,000 genes derived from a high density oligonucleotide microarray analysis of highly purified plasma cells from newly diagnosed MM, MGUS, and normal healthy donors. The three groups of samples are each tested against each other. The methods are found to be similar in their ability to predict group membership; all do quite well at predicting MM vs. normal and MGUS vs. normal. However, no method appears to be able to distinguish explicitly the genetic mechanisms between MM and MGUS. We believe this might be due to the lack of genetic differences between these two conditions, and may not be due to the failure of the models. We report the prediction errors for each of the models and each of the methods. Additionally, we report ROC curves for the results on group prediction.Availability: Logistic regression: standard software, available, for example in SAS. Decision trees and boosted trees: C5.0 from www.rulequest.com. SVM: SVM-light is publicly available from svmlight.joachims.org. Naïve Bayes and ensemble of voters are publicly available from www.biostat.wisc.edu/~mwaddell/eov.html. Nearest Shrunken Centroids is publicly available from http://www-stat.stanford.edu/~tibs/PAM

    Geologic Map of the Ganiki Planitia Quadrangle (V–14), Venus

    Get PDF
    Our current research focuses on addressing four specific questions. Has the dominant style of volcanic expression within the quadrangle varied in a systematic fashion over time? Does the tectonic deformation within the quadrangle record significant regional patterns that vary spatially or temporally, and if so what are the scales, orientations and sources of the stress fields driving this deformation? If mantle upwelling and downwelling have played a significant role in the formation of Atla Regio and Atalanta Planitia as has been proposed, does the geology of Ganiki Planitia record evidence of northwest-directed lateral mantle flow connecting the two sites? Finally, can integration of the tectonic and volcanic histories preserved within the quadrangle help constrain competing resurfacing models for Venus

    A robust measure of correlation between two genes on a microarray

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The underlying goal of microarray experiments is to identify gene expression patterns across different experimental conditions. Genes that are contained in a particular pathway or that respond similarly to experimental conditions could be co-expressed and show similar patterns of expression on a microarray. Using any of a variety of clustering methods or gene network analyses we can partition genes of interest into groups, clusters, or modules based on measures of similarity. Typically, Pearson correlation is used to measure distance (or similarity) before implementing a clustering algorithm. Pearson correlation is quite susceptible to outliers, however, an unfortunate characteristic when dealing with microarray data (well known to be typically quite noisy.)</p> <p>Results</p> <p>We propose a resistant similarity metric based on Tukey's biweight estimate of multivariate scale and location. The resistant metric is simply the correlation obtained from a resistant covariance matrix of scale. We give results which demonstrate that our correlation metric is much more resistant than the Pearson correlation while being more efficient than other nonparametric measures of correlation (e.g., Spearman correlation.) Additionally, our method gives a systematic gene flagging procedure which is useful when dealing with large amounts of noisy data.</p> <p>Conclusion</p> <p>When dealing with microarray data, which are known to be quite noisy, robust methods should be used. Specifically, robust distances, including the biweight correlation, should be used in clustering and gene network analysis.</p