2,790 research outputs found

    A first principles approach to differential expression in microarray data analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The disparate results from the methods commonly used to determine differential expression in Affymetrix microarray experiments may well result from the wide variety of probe set and probe level models employed. Here we take the approach of making the fewest assumptions about the structure of the microarray data. Specifically, we only require that, under the null hypothesis that a gene is not differentially expressed for specified conditions, for any probe position in the gene's probe set: a) the probe amplitudes are independent and identically distributed over the conditions, and b) the distributions of the replicated probe amplitudes are amenable to classical analysis of variance (ANOVA). Log-amplitudes that have been standardized within-chip meet these conditions well enough for our approach, which is to perform ANOVA across conditions for each probe position, and then take the median of the resulting (1 - p) values as a gene-level measure of differential expression.</p> <p>Results</p> <p>We applied the technique to the HGU-133A, HG-U95A, and "Golden Spike" spike-in data sets. The resulting receiver operating characteristic (ROC) curves compared favorably with other published results. This procedure is quite sensitive, so much so that it has revealed the presence of probe sets that might properly be called "unanticipated positives" rather than "false positives", because plots of these probe sets strongly suggest that they are differentially expressed.</p> <p>Conclusion</p> <p>The median ANOVA (1-p) approach presented here is a very simple methodology that does not depend on any specific probe level or probe models, and does not require any pre-processing other than within-chip standardization of probe level log amplitudes. Its performance is comparable to other published methods on the standard spike-in data sets, and has revealed the presence of new categories of probe sets that might properly be referred to as "unanticipated positives" and "unanticipated negatives" that need to be taken into account when using spiked-in data sets at "truthed" test beds.</p

    Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer

    Get PDF
    Background:Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer.Patients and Methods:We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data-complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI) and multiple imputation with inclusion of the outcome (MI). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared.Results:Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI were least biased and most accurate, whereas estimates for CCA were most biased and least accurate.Conclusion:In this study, empirical results from analyses using CCA, MS, MI and MI were similar, although results from CCA were less precise. The results from simulations suggest that in general MI is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI and CCA should be compared in any multi-variate analysis where missing data are a problem. © 2011 Cancer Research UK. All rights reserved

    Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines

    Get PDF
    Background: Multiple imputation (MI) provides an effective approach to handle missing covariate data within prognostic modelling studies, as it can properly account for the missing data uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling techniques to obtain the estimates of interest. The estimates from each imputed dataset are then combined into one overall estimate and variance, incorporating both the within and between imputation variability. Rubin's rules for combining these multiply imputed estimates are based on asymptotic theory. The resulting combined estimates may be more accurate if the posterior distribution of the population parameter of interest is better approximated by the normal distribution. However, the normality assumption may not be appropriate for all the parameters of interest when analysing prognostic modelling studies, such as predicted survival probabilities and model performance measures. Methods: Guidelines for combining the estimates of interest when analysing prognostic modelling studies are provided. A literature review is performed to identify current practice for combining such estimates in prognostic modelling studies. Results: Methods for combining all reported estimates after MI were not well reported in the current literature. Rubin's rules without applying any transformations were the standard approach used, when any method was stated. Conclusion: The proposed simple guidelines for combining estimates after MI may lead to a wider and more appropriate use of MI in future prognostic modelling studies

    Quantum states made to measure

    Full text link
    Recent progress in manipulating quantum states of light and matter brings quantum-enhanced measurements closer to prospective applications. The current challenge is to make quantum metrologic strategies robust against imperfections.Comment: 4 pages, 3 figures, Commentary for Nature Photonic

    Imputation of continuous variables missing at random using the method of simulated scores

    Get PDF
    For multivariate datasets with missing values, we present a procedure of statistical inference and state its "optimal" properties. Two main assumptions are needed: (1) data are missing at random (MAR); (2) the data generating process is a multivariate normal linear regression. Disentangling the problem of convergence of the iterative estimation/imputation procedure, we show that the estimator is a "method of simulated scores" (a particular case of McFadden's "method of simulated moments"); thus the estimator is equivalent to maximum likelihood if the number of replications is conveniently large, and the whole procedure can be considered an optimal parametric technique for imputation of missing data

    Developing a multidisciplinary syndromic surveillance academic research programme in the United Kingdom: benefits for public health surveillance

    Get PDF
    Syndromic surveillance is growing in stature internationally as a recognised and innovative approach to public health surveillance. Syndromic surveillance research uses data captured by syndromic surveillance systems to investigate specific hypotheses or questions. However, this research is often undertaken either within established public health organisations or the academic setting, but often not together. Public health organisations can provide access to health-related data and expertise in infectious and non-infectious disease epidemiology and clinical interpretation of data. Academic institutions can optimise methodological rigour, intellectual clarity and establish routes for applying to external research funding bodies to attract money to fund projects. Together, these competencies can complement each other to enhance the public health benefits of syndromic surveillance research. This paper describes the development of a multidisciplinary syndromic surveillance academic research programme in England, United Kingdom, its aims, goals and benefits to public health

    Subaru FOCAS Spectroscopic Observations of High-Redshift Supernovae

    Full text link
    We present spectra of high-redshift supernovae (SNe) that were taken with the Subaru low resolution optical spectrograph, FOCAS. These SNe were found in SN surveys with Suprime-Cam on Subaru, the CFH12k camera on the Canada-France-Hawaii Telescope (CFHT), and the Advanced Camera for Surveys (ACS) on the Hubble Space Telescope (HST). These SN surveys specifically targeted z>1 Type Ia supernovae (SNe Ia). From the spectra of 39 candidates, we obtain redshifts for 32 candidates and spectroscopically identify 7 active candidates as probable SNe Ia, including one at z=1.35, which is the most distant SN Ia to be spectroscopically confirmed with a ground-based telescope. An additional 4 candidates are identified as likely SNe Ia from the spectrophotometric properties of their host galaxies. Seven candidates are not SNe Ia, either being SNe of another type or active galactic nuclei. When SNe Ia are observed within a week of maximum light, we find that we can spectroscopically identify most of them up to z=1.1. Beyond this redshift, very few candidates were spectroscopically identified as SNe Ia. The current generation of super red-sensitive, fringe-free CCDs will push this redshift limit higher.Comment: 19 pages, 26 figures. PASJ in press. see http://www.supernova.lbl.gov/2009ClusterSurvey/ for additional information pertaining to the HST Cluster SN Surve

    Identification and Dynamics of a Heparin-Binding Site in Hepatocyte Growth Factor †

    Get PDF
    Hepatocyte growth factor (HGF) is a heparin-binding, multipotent growth factor that transduces a wide range of biological signals, including mitogenesis, motogenesis, and morphogenesis. Heparin or closely related heparan sulfate has profound effects on HGF signaling. A heparin-binding site in the N-terminal (N) domain of HGF was proposed on the basis of the clustering of surface positive charges [Zhou, H., Mazzulla, M. J., Kaufman, J. D., Stahl, S. J., Wingfield, P. T., Rubin, J. S., Bottaro, D. P., and Byrd, R. A. (1998) Structure 6, 109-116]. In the present study, we confirmed this binding site in a heparin titration experiment monitored by nuclear magnetic resonance spectroscopy, and we estimated the apparent dissociation constant (K(d)) of the heparin-protein complex by NMR and fluorescence techniques. The primary heparin-binding site is composed of Lys60, Lys62, and Arg73, with additional contributions from the adjacent Arg76, Lys78, and N-terminal basic residues. The K(d) of binding is in the micromolar range. A heparin disaccharide analogue, sucrose octasulfate, binds with similar affinity to the N domain and to a naturally occurring HGF isoform, NK1, at nearly the same region as in heparin binding. (15)N relaxation data indicate structural flexibility on a microsecond-to-millisecond time scale around the primary binding site in the N domain. This flexibility appears to be dramatically reduced by ligand binding. On the basis of the NK1 crystal structure, we propose a model in which heparin binds to the two primary binding sites and the N-terminal regions of the N domains and stabilizes an NK1 dimer
    • …
    corecore