2,790 research outputs found
A first principles approach to differential expression in microarray data analysis
<p>Abstract</p> <p>Background</p> <p>The disparate results from the methods commonly used to determine differential expression in Affymetrix microarray experiments may well result from the wide variety of probe set and probe level models employed. Here we take the approach of making the fewest assumptions about the structure of the microarray data. Specifically, we only require that, under the null hypothesis that a gene is not differentially expressed for specified conditions, for any probe position in the gene's probe set: a) the probe amplitudes are independent and identically distributed over the conditions, and b) the distributions of the replicated probe amplitudes are amenable to classical analysis of variance (ANOVA). Log-amplitudes that have been standardized within-chip meet these conditions well enough for our approach, which is to perform ANOVA across conditions for each probe position, and then take the median of the resulting (1 - p) values as a gene-level measure of differential expression.</p> <p>Results</p> <p>We applied the technique to the HGU-133A, HG-U95A, and "Golden Spike" spike-in data sets. The resulting receiver operating characteristic (ROC) curves compared favorably with other published results. This procedure is quite sensitive, so much so that it has revealed the presence of probe sets that might properly be called "unanticipated positives" rather than "false positives", because plots of these probe sets strongly suggest that they are differentially expressed.</p> <p>Conclusion</p> <p>The median ANOVA (1-p) approach presented here is a very simple methodology that does not depend on any specific probe level or probe models, and does not require any pre-processing other than within-chip standardization of probe level log amplitudes. Its performance is comparable to other published methods on the standard spike-in data sets, and has revealed the presence of new categories of probe sets that might properly be referred to as "unanticipated positives" and "unanticipated negatives" that need to be taken into account when using spiked-in data sets at "truthed" test beds.</p
Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer
Background:Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer.Patients and Methods:We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data-complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI) and multiple imputation with inclusion of the outcome (MI). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared.Results:Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI were least biased and most accurate, whereas estimates for CCA were most biased and least accurate.Conclusion:In this study, empirical results from analyses using CCA, MS, MI and MI were similar, although results from CCA were less precise. The results from simulations suggest that in general MI is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI and CCA should be compared in any multi-variate analysis where missing data are a problem. © 2011 Cancer Research UK. All rights reserved
Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines
Background: Multiple imputation (MI) provides an effective approach to handle missing covariate
data within prognostic modelling studies, as it can properly account for the missing data
uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling
techniques to obtain the estimates of interest. The estimates from each imputed dataset are then
combined into one overall estimate and variance, incorporating both the within and between
imputation variability. Rubin's rules for combining these multiply imputed estimates are based on
asymptotic theory. The resulting combined estimates may be more accurate if the posterior
distribution of the population parameter of interest is better approximated by the normal
distribution. However, the normality assumption may not be appropriate for all the parameters of
interest when analysing prognostic modelling studies, such as predicted survival probabilities and
model performance measures.
Methods: Guidelines for combining the estimates of interest when analysing prognostic modelling
studies are provided. A literature review is performed to identify current practice for combining
such estimates in prognostic modelling studies.
Results: Methods for combining all reported estimates after MI were not well reported in the
current literature. Rubin's rules without applying any transformations were the standard approach
used, when any method was stated.
Conclusion: The proposed simple guidelines for combining estimates after MI may lead to a wider
and more appropriate use of MI in future prognostic modelling studies
Quantum states made to measure
Recent progress in manipulating quantum states of light and matter brings
quantum-enhanced measurements closer to prospective applications. The current
challenge is to make quantum metrologic strategies robust against
imperfections.Comment: 4 pages, 3 figures, Commentary for Nature Photonic
Imputation of continuous variables missing at random using the method of simulated scores
For multivariate datasets with missing values, we present a procedure of statistical inference and state its "optimal" properties. Two main assumptions are needed: (1) data are missing at random (MAR); (2) the data generating process is a multivariate normal linear regression. Disentangling the problem of convergence of the iterative estimation/imputation procedure, we show that the estimator is a "method of simulated scores" (a particular case of McFadden's "method of simulated moments"); thus the estimator is equivalent to maximum likelihood if the number of replications is conveniently large, and the whole procedure can be considered an optimal parametric technique for imputation of missing data
Developing a multidisciplinary syndromic surveillance academic research programme in the United Kingdom: benefits for public health surveillance
Syndromic surveillance is growing in stature internationally as a recognised and innovative approach to public health surveillance. Syndromic surveillance research uses data captured by syndromic surveillance systems to investigate specific hypotheses or questions. However, this research is often undertaken either within established public health organisations or the academic setting, but often not together. Public health organisations can provide access to health-related data and expertise in infectious and non-infectious disease epidemiology and clinical interpretation of data. Academic institutions can optimise methodological rigour, intellectual clarity and establish routes for applying to external research funding bodies to attract money to fund projects. Together, these competencies can complement each other to enhance the public health benefits of syndromic surveillance research. This paper describes the development of a multidisciplinary syndromic surveillance academic research programme in England, United Kingdom, its aims, goals and benefits to public health
Subaru FOCAS Spectroscopic Observations of High-Redshift Supernovae
We present spectra of high-redshift supernovae (SNe) that were taken with the
Subaru low resolution optical spectrograph, FOCAS. These SNe were found in SN
surveys with Suprime-Cam on Subaru, the CFH12k camera on the
Canada-France-Hawaii Telescope (CFHT), and the Advanced Camera for Surveys
(ACS) on the Hubble Space Telescope (HST). These SN surveys specifically
targeted z>1 Type Ia supernovae (SNe Ia). From the spectra of 39 candidates, we
obtain redshifts for 32 candidates and spectroscopically identify 7 active
candidates as probable SNe Ia, including one at z=1.35, which is the most
distant SN Ia to be spectroscopically confirmed with a ground-based telescope.
An additional 4 candidates are identified as likely SNe Ia from the
spectrophotometric properties of their host galaxies. Seven candidates are not
SNe Ia, either being SNe of another type or active galactic nuclei. When SNe Ia
are observed within a week of maximum light, we find that we can
spectroscopically identify most of them up to z=1.1. Beyond this redshift, very
few candidates were spectroscopically identified as SNe Ia. The current
generation of super red-sensitive, fringe-free CCDs will push this redshift
limit higher.Comment: 19 pages, 26 figures. PASJ in press. see
http://www.supernova.lbl.gov/2009ClusterSurvey/ for additional information
pertaining to the HST Cluster SN Surve
Identification and Dynamics of a Heparin-Binding Site in Hepatocyte Growth Factor â€
Hepatocyte growth factor (HGF) is a heparin-binding, multipotent growth factor that transduces a wide range of biological signals, including mitogenesis, motogenesis, and morphogenesis. Heparin or closely related heparan sulfate has profound effects on HGF signaling. A heparin-binding site in the N-terminal (N) domain of HGF was proposed on the basis of the clustering of surface positive charges [Zhou, H., Mazzulla, M. J., Kaufman, J. D., Stahl, S. J., Wingfield, P. T., Rubin, J. S., Bottaro, D. P., and Byrd, R. A. (1998) Structure 6, 109-116]. In the present study, we confirmed this binding site in a heparin titration experiment monitored by nuclear magnetic resonance spectroscopy, and we estimated the apparent dissociation constant (K(d)) of the heparin-protein complex by NMR and fluorescence techniques. The primary heparin-binding site is composed of Lys60, Lys62, and Arg73, with additional contributions from the adjacent Arg76, Lys78, and N-terminal basic residues. The K(d) of binding is in the micromolar range. A heparin disaccharide analogue, sucrose octasulfate, binds with similar affinity to the N domain and to a naturally occurring HGF isoform, NK1, at nearly the same region as in heparin binding. (15)N relaxation data indicate structural flexibility on a microsecond-to-millisecond time scale around the primary binding site in the N domain. This flexibility appears to be dramatically reduced by ligand binding. On the basis of the NK1 crystal structure, we propose a model in which heparin binds to the two primary binding sites and the N-terminal regions of the N domains and stabilizes an NK1 dimer
- …