Search CORE

2,790 research outputs found

A first principles approach to differential expression in microarray data analysis

Author: AK Gupta
M Dai
M McGee
RA Irizarry
RD Pearson
Robert A Rubin
SE Choe
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The disparate results from the methods commonly used to determine differential expression in Affymetrix microarray experiments may well result from the wide variety of probe set and probe level models employed. Here we take the approach of making the fewest assumptions about the structure of the microarray data. Specifically, we only require that, under the null hypothesis that a gene is not differentially expressed for specified conditions, for any probe position in the gene's probe set: a) the probe amplitudes are independent and identically distributed over the conditions, and b) the distributions of the replicated probe amplitudes are amenable to classical analysis of variance (ANOVA). Log-amplitudes that have been standardized within-chip meet these conditions well enough for our approach, which is to perform ANOVA across conditions for each probe position, and then take the median of the resulting (1 - p) values as a gene-level measure of differential expression. Results We applied the technique to the HGU-133A, HG-U95A, and "Golden Spike" spike-in data sets. The resulting receiver operating characteristic (ROC) curves compared favorably with other published results. This procedure is quite sensitive, so much so that it has revealed the presence of probe sets that might properly be called "unanticipated positives" rather than "false positives", because plots of these probe sets strongly suggest that they are differentially expressed. Conclusion The median ANOVA (1-p) approach presented here is a very simple methodology that does not depend on any specific probe level or probe models, and does not require any pre-processing other than within-chip standardization of probe level log amplitudes. Its performance is comparable to other published methods on the standard spike-in data sets, and has revealed the presence of new categories of probe sets that might properly be referred to as "unanticipated positives" and "unanticipated negatives" that need to be taken into account when using spiked-in data sets at "truthed" test beds.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer

Author: A Marshall
AR Donders
DB Rubin
DB Rubin
DB Rubin
G Ambler
G Van der Heijden
IR White
JA Sterne
JL Schafer
JL Schafer
JL Schafer
JM Engels
JO Kim
KG Moons
NJ Horton
RA Little
RA Little
S Greenland
S van Buuren
SJ Dawson
TE Bodner
W Vach
Publication venue: Nature Publishing Group
Publication date: 01/01/2011
Field of study

Background:Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer.Patients and Methods:We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data-complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI) and multiple imputation with inclusion of the outcome (MI). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared.Results:Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI were least biased and most accurate, whereas estimates for CCA were most biased and least accurate.Conclusion:In this study, empirical results from analyses using CCA, MS, MI and MI were similar, although results from CCA were less precise. The results from simulations suggest that in general MI is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI and CCA should be compared in any multi-variate analysis where missing data are a problem. © 2011 Cancer Research UK. All rights reserved

Crossref

PubMed Central

Archivio della Ricerca - Università di Pisa

University of Melbourne Institutional Repository

Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines

Author: A Burton
A Rouxel
AC Mertens
Andrea Marshall
BL Thomsen
C Serrat
D Collett
DB Rubin
DB Rubin
DB Rubin
DG Altman
Douglas G Altman
DW Hosmer
FE Harrell
FE Harrell
FR Hampel
G Ambler
G Vaughn
HC van Houwelingen
J O'Quigley
JA Hoeting
JC Wyatt
JL Schafer
JW Graham
KH Li
M Schemper
M Schemper
MG Kenward
MW Heymans
N Orsini
O Harel
P Peduzzi
P Royston
Patrick Royston
RA Fisher
Roger L Holder
S Gill
S Sinharay
S van Buuren
T Bärnighausen
TG Clark
TG Clark
WM Stadler
XL Meng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background: Multiple imputation (MI) provides an effective approach to handle missing covariate data within prognostic modelling studies, as it can properly account for the missing data uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling techniques to obtain the estimates of interest. The estimates from each imputed dataset are then combined into one overall estimate and variance, incorporating both the within and between imputation variability. Rubin's rules for combining these multiply imputed estimates are based on asymptotic theory. The resulting combined estimates may be more accurate if the posterior distribution of the population parameter of interest is better approximated by the normal distribution. However, the normality assumption may not be appropriate for all the parameters of interest when analysing prognostic modelling studies, such as predicted survival probabilities and model performance measures. Methods: Guidelines for combining the estimates of interest when analysing prognostic modelling studies are provided. A literature review is performed to identify current practice for combining such estimates in prognostic modelling studies. Results: Methods for combining all reported estimates after MI were not well reported in the current literature. Rubin's rules without applying any transformations were the standard approach used, when any method was stated. Conclusion: The proposed simple guidelines for combining estimates after MI may lead to a wider and more appropriate use of MI in future prognostic modelling studies

Crossref

Springer - Publisher Connector

University of Birmingham Research Portal

Directory of Open Access Journals

PubMed Central

UCL Discovery

Warwick Research Archives Portal Repository

Oxford University Research Archive

A Higher-Order Approach to Ontology Evolution in Physics

Author: A Borgida
A Kalyanpur
F Giunchiglia
FV Jensen
G Flouris
JSC Lam
LC Paulson
MJ Gordon
RA Kowalski
VC Rubin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2013
Field of study

Crossref

Edinburgh Research Explorer

Quantum states made to measure

Author: BL Higgins
C-Y Lu
CM Caves
F Dell'Anno
G Gilbert
H Vahlbruch
Ian A. Walmsley
J Appel
J Estève
J Ye
JG Rarity
JJ Bollinger
JL O'Brien
JP Dowling
Konrad Banaszek
M Bourennane
MA Rubin
MW Mitchell
NJ Cerf
P Walther
PAM Dirac
PJ Mosley
RA Fisher
Rafał Demkowicz-Dobrzański
S Boixo
T Nagata
U Dorner
V Giovannetti
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Recent progress in manipulating quantum states of light and matter brings quantum-enhanced measurements closer to prospective applications. The current challenge is to make quantum metrologic strategies robust against imperfections.Comment: 4 pages, 3 figures, Commentary for Nature Photonic

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Imputation of continuous variables missing at random using the method of simulated scores

Author: C Gourieroux
D Fadden Mc
DB Rubin
DB Rubin
DB Rubin
DB Rubin
G Calzolari
JL Schafer
NJ Horton
RA Thisted
RJA Little
TE Raghunathan
V Hajivassiliou
WH Greene
Publication venue
Publication date: 01/01/2002
Field of study

For multivariate datasets with missing values, we present a procedure of statistical inference and state its "optimal" properties. Two main assumptions are needed: (1) data are missing at random (MAR); (2) the data generating process is a multivariate normal linear regression. Disentangling the problem of convergence of the iterative estimation/imputation procedure, we show that the estimator is a "method of simulated scores" (a particular case of McFadden's "method of simulated moments"); thus the estimator is equivalent to maximum likelihood if the number of replications is conveniently large, and the whole procedure can be considered an optimal parametric technique for imputation of missing data

Munich RePEc Personal Archive

Crossref

Imputation of Continuous Variables Missing at Random using the Method of Simulated Scores

Author: C Gourieroux
D Fadden Mc
DB Rubin
DB Rubin
DB Rubin
DB Rubin
G Calzolari
JL Schafer
NJ Horton
RA Thisted
RJA Little
TE Raghunathan
V Hajivassiliou
WH Greene
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2002
Field of study

Crossref

Developing a multidisciplinary syndromic surveillance academic research programme in the United Kingdom: benefits for public health surveillance

Author: Alex J. Elliot
Felipe J. Colón-González
G. James Rubin
Gillian E. Smith
Harcourt SE
Iain R. Lake
Morbey RA
Obaghe Edeghere
Roberto Vivancos
Roger Morbey
Sarah J. O’Brien
Publication venue: 'SAGE Publications'
Publication date: 01/01/2017
Field of study

Syndromic surveillance is growing in stature internationally as a recognised and innovative approach to public health surveillance. Syndromic surveillance research uses data captured by syndromic surveillance systems to investigate specific hypotheses or questions. However, this research is often undertaken either within established public health organisations or the academic setting, but often not together. Public health organisations can provide access to health-related data and expertise in infectious and non-infectious disease epidemiology and clinical interpretation of data. Academic institutions can optimise methodological rigour, intellectual clarity and establish routes for applying to external research funding bodies to attract money to fund projects. Together, these competencies can complement each other to enhance the public health benefits of syndromic surveillance research. This paper describes the development of a multidisciplinary syndromic surveillance academic research programme in England, United Kingdom, its aims, goals and benefits to public health

University of Liverpool Repository

Crossref

LSHTM Research Online

University of East Anglia digital repository

Subaru FOCAS Spectroscopic Observations of High-Redshift Supernovae

Author: Aldering G
Amanullah R
Barbary K
Dawson K
Doi M
Fadeyev V
Fakhouri HK
Furusawa H
Goldhaber G
Goobar A
Hattori T
Hayano J
Hook IM
Howell DA
Ihara Y
Kashikawa N
Knop RA
Konishi K
Lidman C
Meyers J
Morokuma T
Oda T
Pain R
Perlmutter S
Project SC
Rubin D
Spadafora AL
Suzuki N
Takanashi N
Tokita K
Totani T
Utsunomiya H
Wang L
Yasuda N
Publication venue: 'Oxford University Press (OUP)'
Publication date: 06/11/2009
Field of study

We present spectra of high-redshift supernovae (SNe) that were taken with the Subaru low resolution optical spectrograph, FOCAS. These SNe were found in SN surveys with Suprime-Cam on Subaru, the CFH12k camera on the Canada-France-Hawaii Telescope (CFHT), and the Advanced Camera for Surveys (ACS) on the Hubble Space Telescope (HST). These SN surveys specifically targeted z>1 Type Ia supernovae (SNe Ia). From the spectra of 39 candidates, we obtain redshifts for 32 candidates and spectroscopically identify 7 active candidates as probable SNe Ia, including one at z=1.35, which is the most distant SN Ia to be spectroscopically confirmed with a ground-based telescope. An additional 4 candidates are identified as likely SNe Ia from the spectrophotometric properties of their host galaxies. Seven candidates are not SNe Ia, either being SNe of another type or active galactic nuclei. When SNe Ia are observed within a week of maximum light, we find that we can spectroscopically identify most of them up to z=1.1. Beyond this redshift, very few candidates were spectroscopically identified as SNe Ia. The current generation of super red-sensitive, fringe-free CCDs will push this redshift limit higher.Comment: 19 pages, 26 figures. PASJ in press. see http://www.supernova.lbl.gov/2009ClusterSurvey/ for additional information pertaining to the HST Cluster SN Surve

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Identification and Dynamics of a Heparin-Binding Site in Hepatocyte Growth Factor †

Author: Bottaro Donald P.
Byrd Ra Andrew
Casas-Finet José R.
Coats Rh Heath
Kaufman Joshua D.
Rubin Jeffrey S.
Stahl Stephen J.
Wingfield Paul T.
Zhou Hongjun
Publication venue
Publication date: 01/11/1999
Field of study

Hepatocyte growth factor (HGF) is a heparin-binding, multipotent growth factor that transduces a wide range of biological signals, including mitogenesis, motogenesis, and morphogenesis. Heparin or closely related heparan sulfate has profound effects on HGF signaling. A heparin-binding site in the N-terminal (N) domain of HGF was proposed on the basis of the clustering of surface positive charges [Zhou, H., Mazzulla, M. J., Kaufman, J. D., Stahl, S. J., Wingfield, P. T., Rubin, J. S., Bottaro, D. P., and Byrd, R. A. (1998) Structure 6, 109-116]. In the present study, we confirmed this binding site in a heparin titration experiment monitored by nuclear magnetic resonance spectroscopy, and we estimated the apparent dissociation constant (K(d)) of the heparin-protein complex by NMR and fluorescence techniques. The primary heparin-binding site is composed of Lys60, Lys62, and Arg73, with additional contributions from the adjacent Arg76, Lys78, and N-terminal basic residues. The K(d) of binding is in the micromolar range. A heparin disaccharide analogue, sucrose octasulfate, binds with similar affinity to the N domain and to a naturally occurring HGF isoform, NK1, at nearly the same region as in heparin binding. (15)N relaxation data indicate structural flexibility on a microsecond-to-millisecond time scale around the primary binding site in the N domain. This flexibility appears to be dramatically reduced by ligand binding. On the basis of the NK1 crystal structure, we propose a model in which heparin binds to the two primary binding sites and the N-terminal regions of the N domains and stabilizes an NK1 dimer

ZENODO