Search CORE

113 research outputs found

Statistical Challenges in Combining Information from Big and Small Data Sources

Author: Raghunathan Trivellore
Publication venue
Publication date: 19/11/2015
Field of study

Social Media, electronic health records, credit card transactional and administrative data, web scraping, and numerous other ways of collecting information have changed the landscape for those interested in addressing policy-relevant research questions. During the same time, the traditional sources of data, such as large-scale surveys, that have been a stable source for policy-relevant research have su ered set- backs due to large nonresponse and increasing data collection costs. The non-survey data usually contain detailed information on certain behaviors on a large number of individuals (such as all credit card transactions) but very little background information on them (such as important covariates to address the policy-relevant question). On the other hand, the survey data contains detailed information on co- variates but not so detailed information on the behaviors. Both data sources may not be perfect for the target population of interest. This paper develops and evaluates a framework for linking information from multiple imperfect data sources along with the Census data to draw statistical inference. An explicit modeling framework involving se- lection into the big data, sampling and nonresponse mechanism in the survey data, distribution of the key variables of interest and cer- tain marginal distributions from the Census Data are used as building blocks to draw inference about the population quantity of interest.http://deepblue.lib.umich.edu/bitstream/2027.42/120417/1/NAS-Paper.pdfDescription of NAS-Paper.pdf : Main Articl

Deep Blue Documents at the University of Michigan

Bayesian sensitivity analysis of incomplete data: bridging pattern‐mixture and selection models

Author: Kaciroti Niko A.
Raghunathan Trivellore
Publication venue: 'Wiley'
Publication date: 24/09/2014
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/109600/1/sim6302.pd

Crossref

Deep Blue Documents at the University of Michigan

On summary measures analysis of the linear mixed effects model for repeated measures when data are not missing completely at random

Author: Roderick J. Little
Trivellore Raghunathan
Publication venue: 'Wiley'
Publication date: 01/01/2002
Field of study

Crossref

On summary measures analysis of the linear mixed effects model for repeated measures when data are not missing completely at random

Author: Roderick J. Little
Trivellore Raghunathan
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

Crossref

An Approximate Test for Homogeneity of Correlated Correlation Coefficients

Author: Raghunathan Trivellore E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2003
Field of study

This paper develops and evaluates an approximate procedure for testing homogeneity of an arbitrary subset of correlation coefficients among variables measured on the same set of individuals. The sample may have some missing data. The simple test statistic is a multiple of the variance of Fisher r-to-z transformed correlation coefficients relevant to the null hypothesis being tested and is referred to a chi-square distribution. The use of this test is illustrated through several examples. Given the approximate nature of the test statistics, the procedure was evaluated using a simulation study. The accuracy in terms of the nominal and the actual significance levels of this test for several null hypotheses of interest were evaluated.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/43560/1/11135_2004_Article_394854.pd

Deep Blue Documents at the University of Michigan

A Multiple‐Imputation Analysis of a Case‐Control Study of the Risk of Primary Cardiac Arrest Among Pharmacologicallytreated Hypertensives

Author: Raghunathan Trivellore E.
Siscovick David S.
Publication venue: 'JSTOR'
Publication date: 01/09/1996
Field of study

Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/146847/1/rssc02669.pd

Deep Blue Documents at the University of Michigan

Improving on analyses of self-reported data in a large-scale health survey by using information from an examination-based survey

Author: Bondarenko Irina
Raghunathan Trivellore E.
Schenker Nathaniel
Publication venue: 'Wiley'
Publication date: 28/02/2010
Field of study

Common data sources for assessing the health of a population of interest include large-scale surveys based on interviews that often pose questions requiring a self-report, such as, ‘Has a doctor or other health professional ever told you that you have 〈 health condition of interest〉 ?’ or ‘What is your 〈 height/weight〉 ?’ Answers to such questions might not always reflect the true prevalences of health conditions (for example, if a respondent misreports height/weight or does not have access to a doctor or other health professional). Such ‘measurement error’ in health data could affect inferences about measures of health and health disparities. Drawing on two surveys conducted by the National Center for Health Statistics, this paper describes an imputation-based strategy for using clinical information from an examination-based health survey to improve on analyses of self-reported data in a larger interview-based health survey. Models predicting clinical values from self-reported values and covariates are fitted to data from the National Health and Nutrition Examination Survey (NHANES), which asks self-report questions during an interview component and also obtains clinical measurements during a physical examination component. The fitted models are used to multiply impute clinical values for the National Health Interview Survey (NHIS), a larger survey that obtains data solely via interviews. Illustrations involving hypertension, diabetes, and obesity suggest that estimates of health measures based on the multiply imputed clinical values are different from those based on the NHIS self-reported data alone and have smaller estimated standard errors than those based solely on the NHANES clinical data. The paper discusses the relationship of the methods used in the study to two-phase/two-stage/validation sampling and estimation, along with limitations, practical considerations, and areas for future research. Published in 2009 by John Wiley & Sons, Ltd.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/65032/1/3809_ftp.pd

Deep Blue Documents at the University of Michigan

Mixed Mode Methods in a World of Social Isolates, Pervasive Surveillance, and Ubiquitous Transaction Records: A Modest Proposal

Author: Groves Robert M.
Raghunathan Trivellore
Publication venue
Publication date: 01/09/2005
Field of study

National Centre for Research Methods: NCRM EPrints Repository

Bayesian Variable Selection with Joint Modeling of Categorical and Survival Outcomes: An Application to Individualizing Chemotherapy Treatment in Advanced Colorectal Cancer

Author: Chen Wei
Ghosh Debashis
Raghunathan Trivellore E.
Sargent Daniel J.
Publication venue: 'Wiley'
Publication date: 01/12/2009
Field of study

Colorectal cancer is the second leading cause of cancer related deaths in the United States, with more than 130,000 new cases of colorectal cancer diagnosed each year. Clinical studies have shown that genetic alterations lead to different responses to the same treatment, despite the morphologic similarities of tumors. A molecular test prior to treatment could help in determining an optimal treatment for a patient with regard to both toxicity and efficacy. This article introduces a statistical method appropriate for predicting and comparing multiple endpoints given different treatment options and molecular profiles of an individual. A latent variable-based multivariate regression model with structured variance covariance matrix is considered here. The latent variables account for the correlated nature of multiple endpoints and accommodate the fact that some clinical endpoints are categorical variables and others are censored variables. The mixture normal hierarchical structure admits a natural variable selection rule. Inference was conducted using the posterior distribution sampling Markov chain Monte Carlo method. We analyzed the finite-sample properties of the proposed method using simulation studies. The application to the advanced colorectal cancer study revealed associations between multiple endpoints and particular biomarkers, demonstrating the potential of individualizing treatment based on genetic profiles.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/66395/1/j.1541-0420.2008.01181.x.pd

PubMed Central

Deep Blue Documents at the University of Michigan

A Bayesian model for longitudinal count data with non-ignorable dropout

Author: Anthony Schork M.
Clark Noreen M.
Kaciroti Niko A.
Raghunathan Trivellore E.
Publication venue: 'Wiley'
Publication date: 01/12/2008
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/73907/1/j.1467-9876.2008.00628.x.pd

PubMed Central

Deep Blue Documents at the University of Michigan