57,761 research outputs found
Ordinal Probit Functional Regression Models with Application to Computer-Use Behavior in Rhesus Monkeys
Research in functional regression has made great strides in expanding to
non-Gaussian functional outcomes, however the exploration of ordinal functional
outcomes remains limited. Motivated by a study of computer-use behavior in
rhesus macaques (\emph{Macaca mulatta}), we introduce the Ordinal Probit
Functional Regression Model or OPFRM to perform ordinal function-on-scalar
regression. The OPFRM is flexibly formulated to allow for the choice of
different basis functions including penalized B-splines, wavelets, and
O'Sullivan splines. We demonstrate the operating characteristics of the model
in simulation using a variety of underlying covariance patterns showing the
model performs reasonably well in estimation under multiple basis functions. We
also present and compare two approaches for conducting posterior inference
showing that joint credible intervals tend to out perform point-wise credible.
Finally, in application, we determine demographic factors associated with the
monkeys' computer use over the course of a year and provide a brief analysis of
the findings
Joint analysis of SNP and gene expression data in genetic association studies of complex diseases
Genetic association studies have been a popular approach for assessing the
association between common Single Nucleotide Polymorphisms (SNPs) and complex
diseases. However, other genomic data involved in the mechanism from SNPs to
disease, for example, gene expressions, are usually neglected in these
association studies. In this paper, we propose to exploit gene expression
information to more powerfully test the association between SNPs and diseases
by jointly modeling the relations among SNPs, gene expressions and diseases. We
propose a variance component test for the total effect of SNPs and a gene
expression on disease risk. We cast the test within the causal mediation
analysis framework with the gene expression as a potential mediator. For eQTL
SNPs, the use of gene expression information can enhance power to test for the
total effect of a SNP-set, which is the combined direct and indirect effects of
the SNPs mediated through the gene expression, on disease risk. We show that
the test statistic under the null hypothesis follows a mixture of
distributions, which can be evaluated analytically or empirically using the
resampling-based perturbation method. We construct tests for each of three
disease models that are determined by SNPs only, SNPs and gene expression, or
include also their interactions. As the true disease model is unknown in
practice, we further propose an omnibus test to accommodate different
underlying disease models. We evaluate the finite sample performance of the
proposed methods using simulation studies, and show that our proposed test
performs well and the omnibus test can almost reach the optimal power where the
disease model is known and correctly specified. We apply our method to
reanalyze the overall effect of the SNP-set and expression of the ORMDL3 gene
on the risk of asthma.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS690 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The relative efficiency of time-to-progression and continuous measures of cognition in presymptomatic Alzheimer's disease.
IntroductionClinical trials on preclinical Alzheimer's disease are challenging because of the slow rate of disease progression. We use a simulation study to demonstrate that models of repeated cognitive assessments detect treatment effects more efficiently than models of time to progression.MethodsMultivariate continuous data are simulated from a Bayesian joint mixed-effects model fit to data from the Alzheimer's Disease Neuroimaging Initiative. Simulated progression events are algorithmically derived from the continuous assessments using a random forest model fit to the same data.ResultsWe find that power is approximately doubled with models of repeated continuous outcomes compared with the time-to-progression analysis. The simulations also demonstrate that a plausible informative missing data pattern can induce a bias that inflates treatment effects, yet 5% type I error is maintained.DiscussionGiven the relative inefficiency of time to progression, it should be avoided as a primary analysis approach in clinical trials of preclinical Alzheimer's disease
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
Graphical Markov models: overview
We describe how graphical Markov models started to emerge in the last 40
years, based on three essential concepts that had been developed independently
more than a century ago. Sequences of joint or single regressions and their
regression graphs are singled out as being best suited for analyzing
longitudinal data and for tracing developmental pathways. Interpretations are
illustrated using two sets of data and some of the more recent, important
results for sequences of regressions are summarized.Comment: 22 pages, 9 figure
- …