115 research outputs found
Joint Clustering and Registration of Functional Data
Curve registration and clustering are fundamental tools in the analysis of
functional data. While several methods have been developed and explored for
either task individually, limited work has been done to infer functional
clusters and register curves simultaneously. We propose a hierarchical model
for joint curve clustering and registration. Our proposal combines a Dirichlet
process mixture model for clustering of common shapes, with a reproducing
kernel representation of phase variability for registration. We show how
inference can be carried out applying standard posterior simulation algorithms
and compare our method to several alternatives in both engineered data and a
benchmark analysis of the Berkeley growth data. We conclude our investigation
with an application to time course gene expression
Bayesian Inference for Latent Biologic Structure with Determinantal Point Processes (DPP)
We discuss the use of the determinantal point process (DPP) as a prior for
latent structure in biomedical applications, where inference often centers on
the interpretation of latent features as biologically or clinically meaningful
structure. Typical examples include mixture models, when the terms of the
mixture are meant to represent clinically meaningful subpopulations (of
patients, genes, etc.). Another class of examples are feature allocation
models. We propose the DPP prior as a repulsive prior on latent mixture
components in the first example, and as prior on feature-specific parameters in
the second case. We argue that the DPP is in general an attractive prior model
for latent structure when biologically relevant interpretation of such
structure is desired. We illustrate the advantages of DPP prior in three case
studies, including inference in mixture models for magnetic resonance images
(MRI) and for protein expression, and a feature allocation model for gene
expression using data from The Cancer Genome Atlas. An important part of our
argument are efficient and straightforward posterior simulation methods. We
implement a variation of reversible jump Markov chain Monte Carlo simulation
for inference under the DPP prior, using a density with respect to the unit
rate Poisson process
Comparison of Clustering Methods for Time Course Genomic Data: Applications to Aging Effects
Time course microarray data provide insight about dynamic biological
processes. While several clustering methods have been proposed for the analysis
of these data structures, comparison and selection of appropriate clustering
methods are seldom discussed. We compared probabilistic based clustering
methods and distance based clustering methods for time course microarray
data. Among probabilistic methods, we considered: smoothing spline clustering
also known as model based functional data analysis (MFDA), functional
clustering models for sparsely sampled data (FCM) and model-based clustering
(MCLUST). Among distance based methods, we considered: weighted gene
co-expression network analysis (WGCNA), clustering with dynamic time warping
distance (DTW) and clustering with autocorrelation based distance (ACF). We
studied these algorithms in both simulated settings and case study data. Our
investigations showed that FCM performed very well when gene curves were short
and sparse. DTW and WGCNA performed well when gene curves were medium or long
( observations). SSC performed very well when there were clusters of gene
curves similar to one another. Overall, ACF performed poorly in these
applications. In terms of computation time, FCM, SSC and DTW were considerably
slower than MCLUST and WGCNA. WGCNA outperformed MCLUST by generating more
accurate and biological meaningful clustering results. WGCNA and MCLUST are the
best methods among the 6 methods compared, when performance and computation
time are both taken into account. WGCNA outperforms MCLUST, but MCLUST provides
model based inference and uncertainty measure of clustering results
Modeling dependent gene expression
In this paper we propose a Bayesian approach for inference about dependence
of high throughput gene expression. Our goals are to use prior knowledge about
pathways to anchor inference about dependence among genes; to account for this
dependence while making inferences about differences in mean expression across
phenotypes; and to explore differences in the dependence itself across
phenotypes. Useful features of the proposed approach are a model-based
parsimonious representation of expression as an ordinal outcome, a novel and
flexible representation of prior information on the nature of dependencies, and
the use of a coherent probability model over both the structure and strength of
the dependencies of interest. We evaluate our approach through simulations and
in the analysis of data on expression of genes in the Complement and
Coagulation Cascade pathway in ovarian cancer.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS525 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Modeling Criminal Careers as Departures from a Unimodal Population Age-Crime Curve: The Case of Marijuana Use
A major aim of longitudinal analyses of life course data is to describe the within- and between-individual variability in a behavioral outcome, such as crime. Statistical analyses of such data typically draw on mixture and mixed-effects growth models. In this work, we present a functional analytic point of view and develop an alternative method that models individual crime trajectories as departures from a population age-crime curve. Drawing on empirical and theoretical claims in criminology, we assume a unimodal population age-crime curve and allow individual expected crime trajectories to differ by their levels of offending and patterns of temporal misalignment. We extend Bayesian hierarchical curve registration methods to accommodate count data and to incorporate influence of baseline covariates on individual behavioral trajectories. Analyzing self-reported counts of yearly marijuana use from the Denver Youth Survey, we examine the influence of race and gender categories on differences in levels and timing of marijuana smoking. We find that our approach offers a flexible and realistic model for longitudinal crime trajectories that fits individual observations well and allows for a rich array of inferences of interest to criminologists and drug abuse researchers
Recommended from our members
Has Toxicity Testing Moved into the 21st Century? A Survey and Analysis of Perceptions in the Field of Toxicology.
BackgroundTen years ago, leaders in the field of toxicology called for a transformation of the discipline and a shift from primarily relying on traditional animal testing to incorporating advances in biotechnology and predictive methodologies into alternative testing strategies (ATS). Governmental agencies and academic and industry partners initiated programs to support such a transformation, but a decade later, the outcomes of these efforts are not well understood.ObjectivesWe aimed to assess the use of ATS and the perceived barriers and drivers to their adoption by toxicologists and by others working in, or closely linked with, the field of toxicology.MethodsWe surveyed 1,381 toxicologists and experts in associated fields regarding the viability and use of ATS and the perceived barriers and drivers of ATS for a range of applications. We performed ranking, hierarchical clustering, and correlation analyses of the survey data.ResultsMany respondents indicated that they were already using ATS, or believed that ATS were already viable approaches, for toxicological assessment of one or more end points in their primary area of interest or concern (26-86%, depending on the specific ATS/application pair). However, the proportions of respondents reporting use of ATS in the previous 12 mo were smaller (4.5-41%). Concern about regulatory acceptance was the most commonly cited factor inhibiting the adoption of ATS, and a variety of technical concerns were also cited as significant barriers to ATS viability. The factors most often cited as playing a significant role (currently or in the future) in driving the adoption of ATS were the need for expedited toxicology information, the need for reduced toxicity testing costs, demand by regulatory agencies, and ethical or moral concerns.ConclusionsOur findings indicate that the transformation of the field of toxicology is partly implemented, but significant barriers to acceptance and adoption remain. https://doi.org/10.1289/EHP1435
Region-Referenced Spectral Power Dynamics of EEG Signals: A Hierarchical Modeling Approach
Functional brain imaging through electroencephalography (EEG) relies upon the
analysis and interpretation of high-dimensional, spatially organized time
series. We propose to represent time-localized frequency domain
characterizations of EEG data as region-referenced functional data. This
representation is coupled with a hierarchical modeling approach to multivariate
functional observations. Within this familiar setting, we discuss how several
prior models relate to structural assumptions about multivariate covariance
operators. An overarching modeling framework, based on infinite factorial
decompositions, is finally proposed to balance flexibility and efficiency in
estimation. The motivating application stems from a study of implicit auditory
learning, in which typically developing (TD) children, and children with autism
spectrum disorder (ASD) were exposed to a continuous speech stream. Using the
proposed model, we examine differential band power dynamics as brain function
is interrogated throughout the duration of a computer-controlled experiment.
Our work offers a novel look at previous findings in psychiatry, and provides
further insights into the understanding of ASD. Our approach to inference is
fully Bayesian and implemented in a highly optimized Rcpp package
Modeling Protein Expression and Protein Signaling Pathways
High-throughput functional proteomic technologies provide a way to quantify the expression of proteins of interest. Statistical inference centers on identifying the activation state of proteins and their patterns of molecular interaction formalized as dependence structure. Inference on dependence structure is particularly important when proteins are selected because they are part of a common molecular pathway. In that case inference on dependence structure reveals properties of the underlying pathway. We propose a probability model that represents molecular interactions at the level of hidden binary latent variables that can be interpreted as indicators for active versus inactive states of the proteins. The proposed approach exploits available expert knowledge about the target pathway to define an informative prior on the hidden conditional dependence structure. An important feature of this prior is that it provides an instrument to explicitly anchor the model space to a set of interactions of interest, favoring a local search approach to model determination. We apply our model to reverse phase protein array data from a study on acute myeloid leukemia. Our inference identifies relevant sub-pathways in relation to the unfolding of the biological process under study
- …