Search CORE

115 research outputs found

Joint Clustering and Registration of Functional Data

Author: Telesca Donatello
Zhang Yafeng
Publication venue
Publication date: 01/01/2014
Field of study

Curve registration and clustering are fundamental tools in the analysis of functional data. While several methods have been developed and explored for either task individually, limited work has been done to infer functional clusters and register curves simultaneously. We propose a hierarchical model for joint curve clustering and registration. Our proposal combines a Dirichlet process mixture model for clustering of common shapes, with a reproducing kernel representation of phase variability for registration. We show how inference can be carried out applying standard posterior simulation algorithms and compare our method to several alternatives in both engineered data and a benchmark analysis of the Berkeley growth data. We conclude our investigation with an application to time course gene expression

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

Bayesian Inference for Latent Biologic Structure with Determinantal Point Processes (DPP)

Author: Mueller Peter
Telesca Donatello
Xu Yanxun
Publication venue: 'Wiley'
Publication date: 16/11/2015
Field of study

We discuss the use of the determinantal point process (DPP) as a prior for latent structure in biomedical applications, where inference often centers on the interpretation of latent features as biologically or clinically meaningful structure. Typical examples include mixture models, when the terms of the mixture are meant to represent clinically meaningful subpopulations (of patients, genes, etc.). Another class of examples are feature allocation models. We propose the DPP prior as a repulsive prior on latent mixture components in the first example, and as prior on feature-specific parameters in the second case. We argue that the DPP is in general an attractive prior model for latent structure when biologically relevant interpretation of such structure is desired. We illustrate the advantages of DPP prior in three case studies, including inference in mixture models for magnetic resonance images (MRI) and for protein expression, and a feature allocation model for gene expression using data from The Cancer Genome Atlas. An important part of our argument are efficient and straightforward posterior simulation methods. We implement a variation of reversible jump Markov chain Monte Carlo simulation for inference under the DPP prior, using a density with respect to the unit rate Poisson process

arXiv.org e-Print Archive

PubMed Central

eScholarship - University of California

Comparison of Clustering Methods for Time Course Genomic Data: Applications to Aging Effects

Author: Horvath Steve
Ophoff Roel
Telesca Donatello
Zhang Yafeng
Publication venue
Publication date: 29/04/2014
Field of study

Time course microarray data provide insight about dynamic biological processes. While several clustering methods have been proposed for the analysis of these data structures, comparison and selection of appropriate clustering methods are seldom discussed. We compared

3

probabilistic based clustering methods and

3

distance based clustering methods for time course microarray data. Among probabilistic methods, we considered: smoothing spline clustering also known as model based functional data analysis (MFDA), functional clustering models for sparsely sampled data (FCM) and model-based clustering (MCLUST). Among distance based methods, we considered: weighted gene co-expression network analysis (WGCNA), clustering with dynamic time warping distance (DTW) and clustering with autocorrelation based distance (ACF). We studied these algorithms in both simulated settings and case study data. Our investigations showed that FCM performed very well when gene curves were short and sparse. DTW and WGCNA performed well when gene curves were medium or long (

>=10

observations). SSC performed very well when there were clusters of gene curves similar to one another. Overall, ACF performed poorly in these applications. In terms of computation time, FCM, SSC and DTW were considerably slower than MCLUST and WGCNA. WGCNA outperformed MCLUST by generating more accurate and biological meaningful clustering results. WGCNA and MCLUST are the best methods among the 6 methods compared, when performance and computation time are both taken into account. WGCNA outperforms MCLUST, but MCLUST provides model based inference and uncertainty measure of clustering results

arXiv.org e-Print Archive

eScholarship - University of California

Modeling dependent gene expression

Author: Freedman Ralph S.
Müller Peter
Parmigiani Giovanni
Telesca Donatello
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 28/06/2012
Field of study

In this paper we propose a Bayesian approach for inference about dependence of high throughput gene expression. Our goals are to use prior knowledge about pathways to anchor inference about dependence among genes; to account for this dependence while making inferences about differences in mean expression across phenotypes; and to explore differences in the dependence itself across phenotypes. Useful features of the proposed approach are a model-based parsimonious representation of expression as an ordinal outcome, a novel and flexible representation of prior information on the nature of dependencies, and the use of a coherent probability model over both the structure and strength of the dependencies of interest. We evaluate our approach through simulations and in the analysis of data on expression of genes in the Complement and Coagulation Cascade pathway in ovarian cancer.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS525 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Modeling Criminal Careers as Departures from a Unimodal Population Age-Crime Curve: The Case of Marijuana Use

Author: Erosheva Elena
Kreager Derek
Matsueda Ross
Telesca Donatello
Publication venue: Collection of Biostatistics Research Archive
Publication date: 10/12/2011
Field of study

A major aim of longitudinal analyses of life course data is to describe the within- and between-individual variability in a behavioral outcome, such as crime. Statistical analyses of such data typically draw on mixture and mixed-effects growth models. In this work, we present a functional analytic point of view and develop an alternative method that models individual crime trajectories as departures from a population age-crime curve. Drawing on empirical and theoretical claims in criminology, we assume a unimodal population age-crime curve and allow individual expected crime trajectories to differ by their levels of offending and patterns of temporal misalignment. We extend Bayesian hierarchical curve registration methods to accommodate count data and to incorporate influence of baseline covariates on individual behavioral trajectories. Analyzing self-reported counts of yearly marijuana use from the Denver Youth Survey, we examine the influence of race and gender categories on differences in levels and timing of marijuana smoking. We find that our approach offers a flexible and realistic model for longitudinal crime trajectories that fits individual observations well and allows for a rich array of inferences of interest to criminologists and drug abuse researchers

Collection Of Biostatistics Research Archive

Recommended from our members

Has Toxicity Testing Moved into the 21st Century? A Survey and Analysis of Perceptions in the Field of Toxicology.

Author: Allard Patrick
Beryt Elizabeth
Doherty Joseph
Malloy Timothy
Parodi Daniela
Telesca Donatello
Zaunbrecher Virginia
Publication venue: eScholarship, University of California
Publication date: 01/08/2017
Field of study

BackgroundTen years ago, leaders in the field of toxicology called for a transformation of the discipline and a shift from primarily relying on traditional animal testing to incorporating advances in biotechnology and predictive methodologies into alternative testing strategies (ATS). Governmental agencies and academic and industry partners initiated programs to support such a transformation, but a decade later, the outcomes of these efforts are not well understood.ObjectivesWe aimed to assess the use of ATS and the perceived barriers and drivers to their adoption by toxicologists and by others working in, or closely linked with, the field of toxicology.MethodsWe surveyed 1,381 toxicologists and experts in associated fields regarding the viability and use of ATS and the perceived barriers and drivers of ATS for a range of applications. We performed ranking, hierarchical clustering, and correlation analyses of the survey data.ResultsMany respondents indicated that they were already using ATS, or believed that ATS were already viable approaches, for toxicological assessment of one or more end points in their primary area of interest or concern (26-86%, depending on the specific ATS/application pair). However, the proportions of respondents reporting use of ATS in the previous 12 mo were smaller (4.5-41%). Concern about regulatory acceptance was the most commonly cited factor inhibiting the adoption of ATS, and a variety of technical concerns were also cited as significant barriers to ATS viability. The factors most often cited as playing a significant role (currently or in the future) in driving the adoption of ATS were the need for expedited toxicology information, the need for reduced toxicity testing costs, demand by regulatory agencies, and ethical or moral concerns.ConclusionsOur findings indicate that the transformation of the field of toxicology is partly implemented, but significant barriers to acceptance and adoption remain. https://doi.org/10.1289/EHP1435

eScholarship - University of California

Region-Referenced Spectral Power Dynamics of EEG Signals: A Hierarchical Modeling Approach

Author: DiStefano Charlotte
Jeste Shafali
Li Qian
Senturk Damla
Shamshoian John
Sugar Catherine
Telesca Donatello
Publication venue
Publication date: 22/07/2020
Field of study

Functional brain imaging through electroencephalography (EEG) relies upon the analysis and interpretation of high-dimensional, spatially organized time series. We propose to represent time-localized frequency domain characterizations of EEG data as region-referenced functional data. This representation is coupled with a hierarchical modeling approach to multivariate functional observations. Within this familiar setting, we discuss how several prior models relate to structural assumptions about multivariate covariance operators. An overarching modeling framework, based on infinite factorial decompositions, is finally proposed to balance flexibility and efficiency in estimation. The motivating application stems from a study of implicit auditory learning, in which typically developing (TD) children, and children with autism spectrum disorder (ASD) were exposed to a continuous speech stream. Using the proposed model, we examine differential band power dynamics as brain function is interrogated throughout the duration of a computer-controlled experiment. Our work offers a novel look at previous findings in psychiatry, and provides further insights into the understanding of ASD. Our approach to inference is fully Bayesian and implemented in a highly optimized Rcpp package

arXiv.org e-Print Archive

eScholarship - University of California

Modeling Dependent Gene Expression

Author: Freedman Ralph S.
Muller Peter
Parmigiani Giovanni
Telesca Donatello
Publication venue: Collection of Biostatistics Research Archive
Publication date: 11/02/2010
Field of study

Collection Of Biostatistics Research Archive

Modeling Protein Expression and Protein Signaling Pathways

Author: Ji Yuan
Kornblau Steven
Muller Peter
Suchard Marc
Telesca Donatello
Publication venue: Collection of Biostatistics Research Archive
Publication date: 01/01/2011
Field of study

High-throughput functional proteomic technologies provide a way to quantify the expression of proteins of interest. Statistical inference centers on identifying the activation state of proteins and their patterns of molecular interaction formalized as dependence structure. Inference on dependence structure is particularly important when proteins are selected because they are part of a common molecular pathway. In that case inference on dependence structure reveals properties of the underlying pathway. We propose a probability model that represents molecular interactions at the level of hidden binary latent variables that can be interpreted as indicators for active versus inactive states of the proteins. The proposed approach exploits available expert knowledge about the target pathway to define an informative prior on the hidden conditional dependence structure. An important feature of this prior is that it provides an instrument to explicitly anchor the model space to a set of interactions of interest, favoring a local search approach to model determination. We apply our model to reverse phase protein array data from a study on acute myeloid leukemia. Our inference identifies relevant sub-pathways in relation to the unfolding of the biological process under study

Crossref

PubMed Central

Collection Of Biostatistics Research Archive