15,292 research outputs found

    Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures

    Get PDF
    Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture (DPM) models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model-based clustering methods have been to short time series data. In this paper, we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of high-dimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a DPM model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data

    Efficient Correlated Topic Modeling with Topic Embedding

    Full text link
    Correlated topic modeling has been limited to small model and problem sizes due to their high computational cost and poor scaling. In this paper, we propose a new model which learns compact topic embeddings and captures topic correlations through the closeness between the topic vectors. Our method enables efficient inference in the low-dimensional embedding space, reducing previous cubic or quadratic time complexity to linear w.r.t the topic size. We further speedup variational inference with a fast sampler to exploit sparsity of topic occurrence. Extensive experiments show that our approach is capable of handling model and data scales which are several orders of magnitude larger than existing correlation results, without sacrificing modeling quality by providing competitive or superior performance in document classification and retrieval.Comment: KDD 2017 oral. The first two authors contributed equall

    Multivariate adaptive regression splines for estimating riverine constituent concentrations

    Get PDF
    Regression-based methods are commonly used for riverine constituent concentration/flux estimation, which is essential for guiding water quality protection practices and environmental decision making. This paper developed a multivariate adaptive regression splines model for estimating riverine constituent concentrations (MARS-EC). The process, interpretability and flexibility of the MARS-EC modelling approach, was demonstrated for total nitrogen in the Patuxent River, a major river input to Chesapeake Bay. Model accuracy and uncertainty of the MARS-EC approach was further analysed using nitrate plus nitrite datasets from eight tributary rivers to Chesapeake Bay. Results showed that the MARS-EC approach integrated the advantages of both parametric and nonparametric regression methods, and model accuracy was demonstrated to be superior to the traditionally used ESTIMATOR model. MARS-EC is flexible and allows consideration of auxiliary variables; the variables and interactions can be selected automatically. MARS-EC does not constrain concentration-predictor curves to be constant but rather is able to identify shifts in these curves from mathematical expressions and visual graphics. The MARS-EC approach provides an effective and complementary tool along with existing approaches for estimating riverine constituent concentrations

    Regularized parametric system identification: a decision-theoretic formulation

    Full text link
    Parametric prediction error methods constitute a classical approach to the identification of linear dynamic systems with excellent large-sample properties. A more recent regularized approach, inspired by machine learning and Bayesian methods, has also gained attention. Methods based on this approach estimate the system impulse response with excellent small-sample properties. In several applications, however, it is desirable to obtain a compact representation of the system in the form of a parametric model. By viewing the identification of such models as a decision, we develop a decision-theoretic formulation of the parametric system identification problem that bridges the gap between the classical and regularized approaches above. Using the output-error model class as an illustration, we show that this decision-theoretic approach leads to a regularized method that is robust to small sample-sizes as well as overparameterization.Comment: 10 pages, 8 figure

    Modeling Dynamic Functional Connectivity with Latent Factor Gaussian Processes

    Get PDF
    Dynamic functional connectivity, as measured by the time-varying covariance of neurological signals, is believed to play an important role in many aspects of cognition. While many methods have been proposed, reliably establishing the presence and characteristics of brain connectivity is challenging due to the high dimensionality and noisiness of neuroimaging data. We present a latent factor Gaussian process model which addresses these challenges by learning a parsimonious representation of connectivity dynamics. The proposed model naturally allows for inference and visualization of time-varying connectivity. As an illustration of the scientific utility of the model, application to a data set of rat local field potential activity recorded during a complex non-spatial memory task provides evidence of stimuli differentiation

    Massive Science with VO and Grids

    Full text link
    There is a growing need for massive computational resources for the analysis of new astronomical datasets. To tackle this problem, we present here our first steps towards marrying two new and emerging technologies; the Virtual Observatory (e.g, AstroGrid) and the computational grid (e.g. TeraGrid, COSMOS etc.). We discuss the construction of VOTechBroker, which is a modular software tool designed to abstract the tasks of submission and management of a large number of computational jobs to a distributed computer system. The broker will also interact with the AstroGrid workflow and MySpace environments. We discuss our planned usages of the VOTechBroker in computing a huge number of n-point correlation functions from the SDSS data and massive model-fitting of millions of CMBfast models to WMAP data. We also discuss other applications including the determination of the XMM Cluster Survey selection function and the construction of new WMAP maps.Comment: Invited talk at ADASSXV conference published as ASP Conference Series, Vol. XXX, 2005 C. Gabriel, C. Arviset, D. Ponz and E. Solano, eds. 9 page

    Feature discovery and visualization of robot mission data using convolutional autoencoders and Bayesian nonparametric topic models

    Full text link
    The gap between our ability to collect interesting data and our ability to analyze these data is growing at an unprecedented rate. Recent algorithmic attempts to fill this gap have employed unsupervised tools to discover structure in data. Some of the most successful approaches have used probabilistic models to uncover latent thematic structure in discrete data. Despite the success of these models on textual data, they have not generalized as well to image data, in part because of the spatial and temporal structure that may exist in an image stream. We introduce a novel unsupervised machine learning framework that incorporates the ability of convolutional autoencoders to discover features from images that directly encode spatial information, within a Bayesian nonparametric topic model that discovers meaningful latent patterns within discrete data. By using this hybrid framework, we overcome the fundamental dependency of traditional topic models on rigidly hand-coded data representations, while simultaneously encoding spatial dependency in our topics without adding model complexity. We apply this model to the motivating application of high-level scene understanding and mission summarization for exploratory marine robots. Our experiments on a seafloor dataset collected by a marine robot show that the proposed hybrid framework outperforms current state-of-the-art approaches on the task of unsupervised seafloor terrain characterization.Comment: 8 page
    • …
    corecore