14 research outputs found

    Probabilistic Structured Models for Plant Trait Analysis

    Get PDF
    University of Minnesota Ph.D. dissertation. March 2017. Major: Communication Sciences and Disorders. Advisor: Arindam Banerjee. 1 computer file (PDF); xii, 171 pages.Many fields in modern science and engineering such as ecology, computational biology, astronomy, signal processing, climate science, brain imaging, natural language processing, and many more involve collecting data sets in which the dimensionality of the data p exceeds the sample size n. Since it is usually impossible to obtain consistent procedures unless p < n, a line of recent work has studied models with various types of low-dimensional structure, including sparse vectors, sparse structured graphical models, low-rank matrices, and combinations thereof. In such settings, a general approach to estimation is to solve a regularized optimization problem, which combines a loss function measuring how well the model fits the data with some regularization function that encourages the assumed structure. Of particular interest are structure learning of graphical models in high dimensional setting. The majority of statistical analysis of graphical model estimations assume that all the data are fully observed and the data points are sampled from the same distribution and provide the sample complexity and convergence rate by considering only one graphical structure for all the observations. In this thesis, we extend the above results to estimate the structure of graphical models where the data is partially observed or the data is sampled from multiple distributions. First, we consider the problem of estimating change in the dependency structure of two p-dimensional models, based on samples drawn from two graphical models. The change is assumed to be structured, e.g., sparse, block sparse, node-perturbed sparse, etc., such that it can be characterized by a suitable (atomic) norm. We present and analyze a norm-regularized estimator for directly estimating the change in structure, without having to estimate the structures of the individual graphical models. Next, we consider the problem of estimating sparse structure of Gaussian copula distributions (corresponding to non-paranormal distributions) using samples with missing values. We prove that our proposed estimators consistently estimate the non-paranormal correlation matrix where the convergence rate depends on the probability of missing values. In the second part of thesis, we consider matrix completion problem. Low-rank matrix completion methods have been successful in a variety of settings such as recommendation systems. However, most of the existing matrix completion methods only provide a point estimate of missing entries, and do not characterize uncertainties of the predictions. First, we illustrate that the the posterior distribution in latent factor models, such as probabilistic matrix factorization, when marginalized over one latent factor has the Matrix Generalized Inverse Gaussian (MGIG) distribution. We show that the MGIG is unimodal, and the mode can be obtained by solving an Algebraic Riccati Equation equation. The characterization leads to a novel Collapsed Monte Carlo inference algorithm for such latent factor models. Next, we propose a Bayesian hierarchical probabilistic matrix factorization (BHPMF) model to 1) incorporate hierarchical side information, and 2) provide uncertainty quantified predictions. The former yields significant performance improvements in the problem of plant trait prediction, a key problem in ecology, by leveraging the taxonomic hierarchy in the plant kingdom. The latter is helpful in identifying predictions of low confidence which can in turn be used to guide field work for data collection efforts. Finally, we consider applications of probabilistic structured models to plant trait analysis. We apply BHPMF model to fill the gaps in TRY database. The BHPMF model is the-state-of-the-art model for plant trait prediction and is getting increasing visibility and usage in the plant trait analysis. We have submitted a R package for BHPMF to CRAN. Next, we apply the Gaussian graphical model structure estimators to obtain the trait-trait interactions. We study the trait-trait interactions structure at different climate zones and among different plant growth forms and uncover the dependence of traits on climate and on vegetation

    Extracting Signals and Graphical Models from Compressed Measurements

    Get PDF
    The thesis is to give an integrated approach to efficiently learn the interdependency relation among high dimensional signal components and reconstruct signals from observations collected in a linear sensing system, Broadly speaking, the research topics consists of three parts: (i) interdependency relation learning; (ii) sensing system design; and (iii) signal reconstruction. In the interdependency relation learning part, we considered both the parametric and non-parametric methods to learn the graphical structure under the noisy indirect measurements. In the sensing system design part, we introduced a density evolution framework to design sensing systems for compressive sensing for the first time. In the signal reconstruction part, we focused on the signal reconstruction with a given sensing system, which consists of three parts: signal reconstruction with inexact knowledge of the sensing system; signal reconstruction with the signal being contaminated by undesired noise; signal reconstruction with the signal belonging to a union of convex sets.Ph.D

    The Econometrics of Bayesian Graphical Models: A Review With Financial Application

    Get PDF
    Recent advances in empirical finance has shown that the adoption of network theory is critical to understand contagion and systemic vulnerabilities. While interdependencies among financial markets have been widely examined, only few studies review networks, however, they do not focus on the econometrics aspects. This paper presents a state-of-the-art review on the interface between statistics and econometrics in the inference and application of Bayesian graphical models. We specifically highlight the connections and possible applications of network models in financial econometrics, in the context of systemic risk

    The Econometrics of Bayesian Graphical Models: A Review With Financial Application

    Get PDF
    Recent advances in empirical finance has shown that the adoption of network theory is critical to understand contagion and systemic vulnerabilities. While interdependencies among financial markets have been widely examined, only few studies review networks, however, they do not focus on the econometrics aspects. This paper presents a state-of-the-art review on the interface between statistics and econometrics in the inference and application of Bayesian graphical models. We specifically highlight the connections and possible applications of network models in financial econometrics, in the context of systemic risk

    Human Guidance Behavior Decomposition and Modeling

    Get PDF
    University of Minnesota Ph.D. dissertation. December 2017. Major: Aerospace Engineering. Advisor: Berenice Mettler. 1 computer file (PDF); x, 128 pages.Trained humans are capable of high performance, adaptable, and robust first-person dynamic motion guidance behavior. This behavior is exhibited in a wide variety of activities such as driving, piloting aircraft, skiing, biking, and many others. Human performance in such activities far exceeds the current capability of autonomous systems in terms of adaptability to new tasks, real-time motion planning, robustness, and trading safety for performance. The present work investigates the structure of human dynamic motion guidance that enables these performance qualities. This work uses a first-person experimental framework that presents a driving task to the subject, measuring control inputs, vehicle motion, and operator visual gaze movement. The resulting data is decomposed into subspace segment clusters that form primitive elements of action-perception interactive behavior. Subspace clusters are defined by both agent-environment system dynamic constraints and operator control strategies. A key contribution of this work is to define transitions between subspace cluster segments, or subgoals, as points where the set of active constraints, either system or operator defined, changes. This definition provides necessary conditions to determine transition points for a given task-environment scenario that allow a solution trajectory to be planned from known behavior elements. In addition, human gaze behavior during this task contains predictive behavior elements, indicating that the identified control modes are internally modeled. Based on these ideas, a generative, autonomous guidance framework is introduced that efficiently generates optimal dynamic motion behavior in new tasks. The new subgoal planning algorithm is shown to generate solutions to certain tasks more quickly than existing approaches currently used in robotics

    Compute Faster and Learn Better: Model-based Nonconvex Optimization for Machine Learning

    Get PDF
    Nonconvex optimization naturally arises in many machine learning problems. Machine learning researchers exploit various nonconvex formulations to gain modeling flexibility, estimation robustness, adaptivity, and computational scalability. Although classical computational complexity theory has shown that solving nonconvex optimization is generally NP-hard in the worst case, practitioners have proposed numerous heuristic optimization algorithms, which achieve outstanding empirical performance in real-world applications. To bridge this gap between practice and theory, we propose a new generation of model-based optimization algorithms and theory, which incorporate the statistical thinking into modern optimization. Particularly, when designing practical computational algorithms, we take the underlying statistical models into consideration. Our novel algorithms exploit hidden geometric structures behind many nonconvex optimization problems, and can obtain global optima with the desired statistics properties in polynomial time with high probability

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available
    corecore