2,699 research outputs found
Kernel Belief Propagation
We propose a nonparametric generalization of belief propagation, Kernel
Belief Propagation (KBP), for pairwise Markov random fields. Messages are
represented as functions in a reproducing kernel Hilbert space (RKHS), and
message updates are simple linear operations in the RKHS. KBP makes none of the
assumptions commonly required in classical BP algorithms: the variables need
not arise from a finite domain or a Gaussian distribution, nor must their
relations take any particular parametric form. Rather, the relations between
variables are represented implicitly, and are learned nonparametrically from
training data. KBP has the advantage that it may be used on any domain where
kernels are defined (Rd, strings, groups), even where explicit parametric
models are not known, or closed form expressions for the BP updates do not
exist. The computational cost of message updates in KBP is polynomial in the
training data size. We also propose a constant time approximate message update
procedure by representing messages using a small number of basis functions. In
experiments, we apply KBP to image denoising, depth prediction from still
images, and protein configuration prediction: KBP is faster than competing
classical and nonparametric approaches (by orders of magnitude, in some cases),
while providing significantly more accurate results
Recommended from our members
Optimization for Probabilistic Machine Learning
We have access to great variety of datasets more than any time in the history. Everyday, more data is collected from various natural resources and digital platforms. Great advances in the area of machine learning research in the past few decades have relied strongly on availability of these datasets. However, analyzing them imposes significant challenges that are mainly due to two factors. First, the datasets have complex structures with hidden interdependencies. Second, most of the valuable datasets are high dimensional and are largely scaled. The main goal of a machine learning framework is to design a model that is a valid representative of the observations and develop a learning algorithm to make inference about unobserved or latent data based on the observations. Discovering hidden patterns and inferring latent characteristics in such datasets is one of the greatest challenges in the area of machine learning research. In this dissertation, I will investigate some of the challenges in modeling and algorithm design, and present my research results on how to overcome these obstacles.
Analyzing data generally involves two main stages. The first stage is designing a model that is flexible enough to capture complex variation and latent structures in data and is robust enough to generalize well to the unseen data. Designing an expressive and interpretable model is one of crucial objectives in this stage. The second stage involves training learning algorithm on the observed data and measuring the accuracy of model and learning algorithm. This stage usually involves an optimization problem whose objective is to tune the model to the training data and learn the model parameters. Finding global optimal or sufficiently good local optimal solution is one of the main challenges in this step.
Probabilistic models are one of the best known models for capturing data generating process and quantifying uncertainties in data using random variables and probability distributions. They are powerful models that are shown to be adaptive and robust and can scale well to large datasets. However, most probabilistic models have a complex structure. Training them could become challenging commonly due to the presence of intractable integrals in the calculation. To remedy this, they require approximate inference strategies that often results in non-convex optimization problems. The optimization part ensures that the model is the best representative of data or data generating process. The non-convexity of an optimization problem take away the general guarantee on finding a global optimal solution. It will be shown later in this dissertation that inference for a significant number of probabilistic models require solving a non-convex optimization problem.
One of the well-known methods for approximate inference in probabilistic modeling is variational inference. In the Bayesian setting, the target is to learn the true posterior distribution for model parameters given the observations and prior distributions. The main challenge involves marginalization of all the other variables in the model except for the variable of interest. This high-dimensional integral is generally computationally hard, and for many models there is no known polynomial time algorithm for calculating them exactly. Variational inference deals with finding an approximate posterior distribution for Bayesian models where finding the true posterior distribution is analytically or numerically impossible. It assumes a family of distribution for the estimation, and finds the closest member of that family to the true posterior distribution using a distance measure. For many models though, this technique requires solving a non-convex optimization problem that has no general guarantee on reaching a global optimal solution. This dissertation presents a convex relaxation technique for dealing with hardness of the optimization involved in the inference.
The proposed convex relaxation technique is based on semidefinite optimization that has a general applicability to polynomial optimization problem. I will present theoretical foundations and in-depth details of this relaxation in this work. Linear dynamical systems represent the functionality of many real-world physical systems. They can describe the dynamics of a linear time-varying observation which is controlled by a controller unit with quadratic cost function objectives. Designing distributed and decentralized controllers is the goal of many of these systems, which computationally, results in a non-convex optimization problem. In this dissertation, I will further investigate the issues arising in this area and develop a convex relaxation framework to deal with the optimization challenges.
Setting the correct number of model parameters is an important aspect for a good probabilistic model. If there are only a few parameters, model may lack capturing all the essential relations and components in the observations while too many parameters may cause significant complications in learning or overfit to the observations. Non-parametric models are suitable techniques to deal with this issue. They allow the model to learn the appropriate number of parameters to describe the data and make predictions. In this dissertation, I will present my work on designing Bayesian non-parametric models as powerful tools for learning representations of data. Moreover, I will describe the algorithm that we derived to efficiently train the model on the observations and learn the number of model parameters.
Later in this dissertation, I will present my works on designing probabilistic models in combination with deep learning methods for representing sequential data. Sequential datasets comprise a significant portion of resources in the area of machine learning research. Designing models to capture dependencies in sequential datasets are of great interest and have a wide variety of applications in engineering, medicine and statistics. Recent advances in deep learning research has shown exceptional promises in this area. However, they lack interpretability in their general form. To remedy this, I will present my work on mixing probabilistic models with neural network models that results in better performance and expressiveness of the results
A markov classification model for metabolic pathways
<p>Abstract</p> <p>Background</p> <p>This paper considers the problem of identifying pathways through metabolic networks that relate to a specific biological response. Our proposed model, HME3M, first identifies frequently traversed network paths using a Markov mixture model. Then by employing a hierarchical mixture of experts, separate classifiers are built using information specific to each path and combined into an ensemble prediction for the response.</p> <p>Results</p> <p>We compared the performance of HME3M with logistic regression and support vector machines (SVM) for both simulated pathways and on two metabolic networks, glycolysis and the pentose phosphate pathway for <it>Arabidopsis thaliana</it>. We use AltGenExpress microarray data and focus on the pathway differences in the developmental stages and stress responses of <it>Arabidopsis</it>. The results clearly show that HME3M outperformed the comparison methods in the presence of increasing network complexity and pathway noise. Furthermore an analysis of the paths identified by HME3M for each metabolic network confirmed known biological responses of <it>Arabidopsis</it>.</p> <p>Conclusions</p> <p>This paper clearly shows HME3M to be an accurate and robust method for classifying metabolic pathways. HME3M is shown to outperform all comparison methods and further is capable of identifying known biologically active pathways within microarray data.</p
Learning Instrumental Variables with Structural and Non-Gaussianity Assumptions
Learning a causal effect from observational data requires strong assumptions. One possible method is to use instrumental variables, which are typically justified by background knowledge. It is possible, under further assumptions, to discover whether a variable is structurally instrumental to a target causal effect X→YX→Y. However, the few existing approaches are lacking on how general these assumptions can be, and how to express possible equivalence classes of solutions. We present instrumental variable discovery methods that systematically characterize which set of causal effects can and cannot be discovered under local graphical criteria that define instrumental variables, without reconstructing full causal graphs. We also introduce the first methods to exploit non-Gaussianity assumptions, highlighting identifiability problems and solutions. Due to the difficulty of estimating such models from finite data, we investigate how to strengthen assumptions in order to make the statistical problem more manageable
Optimised Method of Resource Allocation for Hadoop on Cloud
— Many case studies have proved that the data generated at industries and academia are growing rapidly, which are difficult to store using existing database system. Due to the usage of internet many applications are created and has helped many industries such as finance, health care etc, which are also the source of producing massive data. The smart grid is a technology which delivers energy in an optimal manner, phasor measurement unit (PMU) installed in smart grid is used to check the critical power paths and also generate massive sample data. Using parallel detrending fluctuation analysis algorithm (PDFA) fast detection of events from PMU samples are made. Storing and analyzing the events are made easy using MapReduce model, hadoop is an open source implemented MapReduce framework. Many cloud service providers (CSP) are extending their service for Hadoop which makes easy for user’s to run their hadoop application on cloud. The major task is, it is users responsibility to estimate the time and resources required to complete the job within deadlines. In this paper, machine learning techniquies such as local weighted linear regression and the parallel glowworm swarm optimization (GSO) algorithm are used to estimate the resource and job completion time
Recommended from our members
Constrained estimation via the fused lasso and some generalizations
This dissertation studies structurally constrained statistical estimators. Two entwined main themes are developed: computationally efficient algorithms, and strong statistical guarantees of estimators across a wide range of frameworks.
In the first chapter we discuss a unified view of optimization problems that enforces constrains, such as smoothness, in statistical inference. This in turn helps to incorporate spatial and/or temporal information about
data.
The second chapter studies the fused lasso, a non-parametric regression estimator commonly used for graph denoising. This has been widely used in applications where the graph structure indicates that neighbor nodes
have similar signal values. I prove for the fused lasso on arbitrary graphs, an upper bound on the mean squared error that depends on the total variation
of the underlying signal on the graph. Moreover, I provide a surrogate estimator that can be found in linear time and attains the same upper–bound.
In the third chapter I present an approach for penalized tensor decomposition (PTD) that estimates smoothly varying latent factors in multiway data. This generalizes existing work on sparse tensor decomposition
and penalized matrix decomposition, in a manner parallel to the generalized lasso for regression and smoothing problems. I present an efficient coordinate-wise optimization algorithm for PTD, and characterize its convergence properties.
The fourth chapter proposes histogram trend filtering, a novel approach for density estimation. This estimator arises from looking at surrogate Poisson model for counts of observations in a partition of the support
of the data.
The fifth chapter develops a class of estimators for deconvolution in mixture models based on a simple two-step bin-and-smooth procedure, applied to histogram counts. The method is both statistically and computationally efficient. By exploiting recent advances in convex optimization,we are able to provide a full deconvolution path that shows the estimate for
the mixing distribution across a range of plausible degrees of smoothness, at far less cost than a full Bayesian analysis.
Finally, the sixth chapter summarizes my contributions and provides possible directions for future work.Statistic
Geometric deep learning: going beyond Euclidean data
Many scientific fields study data with an underlying structure that is a
non-Euclidean space. Some examples include social networks in computational
social sciences, sensor networks in communications, functional networks in
brain imaging, regulatory networks in genetics, and meshed surfaces in computer
graphics. In many applications, such geometric data are large and complex (in
the case of social networks, on the scale of billions), and are natural targets
for machine learning techniques. In particular, we would like to use deep
neural networks, which have recently proven to be powerful tools for a broad
range of problems from computer vision, natural language processing, and audio
analysis. However, these tools have been most successful on data with an
underlying Euclidean or grid-like structure, and in cases where the invariances
of these structures are built into networks used to model them. Geometric deep
learning is an umbrella term for emerging techniques attempting to generalize
(structured) deep neural models to non-Euclidean domains such as graphs and
manifolds. The purpose of this paper is to overview different examples of
geometric deep learning problems and present available solutions, key
difficulties, applications, and future research directions in this nascent
field
Recommended from our members
Algebraic Statistics
Algebraic Statistics is concerned with the interplay of techniques from commutative algebra, combinatorics, (real) algebraic geometry, and related fields with problems arising in statistics and data science. This workshop was the first at Oberwolfach dedicated to this emerging subject area. The participants highlighted recent achievements in this field, explored exciting new applications, and mapped out future directions for research
- …