143,844 research outputs found
Learning Tuple Probabilities
Learning the parameters of complex probabilistic-relational models from
labeled training data is a standard technique in machine learning, which has
been intensively studied in the subfield of Statistical Relational Learning
(SRL), but---so far---this is still an under-investigated topic in the context
of Probabilistic Databases (PDBs). In this paper, we focus on learning the
probability values of base tuples in a PDB from labeled lineage formulas. The
resulting learning problem can be viewed as the inverse problem to confidence
computations in PDBs: given a set of labeled query answers, learn the
probability values of the base tuples, such that the marginal probabilities of
the query answers again yield in the assigned probability labels. We analyze
the learning problem from a theoretical perspective, cast it into an
optimization problem, and provide an algorithm based on stochastic gradient
descent. Finally, we conclude by an experimental evaluation on three real-world
and one synthetic dataset, thus comparing our approach to various techniques
from SRL, reasoning in information extraction, and optimization
Task-Driven Dictionary Learning
Modeling data with linear combinations of a few elements from a learned
dictionary has been the focus of much recent research in machine learning,
neuroscience and signal processing. For signals such as natural images that
admit such sparse representations, it is now well established that these models
are well suited to restoration tasks. In this context, learning the dictionary
amounts to solving a large-scale matrix factorization problem, which can be
done efficiently with classical optimization tools. The same approach has also
been used for learning features from data for other purposes, e.g., image
classification, but tuning the dictionary in a supervised way for these tasks
has proven to be more difficult. In this paper, we present a general
formulation for supervised dictionary learning adapted to a wide variety of
tasks, and present an efficient algorithm for solving the corresponding
optimization problem. Experiments on handwritten digit classification, digital
art identification, nonlinear inverse image problems, and compressed sensing
demonstrate that our approach is effective in large-scale settings, and is well
suited to supervised and semi-supervised classification, as well as regression
tasks for data that admit sparse representations.Comment: final draft post-refereein
Consistency analysis of bilevel data-driven learning in inverse problems
One fundamental problem when solving inverse problems is how to find
regularization parameters. This article considers solving this problem using
data-driven bilevel optimization, i.e. we consider the adaptive learning of the
regularization parameter from data by means of optimization. This approach can
be interpreted as solving an empirical risk minimization problem, and we
analyze its performance in the large data sample size limit for general
nonlinear problems. We demonstrate how to implement our framework on linear
inverse problems, where we can further show the inverse accuracy does not
depend on the ambient space dimension. To reduce the associated computational
cost, online numerical schemes are derived using the stochastic gradient
descent method. We prove convergence of these numerical schemes under suitable
assumptions on the forward problem. Numerical experiments are presented
illustrating the theoretical results and demonstrating the applicability and
efficiency of the proposed approaches for various linear and nonlinear inverse
problems, including Darcy flow, the eikonal equation, and an image denoising
example
Efficient Exploration of Microstructure-Property Spaces via Active Learning
In materials design, supervised learning plays an important role for optimization and inverse modeling of microstructure-property relations. To successfully apply supervised learning models, it is essential to train them on suitable data. Here, suitable means that the data covers the microstructure and property space sufficiently and, especially for optimization and inverse modeling, that the property space is explored broadly. For virtual materials design, typically data is generated by numerical simulations, which implies that data pairs can be sampled on demand at arbitrary locations in microstructure space. However, exploring the space of properties remains challenging. To tackle this problem, interactive learning techniques known as active learning can be applied. The present work is the first that investigates the applicability of the active learning strategy query-by-committee for an efficient property space exploration. Furthermore, an extension to active learning strategies is described, which prevents from exploring regions with properties out of scope (i.e., properties that are physically not meaningful or not reachable by manufacturing processes)
Inverse Parametric Optimization For Learning Utility Functions From Optimal and Satisficing Decisions
Inverse optimization is a method to determine optimization model parameters from observed decisions. Despite being a learning method, inverse optimization is not part of a data scientist's toolkit in practice,
especially as many general-purpose machine learning packages are widely available as an alternative. In this dissertation, we examine and remedy two aspects of inverse optimization that prevent it from becoming more used by practitioners. These aspects include the alternative-based approach in inverse optimization modeling and the assumption that observations should be optimal.
In the first part of the dissertation, we position inverse optimization as a learning method in analogy to supervised machine learning. The first part of this dissertation provides a starting point toward identifying the characteristics that make inverse optimization more efficient compared to general out-of-the-box supervised machine learning approaches, focusing on the problem of imputing the objective function of a parametric convex optimization problem.
The second part of this dissertation provides an attribute-based perspective to inverse optimization modeling. Inverse attribute-based optimization imputes the importance of the decision attributes that result in minimally suboptimal decisions instead of imputing the importance of decisions. This perspective expands the range of inverse optimization applicability. We demonstrate that it facilitates the application of inverse optimization in assortment optimization, where changing product selections is a defining feature and accurate predictions of demand are essential.
Finally, in the third part of the dissertation, we expand inverse parametric optimization to a more general setting where the assumption that the observations are optimal is relaxed to requiring only feasibility. The proposed inverse satisfaction method can deal with both feasible and minimally suboptimal solutions. We mathematically prove that the inverse satisfaction method provides statistically consistent estimates of the unknown parameters and can learn from both optimal and feasible decisions
Dictionary optimization for representing sparse signals using Rank-One Atom Decomposition (ROAD)
Dictionary learning has attracted growing research interest during recent years. As it is a bilinear inverse problem, one typical way to address this problem is to iteratively alternate between two stages: sparse coding and dictionary update. The general principle of the alternating approach is to fix one variable and optimize the other one. Unfortunately, for the alternating method, an ill-conditioned dictionary in the training process may not only introduce numerical instability but also trap the overall training process towards a singular point. Moreover, it leads to difficulty in analyzing its convergence, and few dictionary learning algorithms have been proved to have global convergence. For the other bilinear inverse problems, such as short-and-sparse deconvolution (SaSD) and convolutional dictionary learning (CDL), the alternating method is still a popular choice. As these bilinear inverse problems are also ill-posed and complicated, they are tricky to handle. Additional inner iterative methods are usually required for both of the updating stages, which aggravates the difficulty of analyzing the convergence of the whole learning process. It is also challenging to determine the number of iterations for each stage, as over-tuning any stage will trap the whole process into a local minimum that is far from the ground truth.
To mitigate the issues resulting from the alternating method, this thesis proposes a novel algorithm termed rank-one atom decomposition (ROAD), which intends to recast a bilinear inverse problem into an optimization problem with respect to a single variable, that is, a set of rank-one matrices. Therefore, the resulting algorithm is one stage, which minimizes the sparsity of the coefficients while keeping the data consistency constraint throughout the whole learning process. Inspired by recent advances in applying the alternating direction method of multipliers (ADMM) to nonconvex nonsmooth problems, an ADMM solver is adopted to address ROAD problems, and a lower bound of the penalty parameter is derived to guarantee a convergence in the augmented Lagrangian despite nonconvexity of the optimization formulation. Compared to two-stage dictionary learning methods, ROAD simplifies the learning process, eases the difficulty of analyzing convergence, and avoids the singular point issue. From a practical point of view, ROAD reduces the number of tuning parameters required in other benchmark algorithms. Numerical tests reveal that ROAD outperforms other benchmark algorithms in both synthetic data tests and single image super-resolution applications. In addition to dictionary learning, the ROAD formulation can also be extended to solve the SaSD and CDL problems. ROAD can still be employed to recast these problems into a one-variable optimization problem. Numerical tests illustrate that ROAD has better performance in estimating convolutional kernels compared to the latest SaSD and CDL algorithms.Open Acces
Learning Tuple Probabilities in Probabilistic Databases
Learning the parameters of complex probabilistic-relational models from labeled training data is a standard technique in machine learning, which has been intensively studied in the subfield of Statistical Relational Learning (SRL), but---so far---this is still an under-investigated topic in the context of Probabilistic Databases (PDBs). In this paper, we focus on learning the probability values of base tuples in a PDB from query answers, the latter of which are represented as labeled lineage formulas. Specifically, we consider labels in the form of pairs, each consisting of a Boolean lineage formula and a marginal probability that comes attached to the corresponding query answer. The resulting learning problem can be viewed as the inverse problem to confidence computations in PDBs: given a set of labeled query answers, learn the probability values of the base tuples, such that the marginal probabilities of the query answers again yield in the assigned probability labels. We analyze the learning problem from a theoretical perspective, devise two optimization-based objectives, and provide an efficient algorithm (based on Stochastic Gradient Descent) for solving these objectives. Finally, we conclude this work by an experimental evaluation on three real-world and one synthetic dataset, while competing with various techniques from SRL, reasoning in information extraction, and optimization
Learning Linear Programs: Inverse Optimization as a Form of Machine Learning
Conventionally, in an optimization problem, we aim to determine the values of the decision variables to minimize the objective function subject to the constraints. We refer to this problem as the forward optimization problem (FOP). In an inverse optimization (IO) problem, the goal is to determine the coefficients of an FOP such that an observed solution becomes an optimal solution of the learned FOP model.
In this dissertation, we focus on the inverse linear optimization problem whose FOP has the form of linear programming.
We adopt an interdisciplinary approach, leveraging concepts and methods from machine learning to address a very general form of this problem.
Firstly, we study the general form of the inverse linear optimization problem, that is, learning all model coefficients individually or jointly where the unknown model coefficients may or may not depend on exogenous parameters. We are the first to cast the IO problem as a form of deep learning and solve it with a gradient-based algorithm. To compute the gradients, we differentiate through the steps of an optimization process, in particular, the Barrier interior-point method. We develop new sets of benchmark instances and show good performance of our algorithm on different IO tasks: 1). learning a cost vector of linear program; 2). learning cost vector and constraints of a linear program jointly; 3). learning unknown parameters in the objective and constraints of a parametric linear program.
To the best of our knowledge, this algorithm is the first IO approach in the literature to be able to handle all three types of tasks.
Secondly, we formulate the inverse linear optimization problem as a bilevel optimization problem and explicitly encode constraints in the outer problem to ensure that observed solutions remain feasible with respect to the constraints. Again by leveraging a machine learning perspective on inverse linear optimization, we develop a general-purpose framework to solve the bilevel model with gradient-based algorithms. We investigate different methods for differentiating through an optimization problem and specialize them to LP. Additionally, we focus on an objective-based loss function and derive a closed-form expression for computing gradients.
Experimental results show that our framework is capable of solving synthetic parametric linear program and multi-commodity flow problem instances which could not be previously solved by methods in the IO literature. Additionally, we show that our closed-form expression is orders of magnitude faster than other methods for computing the gradients.
Finally, we focus on a special case of learning the objective only. We present four different methods for solving this problem, including three mathematical formulations and one general gradient-based algorithm. We test all four methods on synthetic parametric linear program and multi-commodity flow problem instances, and show that all four methods can successfully solve all experimental instances. All three mathematical models are solved in a commercial optimization solver and, due to their specialized nature, outperform the more generic gradient-based algorithm in runtime. Additionally, we show that the parametric linear programs learned from the KKT-based and strong duality-based formulations produce the best predictions on testing data.
In summary, the main contribution of this dissertation is the development of a general gradient-based framework which is able to solve problems that were previously not tackled in the IO literature. In addition, we extend and unify models for the special case of learning the objective only, which has been the focus of previous IO work. This dissertation unifies machine learning and inverse optimization both at the modelling and algorithmic levels
- …