22,809 research outputs found
Efficient Low-Rank Semidefinite Programming with Robust Loss Functions
In real-world applications, it is important for machine learning algorithms
to be robust against data outliers or corruptions. In this paper, we focus on
improving the robustness of a large class of learning algorithms that are
formulated as low-rank semi-definite programming (SDP) problems. Traditional
formulations use square loss, which is notorious for being sensitive to
outliers. We propose to replace this with more robust noise models, including
the -loss and other nonconvex losses. However, the resultant
optimization problem becomes difficult as the objective is no longer convex or
smooth. To alleviate this problem, we design an efficient algorithm based on
majorization-minimization. The crux is on constructing a good optimization
surrogate, and we show that this surrogate can be efficiently obtained by the
alternating direction method of multipliers (ADMM). By properly monitoring
ADMM's convergence, the proposed algorithm is empirically efficient and also
theoretically guaranteed to converge to a critical point. Extensive experiments
are performed on four machine learning applications using both synthetic and
real-world data sets. Results show that the proposed algorithm is not only fast
but also has better performance than the state-of-the-art.Comment: Preprint version. Final version is accepted to "IEEE Transactions on
Pattern Analysis and Machine Intelligence
Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment
In most machine learning training paradigms a fixed, often handcrafted, loss
function is assumed to be a good proxy for an underlying evaluation metric. In
this work we assess this assumption by meta-learning an adaptive loss function
to directly optimize the evaluation metric. We propose a sample efficient
reinforcement learning approach for adapting the loss dynamically during
training. We empirically show how this formulation improves performance by
simultaneously optimizing the evaluation metric and smoothing the loss
landscape. We verify our method in metric learning and classification
scenarios, showing considerable improvements over the state-of-the-art on a
diverse set of tasks. Importantly, our method is applicable to a wide range of
loss functions and evaluation metrics. Furthermore, the learned policies are
transferable across tasks and data, demonstrating the versatility of the
method.Comment: Accepted to ICML 201
Consistent Second-Order Conic Integer Programming for Learning Bayesian Networks
Bayesian Networks (BNs) represent conditional probability relations among a
set of random variables (nodes) in the form of a directed acyclic graph (DAG),
and have found diverse applications in knowledge discovery. We study the
problem of learning the sparse DAG structure of a BN from continuous
observational data. The central problem can be modeled as a mixed-integer
program with an objective function composed of a convex quadratic loss function
and a regularization penalty subject to linear constraints. The optimal
solution to this mathematical program is known to have desirable statistical
properties under certain conditions. However, the state-of-the-art optimization
solvers are not able to obtain provably optimal solutions to the existing
mathematical formulations for medium-size problems within reasonable
computational times. To address this difficulty, we tackle the problem from
both computational and statistical perspectives. On the one hand, we propose a
concrete early stopping criterion to terminate the branch-and-bound process in
order to obtain a near-optimal solution to the mixed-integer program, and
establish the consistency of this approximate solution. On the other hand, we
improve the existing formulations by replacing the linear "big-" constraints
that represent the relationship between the continuous and binary indicator
variables with second-order conic constraints. Our numerical results
demonstrate the effectiveness of the proposed approaches
A Survey on State Estimation Techniques and Challenges in Smart Distribution Systems
This paper presents a review of the literature on State Estimation (SE) in
power systems. While covering some works related to SE in transmission systems,
the main focus of this paper is Distribution System State Estimation (DSSE).
The paper discusses a few critical topics of DSSE, including mathematical
problem formulation, application of pseudo-measurements, metering instrument
placement, network topology issues, impacts of renewable penetration, and
cyber-security. Both conventional and modern data-driven and probabilistic
techniques have been reviewed. This paper can provide researchers and utility
engineers with insights into the technical achievements, barriers, and future
research directions of DSSE
Hardware-Aware Machine Learning: Modeling and Optimization
Recent breakthroughs in Deep Learning (DL) applications have made DL models a
key component in almost every modern computing system. The increased popularity
of DL applications deployed on a wide-spectrum of platforms have resulted in a
plethora of design challenges related to the constraints introduced by the
hardware itself. What is the latency or energy cost for an inference made by a
Deep Neural Network (DNN)? Is it possible to predict this latency or energy
consumption before a model is trained? If yes, how can machine learners take
advantage of these models to design the hardware-optimal DNN for deployment?
From lengthening battery life of mobile devices to reducing the runtime
requirements of DL models executing in the cloud, the answers to these
questions have drawn significant attention.
One cannot optimize what isn't properly modeled. Therefore, it is important
to understand the hardware efficiency of DL models during serving for making an
inference, before even training the model. This key observation has motivated
the use of predictive models to capture the hardware performance or energy
efficiency of DL applications. Furthermore, DL practitioners are challenged
with the task of designing the DNN model, i.e., of tuning the hyper-parameters
of the DNN architecture, while optimizing for both accuracy of the DL model and
its hardware efficiency. Therefore, state-of-the-art methodologies have
proposed hardware-aware hyper-parameter optimization techniques. In this paper,
we provide a comprehensive assessment of state-of-the-art work and selected
results on the hardware-aware modeling and optimization for DL applications. We
also highlight several open questions that are poised to give rise to novel
hardware-aware designs in the next few years, as DL applications continue to
significantly impact associated hardware systems and platforms.Comment: ICCAD'18 Invited Pape
Machine learning for cognitive networks : technology assessment and research challenges
The field of machine learning has made major strides over the last 20 years. This document summarizes the major problem formulations that the discipline has studied, then reviews three tasks in cognitive networking and briefly discusses how aspects of those tasks fit these formulations. After this, it discusses challenges for machine learning research raised by Knowledge Plane applications and closes with proposals for the evaluation of learning systems developed for these problems
Cautious NMPC with Gaussian Process Dynamics for Autonomous Miniature Race Cars
This paper presents an adaptive high performance control method for
autonomous miniature race cars. Racing dynamics are notoriously hard to model
from first principles, which is addressed by means of a cautious nonlinear
model predictive control (NMPC) approach that learns to improve its dynamics
model from data and safely increases racing performance. The approach makes use
of a Gaussian Process (GP) and takes residual model uncertainty into account
through a chance constrained formulation. We present a sparse GP approximation
with dynamically adjusting inducing inputs, enabling a real-time implementable
controller. The formulation is demonstrated in simulations, which show
significant improvement with respect to both lap time and constraint
satisfaction compared to an NMPC without model learning
Learning Features for Offline Handwritten Signature Verification using Deep Convolutional Neural Networks
Verifying the identity of a person using handwritten signatures is
challenging in the presence of skilled forgeries, where a forger has access to
a person's signature and deliberately attempt to imitate it. In offline
(static) signature verification, the dynamic information of the signature
writing process is lost, and it is difficult to design good feature extractors
that can distinguish genuine signatures and skilled forgeries. This reflects in
a relatively poor performance, with verification errors around 7% in the best
systems in the literature. To address both the difficulty of obtaining good
features, as well as improve system performance, we propose learning the
representations from signature images, in a Writer-Independent format, using
Convolutional Neural Networks. In particular, we propose a novel formulation of
the problem that includes knowledge of skilled forgeries from a subset of users
in the feature learning process, that aims to capture visual cues that
distinguish genuine signatures and forgeries regardless of the user. Extensive
experiments were conducted on four datasets: GPDS, MCYT, CEDAR and Brazilian
PUC-PR datasets. On GPDS-160, we obtained a large improvement in
state-of-the-art performance, achieving 1.72% Equal Error Rate, compared to
6.97% in the literature. We also verified that the features generalize beyond
the GPDS dataset, surpassing the state-of-the-art performance in the other
datasets, without requiring the representation to be fine-tuned to each
particular dataset
Feature-based tuning of simulated annealing applied to the curriculum-based course timetabling problem
We consider the university course timetabling problem, which is one of the
most studied problems in educational timetabling. In particular, we focus our
attention on the formulation known as the curriculum-based course timetabling
problem, which has been tackled by many researchers and for which there are
many available benchmarks.
The contribution of this paper is twofold. First, we propose an effective and
robust single-stage simulated annealing method for solving the problem.
Secondly, we design and apply an extensive and statistically-principled
methodology for the parameter tuning procedure. The outcome of this analysis is
a methodology for modeling the relationship between search method parameters
and instance features that allows us to set the parameters for unseen instances
on the basis of a simple inspection of the instance itself. Using this
methodology, our algorithm, despite its apparent simplicity, has been able to
achieve high quality results on a set of popular benchmarks.
A final contribution of the paper is a novel set of real-world instances,
which could be used as a benchmark for future comparison
Optimistic Robust Optimization With Applications To Machine Learning
Robust Optimization has traditionally taken a pessimistic, or worst-case
viewpoint of uncertainty which is motivated by a desire to find sets of optimal
policies that maintain feasibility under a variety of operating conditions. In
this paper, we explore an optimistic, or best-case view of uncertainty and show
that it can be a fruitful approach. We show that these techniques can be used
to address a wide variety of problems. First, we apply our methods in the
context of robust linear programming, providing a method for reducing
conservatism in intuitive ways that encode economically realistic modeling
assumptions. Second, we look at problems in machine learning and find that this
approach is strongly connected to the existing literature. Specifically, we
provide a new interpretation for popular sparsity inducing non-convex
regularization schemes. Additionally, we show that successful approaches for
dealing with outliers and noise can be interpreted as optimistic robust
optimization problems. Although many of the problems resulting from our
approach are non-convex, we find that DCA or DCA-like optimization approaches
can be intuitive and efficient
- …