774 research outputs found
Doubly Stochastic Primal-Dual Coordinate Method for Bilinear Saddle-Point Problem
We propose a doubly stochastic primal-dual coordinate optimization algorithm
for empirical risk minimization, which can be formulated as a bilinear
saddle-point problem. In each iteration, our method randomly samples a block of
coordinates of the primal and dual solutions to update. The linear convergence
of our method could be established in terms of 1) the distance from the current
iterate to the optimal solution and 2) the primal-dual objective gap. We show
that the proposed method has a lower overall complexity than existing
coordinate methods when either the data matrix has a factorized structure or
the proximal mapping on each block is computationally expensive, e.g.,
involving an eigenvalue decomposition. The efficiency of the proposed method is
confirmed by empirical studies on several real applications, such as the
multi-task large margin nearest neighbor problem
One-Class Kernel Spectral Regression
The paper introduces a new efficient nonlinear one-class classifier
formulated as the Rayleigh quotient criterion optimisation. The method,
operating in a reproducing kernel Hilbert space, minimises the scatter of
target distribution along an optimal projection direction while at the same
time keeping projections of positive observations distant from the mean of the
negative class. We provide a graph embedding view of the problem which can then
be solved efficiently using the spectral regression approach. In this sense,
unlike previous similar methods which often require costly eigen-computations
of dense matrices, the proposed approach casts the problem under consideration
into a regression framework which is computationally more efficient. In
particular, it is shown that the dominant complexity of the proposed method is
the complexity of computing the kernel matrix. Additional appealing
characteristics of the proposed one-class classifier are: 1-the ability to be
trained in an incremental fashion (allowing for application in streaming data
scenarios while also reducing the computational complexity in a non-streaming
operation mode); 2-being unsupervised, but providing the option for refining
the solution using negative training examples, when available; And last but not
the least, 3-the use of the kernel trick which facilitates a nonlinear mapping
of the data into a high-dimensional feature space to seek better solutions
Online Machine Learning in Big Data Streams
The area of online machine learning in big data streams covers algorithms
that are (1) distributed and (2) work from data streams with only a limited
possibility to store past data. The first requirement mostly concerns software
architectures and efficient algorithms. The second one also imposes nontrivial
theoretical restrictions on the modeling methods: In the data stream model,
older data is no longer available to revise earlier suboptimal modeling
decisions as the fresh data arrives.
In this article, we provide an overview of distributed software architectures
and libraries as well as machine learning models for online learning. We
highlight the most important ideas for classification, regression,
recommendation, and unsupervised modeling from streaming data, and we show how
they are implemented in various distributed data stream processing systems.
This article is a reference material and not a survey. We do not attempt to
be comprehensive in describing all existing methods and solutions; rather, we
give pointers to the most important resources in the field. All related
sub-fields, online algorithms, online learning, and distributed data processing
are hugely dominant in current research and development with conceptually new
research results and software components emerging at the time of writing. In
this article, we refer to several survey results, both for distributed data
processing and for online machine learning. Compared to past surveys, our
article is different because we discuss recommender systems in extended detail
Stochastic Optimization for Machine Learning
It has been found that stochastic algorithms often find good solutions much
more rapidly than inherently-batch approaches. Indeed, a very useful rule of
thumb is that often, when solving a machine learning problem, an iterative
technique which relies on performing a very large number of
relatively-inexpensive updates will often outperform one which performs a
smaller number of much "smarter" but computationally-expensive updates.
In this thesis, we will consider the application of stochastic algorithms to
two of the most important machine learning problems. Part i is concerned with
the supervised problem of binary classification using kernelized linear
classifiers, for which the data have labels belonging to exactly two classes
(e.g. "has cancer" or "doesn't have cancer"), and the learning problem is to
find a linear classifier which is best at predicting the label. In Part ii, we
will consider the unsupervised problem of Principal Component Analysis, for
which the learning task is to find the directions which contain most of the
variance of the data distribution.
Our goal is to present stochastic algorithms for both problems which are,
above all, practical--they work well on real-world data, in some cases better
than all known competing algorithms. A secondary, but still very important,
goal is to derive theoretical bounds on the performance of these algorithms
which are at least competitive with, and often better than, those known for
other approaches.Comment: PhD Thesi
Model updating in structural dynamics: advanced parametrization, optimal regularization, and symmetry considerations
Numerical models are pervasive tools in science and engineering for simulation, design, and assessment of physical systems. In structural engineering, finite element (FE) models are extensively used to predict responses and estimate risk for built structures. While FE models attempt to exactly replicate the physics of their corresponding structures, discrepancies always exist between measured and model output responses. Discrepancies are related to aleatoric uncertainties, such as measurement noise, and epistemic uncertainties, such as modeling errors. Epistemic uncertainties indicate that the FE model may not fully represent the built structure, greatly limiting its utility for simulation and structural assessment. Model updating is used to reduce error between measurement and model-output responses through adjustment of uncertain FE model parameters, typically using data from structural vibration studies. However, the model updating problem is often ill-posed with more unknown parameters than available data, such that parameters cannot be uniquely inferred from the data.
This dissertation focuses on two approaches to remedy ill-posedness in FE model updating: parametrization and regularization. Parametrization produces a reduced set of updating parameters to estimate, thereby improving posedness. An ideal parametrization should incorporate model uncertainties, effectively reduce errors, and use as few parameters as possible. This is a challenging task since a large number of candidate parametrizations are available in any model updating problem. To ameliorate this, three new parametrization techniques are proposed: improved parameter clustering with residual-based weighting, singular vector decomposition-based parametrization, and incremental reparametrization. All of these methods utilize local system sensitivity information, providing effective reduced-order parametrizations which incorporate FE model uncertainties.
The other focus of this dissertation is regularization, which improves posedness by providing additional constraints on the updating problem, such as a minimum-norm parameter solution constraint. Optimal regularization is proposed for use in model updating to provide an optimal balance between residual reduction and parameter change minimization. This approach links computationally-efficient deterministic model updating with asymptotic Bayesian inference to provide regularization based on maximal model evidence. Estimates are also provided for uncertainties and model evidence, along with an interesting measure of parameter efficiency
Managing Randomization in the Multi-Block Alternating Direction Method of Multipliers for Quadratic Optimization
The Alternating Direction Method of Multipliers (ADMM) has gained a lot of
attention for solving large-scale and objective-separable constrained
optimization. However, the two-block variable structure of the ADMM still
limits the practical computational efficiency of the method, because one big
matrix factorization is needed at least once even for linear and convex
quadratic programming. This drawback may be overcome by enforcing a multi-block
structure of the decision variables in the original optimization problem.
Unfortunately, the multi-block ADMM, with more than two blocks, is not
guaranteed to be convergent. On the other hand, two positive developments have
been made: first, if in each cyclic loop one randomly permutes the updating
order of the multiple blocks, then the method converges in expectation for
solving any system of linear equations with any number of blocks. Secondly,
such a randomly permuted ADMM also works for equality-constrained convex
quadratic programming even when the objective function is not separable. The
goal of this paper is twofold. First, we add more randomness into the ADMM by
developing a randomly assembled cyclic ADMM (RAC-ADMM) where the decision
variables in each block are randomly assembled. We discuss the theoretical
properties of RAC-ADMM and show when random assembling helps and when it hurts,
and develop a criterion to guarantee that it converges almost surely. Secondly,
using the theoretical guidance on RAC-ADMM, we conduct multiple numerical tests
on solving both randomly generated and large-scale benchmark quadratic
optimization problems, which include continuous, and binary graph-partition and
quadratic assignment, and selected machine learning problems. Our numerical
tests show that the RAC-ADMM, with a variable-grouping strategy, could
significantly improve the computation efficiency on solving most quadratic
optimization problems.Comment: Expanded and streamlined theoretical sections. Added comparisons with
other multi-block ADMM variants. Updated Computational Studies Section on
continuous problems -- reporting primal and dual residuals instead of
objective value gap. Added selected machine learning problems
(ElasticNet/Lasso and Support Vector Machine) to Computational Studies
Sectio
Online Metric-Weighted Linear Representations for Robust Visual Tracking
In this paper, we propose a visual tracker based on a metric-weighted linear
representation of appearance. In order to capture the interdependence of
different feature dimensions, we develop two online distance metric learning
methods using proximity comparison information and structured output learning.
The learned metric is then incorporated into a linear representation of
appearance.
We show that online distance metric learning significantly improves the
robustness of the tracker, especially on those sequences exhibiting drastic
appearance changes. In order to bound growth in the number of training samples,
we design a time-weighted reservoir sampling method.
Moreover, we enable our tracker to automatically perform object
identification during the process of object tracking, by introducing a
collection of static template samples belonging to several object classes of
interest. Object identification results for an entire video sequence are
achieved by systematically combining the tracking information and visual
recognition at each frame. Experimental results on challenging video sequences
demonstrate the effectiveness of the method for both inter-frame tracking and
object identification.Comment: 51 pages. Appearing in IEEE Transactions on Pattern Analysis and
Machine Intelligenc
Real-time 3D scene description using Spheres, Cones and Cylinders
The paper describes a novel real-time algorithm for finding 3D geometric
primitives (cylinders, cones and spheres) from 3D range data. In its core, it
performs a fast model fitting with a model update in constant time (O(1)) for
each new data point added to the model. We use a three stage approach.The first
step inspects 1.5D sub spaces, to find ellipses. The next stage uses these
ellipses as input by examining their neighborhood structure to form sets of
candidates for the 3D geometric primitives. Finally, candidate ellipses are
fitted to the geometric primitives. The complexity for point processing is
O(n); additional time of lower order is needed for working on significantly
smaller amount of mid-level objects. This allows the approach to process 30
frames per second on Kinect depth data, which suggests this approach as a
pre-processing step for 3D real-time higher level tasks in robotics, like
tracking or feature based mapping.Comment: 8 Pages, 16th International Conference on Advanced Robotics (ICAR
2013). Montevideo, Uruguay, November 201
Minimum Spectral Connectivity Projection Pursuit
We study the problem of determining the optimal low dimensional projection
for maximising the separability of a binary partition of an unlabelled dataset,
as measured by spectral graph theory. This is achieved by finding projections
which minimise the second eigenvalue of the graph Laplacian of the projected
data, which corresponds to a non-convex, non-smooth optimisation problem. We
show that the optimal univariate projection based on spectral connectivity
converges to the vector normal to the maximum margin hyperplane through the
data, as the scaling parameter is reduced to zero. This establishes a
connection between connectivity as measured by spectral graph theory and
maximal Euclidean separation. The computational cost associated with each
eigen-problem is quadratic in the number of data. To mitigate this issue, we
propose an approximation method using microclusters with provable approximation
error bounds. Combining multiple binary partitions within a divisive
hierarchical model allows us to construct clustering solutions admitting
clusters with varying scales and lying within different subspaces. We evaluate
the performance of the proposed method on a large collection of benchmark
datasets and find that it compares favourably with existing methods for
projection pursuit and dimension reduction for data clustering
Methods for online voltage stability monitoring
Online voltage stability monitoring is the process of obtaining voltage stability information in real time for a given operating condition. The prediction should be fast and accurate so that control signals can be sent to appropriate locations quickly and effectively. One approach is to get the stability information directly from the phasor measurements obtained for operating conditions. This approach is simple and requires few computations. The methods proposed are based on Thyvenin equivalent of a system. The Thyvenin equivalent, according to the maximum power transfer theorem,gives the upper limit of the power transfer to a load bus. To get the Thyvenin equivalent we need at least two sets of phasor measurements. It is found that Thyvenin equivalent gives a highly optimistic approximation of power margin. The work done in this thesis compensates the optimistic prediction by applying reactive power availability information of the system.The accuracy of this approach is very high compared to Thevenin equivalent. The thesis also presents improvement on decision trees method for online voltage stability monitoring by attribute selection. The role of data mining approach such as decision tree is vital in using the available accurate measurement data in the power system. Also, it is very important to extract important data or attributes so that the tree is robust, reliable and easy to compute. Data mining itself offers information based (gain ratio), statistical (k-nearest neighbor), probabilistic (nayve Bayes) and others for attribute selection. There are analytical approaches in power systems which can characterize attributes as well. Can these attributes be used for attribute selection for decision trees? The hypothesis has been tested using the tangent vector information of attributes. The accuracy of the selected attributes on decision trees is very high. Attributes with higher sensitivity were found to be better indicators of voltage instability. Attribute selection will be very helpful when it comes to large systems with a huge volume of data
- …