774 research outputs found

    Doubly Stochastic Primal-Dual Coordinate Method for Bilinear Saddle-Point Problem

    Full text link
    We propose a doubly stochastic primal-dual coordinate optimization algorithm for empirical risk minimization, which can be formulated as a bilinear saddle-point problem. In each iteration, our method randomly samples a block of coordinates of the primal and dual solutions to update. The linear convergence of our method could be established in terms of 1) the distance from the current iterate to the optimal solution and 2) the primal-dual objective gap. We show that the proposed method has a lower overall complexity than existing coordinate methods when either the data matrix has a factorized structure or the proximal mapping on each block is computationally expensive, e.g., involving an eigenvalue decomposition. The efficiency of the proposed method is confirmed by empirical studies on several real applications, such as the multi-task large margin nearest neighbor problem

    One-Class Kernel Spectral Regression

    Full text link
    The paper introduces a new efficient nonlinear one-class classifier formulated as the Rayleigh quotient criterion optimisation. The method, operating in a reproducing kernel Hilbert space, minimises the scatter of target distribution along an optimal projection direction while at the same time keeping projections of positive observations distant from the mean of the negative class. We provide a graph embedding view of the problem which can then be solved efficiently using the spectral regression approach. In this sense, unlike previous similar methods which often require costly eigen-computations of dense matrices, the proposed approach casts the problem under consideration into a regression framework which is computationally more efficient. In particular, it is shown that the dominant complexity of the proposed method is the complexity of computing the kernel matrix. Additional appealing characteristics of the proposed one-class classifier are: 1-the ability to be trained in an incremental fashion (allowing for application in streaming data scenarios while also reducing the computational complexity in a non-streaming operation mode); 2-being unsupervised, but providing the option for refining the solution using negative training examples, when available; And last but not the least, 3-the use of the kernel trick which facilitates a nonlinear mapping of the data into a high-dimensional feature space to seek better solutions

    Online Machine Learning in Big Data Streams

    Full text link
    The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software architectures and efficient algorithms. The second one also imposes nontrivial theoretical restrictions on the modeling methods: In the data stream model, older data is no longer available to revise earlier suboptimal modeling decisions as the fresh data arrives. In this article, we provide an overview of distributed software architectures and libraries as well as machine learning models for online learning. We highlight the most important ideas for classification, regression, recommendation, and unsupervised modeling from streaming data, and we show how they are implemented in various distributed data stream processing systems. This article is a reference material and not a survey. We do not attempt to be comprehensive in describing all existing methods and solutions; rather, we give pointers to the most important resources in the field. All related sub-fields, online algorithms, online learning, and distributed data processing are hugely dominant in current research and development with conceptually new research results and software components emerging at the time of writing. In this article, we refer to several survey results, both for distributed data processing and for online machine learning. Compared to past surveys, our article is different because we discuss recommender systems in extended detail

    Stochastic Optimization for Machine Learning

    Full text link
    It has been found that stochastic algorithms often find good solutions much more rapidly than inherently-batch approaches. Indeed, a very useful rule of thumb is that often, when solving a machine learning problem, an iterative technique which relies on performing a very large number of relatively-inexpensive updates will often outperform one which performs a smaller number of much "smarter" but computationally-expensive updates. In this thesis, we will consider the application of stochastic algorithms to two of the most important machine learning problems. Part i is concerned with the supervised problem of binary classification using kernelized linear classifiers, for which the data have labels belonging to exactly two classes (e.g. "has cancer" or "doesn't have cancer"), and the learning problem is to find a linear classifier which is best at predicting the label. In Part ii, we will consider the unsupervised problem of Principal Component Analysis, for which the learning task is to find the directions which contain most of the variance of the data distribution. Our goal is to present stochastic algorithms for both problems which are, above all, practical--they work well on real-world data, in some cases better than all known competing algorithms. A secondary, but still very important, goal is to derive theoretical bounds on the performance of these algorithms which are at least competitive with, and often better than, those known for other approaches.Comment: PhD Thesi

    Model updating in structural dynamics: advanced parametrization, optimal regularization, and symmetry considerations

    Get PDF
    Numerical models are pervasive tools in science and engineering for simulation, design, and assessment of physical systems. In structural engineering, finite element (FE) models are extensively used to predict responses and estimate risk for built structures. While FE models attempt to exactly replicate the physics of their corresponding structures, discrepancies always exist between measured and model output responses. Discrepancies are related to aleatoric uncertainties, such as measurement noise, and epistemic uncertainties, such as modeling errors. Epistemic uncertainties indicate that the FE model may not fully represent the built structure, greatly limiting its utility for simulation and structural assessment. Model updating is used to reduce error between measurement and model-output responses through adjustment of uncertain FE model parameters, typically using data from structural vibration studies. However, the model updating problem is often ill-posed with more unknown parameters than available data, such that parameters cannot be uniquely inferred from the data. This dissertation focuses on two approaches to remedy ill-posedness in FE model updating: parametrization and regularization. Parametrization produces a reduced set of updating parameters to estimate, thereby improving posedness. An ideal parametrization should incorporate model uncertainties, effectively reduce errors, and use as few parameters as possible. This is a challenging task since a large number of candidate parametrizations are available in any model updating problem. To ameliorate this, three new parametrization techniques are proposed: improved parameter clustering with residual-based weighting, singular vector decomposition-based parametrization, and incremental reparametrization. All of these methods utilize local system sensitivity information, providing effective reduced-order parametrizations which incorporate FE model uncertainties. The other focus of this dissertation is regularization, which improves posedness by providing additional constraints on the updating problem, such as a minimum-norm parameter solution constraint. Optimal regularization is proposed for use in model updating to provide an optimal balance between residual reduction and parameter change minimization. This approach links computationally-efficient deterministic model updating with asymptotic Bayesian inference to provide regularization based on maximal model evidence. Estimates are also provided for uncertainties and model evidence, along with an interesting measure of parameter efficiency

    Managing Randomization in the Multi-Block Alternating Direction Method of Multipliers for Quadratic Optimization

    Full text link
    The Alternating Direction Method of Multipliers (ADMM) has gained a lot of attention for solving large-scale and objective-separable constrained optimization. However, the two-block variable structure of the ADMM still limits the practical computational efficiency of the method, because one big matrix factorization is needed at least once even for linear and convex quadratic programming. This drawback may be overcome by enforcing a multi-block structure of the decision variables in the original optimization problem. Unfortunately, the multi-block ADMM, with more than two blocks, is not guaranteed to be convergent. On the other hand, two positive developments have been made: first, if in each cyclic loop one randomly permutes the updating order of the multiple blocks, then the method converges in expectation for solving any system of linear equations with any number of blocks. Secondly, such a randomly permuted ADMM also works for equality-constrained convex quadratic programming even when the objective function is not separable. The goal of this paper is twofold. First, we add more randomness into the ADMM by developing a randomly assembled cyclic ADMM (RAC-ADMM) where the decision variables in each block are randomly assembled. We discuss the theoretical properties of RAC-ADMM and show when random assembling helps and when it hurts, and develop a criterion to guarantee that it converges almost surely. Secondly, using the theoretical guidance on RAC-ADMM, we conduct multiple numerical tests on solving both randomly generated and large-scale benchmark quadratic optimization problems, which include continuous, and binary graph-partition and quadratic assignment, and selected machine learning problems. Our numerical tests show that the RAC-ADMM, with a variable-grouping strategy, could significantly improve the computation efficiency on solving most quadratic optimization problems.Comment: Expanded and streamlined theoretical sections. Added comparisons with other multi-block ADMM variants. Updated Computational Studies Section on continuous problems -- reporting primal and dual residuals instead of objective value gap. Added selected machine learning problems (ElasticNet/Lasso and Support Vector Machine) to Computational Studies Sectio

    Online Metric-Weighted Linear Representations for Robust Visual Tracking

    Full text link
    In this paper, we propose a visual tracker based on a metric-weighted linear representation of appearance. In order to capture the interdependence of different feature dimensions, we develop two online distance metric learning methods using proximity comparison information and structured output learning. The learned metric is then incorporated into a linear representation of appearance. We show that online distance metric learning significantly improves the robustness of the tracker, especially on those sequences exhibiting drastic appearance changes. In order to bound growth in the number of training samples, we design a time-weighted reservoir sampling method. Moreover, we enable our tracker to automatically perform object identification during the process of object tracking, by introducing a collection of static template samples belonging to several object classes of interest. Object identification results for an entire video sequence are achieved by systematically combining the tracking information and visual recognition at each frame. Experimental results on challenging video sequences demonstrate the effectiveness of the method for both inter-frame tracking and object identification.Comment: 51 pages. Appearing in IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Real-time 3D scene description using Spheres, Cones and Cylinders

    Full text link
    The paper describes a novel real-time algorithm for finding 3D geometric primitives (cylinders, cones and spheres) from 3D range data. In its core, it performs a fast model fitting with a model update in constant time (O(1)) for each new data point added to the model. We use a three stage approach.The first step inspects 1.5D sub spaces, to find ellipses. The next stage uses these ellipses as input by examining their neighborhood structure to form sets of candidates for the 3D geometric primitives. Finally, candidate ellipses are fitted to the geometric primitives. The complexity for point processing is O(n); additional time of lower order is needed for working on significantly smaller amount of mid-level objects. This allows the approach to process 30 frames per second on Kinect depth data, which suggests this approach as a pre-processing step for 3D real-time higher level tasks in robotics, like tracking or feature based mapping.Comment: 8 Pages, 16th International Conference on Advanced Robotics (ICAR 2013). Montevideo, Uruguay, November 201

    Minimum Spectral Connectivity Projection Pursuit

    Full text link
    We study the problem of determining the optimal low dimensional projection for maximising the separability of a binary partition of an unlabelled dataset, as measured by spectral graph theory. This is achieved by finding projections which minimise the second eigenvalue of the graph Laplacian of the projected data, which corresponds to a non-convex, non-smooth optimisation problem. We show that the optimal univariate projection based on spectral connectivity converges to the vector normal to the maximum margin hyperplane through the data, as the scaling parameter is reduced to zero. This establishes a connection between connectivity as measured by spectral graph theory and maximal Euclidean separation. The computational cost associated with each eigen-problem is quadratic in the number of data. To mitigate this issue, we propose an approximation method using microclusters with provable approximation error bounds. Combining multiple binary partitions within a divisive hierarchical model allows us to construct clustering solutions admitting clusters with varying scales and lying within different subspaces. We evaluate the performance of the proposed method on a large collection of benchmark datasets and find that it compares favourably with existing methods for projection pursuit and dimension reduction for data clustering

    Methods for online voltage stability monitoring

    Get PDF
    Online voltage stability monitoring is the process of obtaining voltage stability information in real time for a given operating condition. The prediction should be fast and accurate so that control signals can be sent to appropriate locations quickly and effectively. One approach is to get the stability information directly from the phasor measurements obtained for operating conditions. This approach is simple and requires few computations. The methods proposed are based on Thyvenin equivalent of a system. The Thyvenin equivalent, according to the maximum power transfer theorem,gives the upper limit of the power transfer to a load bus. To get the Thyvenin equivalent we need at least two sets of phasor measurements. It is found that Thyvenin equivalent gives a highly optimistic approximation of power margin. The work done in this thesis compensates the optimistic prediction by applying reactive power availability information of the system.The accuracy of this approach is very high compared to Thevenin equivalent. The thesis also presents improvement on decision trees method for online voltage stability monitoring by attribute selection. The role of data mining approach such as decision tree is vital in using the available accurate measurement data in the power system. Also, it is very important to extract important data or attributes so that the tree is robust, reliable and easy to compute. Data mining itself offers information based (gain ratio), statistical (k-nearest neighbor), probabilistic (nayve Bayes) and others for attribute selection. There are analytical approaches in power systems which can characterize attributes as well. Can these attributes be used for attribute selection for decision trees? The hypothesis has been tested using the tangent vector information of attributes. The accuracy of the selected attributes on decision trees is very high. Attributes with higher sensitivity were found to be better indicators of voltage instability. Attribute selection will be very helpful when it comes to large systems with a huge volume of data
    • …
    corecore