437 research outputs found

    Parametric Polynomial Time Perceptron Rescaling Algorithm

    No full text
    Let us consider a linear feasibility problem with a possibly infinite number of inequality constraints posed in an on-line setting: an algorithm suggests a candidate solution, and the oracle either confirms its feasibility, or outputs a violated constraint vector. This model can be solved by subgradient optimisation algorithms for non-smooth functions, also known as the perceptron algorithms in the machine learning community, and its solvability depends on the problem dimension and the radius of the constraint set. The classical perceptron algorithm may have an exponential complexity in the worst case when the radius is infinitesimal [1]. To overcome this difficulty, the space dilation technique was exploited in the ellipsoid algorithm to make its running time polynomial [3]. A special case of the space dilation, the rescaling procedure is utilised in the perceptron rescaling algorithm [2] with a probabilistic approach to choosing the direction of dilation. A parametric version of the perceptron rescaling algorithm is the focus of this work. It is demonstrated that some fixed parameters of the latter algorithm (the initial estimate of the radius and the relaxation parameter) may be modified and adapted for particular problems. The generalised theoretical framework allows to determine convergence of the algorithm with any chosen set of values of these parameters, and suggests a potential way of decreasing the complexity of the algorithm which remains the subject of current research

    Second-Order Kernel Online Convex Optimization with Adaptive Sketching

    Get PDF
    Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only O(t)\mathcal{O}(t) time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal O(T)\mathcal{O}(\sqrt{T}) regret. Nonetheless, many common losses in kernel problems, such as squared loss, logistic loss, and squared hinge loss posses stronger curvature that can be exploited. In this case, second-order KOCO methods achieve O(log(Det(K)))\mathcal{O}(\log(\text{Det}(\boldsymbol{K}))) regret, which we show scales as O(defflogT)\mathcal{O}(d_{\text{eff}}\log T), where deffd_{\text{eff}} is the effective dimension of the problem and is usually much smaller than O(T)\mathcal{O}(\sqrt{T}). The main drawback of second-order methods is their much higher O(t2)\mathcal{O}(t^2) space and time complexity. In this paper, we introduce kernel online Newton step (KONS), a new second-order KOCO method that also achieves O(defflogT)\mathcal{O}(d_{\text{eff}}\log T) regret. To address the computational complexity of second-order methods, we introduce a new matrix sketching algorithm for the kernel matrix Kt\boldsymbol{K}_t, and show that for a chosen parameter γ1\gamma \leq 1 our Sketched-KONS reduces the space and time complexity by a factor of γ2\gamma^2 to O(t2γ2)\mathcal{O}(t^2\gamma^2) space and time per iteration, while incurring only 1/γ1/\gamma times more regret

    Neural Networks

    Get PDF
    We present an overview of current research on artificial neural networks, emphasizing a statistical perspective. We view neural networks as parameterized graphs that make probabilistic assumptions about data, and view learning algorithms as methods for finding parameter values that look probable in the light of the data. We discuss basic issues in representation and learning, and treat some of the practical issues that arise in fitting networks to data. We also discuss links between neural networks and the general formalism of graphical models

    Kernel methods in machine learning

    Full text link
    We review machine learning methods employing positive definite kernels. These methods formulate learning and estimation problems in a reproducing kernel Hilbert space (RKHS) of functions defined on the data domain, expanded in terms of a kernel. Working in linear spaces of function has the benefit of facilitating the construction and analysis of learning algorithms while at the same time allowing large classes of functions. The latter include nonlinear functions as well as functions defined on nonvectorial data. We cover a wide range of methods, ranging from binary classifiers to sophisticated methods for estimation with structured data.Comment: Published in at http://dx.doi.org/10.1214/009053607000000677 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Shallow and the Deep:A biased introduction to neural networks and old school machine learning

    Get PDF
    The Shallow and the Deep is a collection of lecture notes that offers an accessible introduction to neural networks and machine learning in general. However, it was clear from the beginning that these notes would not be able to cover this rapidly changing and growing field in its entirety. The focus lies on classical machine learning techniques, with a bias towards classification and regression. Other learning paradigms and many recent developments in, for instance, Deep Learning are not addressed or only briefly touched upon.Biehl argues that having a solid knowledge of the foundations of the field is essential, especially for anyone who wants to explore the world of machine learning with an ambition that goes beyond the application of some software package to some data set. Therefore, The Shallow and the Deep places emphasis on fundamental concepts and theoretical background. This also involves delving into the history and pre-history of neural networks, where the foundations for most of the recent developments were laid. These notes aim to demystify machine learning and neural networks without losing the appreciation for their impressive power and versatility

    Recognition of transport means in GPS data using machine-learning methods

    Get PDF
    Bicycle transport is today one of the most important measures in urban traffic with a view to moving towards more sustainable mobility. Nowadays, smartphones are equipped with Global Positioning System (GPS), which allows cyclists, through smartphone applications, to record their own routes on a daily basis, which is very useful information for traffic and transport planners.The problem appears when there is invalid data due to errors in the measurement or in the GPS signal. The solution is transport mode recognition, which consists of classifying the different existing transport modes on the basis of a set of data. The emerging techniques of machine learning allow the development of very powerful models capable of recognizing means of transport with great effectiveness, based on other studies.Accordingly, this study aims to separate GPS bicycle tracks from the other modes studied (inner-city train (S-Bahn), walk, bike, tram, bus), also classifying the tracks of each means of transport separately. The key contribution of this study is the design and implementation of a machine learning model capable of classifying existing modes of transport in urban traffic in the city of Dresden in Germany.For this purpose, a cascading classifiers model was designed so that in each phase tracks belonging to a different mode are separated, studying in each phase which of the machine learning algorithms used (Decision Tree, Support Vector Machine and Neural Network) has the best performance. The GPS data was collected with the application for smartphone Cyface and from there it was carried out the structuring of data and calculation and selection of features that serve as inputs of the model.To separate inner-city train (S-Bahn), bike and walk tracks (first three phases) accuracy values above 98 % are obtained for any of the mentioned algorithms. For the fourth phase, where the classification between bus and tram tracks is carried out, the performance of the model is not so outstanding, due to its similar characteristics, but nevertheless reaches an accuracy value of 83 % using a Neural Network Multi-layer Perceptron model. The great performance of the model after the training phase allowed its implementation using unlabeled tracks, achieving very good results with an accuracy of 92.6 % in the prediction of the tracks used, making only mistakes in distinguishing between tram and bus tracks.<br /
    corecore