17 research outputs found

    Similarity Learning for High-Dimensional Sparse Data

    Get PDF
    A good measure of similarity between data points is crucial to many tasks in machine learning. Similarity and metric learning methods learn such measures automatically from data, but they do not scale well respect to the dimensionality of the data. In this paper, we propose a method that can learn efficiently similarity measure from high-dimensional sparse data. The core idea is to parameterize the similarity measure as a convex combination of rank-one matrices with specific sparsity structures. The parameters are then optimized with an approximate Frank-Wolfe procedure to maximally satisfy relative similarity constraints on the training data. Our algorithm greedily incorporates one pair of features at a time into the similarity measure, providing an efficient way to control the number of active features and thus reduce overfitting. It enjoys very appealing convergence guarantees and its time and memory complexity depends on the sparsity of the data instead of the dimension of the feature space. Our experiments on real-world high-dimensional datasets demonstrate its potential for classification, dimensionality reduction and data exploration.Comment: 14 pages. Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS 2015). Matlab code: https://github.com/bellet/HDS

    On the usage of active learning for SHM

    Get PDF
    The key element of this work is to demonstrate a strategy for using pattern recognition algorithms to investigate correlations between feature variables for Structural Health Monitoring (SHM). The task will take advantage of data from a bridge. An informative chain of artificial intelligence tools will allow an active learning interaction between the unfolded shapes of the manifold of online data by characterising the physical shape between variables. In many data mining and machine learning applications, there is a significant supply of unlabelled data but an important undersupply of labelled data. Semi-supervised active learning, which combines both labelled and unlabelled data can offer serious access to useful information and may be the crucial element in successful decision making, regarding the health of structures

    Is it worth changing pattern recognition methods for structural health monitoring?

    Get PDF
    The key element of this work is to demonstrate alternative strategies for using pattern recognition algorithms whilst investigating structural health monitoring. This paper looks to determine if it makes any difference in choosing from a range of established classification techniques: from decision trees and support vector machines, to Gaussian processes. Classification algorithms are tested on adjustable synthetic data to establish performance metrics, then all techniques are applied to real SHM data. To aid the selection of training data, an informative chain of artificial intelligence tools is used to explore an active learning interaction between meaningful clusters of data
    corecore