17 research outputs found
Similarity Learning for High-Dimensional Sparse Data
A good measure of similarity between data points is crucial to many tasks in
machine learning. Similarity and metric learning methods learn such measures
automatically from data, but they do not scale well respect to the
dimensionality of the data. In this paper, we propose a method that can learn
efficiently similarity measure from high-dimensional sparse data. The core idea
is to parameterize the similarity measure as a convex combination of rank-one
matrices with specific sparsity structures. The parameters are then optimized
with an approximate Frank-Wolfe procedure to maximally satisfy relative
similarity constraints on the training data. Our algorithm greedily
incorporates one pair of features at a time into the similarity measure,
providing an efficient way to control the number of active features and thus
reduce overfitting. It enjoys very appealing convergence guarantees and its
time and memory complexity depends on the sparsity of the data instead of the
dimension of the feature space. Our experiments on real-world high-dimensional
datasets demonstrate its potential for classification, dimensionality reduction
and data exploration.Comment: 14 pages. Proceedings of the 18th International Conference on
Artificial Intelligence and Statistics (AISTATS 2015). Matlab code:
https://github.com/bellet/HDS
On the usage of active learning for SHM
The key element of this work is to demonstrate a strategy for using pattern recognition algorithms to investigate
correlations between feature variables for Structural Health Monitoring (SHM). The task will take advantage
of data from a bridge. An informative chain of artificial intelligence tools will allow an active learning
interaction between the unfolded shapes of the manifold of online data by characterising the physical shape
between variables. In many data mining and machine learning applications, there is a significant supply
of unlabelled data but an important undersupply of labelled data. Semi-supervised active learning, which
combines both labelled and unlabelled data can offer serious access to useful information and may be the
crucial element in successful decision making, regarding the health of structures
Is it worth changing pattern recognition methods for structural health monitoring?
The key element of this work is to demonstrate alternative strategies for using pattern
recognition algorithms whilst investigating structural health monitoring. This paper looks to
determine if it makes any difference in choosing from a range of established classification
techniques: from decision trees and support vector machines, to Gaussian processes.
Classification algorithms are tested on adjustable synthetic data to establish performance metrics,
then all techniques are applied to real SHM data. To aid the selection of training data, an
informative chain of artificial intelligence tools is used to explore an active learning interaction
between meaningful clusters of data