43,832 research outputs found
RankMerging: A supervised learning-to-rank framework to predict links in large social network
Uncovering unknown or missing links in social networks is a difficult task
because of their sparsity and because links may represent different types of
relationships, characterized by different structural patterns. In this paper,
we define a simple yet efficient supervised learning-to-rank framework, called
RankMerging, which aims at combining information provided by various
unsupervised rankings. We illustrate our method on three different kinds of
social networks and show that it substantially improves the performances of
unsupervised metrics of ranking. We also compare it to other combination
strategies based on standard methods. Finally, we explore various aspects of
RankMerging, such as feature selection and parameter estimation and discuss its
area of relevance: the prediction of an adjustable number of links on large
networks.Comment: 43 pages, published in Machine Learning Journa
Feature Selection for Linear SVM with Provable Guarantees
We give two provably accurate feature-selection techniques for the linear
SVM. The algorithms run in deterministic and randomized time respectively. Our
algorithms can be used in an unsupervised or supervised setting. The supervised
approach is based on sampling features from support vectors. We prove that the
margin in the feature space is preserved to within -relative error of
the margin in the full feature space in the worst-case. In the unsupervised
setting, we also provide worst-case guarantees of the radius of the minimum
enclosing ball, thereby ensuring comparable generalization as in the full
feature space and resolving an open problem posed in Dasgupta et al. We present
extensive experiments on real-world datasets to support our theory and to
demonstrate that our method is competitive and often better than prior
state-of-the-art, for which there are no known provable guarantees.Comment: Appearing in Proceedings of 18th AISTATS, JMLR W&CP, vol 38, 201
Unsupervised spectral sub-feature learning for hyperspectral image classification
Spectral pixel classification is one of the principal techniques used in hyperspectral image (HSI) analysis. In this article, we propose an unsupervised feature learning method for classification of hyperspectral images. The proposed method learns a dictionary of sub-feature basis representations from the spectral domain, which allows effective use of the correlated spectral data. The learned dictionary is then used in encoding convolutional samples from the hyperspectral input pixels to an expanded but sparse feature space. Expanded hyperspectral feature representations enable linear separation between object classes present in an image. To evaluate the proposed method, we performed experiments on several commonly used HSI data sets acquired at different locations and by different sensors. Our experimental results show that the proposed method outperforms other pixel-wise classification methods that make use of unsupervised feature extraction approaches. Additionally, even though our approach does not use any prior knowledge, or labelled training data to learn features, it yields either advantageous, or comparable, results in terms of classification accuracy with respect to recent semi-supervised methods
A random forest system combination approach for error detection in digital dictionaries
When digitizing a print bilingual dictionary, whether via optical character
recognition or manual entry, it is inevitable that errors are introduced into
the electronic version that is created. We investigate automating the process
of detecting errors in an XML representation of a digitized print dictionary
using a hybrid approach that combines rule-based, feature-based, and language
model-based methods. We investigate combining methods and show that using
random forests is a promising approach. We find that in isolation, unsupervised
methods rival the performance of supervised methods. Random forests typically
require training data so we investigate how we can apply random forests to
combine individual base methods that are themselves unsupervised without
requiring large amounts of training data. Experiments reveal empirically that a
relatively small amount of data is sufficient and can potentially be further
reduced through specific selection criteria.Comment: 9 pages, 7 figures, 10 tables; appeared in Proceedings of the
Workshop on Innovative Hybrid Approaches to the Processing of Textual Data,
April 201
FSL-BM: Fuzzy Supervised Learning with Binary Meta-Feature for Classification
This paper introduces a novel real-time Fuzzy Supervised Learning with Binary
Meta-Feature (FSL-BM) for big data classification task. The study of real-time
algorithms addresses several major concerns, which are namely: accuracy, memory
consumption, and ability to stretch assumptions and time complexity. Attaining
a fast computational model providing fuzzy logic and supervised learning is one
of the main challenges in the machine learning. In this research paper, we
present FSL-BM algorithm as an efficient solution of supervised learning with
fuzzy logic processing using binary meta-feature representation using Hamming
Distance and Hash function to relax assumptions. While many studies focused on
reducing time complexity and increasing accuracy during the last decade, the
novel contribution of this proposed solution comes through integration of
Hamming Distance, Hash function, binary meta-features, binary classification to
provide real time supervised method. Hash Tables (HT) component gives a fast
access to existing indices; and therefore, the generation of new indices in a
constant time complexity, which supersedes existing fuzzy supervised algorithms
with better or comparable results. To summarize, the main contribution of this
technique for real-time Fuzzy Supervised Learning is to represent hypothesis
through binary input as meta-feature space and creating the Fuzzy Supervised
Hash table to train and validate model.Comment: FICC201
- …