3,703 research outputs found
How is a data-driven approach better than random choice in label space division for multi-label classification?
We propose using five data-driven community detection approaches from social
networks to partition the label space for the task of multi-label
classification as an alternative to random partitioning into equal subsets as
performed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector,
infomap, walktrap and label propagation algorithms. We construct a label
co-occurence graph (both weighted an unweighted versions) based on training
data and perform community detection to partition the label set. We include
Binary Relevance and Label Powerset classification methods for comparison. We
use gini-index based Decision Trees as the base classifier. We compare educated
approaches to label space divisions against random baselines on 12 benchmark
data sets over five evaluation measures. We show that in almost all cases seven
educated guess approaches are more likely to outperform RAkELd than otherwise
in all measures, but Hamming Loss. We show that fastgreedy and walktrap
community detection methods on weighted label co-occurence graphs are 85-92%
more likely to yield better F1 scores than random partitioning. Infomap on the
unweighted label co-occurence graphs is on average 90% of the times better than
random paritioning in terms of Subset Accuracy and 89% when it comes to Jaccard
similarity. Weighted fastgreedy is better on average than RAkELd when it comes
to Hamming Loss
Return of the features. Efficient feature selection and interpretation for photometric redshifts
The explosion of data in recent years has generated an increasing need for
new analysis techniques in order to extract knowledge from massive datasets.
Machine learning has proved particularly useful to perform this task. Fully
automatized methods have recently gathered great popularity, even though those
methods often lack physical interpretability. In contrast, feature based
approaches can provide both well-performing models and understandable
causalities with respect to the correlations found between features and
physical processes. Efficient feature selection is an essential tool to boost
the performance of machine learning models. In this work, we propose a forward
selection method in order to compute, evaluate, and characterize better
performing features for regression and classification problems. Given the
importance of photometric redshift estimation, we adopt it as our case study.
We synthetically created 4,520 features by combining magnitudes, errors, radii,
and ellipticities of quasars, taken from the SDSS. We apply a forward selection
process, a recursive method in which a huge number of feature sets is tested
through a kNN algorithm, leading to a tree of feature sets. The branches of the
tree are then used to perform experiments with the random forest, in order to
validate the best set with an alternative model. We demonstrate that the sets
of features determined with our approach improve the performances of the
regression models significantly when compared to the performance of the classic
features from the literature. The found features are unexpected and surprising,
being very different from the classic features. Therefore, a method to
interpret some of the found features in a physical context is presented. The
methodology described here is very general and can be used to improve the
performance of machine learning models for any regression or classification
task.Comment: 21 pages, 11 figures, accepted for publication on A&A, final version
after language revisio
Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields
This work presents a first evaluation of using spatio-temporal receptive
fields from a recently proposed time-causal spatio-temporal scale-space
framework as primitives for video analysis. We propose a new family of video
descriptors based on regional statistics of spatio-temporal receptive field
responses and evaluate this approach on the problem of dynamic texture
recognition. Our approach generalises a previously used method, based on joint
histograms of receptive field responses, from the spatial to the
spatio-temporal domain and from object recognition to dynamic texture
recognition. The time-recursive formulation enables computationally efficient
time-causal recognition. The experimental evaluation demonstrates competitive
performance compared to state-of-the-art. Especially, it is shown that binary
versions of our dynamic texture descriptors achieve improved performance
compared to a large range of similar methods using different primitives either
handcrafted or learned from data. Further, our qualitative and quantitative
investigation into parameter choices and the use of different sets of receptive
fields highlights the robustness and flexibility of our approach. Together,
these results support the descriptive power of this family of time-causal
spatio-temporal receptive fields, validate our approach for dynamic texture
recognition and point towards the possibility of designing a range of video
analysis methods based on these new time-causal spatio-temporal primitives.Comment: 29 pages, 16 figure
Multi-View Object Instance Recognition in an Industrial Context
We present a fast object recognition system coding shape by viewpoint invariant geometric relations and appearance information. In our advanced industrial work-cell, the system can observe the work space of the robot by three pairs of Kinect and stereo cameras allowing for reliable and complete object information. From these sensors, we derive global viewpoint invariant shape features and robust color features making use of color normalization techniques.
We show that in such a set-up, our system can achieve high performance already with a very low number of training samples, which is crucial for user acceptance and that the use of multiple views is crucial for performance. This indicates that our approach can be used in controlled but realistic industrial contexts that require—besides high reliability—fast processing and an intuitive and easy use at the end-user side.European UnionDanish Council for Strategic Researc
- …