2,549 research outputs found
Online learning and detection of faces with low human supervision
The final publication is available at link.springer.comWe present an efficient,online,and interactive approach for computing a classifier, called Wild Lady Ferns (WiLFs), for face learning and detection using small human supervision. More precisely, on the one hand, WiLFs combine online boosting and extremely randomized trees (Random Ferns) to compute progressively an efficient and discriminative classifier. On the other hand, WiLFs use an interactive human-machine approach that combines two complementary learning strategies to reduce considerably the degree of human supervision during learning. While the first strategy corresponds to query-by-boosting active learning, that requests human assistance over difficult samples in function of the classifier confidence, the second strategy refers to a memory-based learning which uses ¿ Exemplar-based Nearest Neighbors (¿ENN) to assist automatically the classifier. A pre-trained Convolutional Neural Network (CNN) is used to perform ¿ENN with high-level feature descriptors. The proposed approach is therefore fast (WilFs run in 1 FPS using a code not fully optimized), accurate (we obtain detection rates over 82% in complex datasets), and labor-saving (human assistance percentages of less than 20%).
As a byproduct, we demonstrate that WiLFs also perform semi-automatic annotation during learning, as while the classifier is being computed, WiLFs are discovering faces instances in input images which are used subsequently for training online the classifier. The advantages of our approach are demonstrated in synthetic and publicly available databases, showing comparable detection rates as offline approaches that require larger amounts of handmade training data.Peer ReviewedPostprint (author's final draft
Interactive multiple object learning with scanty human supervision
© 2016. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/We present a fast and online human-robot interaction approach that progressively learns multiple object classifiers using scanty human supervision. Given an input video stream recorded during the human robot interaction, the user just needs to annotate a small fraction of frames to compute object specific classifiers based on random ferns which share the same features. The resulting methodology is fast (in a few seconds, complex object appearances can be learned), versatile (it can be applied to unconstrained scenarios), scalable (real experiments show we can model up to 30 different object classes), and minimizes the amount of human intervention by leveraging the uncertainty measures associated to each classifier.; We thoroughly validate the approach on synthetic data and on real sequences acquired with a mobile platform in indoor and outdoor scenarios containing a multitude of different objects. We show that with little human assistance, we are able to build object classifiers robust to viewpoint changes, partial occlusions, varying lighting and cluttered backgrounds. (C) 2016 Elsevier Inc. All rights reserved.Peer ReviewedPostprint (author's final draft
Data mining for detecting Bitcoin Ponzi schemes
Soon after its introduction in 2009, Bitcoin has been adopted by
cyber-criminals, which rely on its pseudonymity to implement virtually
untraceable scams. One of the typical scams that operate on Bitcoin are the
so-called Ponzi schemes. These are fraudulent investments which repay users
with the funds invested by new users that join the scheme, and implode when it
is no longer possible to find new investments. Despite being illegal in many
countries, Ponzi schemes are now proliferating on Bitcoin, and they keep
alluring new victims, who are plundered of millions of dollars. We apply data
mining techniques to detect Bitcoin addresses related to Ponzi schemes. Our
starting point is a dataset of features of real-world Ponzi schemes, that we
construct by analysing, on the Bitcoin blockchain, the transactions used to
perform the scams. We use this dataset to experiment with various machine
learning algorithms, and we assess their effectiveness through standard
validation protocols and performance metrics. The best of the classifiers we
have experimented can identify most of the Ponzi schemes in the dataset, with a
low number of false positives
Analysis of group evolution prediction in complex networks
In the world, in which acceptance and the identification with social
communities are highly desired, the ability to predict evolution of groups over
time appears to be a vital but very complex research problem. Therefore, we
propose a new, adaptable, generic and mutli-stage method for Group Evolution
Prediction (GEP) in complex networks, that facilitates reasoning about the
future states of the recently discovered groups. The precise GEP modularity
enabled us to carry out extensive and versatile empirical studies on many
real-world complex / social networks to analyze the impact of numerous setups
and parameters like time window type and size, group detection method,
evolution chain length, prediction models, etc. Additionally, many new
predictive features reflecting the group state at a given time have been
identified and tested. Some other research problems like enriching learning
evolution chains with external data have been analyzed as well
Learning from Data Streams with Randomized Forests
Non-stationary streaming data poses a familiar challenge in machine learning: the need to
obtain fast and accurate predictions. A data stream is a continuously generated sequence of
data, with data typically arriving rapidly. They are often characterised by a non-stationary
generative process, with concept drift occurring as the process changes. Such processes are
commonly seen in the real world, such as in advertising, shopping trends, environmental
conditions, electricity monitoring and traffic monitoring.
Typical stationary algorithms are ill-suited for use with concept drifting data, thus necessitating
more targeted methods. Tree-based methods are a popular approach to this problem,
traditionally focussing on the use of the Hoeffding bound in order to guarantee performance
relative to a stationary scenario. However, there are limited single learners available for
regression scenarios, and those that do exist often struggle to choose between similarly
discriminative splits, leading to longer training times and worse performance. This limited
pool of single learners in turn hampers the performance of ensemble approaches in which
they act as base learners.
In this thesis we seek to remedy this gap in the literature, developing methods which
focus on increasing randomization to both improve predictive performance and reduce the
training times of tree-based ensemble methods. In particular, we have chosen to investigate
the use of randomization as it is known to be able to improve generalization error in
ensembles, and is also expected to lead to fast training times, thus being a natural method
of handling the problems typically experienced by single learners.
We begin in a regression scenario, introducing the Adaptive Trees for Streaming with
Extreme Randomization (ATSER) algorithm; a partially randomized approach based on
the concept of Extremely Randomized (extra) trees. The ATSER algorithm incrementally
trains trees, using the Hoeffding bound to select the best of a random selection of splits.
Simultaneously, the trees also detect and adapt to changes in the data stream. Unlike many
traditional streaming algorithms ATSER trees can easily be extended to include nominal
features. We find that compared to other contemporary methods ensembles of ATSER
trees lead to improved predictive performance whilst also reducing run times.
We then demonstrate the Adaptive Categorisation Trees for Streaming with Extreme
Randomization (ACTSER) algorithm, an adaption of the ATSER algorithm to the more
traditional categorization scenario, again showing improved predictive performance and
reduced runtimes. The inclusion of nominal features is particularly novel in this setting
since typical categorization approaches struggle to handle them.
Finally we examine a completely randomized scenario, where an ensemble of trees is generated
prior to having access to the data stream, while also considering multivariate splits
in addition to the traditional axis-aligned approach. We find that through the combination
of a forgetting mechanism in linear models and dynamic weighting for ensemble members,
we are able to avoid explicitly testing for concept drift. This leads to fast ensembles
with strong predictive performance, whilst also requiring fewer parameters than other
contemporary methods.
For each of the proposed methods in this thesis, we demonstrate empirically that they are
effective over a variety of different non-stationary data streams, including on multiple
types of concept drift. Furthermore, in comparison to other contemporary data streaming
algorithms, we find the biggest improvements in performance are on noisy data streams.Engineers Gat
- …