6,437 research outputs found
Hellinger Distance Trees for Imbalanced Streams
Classifiers trained on data sets possessing an imbalanced class distribution
are known to exhibit poor generalisation performance. This is known as the
imbalanced learning problem. The problem becomes particularly acute when we
consider incremental classifiers operating on imbalanced data streams,
especially when the learning objective is rare class identification. As
accuracy may provide a misleading impression of performance on imbalanced data,
existing stream classifiers based on accuracy can suffer poor minority class
performance on imbalanced streams, with the result being low minority class
recall rates. In this paper we address this deficiency by proposing the use of
the Hellinger distance measure, as a very fast decision tree split criterion.
We demonstrate that by using Hellinger a statistically significant improvement
in recall rates on imbalanced data streams can be achieved, with an acceptable
increase in the false positive rate.Comment: 6 Pages, 2 figures, to be published in Proceedings 22nd International
Conference on Pattern Recognition (ICPR) 201
A Penalty Approach to Differential Item Functioning in Rasch Models
A new diagnostic tool for the identification of differential item functioning (DIF) is proposed. Classical approaches to DIF allow to consider only few subpopulations like ethnic groups when investigating if the solution of items depends on the membership to a subpopulation. We propose an explicit model for differential item functioning that includes a set of variables, containing metric as well as categorical components, as potential candidates for inducing DIF. The ability to include a set of covariates entails that the model contains a large number of parameters. Regularized estimators, in particular penalized maximum likelihood estimators, are used
to solve the estimation problem and to identify the items that induce DIF. It is shown that the method is able to detect items with DIF. Simulations and two applications demonstrate the applicability of the method
Random Forests : An Application To Tumour Classification
In this thesis, machine learning approaches, namely decision trees and random forests, are discussed. A mathematical foundation of decision trees is given. It is followed by discussion of the advantages and disadvantages of them. Further, the application of decision trees as a part of random forests is presented. A real life study of brain tumours is discussed regarding usage of random forests. The data consists of six different types of brain tumours, and the data is acquired by Raman spectroscopy. After the data has been curated, a random forest model is utilised to classify the class of the tumour. At the current point, the results seem optimistic, but require further experimentation
The Error is the Feature: how to Forecast Lightning using a Model Prediction Error
Despite the progress within the last decades, weather forecasting is still a
challenging and computationally expensive task. Current satellite-based
approaches to predict thunderstorms are usually based on the analysis of the
observed brightness temperatures in different spectral channels and emit a
warning if a critical threshold is reached. Recent progress in data science
however demonstrates that machine learning can be successfully applied to many
research fields in science, especially in areas dealing with large datasets. We
therefore present a new approach to the problem of predicting thunderstorms
based on machine learning. The core idea of our work is to use the error of
two-dimensional optical flow algorithms applied to images of meteorological
satellites as a feature for machine learning models. We interpret that optical
flow error as an indication of convection potentially leading to thunderstorms
and lightning. To factor in spatial proximity we use various manual convolution
steps. We also consider effects such as the time of day or the geographic
location. We train different tree classifier models as well as a neural network
to predict lightning within the next few hours (called nowcasting in
meteorology) based on these features. In our evaluation section we compare the
predictive power of the different models and the impact of different features
on the classification result. Our results show a high accuracy of 96% for
predictions over the next 15 minutes which slightly decreases with increasing
forecast period but still remains above 83% for forecasts of up to five hours.
The high false positive rate of nearly 6% however needs further investigation
to allow for an operational use of our approach.Comment: 10 pages, 7 figure
HBST: A Hamming Distance embedding Binary Search Tree for Visual Place Recognition
Reliable and efficient Visual Place Recognition is a major building block of
modern SLAM systems. Leveraging on our prior work, in this paper we present a
Hamming Distance embedding Binary Search Tree (HBST) approach for binary
Descriptor Matching and Image Retrieval. HBST allows for descriptor Search and
Insertion in logarithmic time by exploiting particular properties of binary
Feature descriptors. We support the idea behind our search structure with a
thorough analysis on the exploited descriptor properties and their effects on
completeness and complexity of search and insertion. To validate our claims we
conducted comparative experiments for HBST and several state-of-the-art methods
on a broad range of publicly available datasets. HBST is available as a compact
open-source C++ header-only library.Comment: Submitted to IEEE Robotics and Automation Letters (RA-L) 2018 with
International Conference on Intelligent Robots and Systems (IROS) 2018
option, 8 pages, 10 figure
Reduction of non-regression time through Artificial Intelligence
Please help us populate SUNScholar with the post print version of this article. It can be e-mailed to: [email protected]
- ā¦