Search CORE

6,437 research outputs found

Hellinger Distance Trees for Imbalanced Streams

Author: Brooke J. M.
Knowles J. D.
Lyon R. J.
Stappers B. W.
Publication venue
Publication date: 01/01/2014
Field of study

Classifiers trained on data sets possessing an imbalanced class distribution are known to exhibit poor generalisation performance. This is known as the imbalanced learning problem. The problem becomes particularly acute when we consider incremental classifiers operating on imbalanced data streams, especially when the learning objective is rare class identification. As accuracy may provide a misleading impression of performance on imbalanced data, existing stream classifiers based on accuracy can suffer poor minority class performance on imbalanced streams, with the result being low minority class recall rates. In this paper we address this deficiency by proposing the use of the Hellinger distance measure, as a very fast decision tree split criterion. We demonstrate that by using Hellinger a statistically significant improvement in recall rates on imbalanced data streams can be achieved, with an acceptable increase in the false positive rate.Comment: 6 Pages, 2 figures, to be published in Proceedings 22nd International Conference on Pattern Recognition (ICPR) 201

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

Edge Hill University Research Information Repository

The University of Manchester - Institutional Repository

A Penalty Approach to Differential Item Functioning in Rasch Models

Author: Schauberger Gunther
Tutz Gerhard
Publication venue
Publication date: 07/12/2012
Field of study

A new diagnostic tool for the identification of differential item functioning (DIF) is proposed. Classical approaches to DIF allow to consider only few subpopulations like ethnic groups when investigating if the solution of items depends on the membership to a subpopulation. We propose an explicit model for differential item functioning that includes a set of variables, containing metric as well as categorical components, as potential candidates for inducing DIF. The ability to include a set of covariates entails that the model contains a large number of parameters. Regularized estimators, in particular penalized maximum likelihood estimators, are used to solve the estimation problem and to identify the items that induce DIF. It is shown that the method is able to detect items with DIF. Simulations and two applications demonstrate the applicability of the method

Open Access LMU

Random Forests : An Application To Tumour Classification

Author: Kanervo Aleksi
Publication venue
Publication date: 27/05/2022
Field of study

In this thesis, machine learning approaches, namely decision trees and random forests, are discussed. A mathematical foundation of decision trees is given. It is followed by discussion of the advantages and disadvantages of them. Further, the application of decision trees as a part of random forests is presented. A real life study of brain tumours is discussed regarding usage of random forests. The data consists of six different types of brain tumours, and the data is acquired by Raman spectroscopy. After the data has been curated, a random forest model is utilised to classify the class of the tumour. At the current point, the results seem optimistic, but require further experimentation

UTUPub

The Error is the Feature: how to Forecast Lightning using a Model Prediction Error

Author: Andersson T.
Bott Andreas
Ernst Ludwig Planck Max Karl
Lang P
Ruiz Anne
Veillette Mark S.
Williams John K.
Zach Christopher
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2019
Field of study

Despite the progress within the last decades, weather forecasting is still a challenging and computationally expensive task. Current satellite-based approaches to predict thunderstorms are usually based on the analysis of the observed brightness temperatures in different spectral channels and emit a warning if a critical threshold is reached. Recent progress in data science however demonstrates that machine learning can be successfully applied to many research fields in science, especially in areas dealing with large datasets. We therefore present a new approach to the problem of predicting thunderstorms based on machine learning. The core idea of our work is to use the error of two-dimensional optical flow algorithms applied to images of meteorological satellites as a feature for machine learning models. We interpret that optical flow error as an indication of convection potentially leading to thunderstorms and lightning. To factor in spatial proximity we use various manual convolution steps. We also consider effects such as the time of day or the geographic location. We train different tree classifier models as well as a neural network to predict lightning within the next few hours (called nowcasting in meteorology) based on these features. In our evaluation section we compare the predictive power of the different models and the impact of different features on the classification result. Our results show a high accuracy of 96% for predictions over the next 15 minutes which slightly decreases with increasing forecast period but still remains above 83% for forecasts of up to five hours. The high false positive rate of nearly 6% however needs further investigation to allow for an operational use of our approach.Comment: 10 pages, 7 figure

arXiv.org e-Print Archive

Crossref

HBST: A Hamming Distance embedding Binary Search Tree for Visual Place Recognition

Author: Grisetti Giorgio
Schlegel Dominik
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Reliable and efficient Visual Place Recognition is a major building block of modern SLAM systems. Leveraging on our prior work, in this paper we present a Hamming Distance embedding Binary Search Tree (HBST) approach for binary Descriptor Matching and Image Retrieval. HBST allows for descriptor Search and Insertion in logarithmic time by exploiting particular properties of binary Feature descriptors. We support the idea behind our search structure with a thorough analysis on the exploited descriptor properties and their effects on completeness and complexity of search and insertion. To validate our claims we conducted comparative experiments for HBST and several state-of-the-art methods on a broad range of publicly available datasets. HBST is available as a compact open-source C++ header-only library.Comment: Submitted to IEEE Robotics and Automation Letters (RA-L) 2018 with International Conference on Intelligent Robots and Systems (IROS) 2018 option, 8 pages, 10 figure

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Reduction of non-regression time through Artificial Intelligence

Author: Krzesinski AES
Muller KE
Publication venue
Publication date: 01/01/2006
Field of study

Please help us populate SUNScholar with the post print version of this article. It can be e-mailed to: [email protected]

Crossref

Scipedia

Repositório Aberto da Universidade do Porto

Stellenbosch University SUNScholar Repository