8,110 research outputs found
J Regularization Improves Imbalanced Multiclass Segmentation
We propose a new loss formulation to further advance the multiclass segmentation of cluttered cells under weakly supervised conditions. When adding a Youden's J statistic regularization term to the cross entropy loss we improve the separation of touching and immediate cells, obtaining sharp segmentation boundaries with high adequacy. This regularization intrinsically supports class imbalance thus eliminating the necessity of explicitly using weights to balance training. Simulations demonstrate this capability and show how the regularization leads to correct results by helping advancing the optimization when cross entropy stagnates. We build upon our previous work on multiclass segmentation by adding yet another training class representing gaps between adjacent cells. This addition helps the classifier identify narrow gaps as background and no longer as touching regions. We present results of our methods for 2D and 3D images, from bright field images to confocal stacks containing different types of cells, and we show that they accurately segment individual cells after training with a limited number of images, some of which are poorly annotated
Radar-based Feature Design and Multiclass Classification for Road User Recognition
The classification of individual traffic participants is a complex task,
especially for challenging scenarios with multiple road users or under bad
weather conditions. Radar sensors provide an - with respect to well established
camera systems - orthogonal way of measuring such scenes. In order to gain
accurate classification results, 50 different features are extracted from the
measurement data and tested on their performance. From these features a
suitable subset is chosen and passed to random forest and long short-term
memory (LSTM) classifiers to obtain class predictions for the radar input.
Moreover, it is shown why data imbalance is an inherent problem in automotive
radar classification when the dataset is not sufficiently large. To overcome
this issue, classifier binarization is used among other techniques in order to
better account for underrepresented classes. A new method to couple the
resulting probabilities is proposed and compared to others with great success.
Final results show substantial improvements when compared to ordinary
multiclass classificationComment: 8 pages, 6 figure
Axiomatic Interpretability for Multiclass Additive Models
Generalized additive models (GAMs) are favored in many regression and binary
classification problems because they are able to fit complex, nonlinear
functions while still remaining interpretable. In the first part of this paper,
we generalize a state-of-the-art GAM learning algorithm based on boosted trees
to the multiclass setting, and show that this multiclass algorithm outperforms
existing GAM learning algorithms and sometimes matches the performance of full
complexity models such as gradient boosted trees.
In the second part, we turn our attention to the interpretability of GAMs in
the multiclass setting. Surprisingly, the natural interpretability of GAMs
breaks down when there are more than two classes. Naive interpretation of
multiclass GAMs can lead to false conclusions. Inspired by binary GAMs, we
identify two axioms that any additive model must satisfy in order to not be
visually misleading. We then develop a technique called Additive
Post-Processing for Interpretability (API), that provably transforms a
pre-trained additive model to satisfy the interpretability axioms without
sacrificing accuracy. The technique works not just on models trained with our
learning algorithm, but on any multiclass additive model, including multiclass
linear and logistic regression. We demonstrate the effectiveness of API on a
12-class infant mortality dataset.Comment: KDD 201
Anatomical Pattern Analysis for decoding visual stimuli in human brains
Background: A universal unanswered question in neuroscience and machine
learning is whether computers can decode the patterns of the human brain.
Multi-Voxels Pattern Analysis (MVPA) is a critical tool for addressing this
question. However, there are two challenges in the previous MVPA methods, which
include decreasing sparsity and noise in the extracted features and increasing
the performance of prediction.
Methods: In overcoming mentioned challenges, this paper proposes Anatomical
Pattern Analysis (APA) for decoding visual stimuli in the human brain. This
framework develops a novel anatomical feature extraction method and a new
imbalance AdaBoost algorithm for binary classification. Further, it utilizes an
Error-Correcting Output Codes (ECOC) method for multiclass prediction. APA can
automatically detect active regions for each category of the visual stimuli.
Moreover, it enables us to combine homogeneous datasets for applying advanced
classification.
Results and Conclusions: Experimental studies on 4 visual categories (words,
consonants, objects and scrambled photos) demonstrate that the proposed
approach achieves superior performance to state-of-the-art methods.Comment: Published in Cognitive Computatio
Detection of Dispersed Radio Pulses: A machine learning approach to candidate identification and classification
Searching for extraterrestrial, transient signals in astronomical data sets
is an active area of current research. However, machine learning techniques are
lacking in the literature concerning single-pulse detection. This paper
presents a new, two-stage approach for identifying and classifying dispersed
pulse groups (DPGs) in single-pulse search output. The first stage identified
DPGs and extracted features to characterize them using a new peak
identification algorithm which tracks sloping tendencies around local maxima in
plots of signal-to-noise ratio vs. dispersion measure. The second stage used
supervised machine learning to classify DPGs. We created four benchmark data
sets: one unbalanced and three balanced versions using three different
imbalance treatments.We empirically evaluated 48 classifiers by training and
testing binary and multiclass versions of six machine learning algorithms on
each of the four benchmark versions. While each classifier had advantages and
disadvantages, all classifiers with imbalance treatments had higher recall
values than those with unbalanced data, regardless of the machine learning
algorithm used. Based on the benchmarking results, we selected a subset of
classifiers to classify the full, unlabelled data set of over 1.5 million DPGs
identified in 42,405 observations made by the Green Bank Telescope. Overall,
the classifiers using a multiclass ensemble tree learner in combination with
two oversampling imbalance treatments were the most efficient; they identified
additional known pulsars not in the benchmark data set and provided six
potential discoveries, with significantly less false positives than the other
classifiers.Comment: 13 pages, accepted for publication in MNRAS, ref. MN-15-1713-MJ.R
PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison
The selection, development, or comparison of machine learning methods in data
mining can be a difficult task based on the target problem and goals of a
particular study. Numerous publicly available real-world and simulated
benchmark datasets have emerged from different sources, but their organization
and adoption as standards have been inconsistent. As such, selecting and
curating specific benchmarks remains an unnecessary burden on machine learning
practitioners and data scientists. The present study introduces an accessible,
curated, and developing public benchmark resource to facilitate identification
of the strengths and weaknesses of different machine learning methodologies. We
compare meta-features among the current set of benchmark datasets in this
resource to characterize the diversity of available data. Finally, we apply a
number of established machine learning methods to the entire benchmark suite
and analyze how datasets and algorithms cluster in terms of performance. This
work is an important first step towards understanding the limitations of
popular benchmarking suites and developing a resource that connects existing
benchmarking standards to more diverse and efficient standards in the future.Comment: 14 pages, 5 figures, submitted for review to JML
An empirical evaluation of imbalanced data strategies from a practitioner's point of view
This research tested the following well known strategies to deal with binary
imbalanced data on 82 different real life data sets (sampled to imbalance rates
of 5%, 3%, 1%, and 0.1%): class weight, SMOTE, Underbagging, and a baseline
(just the base classifier). As base classifiers we used SVM with RBF kernel,
random forests, and gradient boosting machines and we measured the quality of
the resulting classifier using 6 different metrics (Area under the curve,
Accuracy, F-measure, G-mean, Matthew's correlation coefficient and Balanced
accuracy). The best strategy strongly depends on the metric used to measure the
quality of the classifier. For AUC and accuracy class weight and the baseline
perform better; for F-measure and MCC, SMOTE performs better; and for G-mean
and balanced accuracy, underbagging
- …
