88,109 research outputs found
On Classification from Outlier View
Classification is the basis of cognition. Unlike other solutions, this study
approaches it from the view of outliers. We present an expanding algorithm to
detect outliers in univariate datasets, together with the underlying
foundation. The expanding algorithm runs in a holistic way, making it a rather
robust solution. Synthetic and real data experiments show its power.
Furthermore, an application for multi-class problems leads to the introduction
of the oscillator algorithm. The corresponding result implies the potential
wide use of the expanding algorithm.Comment: Conclusion renewed; IAENG International Journal of Computer Science,
Volume 37, Issue 4, Nov, 201
Amplifying Inter-message Distance: On Information Divergence Measures in Big Data
Message identification (M-I) divergence is an important measure of the
information distance between probability distributions, similar to
Kullback-Leibler (K-L) and Renyi divergence. In fact, M-I divergence with a
variable parameter can make an effect on characterization of distinction
between two distributions. Furthermore, by choosing an appropriate parameter of
M-I divergence, it is possible to amplify the information distance between
adjacent distributions while maintaining enough gap between two nonadjacent
ones. Therefore, M-I divergence can play a vital role in distinguishing
distributions more clearly. In this paper, we first define a parametric M-I
divergence in the view of information theory and then present its major
properties. In addition, we design a M-I divergence estimation algorithm by
means of the ensemble estimator of the proposed weight kernel estimators, which
can improve the convergence of mean squared error from
to . We also discuss the decision with
M-I divergence for clustering or classification, and investigate its
performance in a statistical sequence model of big data for the outlier
detection problem.Comment: 30 pages, 4 figure
3DCapsule: Extending the Capsule Architecture to Classify 3D Point Clouds
This paper introduces the 3DCapsule, which is a 3D extension of the recently
introduced Capsule concept that makes it applicable to unordered point sets.
The original Capsule relies on the existence of a spatial relationship between
the elements in the feature map it is presented with, whereas in point
permutation invariant formulations of 3D point set classification methods, such
relationships are typically lost. Here, a new layer called ComposeCaps is
introduced that, in lieu of a spatially relevant feature mapping, learns a new
mapping that can be exploited by the 3DCapsule. Previous works in the 3D point
set classification domain have focused on other parts of the architecture,
whereas instead, the 3DCapsule is a drop-in replacement of the commonly used
fully connected classifier. It is demonstrated via an ablation study, that when
the 3DCapsule is applied to recent 3D point set classification architectures,
it consistently shows an improvement, in particular when subjected to noisy
data. Similarly, the ComposeCaps layer is evaluated and demonstrates an
improvement over the baseline. In an apples-to-apples comparison against
state-of-the-art methods, again, better performance is demonstrated by the
3DCapsule.Comment: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV
Fast Approximate L_infty Minimization: Speeding Up Robust Regression
Minimization of the norm, which can be viewed as approximately
solving the non-convex least median estimation problem, is a powerful method
for outlier removal and hence robust regression. However, current techniques
for solving the problem at the heart of norm minimization are slow,
and therefore cannot scale to large problems. A new method for the minimization
of the norm is presented here, which provides a speedup of multiple
orders of magnitude for data with high dimension. This method, termed Fast
Minimization, allows robust regression to be applied to a class of
problems which were previously inaccessible. It is shown how the
norm minimization problem can be broken up into smaller sub-problems, which can
then be solved extremely efficiently. Experimental results demonstrate the
radical reduction in computation time, along with robustness against large
numbers of outliers in a few model-fitting problems.Comment: 11 page
Outlier absorbing based on a Bayesian approach
The presence of outliers is prevalent in machine learning applications and
may produce misleading results. In this paper a new method for dealing with
outliers and anomal samples is proposed. To overcome the outlier issue, the
proposed method combines the global and local views of the samples. By
combination of these views, our algorithm performs in a robust manner. The
experimental results show the capabilities of the proposed method
GlidarCo: gait recognition by 3D skeleton estimation and biometric feature correction of flash lidar data
Gait recognition using noninvasively acquired data has been attracting an
increasing interest in the last decade. Among various modalities of data
sources, it is experimentally found that the data involving skeletal
representation are amenable for reliable feature compaction and fast
processing. Model-based gait recognition methods that exploit features from a
fitted model, like skeleton, are recognized for their view and scale-invariant
properties. We propose a model-based gait recognition method, using sequences
recorded by a single flash lidar. Existing state-of-the-art model-based
approaches that exploit features from high quality skeletal data collected by
Kinect and Mocap are limited to controlled laboratory environments. The
performance of conventional research efforts is negatively affected by poor
data quality. We address the problem of gait recognition under challenging
scenarios, such as lower quality and noisy imaging process of lidar, that
degrades the performance of state-of-the-art skeleton-based systems. We present
GlidarCo to attain high accuracy on gait recognition under the described
conditions. A filtering mechanism corrects faulty skeleton joint measurements,
and robust statistics are integrated to conventional feature moments to encode
the dynamic of the motion. As a comparison, length-based and vector-based
features extracted from the noisy skeletons are investigated for outlier
removal. Experimental results illustrate the efficacy of the proposed
methodology in improving gait recognition given noisy low resolution lidar
data
MacroBase: Prioritizing Attention in Fast Data
As data volumes continue to rise, manual inspection is becoming increasingly
untenable. In response, we present MacroBase, a data analytics engine that
prioritizes end-user attention in high-volume fast data streams. MacroBase
enables efficient, accurate, and modular analyses that highlight and aggregate
important and unusual behavior, acting as a search engine for fast data.
MacroBase is able to deliver order-of-magnitude speedups over alternatives by
optimizing the combination of explanation and classification tasks and by
leveraging a new reservoir sampler and heavy-hitters sketch specialized for
fast data streams. As a result, MacroBase delivers accurate results at speeds
of up to 2M events per second per query on a single core. The system has
delivered meaningful results in production, including at a telematics company
monitoring hundreds of thousands of vehicles.Comment: SIGMOD 201
A Less Biased Evaluation of Out-of-distribution Sample Detectors
In the real world, a learning system could receive an input that is unlike
anything it has seen during training. Unfortunately, out-of-distribution
samples can lead to unpredictable behaviour. We need to know whether any given
input belongs to the population distribution of the training/evaluation data to
prevent unpredictable behaviour in deployed systems. A recent surge of interest
in this problem has led to the development of sophisticated techniques in the
deep learning literature. However, due to the absence of a standard problem
definition or an exhaustive evaluation, it is not evident if we can rely on
these methods. What makes this problem different from a typical supervised
learning setting is that the distribution of outliers used in training may not
be the same as the distribution of outliers encountered in the application.
Classical approaches that learn inliers vs. outliers with only two datasets can
yield optimistic results. We introduce OD-test, a three-dataset evaluation
scheme as a more reliable strategy to assess progress on this problem. We
present an exhaustive evaluation of a broad set of methods from related areas
on image classification tasks. Contrary to the existing results, we show that
for realistic applications of high-dimensional images the previous techniques
have low accuracy and are not reliable in practice.Comment: to appear in BMVC 2019; v2 is more compact, with more result
Applications of Data Mining Techniques for Vehicular Ad hoc Networks
Due to the recent advances in vehicular ad hoc networks (VANETs), smart
applications have been incorporating the data generated from these networks to
provide quality of life services. In this paper, we have proposed taxonomy of
data mining techniques that have been applied in this domain in addition to a
classification of these techniques. Our contribution is to highlight the
research methodologies in the literature and allow for comparing among them
using different characteristics. The proposed taxonomy covers elementary data
mining techniques such as: preprocessing, outlier detection, clustering, and
classification of data. In addition, it covers centralized, distributed,
offline, and online techniques from the literature
The MATLAB Toolbox SciXMiner: User's Manual and Programmer's Guide
The Matlab toolbox SciXMiner is designed for the visualization and analysis
of time series and features with a special focus to classification problems. It
was developed at the Institute of Applied Computer Science of the Karlsruhe
Institute of Technology (KIT), a member of the Helmholtz Association of German
Research Centres in Germany. The aim was to provide an open platform for the
development and improvement of data mining methods and its applications to
various medical and technical problems. SciXMiner bases on Matlab (tested for
the version 2017a). Many functions do not require additional standard toolboxes
but some parts of Signal, Statistics and Wavelet toolboxes are used for special
cases. The decision to a Matlab-based solution was made to use the wide
mathematical functionality of this package provided by The Mathworks Inc.
SciXMiner is controlled by a graphical user interface (GUI) with menu items and
control elements like popup lists, checkboxes and edit elements. This makes it
easier to work with SciXMiner for inexperienced users. Furthermore, an
automatization and batch standardization of analyzes is possible using macros.
The standard Matlab style using the command line is also available. SciXMiner
is an open source software. The download page is
http://sourceforge.net/projects/SciXMiner. It is licensed under the conditions
of the GNU General Public License (GNU-GPL) of The Free Software Foundation
- …