10,774 research outputs found
Exploring Symmetry of Binary Classification Performance Metrics
Selecting the proper performance metric constitutes a key issue for most classification problems in the field of machine learning. Although the specialized literature has addressed several topics regarding these metrics, their symmetries have yet to be systematically studied. This research focuses on ten metrics based on a binary confusion matrix and their symmetric behaviour is formally defined under all types of transformations. Through simulated experiments, which cover the full range of datasets and classification results, the symmetric behaviour of these metrics is explored by exposing them to hundreds of simple or combined symmetric transformations. Cross-symmetries among the metrics and statistical symmetries are also explored. The results obtained show that, in all cases, three and only three types of symmetries arise: labelling inversion (between positive and negative classes); scoring inversion (concerning good and bad classifiers); and the combination of these two inversions. Additionally, certain metrics have been shown to be independent of the imbalance in the dataset and two cross-symmetries have been identified. The results regarding their symmetries reveal a deeper insight into the behaviour of various performance metrics and offer an indicator to properly interpret their values and a guide for their selection for certain specific applications.University of Seville (Spain) by Telefónica Chair “Intelligence in Networks
Further thoughts on precision
Background: There has been much discussion amongst automated software defect prediction researchers regarding use of the precision and false positive rate classifier performance metrics. Aim: To demonstrate and explain why failing to report precision when using data with highly imbalanced class distributions may provide an overly optimistic view of classifier performance. Method: Well documented examples of how dependent class distribution affects the suitability of performance measures. Conclusions: When using data where the minority class represents less than around 5 to 10 percent of data points in total, failing to report precision may be a critical mistake. Furthermore, deriving the precision values omitted from studies can reveal valuable insight into true classifier performancePeer reviewedFinal Accepted Versio
Optimal Representation of Anuran Call Spectrum in Environmental Monitoring Systems Using Wireless Sensor Networks
The analysis and classification of the sounds produced by certain animal species, notably anurans, have revealed these amphibians to be a potentially strong indicator of temperature fluctuations and therefore of the existence of climate change. Environmental monitoring systems using Wireless Sensor Networks are therefore of interest to obtain indicators of global warming. For the automatic classification of the sounds recorded on such systems, the proper representation of the sound spectrum is essential since it contains the information required for cataloguing anuran calls. The present paper focuses on this process of feature extraction by exploring three alternatives: the standardized MPEG-7, the Filter Bank Energy (FBE), and the Mel Frequency Cepstral Coefficients (MFCC). Moreover, various values for every option in the extraction of spectrum features have been considered. Throughout the paper, it is shown that representing the frame spectrum with pure FBE offers slightly worse results than using the MPEG-7 features. This performance can easily be increased, however, by rescaling the FBE in a double dimension: vertically, by taking the logarithm of the energies; and, horizontally, by applying mel scaling in the filter banks. On the other hand, representing the spectrum in the cepstral domain, as in MFCC, has shown additional marginal improvements in classification performance.University of Seville: Telefónica Chair "Intelligence Networks
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
Construction of embedded fMRI resting state functional connectivity networks using manifold learning
We construct embedded functional connectivity networks (FCN) from benchmark
resting-state functional magnetic resonance imaging (rsfMRI) data acquired from
patients with schizophrenia and healthy controls based on linear and nonlinear
manifold learning algorithms, namely, Multidimensional Scaling (MDS), Isometric
Feature Mapping (ISOMAP) and Diffusion Maps. Furthermore, based on key global
graph-theoretical properties of the embedded FCN, we compare their
classification potential using machine learning techniques. We also assess the
performance of two metrics that are widely used for the construction of FCN
from fMRI, namely the Euclidean distance and the lagged cross-correlation
metric. We show that the FCN constructed with Diffusion Maps and the lagged
cross-correlation metric outperform the other combinations
Learning to rank in person re-identification with metric ensembles
We propose an effective structured learning based approach to the problem of
person re-identification which outperforms the current state-of-the-art on most
benchmark data sets evaluated. Our framework is built on the basis of multiple
low-level hand-crafted and high-level visual features. We then formulate two
optimization algorithms, which directly optimize evaluation measures commonly
used in person re-identification, also known as the Cumulative Matching
Characteristic (CMC) curve. Our new approach is practical to many real-world
surveillance applications as the re-identification performance can be
concentrated in the range of most practical importance. The combination of
these factors leads to a person re-identification system which outperforms most
existing algorithms. More importantly, we advance state-of-the-art results on
person re-identification by improving the rank- recognition rates from
to on the iLIDS benchmark, to on the PRID2011
benchmark, to on the VIPeR benchmark, to on the
CUHK01 benchmark and to on the CUHK03 benchmark.Comment: 10 page
Sensing Methods for Two-Target and Four-Target Detection in Time-Constrained Vector Poisson and Gaussian Channels
In this dissertation we consider a sensor scheduling or resource management problem for a vector Poisson and Gaussian channels. The input is a binary random vector and the output is a set of conditionally independent Poisson or Gaussian random variables. The objective is to design a scaling matrix, which is a linear transformation whose purpose is to entangle the different inputs, under a total given energy/time constraint. The two metrics are adopted to quantify the performance of the designed scaling matrix: mutual information and Bayesian inference. In other words, it is an experimental design problem where the objective is to glean the information about the binary inputs and perform the classification of the input random vector in a fixed time-resource, that is transmitted through a vector Poisson and Gaussian channel, based on the output observations.
No optimal solution is claimed in this dissertation for the above problem for either of the Poisson or Gaussian channels; from either of the two perspectives: mutual information or Bayes detection. However, time-symmetry does exist in the said problem. It is further noted that the problem is concave in its domain (i.e. sensing times) from mutual information criterion; and this is based on the observations in the computational results. If this concavity does exist in the problem then together with the time-symmetry result; it can be deduced that the optimal solution has a symmetry too; and that would reduce the exponentially rising dimensionality of the search-space to the linear one (w.r.t dimension of the input random vector). However, concavity of the objective function in the Bayes framework does not exist.
Further, it is noted that the classification criterion in the above two channels; and mutual information criterion do not generally lead to the same solution when subjected to the same fixed time constraint and model parameters.
It is also noted that the combinatorial explosion is inevitable, that occurs while addressing the problem through computational means, even with exploiting the inherent time-symmetry and the concavity in the objective. This curse of the dimensionality is the main obstacle in exploring the problem for targets greater than four (i.e. for dimension of the input vector greater than 4)
Identifying networks with common organizational principles
Many complex systems can be represented as networks, and the problem of
network comparison is becoming increasingly relevant. There are many techniques
for network comparison, from simply comparing network summary statistics to
sophisticated but computationally costly alignment-based approaches. Yet it
remains challenging to accurately cluster networks that are of a different size
and density, but hypothesized to be structurally similar. In this paper, we
address this problem by introducing a new network comparison methodology that
is aimed at identifying common organizational principles in networks. The
methodology is simple, intuitive and applicable in a wide variety of settings
ranging from the functional classification of proteins to tracking the
evolution of a world trade network.Comment: 26 pages, 7 figure
- …