10,774 research outputs found

    Exploring Symmetry of Binary Classification Performance Metrics

    Get PDF
    Selecting the proper performance metric constitutes a key issue for most classification problems in the field of machine learning. Although the specialized literature has addressed several topics regarding these metrics, their symmetries have yet to be systematically studied. This research focuses on ten metrics based on a binary confusion matrix and their symmetric behaviour is formally defined under all types of transformations. Through simulated experiments, which cover the full range of datasets and classification results, the symmetric behaviour of these metrics is explored by exposing them to hundreds of simple or combined symmetric transformations. Cross-symmetries among the metrics and statistical symmetries are also explored. The results obtained show that, in all cases, three and only three types of symmetries arise: labelling inversion (between positive and negative classes); scoring inversion (concerning good and bad classifiers); and the combination of these two inversions. Additionally, certain metrics have been shown to be independent of the imbalance in the dataset and two cross-symmetries have been identified. The results regarding their symmetries reveal a deeper insight into the behaviour of various performance metrics and offer an indicator to properly interpret their values and a guide for their selection for certain specific applications.University of Seville (Spain) by Telefónica Chair “Intelligence in Networks

    Further thoughts on precision

    Get PDF
    Background: There has been much discussion amongst automated software defect prediction researchers regarding use of the precision and false positive rate classifier performance metrics. Aim: To demonstrate and explain why failing to report precision when using data with highly imbalanced class distributions may provide an overly optimistic view of classifier performance. Method: Well documented examples of how dependent class distribution affects the suitability of performance measures. Conclusions: When using data where the minority class represents less than around 5 to 10 percent of data points in total, failing to report precision may be a critical mistake. Furthermore, deriving the precision values omitted from studies can reveal valuable insight into true classifier performancePeer reviewedFinal Accepted Versio

    Optimal Representation of Anuran Call Spectrum in Environmental Monitoring Systems Using Wireless Sensor Networks

    Get PDF
    The analysis and classification of the sounds produced by certain animal species, notably anurans, have revealed these amphibians to be a potentially strong indicator of temperature fluctuations and therefore of the existence of climate change. Environmental monitoring systems using Wireless Sensor Networks are therefore of interest to obtain indicators of global warming. For the automatic classification of the sounds recorded on such systems, the proper representation of the sound spectrum is essential since it contains the information required for cataloguing anuran calls. The present paper focuses on this process of feature extraction by exploring three alternatives: the standardized MPEG-7, the Filter Bank Energy (FBE), and the Mel Frequency Cepstral Coefficients (MFCC). Moreover, various values for every option in the extraction of spectrum features have been considered. Throughout the paper, it is shown that representing the frame spectrum with pure FBE offers slightly worse results than using the MPEG-7 features. This performance can easily be increased, however, by rescaling the FBE in a double dimension: vertically, by taking the logarithm of the energies; and, horizontally, by applying mel scaling in the filter banks. On the other hand, representing the spectrum in the cepstral domain, as in MFCC, has shown additional marginal improvements in classification performance.University of Seville: Telefónica Chair "Intelligence Networks

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    Construction of embedded fMRI resting state functional connectivity networks using manifold learning

    Full text link
    We construct embedded functional connectivity networks (FCN) from benchmark resting-state functional magnetic resonance imaging (rsfMRI) data acquired from patients with schizophrenia and healthy controls based on linear and nonlinear manifold learning algorithms, namely, Multidimensional Scaling (MDS), Isometric Feature Mapping (ISOMAP) and Diffusion Maps. Furthermore, based on key global graph-theoretical properties of the embedded FCN, we compare their classification potential using machine learning techniques. We also assess the performance of two metrics that are widely used for the construction of FCN from fMRI, namely the Euclidean distance and the lagged cross-correlation metric. We show that the FCN constructed with Diffusion Maps and the lagged cross-correlation metric outperform the other combinations

    Learning to rank in person re-identification with metric ensembles

    Full text link
    We propose an effective structured learning based approach to the problem of person re-identification which outperforms the current state-of-the-art on most benchmark data sets evaluated. Our framework is built on the basis of multiple low-level hand-crafted and high-level visual features. We then formulate two optimization algorithms, which directly optimize evaluation measures commonly used in person re-identification, also known as the Cumulative Matching Characteristic (CMC) curve. Our new approach is practical to many real-world surveillance applications as the re-identification performance can be concentrated in the range of most practical importance. The combination of these factors leads to a person re-identification system which outperforms most existing algorithms. More importantly, we advance state-of-the-art results on person re-identification by improving the rank-11 recognition rates from 40%40\% to 50%50\% on the iLIDS benchmark, 16%16\% to 18%18\% on the PRID2011 benchmark, 43%43\% to 46%46\% on the VIPeR benchmark, 34%34\% to 53%53\% on the CUHK01 benchmark and 21%21\% to 62%62\% on the CUHK03 benchmark.Comment: 10 page

    Sensing Methods for Two-Target and Four-Target Detection in Time-Constrained Vector Poisson and Gaussian Channels

    Get PDF
    In this dissertation we consider a sensor scheduling or resource management problem for a vector Poisson and Gaussian channels. The input is a binary random vector and the output is a set of conditionally independent Poisson or Gaussian random variables. The objective is to design a scaling matrix, which is a linear transformation whose purpose is to entangle the different inputs, under a total given energy/time constraint. The two metrics are adopted to quantify the performance of the designed scaling matrix: mutual information and Bayesian inference. In other words, it is an experimental design problem where the objective is to glean the information about the binary inputs and perform the classification of the input random vector in a fixed time-resource, that is transmitted through a vector Poisson and Gaussian channel, based on the output observations. No optimal solution is claimed in this dissertation for the above problem for either of the Poisson or Gaussian channels; from either of the two perspectives: mutual information or Bayes detection. However, time-symmetry does exist in the said problem. It is further noted that the problem is concave in its domain (i.e. sensing times) from mutual information criterion; and this is based on the observations in the computational results. If this concavity does exist in the problem then together with the time-symmetry result; it can be deduced that the optimal solution has a symmetry too; and that would reduce the exponentially rising dimensionality of the search-space to the linear one (w.r.t dimension of the input random vector). However, concavity of the objective function in the Bayes framework does not exist. Further, it is noted that the classification criterion in the above two channels; and mutual information criterion do not generally lead to the same solution when subjected to the same fixed time constraint and model parameters. It is also noted that the combinatorial explosion is inevitable, that occurs while addressing the problem through computational means, even with exploiting the inherent time-symmetry and the concavity in the objective. This curse of the dimensionality is the main obstacle in exploring the problem for targets greater than four (i.e. for dimension of the input vector greater than 4)

    Identifying networks with common organizational principles

    Full text link
    Many complex systems can be represented as networks, and the problem of network comparison is becoming increasingly relevant. There are many techniques for network comparison, from simply comparing network summary statistics to sophisticated but computationally costly alignment-based approaches. Yet it remains challenging to accurately cluster networks that are of a different size and density, but hypothesized to be structurally similar. In this paper, we address this problem by introducing a new network comparison methodology that is aimed at identifying common organizational principles in networks. The methodology is simple, intuitive and applicable in a wide variety of settings ranging from the functional classification of proteins to tracking the evolution of a world trade network.Comment: 26 pages, 7 figure
    corecore