2,083 research outputs found

    A Method Non-Deterministic and Computationally Viable for Detecting Outliers in Large Datasets

    Get PDF
    This paper presents an outlier detection method that is based on a Variable Precision Rough Set Model (VPRSM). This method generalizes the standard set inclusion relation, which is the foundation of the Rough Sets Basic Model (RSBM). The main contribution of this research is an improvement in the quality of detection because this generalization allows us to classify when there is some degree of uncertainty. From the proposed method, a computationally viable algorithm for large volumes of data is also introduced. The experiments performed in a real scenario and a comparison of the results with the RSBM-based method demonstrate the efficiency of both the method and the algorithm in diverse contexts that involve large volumes of data.This work has been supported by grant TIN2016-78103-C2-2-R, and University of Alicante projects GRE14-02 and Smart University

    Unsupervised quantum machine learning for fraud detection

    Full text link
    We develop quantum protocols for anomaly detection and apply them to the task of credit card fraud detection (FD). First, we establish classical benchmarks based on supervised and unsupervised machine learning methods, where average precision is chosen as a robust metric for detecting anomalous data. We focus on kernel-based approaches for ease of direct comparison, basing our unsupervised modelling on one-class support vector machines (OC-SVM). Next, we employ quantum kernels of different type for performing anomaly detection, and observe that quantum FD can challenge equivalent classical protocols at increasing number of features (equal to the number of qubits for data embedding). Performing simulations with registers up to 20 qubits, we find that quantum kernels with re-uploading demonstrate better average precision, with the advantage increasing with system size. Specifically, at 20 qubits we reach the quantum-classical separation of average precision being equal to 15%. We discuss the prospects of fraud detection with near- and mid-term quantum hardware, and describe possible future improvements.Comment: 7 pages, 4 figure

    Contextual Anomaly Detection Framework for Big Sensor Data

    Get PDF
    Performing predictive modelling, such as anomaly detection, in Big Data is a difficult task. This problem is compounded as more and more sources of Big Data are generated from environmental sensors, logging applications, and the Internet of Things. Further, most current techniques for anomaly detection only consider the content of the data source, i.e. the data itself, without concern for the context of the data. As data becomes more complex it is increasingly important to bias anomaly detection techniques for the context, whether it is spatial, temporal, or semantic. The work proposed in this thesis outlines a contextual anomaly detection framework for use in Big sensor Data systems. The framework uses a well-defined content anomaly detection algorithm for real-time point anomaly detection. Additionally, we present a post-processing context-aware anomaly detection algorithm based on sensor profiles, which are groups of contextually similar sensors generated by a multivariate clustering algorithm. The contextual anomaly detection framework is evaluated with respect to two different Big sensor Data data sets; one for electrical sensors, and another for temperature sensors within a building

    Outlier Detection In Bayesian Neural Networks

    Get PDF
    Exploring different ways of describing uncertainty in neural networks is of great interest. Artificial intelligence models can be used with greater confidence by having solid methods for identifying and quantifying uncertainty. This is especially important in high-risk areas such as medical applications, autonomous vehicles, and financial systems. This thesis explores how to detect classification outliers in Bayesian Neural Networks. A few methods exist for quantifying uncertainty in Bayesian Neural Networks, such as computing the Entropy of the prediction vector. Is there a more accurate and broad way of detecting classification outliers in Bayesian Neural Networks? If a sample is detected as an outlier, is there a way of separating between different types of outliers? We try to answer these questions by using the pre-activation neuron values of a Bayesian Neural Network. We compare, in total, three different methods using simulated data, the Breast Cancer Wisconsin dataset and the MNIST dataset. The first method uses the well-researched Predictive Entropy, which will act as a baseline method. The second method uses the pre-activation neuron values in the output layer of a Bayesian Neural Network; this is done by comparing the pre-activation neuron value from a given data sample with the pre-activation neuron values from the training data. Lastly, the third method is a combination of the first two methods. The results show that the performance might depend on the dataset type. The proposed method outperforms the baseline method on the simulated data. When using the Breast Cancer Wisconsin dataset, we see that the proposed method is significantly better than the baseline. Interestingly, we observe that with the MNIST dataset, the baseline model outperforms the proposed method in most scenarios. Common for all three datasets is that the combination of the two methods performs approximately as well as the best of the two

    Surrogate regression modelling for fast seismogram generation and detection of microseismic events in heterogeneous velocity models

    Get PDF
    This is the author accepted manuscript. The final version is available from Oxford University Press (OUP) via the DOI in this record.Given a 3D heterogeneous velocity model with a few million voxels, fast generation of accurate seismic responses at specified receiver positions from known microseismic event locations is a well-known challenge in geophysics, since it typically involves numerical solution of the computationally expensive elastic wave equation. Thousands of such forward simulations are often a routine requirement for parameter estimation of microseimsic events via a suitable source inversion process. Parameter estimation based on forward modelling is often advantageous over a direct regression-based inversion approach when there are unknown number of parameters to be estimated and the seismic data has complicated noise characteristics which may not always allow a stable and unique solution in a direct inversion process. In this paper, starting from Graphics Processing Unit (GPU) based synthetic simulations of a few thousand forward seismic shots due to microseismic events via pseudo-spectral solution of elastic wave equation, we develop a step-by-step process to generate a surrogate regression modelling framework, using machine learning techniques that can produce accurate seismograms at specified receiver locations. The trained surrogate models can then be used as a high-speed meta-model/emulator or proxy for the original full elastic wave propagator to generate seismic responses for other microseismic event locations also. The accuracies of the surrogate models have been evaluated using two independent sets of training and testing Latin hypercube (LH) quasi-random samples, drawn from a heterogeneous marine velocity model. The predicted seismograms have been used thereafter to calculate batch likelihood functions, with specified noise characteristics. Finally, the trained models on 23 receivers placed at the sea-bed in a marine velocity model are used to determine the maximum likelihood estimate (MLE) of the event locations which can in future be used in a Bayesian analysis for microseismic event detection.This work has been supported by the Shell Projects and Technology. The Wilkes high performance GPU computing service at the University of Cambridge has been used in this work
    corecore