Search CORE

2,083 research outputs found

A Method Non-Deterministic and Computationally Viable for Detecting Outliers in Large Datasets

Author: Abreu Ortega Miguel
Berna-Martinez Jose Vicente
Fernández Oliva Alberto
Maciá Pérez Francisco
Publication venue: Institute of Information Science, Academia Sinica
Publication date: 01/05/2020
Field of study

This paper presents an outlier detection method that is based on a Variable Precision Rough Set Model (VPRSM). This method generalizes the standard set inclusion relation, which is the foundation of the Rough Sets Basic Model (RSBM). The main contribution of this research is an improvement in the quality of detection because this generalization allows us to classify when there is some degree of uncertainty. From the proposed method, a computationally viable algorithm for large volumes of data is also introduced. The experiments performed in a real scenario and a comparison of the results with the RSBM-based method demonstrate the efficiency of both the method and the algorithm in diverse contexts that involve large volumes of data.This work has been supported by grant TIN2016-78103-C2-2-R, and University of Alicante projects GRE14-02 and Smart University

Repositorio Institucional de la Universidad de Alicante

IMPROVED DGA-BASED BOTNET DETECTION THROUGH CONTEXT-RELATED FEATURE SELECTION BASED ON PACKET FLOW INFORMATION

Author: Ruts D
Publication venue
Publication date: 20/06/2023
Field of study

Open University of the Netherlands Research Portal

Lossy Compression for Quantitative Floating Point Data using Random Noise Detection

Author: Huijten Patrick J.
Publication venue
Publication date: 01/01/2022
Field of study

Pure OAI Repository

Unsupervised quantum machine learning for fraud detection

Author: Kyriienko Oleksandr
Magnusson Einar B.
Publication venue
Publication date: 01/08/2022
Field of study

We develop quantum protocols for anomaly detection and apply them to the task of credit card fraud detection (FD). First, we establish classical benchmarks based on supervised and unsupervised machine learning methods, where average precision is chosen as a robust metric for detecting anomalous data. We focus on kernel-based approaches for ease of direct comparison, basing our unsupervised modelling on one-class support vector machines (OC-SVM). Next, we employ quantum kernels of different type for performing anomaly detection, and observe that quantum FD can challenge equivalent classical protocols at increasing number of features (equal to the number of qubits for data embedding). Performing simulations with registers up to 20 qubits, we find that quantum kernels with re-uploading demonstrate better average precision, with the advantage increasing with system size. Specifically, at 20 qubits we reach the quantum-classical separation of average precision being equal to 15%. We discuss the prospects of fraud detection with near- and mid-term quantum hardware, and describe possible future improvements.Comment: 7 pages, 4 figure

arXiv.org e-Print Archive

Anomaly Detection in Cyber-Physical Production Systems

Author: Carlos Manuel Santos Pinto
Publication venue
Publication date: 14/10/2021
Field of study

Repositório Aberto da Universidade do Porto

Contextual Anomaly Detection Framework for Big Sensor Data

Author: Hayes Michael
Publication venue: Scholarship@Western
Publication date: 16/04/2014
Field of study

Performing predictive modelling, such as anomaly detection, in Big Data is a difficult task. This problem is compounded as more and more sources of Big Data are generated from environmental sensors, logging applications, and the Internet of Things. Further, most current techniques for anomaly detection only consider the content of the data source, i.e. the data itself, without concern for the context of the data. As data becomes more complex it is increasingly important to bias anomaly detection techniques for the context, whether it is spatial, temporal, or semantic. The work proposed in this thesis outlines a contextual anomaly detection framework for use in Big sensor Data systems. The framework uses a well-defined content anomaly detection algorithm for real-time point anomaly detection. Additionally, we present a post-processing context-aware anomaly detection algorithm based on sensor profiles, which are groups of contextually similar sensors generated by a multivariate clustering algorithm. The contextual anomaly detection framework is evaluated with respect to two different Big sensor Data data sets; one for electrical sensors, and another for temperature sensors within a building

Scholarship@Western

Outlier Detection In Bayesian Neural Networks

Author: Ellingsen Herman
Publication venue: Norwegian University of Life Sciences
Publication date: 01/01/2023
Field of study

Exploring different ways of describing uncertainty in neural networks is of great interest. Artificial intelligence models can be used with greater confidence by having solid methods for identifying and quantifying uncertainty. This is especially important in high-risk areas such as medical applications, autonomous vehicles, and financial systems. This thesis explores how to detect classification outliers in Bayesian Neural Networks. A few methods exist for quantifying uncertainty in Bayesian Neural Networks, such as computing the Entropy of the prediction vector. Is there a more accurate and broad way of detecting classification outliers in Bayesian Neural Networks? If a sample is detected as an outlier, is there a way of separating between different types of outliers? We try to answer these questions by using the pre-activation neuron values of a Bayesian Neural Network. We compare, in total, three different methods using simulated data, the Breast Cancer Wisconsin dataset and the MNIST dataset. The first method uses the well-researched Predictive Entropy, which will act as a baseline method. The second method uses the pre-activation neuron values in the output layer of a Bayesian Neural Network; this is done by comparing the pre-activation neuron value from a given data sample with the pre-activation neuron values from the training data. Lastly, the third method is a combination of the first two methods. The results show that the performance might depend on the dataset type. The proposed method outperforms the baseline method on the simulated data. When using the Breast Cancer Wisconsin dataset, we see that the proposed method is significantly better than the baseline. Interestingly, we observe that with the MNIST dataset, the baseline model outperforms the proposed method in most scenarios. Common for all three datasets is that the combination of the two methods performs approximately as well as the best of the two

Brage NMBU

Surrogate regression modelling for fast seismogram generation and detection of microseismic events in heterogeneous velocity models

Author: Chen X
Das S
Goudswaard J
Hobson MP
Hohl D
Phadke S
van Beest B
Publication venue: Geophysical Journal International
Publication date: 01/01/2018
Field of study

This is the author accepted manuscript. The final version is available from Oxford University Press (OUP) via the DOI in this record.Given a 3D heterogeneous velocity model with a few million voxels, fast generation of accurate seismic responses at specified receiver positions from known microseismic event locations is a well-known challenge in geophysics, since it typically involves numerical solution of the computationally expensive elastic wave equation. Thousands of such forward simulations are often a routine requirement for parameter estimation of microseimsic events via a suitable source inversion process. Parameter estimation based on forward modelling is often advantageous over a direct regression-based inversion approach when there are unknown number of parameters to be estimated and the seismic data has complicated noise characteristics which may not always allow a stable and unique solution in a direct inversion process. In this paper, starting from Graphics Processing Unit (GPU) based synthetic simulations of a few thousand forward seismic shots due to microseismic events via pseudo-spectral solution of elastic wave equation, we develop a step-by-step process to generate a surrogate regression modelling framework, using machine learning techniques that can produce accurate seismograms at specified receiver locations. The trained surrogate models can then be used as a high-speed meta-model/emulator or proxy for the original full elastic wave propagator to generate seismic responses for other microseismic event locations also. The accuracies of the surrogate models have been evaluated using two independent sets of training and testing Latin hypercube (LH) quasi-random samples, drawn from a heterogeneous marine velocity model. The predicted seismograms have been used thereafter to calculate batch likelihood functions, with specified noise characteristics. Finally, the trained models on 23 receivers placed at the sea-bed in a marine velocity model are used to determine the maximum likelihood estimate (MLE) of the event locations which can in future be used in a Bayesian analysis for microseismic event detection.This work has been supported by the Shell Projects and Technology. The Wilkes high performance GPU computing service at the University of Cambridge has been used in this work

Open Research Exeter

Apollo (Cambridge)