10 research outputs found

    Multi-source change-point detection over local observation models

    Get PDF
    In this work, we address the problem of change-point detection (CPD) on high-dimensional, multi-source, and heterogeneous sequential data with missing values. We present a new CPD methodology based on local latent variable models and adaptive factorizations that enhances the fusion of multi-source observations with different statistical data-type and face the problem of high dimensionality. Our motivation comes from behavioral change detection in healthcare measured by smartphone monitored data and Electronic Health Records. Due to the high dimension of the observations and the differences in the relevance of each source information, other works fail in obtaining reliable estimates of the change-points location. This leads to methods that are not sensitive enough when dealing with interspersed changes of different intensity within the same sequence or partial missing components. Through the definition of local observation models (LOMs), we transfer the local CP information to homogeneous latent spaces and propose several factorizations that weight the contribution of each source to the global CPD. With the presented methods we demonstrate a reduction in both the detection delay and the number of not-detected CPs, together with robustness against the presence of missing values on a synthetic dataset. We illustrate its application on real-world data from a smartphone-based monitored study and add explainability on the degree of each source contributing to the detection.This work has been partly supported by Spanish government (AEI/MCI) under grants RTI2018-099655-B-100, PID2021-123182OB-I00, PID2021-125159NB-I00, and TED2021-131823B-I00, by Comunidad de Madrid under grant IND2018/TIC-9649, IND2022/TIC- 23550, by the European Union (FEDER) and the European Research Council (ERC) through the European Union's Horizon 2020 research and innovation program under Grant 714161, and by Comunidad de Madrid and FEDER through IntCARE-CM

    Designing a streaming algorithm for outlier detection in data mining—an incrementa approach

    Get PDF
    To design an algorithm for detecting outliers over streaming data has become an important task in many common applications, arising in areas such as fraud detections, network analysis, environment monitoring and so forth. Due to the fact that real-time data may arrive in the form of streams rather than batches, properties such as concept drift, temporal context, transiency, and uncertainty need to be considered. In addition, data processing needs to be incremental with limited memory resource, and scalable. These facts create big challenges for existing outlier detection algorithms in terms of their accuracies when they are implemented in an incremental fashion, especially in the streaming environment. To address these problems, we first propose C_KDE_WR, which uses sliding window and kernel function to process the streaming data online, and reports its results demonstrating high throughput on handling real-time streaming data, implemented in a CUDA framework on Graphics Processing Unit (GPU). We also present another algorithm, C_LOF, based on a very popular and effective outlier detection algorithm called Local Outlier Factor (LOF) which unfortunately works only on batched data. Using a novel incremental approach that compensates the drawback of high complexity in LOF, we show how to implement it in a streaming context and to obtain results in a timely manner. Like C_KDE_WR, C_LOF also employs sliding-window and statistical-summary to help making decision based on the data in the current window. It also addresses all those challenges of streaming data as addressed in C_KDE_WR. In addition, we report the comparative evaluation on the accuracy of C_KDE_WR with the state-of-the-art SOD_GPU using Precision, Recall and F-score metrics. Furthermore, a t-test is also performed to demonstrate the significance of the improvement. We further report the testing results of C_LOF on different parameter settings and drew ROC and PR curve with their area under the curve (AUC) and Average Precision (AP) values calculated respectively. Experimental results show that C_LOF can overcome the masquerading problem, which often exists in outlier detection on streaming data. We provide complexity analysis and report experiment results on the accuracy of both C_KDE_WR and C_LOF algorithms in order to evaluate their effectiveness as well as their efficiencies

    Anomaly-based exploratory analysis and detection of exploits in android mediaserver

    Get PDF
    Smartphone platforms are becoming increasingly complex, which gives way to software vulnerabilities difficult to identify and that might allow malware developers to gain unauthorised privileges through technical exploitation. However, the authors maintain that these types of attacks indirectly renders a number of unexpected behaviours in the system that can be profiled. In this work, the authors present CoME , an anomaly-based methodology aiming at detecting software exploitation in Android systems. CoME models the normal behaviour of a given software component or service and it is capable of identifying any unanticipated behaviour. To this end, they first monitor the normal operation of a given exploitable component through lightweight virtual introspection. Then, they use a multivariate analysis approach to estimate the normality model and detect anomalies. They evaluate their system against one of the most critical vulnerable and widely exploited services in Android, i.e. the mediaserver. Results show that the proposed approach can not only provide a meaningful explanatory of discriminant features for illegitimate activities, but can also be used to accurately detect malicious software exploitations at runtime

    Error sensitivity analysis of Delta divergence - a novel measure for classifier incongruence detection

    Get PDF
    The state of classifier incongruence in decision making systems incorporating multiple classifiers is often an indicator of anomaly caused by an unexpected observation or an unusual situation. Its assessment is important as one of the key mechanisms for domain anomaly detection. In this paper, we investigate the sensitivity of Delta divergence, a novel measure of classifier incongruence, to estimation errors. Statistical properties of Delta divergence are analysed both theoretically and experimentally. The results of the analysis provide guidelines on the selection of threshold for classifier incongruence detection based on this measure

    Detecting Outliers for Improving the Quality of Incident Duration Prediction

    Get PDF
    To circumvent the needs of domain expertise and the excessive data for developing a knowledge-based prediction system such as the I-95 incident duration estimation model, this study has developed an efficient transferability analysis method to assess the applicability of adopting the prediction rules from an existing well-developed model to a different highway. The proposed analysis method has considered the common nature of incident response operations and local-specific incident characteristics in assessing the transferability of available knowledge-based rules for estimating the required clearance duration of different types of incidents. Evaluation of the proposed method with the I-695 incident records clearly shows that the prediction model developed with such an effective transferring method can achieve the same level of performance as with the original rule-searching and refinement method.Since most incident records for model development are collected on-line during the emergency incident response process, some of the key data are likely to be misrecorded which inevitably causes many existing models to yield undesirable performance, especially with respect to those incidents with insufficient records or excessive long duration. As such, this study has also developed a two-phase outlier detection process for identifying outliers and removing those viewed as faulty records from the dataset for model calibration and model evaluation. Using the I-695 incident records for a case study, the resulting performance of the proposed two-phase outlier detection process has proved its promising property for filtering faculty data from the incident records prior to the use for model development

    Featured Anomaly Detection Methods and Applications

    Get PDF
    Anomaly detection is a fundamental research topic that has been widely investigated. From critical industrial systems, e.g., network intrusion detection systems, to people’s daily activities, e.g., mobile fraud detection, anomaly detection has become the very first vital resort to protect and secure public and personal properties. Although anomaly detection methods have been under consistent development over the years, the explosive growth of data volume and the continued dramatic variation of data patterns pose great challenges on the anomaly detection systems and are fuelling the great demand of introducing more intelligent anomaly detection methods with distinct characteristics to cope with various needs. To this end, this thesis starts with presenting a thorough review of existing anomaly detection strategies and methods. The advantageous and disadvantageous of the strategies and methods are elaborated. Afterward, four distinctive anomaly detection methods, especially for time series, are proposed in this work aiming at resolving specific needs of anomaly detection under different scenarios, e.g., enhanced accuracy, interpretable results, and self-evolving models. Experiments are presented and analysed to offer a better understanding of the performance of the methods and their distinct features. To be more specific, the abstracts of the key contents in this thesis are listed as follows: 1) Support Vector Data Description (SVDD) is investigated as a primary method to fulfill accurate anomaly detection. The applicability of SVDD over noisy time series datasets is carefully examined and it is demonstrated that relaxing the decision boundary of SVDD always results in better accuracy in network time series anomaly detection. Theoretical analysis of the parameter utilised in the model is also presented to ensure the validity of the relaxation of the decision boundary. 2) To support a clear explanation of the detected time series anomalies, i.e., anomaly interpretation, the periodic pattern of time series data is considered as the contextual information to be integrated into SVDD for anomaly detection. The formulation of SVDD with contextual information maintains multiple discriminants which help in distinguishing the root causes of the anomalies. 3) In an attempt to further analyse a dataset for anomaly detection and interpretation, Convex Hull Data Description (CHDD) is developed for realising one-class classification together with data clustering. CHDD approximates the convex hull of a given dataset with the extreme points which constitute a dictionary of data representatives. According to the dictionary, CHDD is capable of representing and clustering all the normal data instances so that anomaly detection is realised with certain interpretation. 4) Besides better anomaly detection accuracy and interpretability, better solutions for anomaly detection over streaming data with evolving patterns are also researched. Under the framework of Reinforcement Learning (RL), a time series anomaly detector that is consistently trained to cope with the evolving patterns is designed. Due to the fact that the anomaly detector is trained with labeled time series, it avoids the cumbersome work of threshold setting and the uncertain definitions of anomalies in time series anomaly detection tasks

    IMAGE UNDERSTANDING OF MOLAR PREGNANCY BASED ON ANOMALIES DETECTION

    Get PDF
    Cancer occurs when normal cells grow and multiply without normal control. As the cells multiply, they form an area of abnormal cells, known as a tumour. Many tumours exhibit abnormal chromosomal segregation at cell division. These anomalies play an important role in detecting molar pregnancy cancer. Molar pregnancy, also known as hydatidiform mole, can be categorised into partial (PHM) and complete (CHM) mole, persistent gestational trophoblastic and choriocarcinoma. Hydatidiform moles are most commonly found in women under the age of 17 or over the age of 35. Hydatidiform moles can be detected by morphological and histopathological examination. Even experienced pathologists cannot easily classify between complete and partial hydatidiform moles. However, the distinction between complete and partial hydatidiform moles is important in order to recommend the appropriate treatment method. Therefore, research into molar pregnancy image analysis and understanding is critical. The hypothesis of this research project is that an anomaly detection approach to analyse molar pregnancy images can improve image analysis and classification of normal PHM and CHM villi. The primary aim of this research project is to develop a novel method, based on anomaly detection, to identify and classify anomalous villi in molar pregnancy stained images. The novel method is developed to simulate expert pathologists’ approach in diagnosis of anomalous villi. The knowledge and heuristics elicited from two expert pathologists are combined with the morphological domain knowledge of molar pregnancy, to develop a heuristic multi-neural network architecture designed to classify the villi into their appropriated anomalous types. This study confirmed that a single feature cannot give enough discriminative power for villi classification. Whereas expert pathologists consider the size and shape before textural features, this thesis demonstrated that the textural feature has a higher discriminative power than size and shape. The first heuristic-based multi-neural network, which was based on 15 elicited features, achieved an improved average accuracy of 81.2%, compared to the traditional multi-layer perceptron (80.5%); however, the recall of CHM villi class was still low (64.3%). Two further textural features, which were elicited and added to the second heuristic-based multi-neural network, have improved the average accuracy from 81.2% to 86.1% and the recall of CHM villi class from 64.3% to 73.5%. The precision of the multi-neural network II has also increased from 82.7% to 89.5% for normal villi class, from 81.3% to 84.7% for PHM villi class and from 80.8% to 86% for CHM villi class. To support pathologists to visualise the results of the segmentation, a software tool, Hydatidiform Mole Analysis Tool (HYMAT), was developed compiling the morphological and pathological data for each villus analysis
    corecore