9 research outputs found

    An ensemble learning framework for anomaly detection in building energy consumption

    Get PDF
    During building operation, a significant amount of energy is wasted due to equipment and human-related faults. To reduce waste, today\u27s smart buildings monitor energy usage with the aim of identifying abnormal consumption behaviour and notifying the building manager to implement appropriate energy-saving procedures. To this end, this research proposes a new pattern-based anomaly classifier, the collective contextual anomaly detection using sliding window (CCAD-SW) framework. The CCAD-SW framework identifies anomalous consumption patterns using overlapping sliding windows. To enhance the anomaly detection capacity of the CCAD-SW, this research also proposes the ensemble anomaly detection (EAD) framework. The EAD is a generic framework that combines several anomaly detection classifiers using majority voting. To ensure diversity of anomaly classifiers, the EAD is implemented by combining pattern-based (e.g., CCAD-SW) and prediction-based anomaly classifiers. The research was evaluated using real-world data provided by Powersmiths, located in Brampton, Ontario, Canada. Results show that the EAD framework improved the sensitivity of the CCAD-SW by 3.6% and reduced false alarm rate by 2.7%

    Collective Contextual Anomaly Detection for Building Energy Consumption

    Get PDF
    Commercial and residential buildings are responsible for a substantial portion of total global energy consumption and as a result make a significant contribution to global carbon emissions. Hence, energy-saving goals that target buildings can have a major impact in reducing environmental damage. During building operation, a significant amount of energy is wasted due to equipment and human-related faults. To reduce waste, today\u27s smart buildings monitor energy usage with the aim of identifying abnormal consumption behaviour and notifying the building manager to implement appropriate energy-saving procedures. To this end, this research proposes the \textit{ensemble anomaly detection} (EAD) framework. The EAD is a generic framework that combines several anomaly detection classifiers using majority voting. This anomaly detection classifiers are formed using existing machine learning algorithm. It is assumed that each anomaly classifier has equal weight. More importantly, to ensure diversity of anomaly classifiers, the EAD is implemented by combining pattern-based and prediction-based anomaly classifiers. For this reason, this research also proposes a new pattern-based anomaly classifier, the \textit{collective contextual anomaly detection using sliding window} (CCAD-SW) framework. The CCAD-SW, which is also a machine leaning-based framework that identifies anomalous consumption patterns using overlapping sliding windows. The EAD framework combines the CCAD-SW, which is implemented using autoencoder, with two prediction-based anomaly classifiers that are implemented using the support vector regression and random forest machine-learning algorithms. In addition, it determines an ensemble threshold that yields an anomaly classifier with optimal anomaly detection capability and false positive minimization. Results show that the EAD performs better than the individual anomaly detection classifiers. In the EAD framework, the optimal ensemble anomaly classifier is not attained by combining the individual learners at their respective optimal performance levels. Instead, an ensemble threshold combination that yields the optimal anomaly classifier was identified by searching through the ensemble threshold space. The research was evaluated using real-world data provided by Powersmiths, located in Brampton, Ontario, Canada

    Ensemble Methods for Anomaly Detection

    Get PDF
    Anomaly detection has many applications in numerous areas such as intrusion detection, fraud detection, and medical diagnosis. Most current techniques are specialized for detecting one type of anomaly, and work well on specific domains and when the data satisfies specific assumptions. We address this problem, proposing ensemble anomaly detection techniques that perform well in many applications, with four major contributions: using bootstrapping to better detect anomalies on multiple subsamples, sequential application of diverse detection algorithms, a novel adaptive sampling and learning algorithm in which the anomalies are iteratively examined, and improving the random forest algorithms for detecting anomalies in streaming data. We design and evaluate multiple ensemble strategies using score normalization, rank aggregation and majority voting, to combine the results from six well-known base algorithms. We propose a bootstrapping algorithm in which anomalies are evaluated from multiple subsets of the data. Results show that our independent ensemble performs better than the base algorithms, and using bootstrapping achieves competitive quality and faster runtime compared with existing works. We develop new sequential ensemble algorithms in which the second algorithm performs anomaly detection based on the first algorithm\u27s outputs; best results are obtained by combining algorithms that are substantially different. We propose a novel adaptive sampling algorithm which uses the score output of the base algorithm to determine the hard-to-detect examples, and iteratively resamples more points from such examples in a complete unsupervised context. On streaming datasets, we analyze the impact of parameters used in random trees, and propose new algorithms that work well with high-dimensional data, improving performance without increasing the number of trees or their heights. We show that further improvements can be obtained with an Evolutionary Algorithm

    EDMON - Electronic Disease Surveillance and Monitoring Network: A Personalized Health Model-based Digital Infectious Disease Detection Mechanism using Self-Recorded Data from People with Type 1 Diabetes

    Get PDF
    Through time, we as a society have been tested with infectious disease outbreaks of different magnitude, which often pose major public health challenges. To mitigate the challenges, research endeavors have been focused on early detection mechanisms through identifying potential data sources, mode of data collection and transmission, case and outbreak detection methods. Driven by the ubiquitous nature of smartphones and wearables, the current endeavor is targeted towards individualizing the surveillance effort through a personalized health model, where the case detection is realized by exploiting self-collected physiological data from wearables and smartphones. This dissertation aims to demonstrate the concept of a personalized health model as a case detector for outbreak detection by utilizing self-recorded data from people with type 1 diabetes. The results have shown that infection onset triggers substantial deviations, i.e. prolonged hyperglycemia regardless of higher insulin injections and fewer carbohydrate consumptions. Per the findings, key parameters such as blood glucose level, insulin, carbohydrate, and insulin-to-carbohydrate ratio are found to carry high discriminative power. A personalized health model devised based on a one-class classifier and unsupervised method using selected parameters achieved promising detection performance. Experimental results show the superior performance of the one-class classifier and, models such as one-class support vector machine, k-nearest neighbor and, k-means achieved better performance. Further, the result also revealed the effect of input parameters, data granularity, and sample sizes on model performances. The presented results have practical significance for understanding the effect of infection episodes amongst people with type 1 diabetes, and the potential of a personalized health model in outbreak detection settings. The added benefit of the personalized health model concept introduced in this dissertation lies in its usefulness beyond the surveillance purpose, i.e. to devise decision support tools and learning platforms for the patient to manage infection-induced crises

    An Ensemble Self-Structuring Neural Network Approach to Solving Classification Problems with Virtual Concept Drift and its Application to Phishing Websites

    Get PDF
    Classification in data mining is one of the well-known tasks that aim to construct a classification model from a labelled input data set. Most classification models are devoted to a static environment where the complete training data set is presented to the classification algorithm. This data set is assumed to cover all information needed to learn the pertinent concepts (rules and patterns) related to how to classify unseen examples to predefined classes. However, in dynamic (non-stationary) domains, the set of features (input data attributes) may change over time. For instance, some features that are considered significant at time Ti might become useless or irrelevant at time Ti+j. This situation results in a phenomena called Virtual Concept Drift. Yet, the set of features that are dropped at time Ti+j might return to become significant again in the future. Such a situation results in the so-called Cyclical Concept Drift, which is a direct result of the frequently called catastrophic forgetting dilemma. Catastrophic forgetting happens when the learning of new knowledge completely removes the previously learned knowledge. Phishing is a dynamic classification problem where a virtual concept drift might occur. Yet, the virtual concept drift that occurs in phishing might be guided by some malevolent intelligent agent rather than occurring naturally. One reason why phishers keep changing the features combination when creating phishing websites might be that they have the ability to interpret the anti-phishing tool and thus they pick a new set of features that can circumvent it. However, besides the generalisation capability, fault tolerance, and strong ability to learn, a Neural Network (NN) classification model is considered as a black box. Hence, if someone has the skills to hack into the NN based classification model, he might face difficulties to interpret and understand how the NN processes the input data in order to produce the final decision (assign class value). In this thesis, we investigate the problem of virtual concept drift by proposing a framework that can keep pace with the continuous changes in the input features. The proposed framework has been applied to phishing websites classification problem and it shows competitive results with respect to various evaluation measures (Harmonic Mean (F1-score), precision, accuracy, etc.) when compared to several other data mining techniques. The framework creates an ensemble of classifiers (group of classifiers) and it offers a balance between stability (maintaining previously learned knowledge) and plasticity (learning knowledge from the newly offered training data set). Hence, the framework can also handle the cyclical concept drift. The classifiers that constitute the ensemble are created using an improved Self-Structuring Neural Networks algorithm (SSNN). Traditionally, NN modelling techniques rely on trial and error, which is a tedious and time-consuming process. The SSNN simplifies structuring NN classifiers with minimum intervention from the user. The framework evaluates the ensemble whenever a new data set chunk is collected. If the overall accuracy of the combined results from the ensemble drops significantly, a new classifier is created using the SSNN and added to the ensemble. Overall, the experimental results show that the proposed framework affords a balance between stability and plasticity and can effectively handle the virtual concept drift when applied to phishing websites classification problem. Most of the chapters of this thesis have been subject to publicatio

    Real Time Anomaly Detection Using Ensembles

    No full text

    Ensemble methods in intrusion detection

    Get PDF
    As services are being deployed on the internet, there is the need to secure the infrastructure from malicious attacks. Intrusion detection serves as a second line of defense apart from firewall and cryptography. There are many techniques employed in intrusion detection which include signature detection, anomaly and specification based detection system. These techniques often trade off accuracy with false positive rate. In this study, anomaly detection using ensembles is used to automatically classify and detect attack patterns. It has been proven that ensembles of classifier outperform their base classifiers. Several multiples of classifiers have been combined to improve the performance of intrusion detection system. Commonly used classifiers include Support Vector Machines, Decision Trees, Genetic Algorithms, Fuzzy, Principal Component Analysis. The study employed KStar clustering and Instance Based classification algorithms to detect intrusions in NSL-KDD dataset. The results show that the ensemble we designed has a 1-error rate of 99.67% and false positive 0.33%. The response time of the anomaly is 0.18seconds. The chosen ensemble outperformed the rest of the ensembles (rPART & SMO and J48) and the base classifiers. The performance of the combiners has showed that the study has built a model with high detection, and reduced error
    corecore