1,082 research outputs found

    Concept Drift Identification using Classifier Ensemble Approach

    Get PDF
    Abstract:-In Internetworking system, the huge amount of data is scattered, generated and processed over the network. The data mining techniques are used to discover the unknown pattern from the underlying data. A traditional classification model is used to classify the data based on past labelled data. However in many current applications, data is increasing in size with fluctuating patterns. Due to this new feature may arrive in the data. It is present in many applications like sensornetwork, banking and telecommunication systems, financial domain, Electricity usage and prices based on its demand and supplyetc .Thus change in data distribution reduces the accuracy of classifying the data. It may discover some patterns as frequent while other patterns tend to disappear and wrongly classify. To mine such data distribution, traditionalclassification techniques may not be suitable as the distribution generating the items can change over time so data from the past may become irrelevant or even false for the current prediction. For handlingsuch varying pattern of data, concept drift mining approach is used to improve the accuracy of classification techniques. In this paper we have proposed ensemble approach for improving the accuracy of classifier. The ensemble classifier is applied on 3 different data sets. We investigated different features for the different chunk of data which is further given to ensemble classifier. We observed the proposed approach improves the accuracy of classifier for different chunks of data

    Automated anomaly recognition in real time data streams for oil and gas industry.

    Get PDF
    There is a growing demand for computer-assisted real-time anomaly detection - from the identification of suspicious activities in cyber security, to the monitoring of engineering data for various applications across the oil and gas, automotive and other engineering industries. To reduce the reliance on field experts' knowledge for identification of these anomalies, this thesis proposes a deep-learning anomaly-detection framework that can help to create an effective real-time condition-monitoring framework. The aim of this research is to develop a real-time and re-trainable generic anomaly-detection framework, which is capable of predicting and identifying anomalies with a high level of accuracy - even when a specific anomalous event has no precedent. Machine-based condition monitoring is preferable in many practical situations where fast data analysis is required, and where there are harsh climates or otherwise life-threatening environments. For example, automated conditional monitoring systems are ideal in deep sea exploration studies, offshore installations and space exploration. This thesis firstly reviews studies about anomaly detection using machine learning. It then adopts the best practices from those studies in order to propose a multi-tiered framework for anomaly detection with heterogeneous input sources, which can deal with unseen anomalies in a real-time dynamic problem environment. The thesis then applies the developed generic multi-tiered framework to two fields of engineering: data analysis and malicious cyber attack detection. Finally, the framework is further refined based on the outcomes of those case studies and is used to develop a secure cross-platform API, capable of re-training and data classification on a real-time data feed

    Optimized and Automated Machine Learning Techniques Towards IoT Data Analytics and Cybersecurity

    Get PDF
    The Internet-of-Things (IoT) systems have emerged as a prevalent technology in our daily lives. With the wide spread of sensors and smart devices in recent years, the data generation volume and speed of IoT systems have increased dramatically. In most IoT systems, massive volumes of data must be processed, transformed, and analyzed on a frequent basis to enable various IoT services and functionalities. Machine Learning (ML) approaches have shown their capacity for IoT data analytics. However, applying ML models to IoT data analytics tasks still faces many difficulties and challenges. The first challenge is to process large amounts of dynamic IoT data to make accurate and informed decisions. The second challenge is to automate and optimize the data analytics process. The third challenge is to protect IoT devices and systems against various cyber threats and attacks. To address the IoT data analytics challenges, this thesis proposes various ML-based frameworks and data analytics approaches in several applications. Specifically, the first part of the thesis provides a comprehensive review of applying Automated Machine Learning (AutoML) techniques to IoT data analytics tasks. It discusses all procedures of the general ML pipeline. The second part of the thesis proposes several supervised ML-based novel Intrusion Detection Systems (IDSs) to improve the security of the Internet of Vehicles (IoV) systems and connected vehicles. Optimization techniques are used to obtain optimized ML models with high attack detection accuracy. The third part of the thesis developed unsupervised ML algorithms to identify network anomalies and malicious network entities (e.g., attacker IPs, compromised machines, and polluted files/content) to protect Content Delivery Networks (CDNs) from service targeting attacks, including distributed denial of service and cache pollution attacks. The proposed framework is evaluated on real-world CDN access log data to illustrate its effectiveness. The fourth part of the thesis proposes adaptive online learning algorithms for addressing concept drift issues (i.e., data distribution changes) and effectively handling dynamic IoT data streams in order to provide reliable IoT services. The development of drift adaptive learning methods can effectively adapt to data distribution changes and avoid data analytics model performance degradation

    HOFA: Twitter Bot Detection with Homophily-Oriented Augmentation and Frequency Adaptive Attention

    Full text link
    Twitter bot detection has become an increasingly important and challenging task to combat online misinformation, facilitate social content moderation, and safeguard the integrity of social platforms. Though existing graph-based Twitter bot detection methods achieved state-of-the-art performance, they are all based on the homophily assumption, which assumes users with the same label are more likely to be connected, making it easy for Twitter bots to disguise themselves by following a large number of genuine users. To address this issue, we proposed HOFA, a novel graph-based Twitter bot detection framework that combats the heterophilous disguise challenge with a homophily-oriented graph augmentation module (Homo-Aug) and a frequency adaptive attention module (FaAt). Specifically, the Homo-Aug extracts user representations and computes a k-NN graph using an MLP and improves Twitter's homophily by injecting the k-NN graph. For the FaAt, we propose an attention mechanism that adaptively serves as a low-pass filter along a homophilic edge and a high-pass filter along a heterophilic edge, preventing user features from being over-smoothed by their neighborhood. We also introduce a weight guidance loss to guide the frequency adaptive attention module. Our experiments demonstrate that HOFA achieves state-of-the-art performance on three widely-acknowledged Twitter bot detection benchmarks, which significantly outperforms vanilla graph-based bot detection techniques and strong heterophilic baselines. Furthermore, extensive studies confirm the effectiveness of our Homo-Aug and FaAt module, and HOFA's ability to demystify the heterophilous disguise challenge.Comment: 11 pages, 7 figure

    Time-series clustering for sensor fault detection in large-scale Cyber-Physical Systems

    Get PDF
    Large-scale Cyber-Physical Systems (CPSs) are information systems that involve a vast network of sensor nodes and other devices that stream observations in real-time and typically are deployed in uncontrolled, broad geographical terrains. Sensor node failures are inevitable and unpredictable events in large-scale CPSs, which compromise the integrity of the sensors measurements and potentially reduce the quality of CPSs services and raise serious concerns related to CPSs safety, reliability, performance, and security. While many studies were conducted to tackle the challenge of sensor nodes failure detection using domain-specific solutions, this paper proposes a novel sensor nodes failure detection approach and empirically evaluates its validity using a real-world case study. This paper investigates time-series clustering techniques as a feasible solution to identify sensor nodes malfunctions by detecting long-segmental outliers in their observations' time series. Three different time-series clustering techniques have been investigated using real-world observations collected from two various sensor node networks, one of which consists of 275 temperature sensors distributed around London. This study demonstrates that time-series clustering effectively detects sensor node's continuous (halting/repeating) and incipient faults. It also showed that the feature-based time series clustering technique is a more efficient long-segmental outliers detection mechanism compared to shape-based time-series clustering techniques such as DTW and K-Shape, mainly when applied to shorter time-series windows

    Minimizing Computational Resources for Deep Machine Learning: A Compression and Neural Architecture Search Perspective for Image Classification and Object Detection

    Get PDF
    Computational resources represent a significant bottleneck across all current deep learning computer vision approaches. Image and video data storage requirements for training deep neural networks have led to the widespread use of image and video compression, the use of which naturally impacts the performance of neural network architectures during both training and inference. The prevalence of deep neural networks deployed on edge devices necessitates efficient network architecture design, while training neural networks requires significant time and computational resources, despite the acceleration of both hardware and software developments within the field of artificial intelligence (AI). This thesis addresses these challenges in order to minimize computational resource requirements across the entire end-to-end deep learning pipeline. We determine the extent to which data compression impacts neural network architecture performance, and by how much this performance can be recovered by retraining neural networks with compressed data. The thesis then focuses on the accessibility of the deployment of neural architecture search (NAS) to facilitate automatic network architecture generation for image classification suited to resource-constrained environments. A combined hard example mining and curriculum learning strategy is developed to minimize the image data processed during a given training epoch within the NAS search phase, without diminishing performance. We demonstrate the capability of the proposed framework across all gradient-based, reinforcement learning, and evolutionary NAS approaches, and a simple but effective method to extend the approach to the prediction-based NAS paradigm. The hard example mining approach within the proposed NAS framework depends upon the effectiveness of an autoencoder to regulate the latent space such that similar images have similar feature embeddings. This thesis conducts a thorough investigation to satisfy this constraint within the context of image classification. Based upon the success of the overall proposed NAS framework, we subsequently extend the approach towards object detection. Despite the resultant multi-label domain presenting a more difficult challenge for hard example mining, we propose an extension to the autoencoder to capture the additional object location information encoded within the training labels. The generation of an implicit attention layer within the autoencoder network sufficiently improves its capability to enforce similar images to have similar embeddings, thus successfully transferring the proposed NAS approach to object detection. Finally, the thesis demonstrates the resilience to compression of the general two-stage NAS approach upon which our proposed NAS framework is based

    A Computational Academic Integrity Framework

    Get PDF
    L'abast creixent i la naturalesa canviant dels programes acadèmics constitueixen un repte per a la integritat dels protocols tradicionals de proves i exàmens. L'objectiu d'aquesta tesi és introduir una alternativa als enfocaments tradicionals d'integritat acadèmica, per a cobrir la bretxa del buit de l'anonimat i donar la possibilitat als instructors i administradors acadèmics de fer servir nous mitjans que permetin mantenir la integritat acadèmica i promoguin la responsabilitat, accessibilitat i eficiència, a més de preservar la privadesa i minimitzin la interrupció en el procés d'aprenentatge. Aquest treball té com a objectiu començar un canvi de paradigma en les pràctiques d'integritat acadèmica. La recerca en l'àrea de la identitat de l'estudiant i la garantia de l'autoria són importants perquè la concessió de crèdits d'estudi a entitats no verificades és perjudicial per a la credibilitat institucional i la seguretat pública. Aquesta tesi es basa en la noció que la identitat de l'alumne es compon de dues capes diferents, física i de comportament, en les quals tant els criteris d'identitat com els d'autoria han de ser confirmats per a mantenir un nivell raonable d'integritat acadèmica. Per a això, aquesta tesi s'organitza en tres seccions, cadascuna de les quals aborda el problema des d'una de les perspectives següents: (a) teòrica, (b) empírica i (c) pragmàtica.El creciente alcance y la naturaleza cambiante de los programas académicos constituyen un reto para la integridad de los protocolos tradicionales de pruebas y exámenes. El objetivo de esta tesis es introducir una alternativa a los enfoques tradicionales de integridad académica, para cubrir la brecha del vacío anonimato y dar la posibilidad a los instructores y administradores académicos de usar nuevos medios que permitan mantener la integridad académica y promuevan la responsabilidad, accesibilidad y eficiencia, además de preservar la privacidad y minimizar la interrupción en el proceso de aprendizaje. Este trabajo tiene como objetivo iniciar un cambio de paradigma en las prácticas de integridad académica. La investigación en el área de la identidad del estudiante y la garantía de la autoría son importantes porque la concesión de créditos de estudio a entidades no verificadas es perjudicial para la credibilidad institucional y la seguridad pública. Esta tesis se basa en la noción de que la identidad del alumno se compone de dos capas distintas, física y de comportamiento, en las que tanto los criterios de identidad como los de autoría deben ser confirmados para mantener un nivel razonable de integridad académica. Para ello, esta tesis se organiza en tres secciones, cada una de las cuales aborda el problema desde una de las siguientes perspectivas: (a) teórica, (b) empírica y (c) pragmática.The growing scope and changing nature of academic programmes provide a challenge to the integrity of traditional testing and examination protocols. The aim of this thesis is to introduce an alternative to the traditional approaches to academic integrity, bridging the anonymity gap and empowering instructors and academic administrators with new ways of maintaining academic integrity that preserve privacy, minimize disruption to the learning process, and promote accountability, accessibility and efficiency. This work aims to initiate a paradigm shift in academic integrity practices. Research in the area of learner identity and authorship assurance is important because the award of course credits to unverified entities is detrimental to institutional credibility and public safety. This thesis builds upon the notion of learner identity consisting of two distinct layers (a physical layer and a behavioural layer), where the criteria of identity and authorship must both be confirmed to maintain a reasonable level of academic integrity. To pursue this goal in organized fashion, this thesis has the following three sections: (a) theoretical, (b) empirical, and (c) pragmatic
    corecore