Search CORE

193 research outputs found

Real-time pre-processing technique for drift detection, feature tracking, and feature selection using adaptive micro-clusters for data stream classification

Author: Hammoodi Mahmood Shakir
Publication venue
Publication date: 01/01/2018
Field of study

Data streams are unbounded, sequential data instances that are generated with high Velocity. Data streams arrive online (i.e., instance by instance) and there is no control over the order in which data instances arrive either within a data stream or across data streams. Classifying sequential data instances is a challenging problem in machine learning with applications in network intrusion detection, financial markets and sensor networks. The automatic labelling of unseen instances from the stream in real-time is the main challenge that data stream classification faces. For this, the classifier needs to adapt to concept drifts and can only have a single-pass through the data with a limited amount of memory if the stream is generating data instances at a high Velocity. Nowadays the focus of Data Stream Mining (DSM) lies in the development of data mining algorithms rather than on pre-processing techniques. To the best of the author knowledge, at present, there are no developments for truly real-time feature selection in a streaming setting. This research work presents a real-time pre-processing technique, in particular, feature tracking in combination with concept drift detection. The feature tracking is designed to improve DSM classification algorithms by enabling real-time feature selection. The pre-processing technique is based on tracking adaptive statistical summaries of the data and class label distributions, known as Micro-Clusters. Thus the three objectives of this research were to develop a real-time pre-processing technique that can (1) detect a concept drift, (2) identify features that were involved in concept drift and thus potentially change their relevance and (3) build a real-time feature selection method based on the developments mentioned above. The evaluation of the developed technique is based on artificial data streams with known ground truth and real datasets with and without artificially induced concept drift (i.e., controlled and uncontrolled real datasets). It was observed that the developed method for concept drift detection did detect induced concept drifts very well compared with alternative concept drift detection methods. Overall the research represents a first attempt to resolve real-time feature selection for DSM tasks. It has been shown that the technique can indeed identify concept drift, track features, and identify features that may have changed their relevance for the DSM task in real-time. It has also been shown that the developed method for real-time feature selection can improve the accuracy of data stream classification tasks

Advanced Adaptive Classifier Methods for Data Streams

Author: Gunasekara Nuwan Amila
Publication venue: The University of Waikato
Publication date: 14/11/2023
Field of study

The exponential growth of the internet has resulted in an overwhelming influx of big data. However, traditional batch learning models face significant obstacles in effectively learning from these vast and constantly evolving data streams and generating up-to-date outcomes. To overcome these limitations, Stream Learning (SL) has emerged as a promising solution that enables continuous learning from evolving data streams and adapts to changes in input distributions. This thesis focuses on the classification task of SL, specifically investigating streaming gradient-boosted trees and Neural Network (NN)s. Firstly, we introduce Streaming Gradient Boosted Trees (SGBT), a novel gradient-boosted method designed explicitly for SL classification. Next, we propose Continuously Adaptive Neural Networks for Data Streams (CAND), an architecture-agnostic NN approach for evolving data stream classification. Both SGBT and CAND outperform current state-of-the-art bagging and random forest-based SL methods, demonstrating their superiority in handling evolving data stream classification tasks. Online Continual Learning (OCL) addresses the issue where NN learning from an evolving data stream forgets its past knowledge when confronted with a distribution shift. Online Domain Incremental Continual Learning (ODICL) is a specific variant of OCL where the input data distribution changes from one task to another. We propose two innovative methods: Online Domain Incremental Pool (ODIP) and Online Domain Incremental Networks (ODIN), for ODICL. The proposed methods leverage existing well-researched SL techniques described in Online Streaming Continual Learning (OSCL). ODIP and ODIN outperform current regularization methods without needing a replay buffer. ODIN achieves competitive results compared to replay-based methods. Both methods are ideal candidates for privacy-concerned ODICL scenarios, offering alternatives to regularization-based approaches. Overall, this thesis explores advancements in SL classification and ODICL, presenting novel techniques that surpass existing approaches in their respective domains. These contributions have significant implications for addressing the challenges posed by evolving data streams in the era of big data

Online Moving Object Visualization with Geo-Referenced Data

Author: Zhao Guangqiang
Publication venue: FIU Digital Commons
Publication date: 13/11/2015
Field of study

As a result of the rapid evolution of smart mobile devices and the wide application of satellite-based positioning devices, the moving object database (MOD) has become a hot research topic in recent years. The moving objects generate a large amount of geo-referenced data in different types, such as videos, audios, images and sensor logs. In order to better analyze and utilize the data, it is useful and necessary to visualize the data on a map. With the rise of web mapping, visualizing the moving object and geo-referenced data has never been so easy. While displaying the trajectory of a moving object is a mature technology, there is little research on visualizing both the location and data of the moving objects in a synchronized manner. This dissertation proposes a general moving object visualization model to address the above problem. This model divides the spatial data visualization systems into four categories. Another contribution of this dissertation is to provide a framework, which deals with all these visualization tasks with synchronization control in mind. This platform relies on the TerraFly web mapping system. To evaluate the universality and effectiveness of the proposed framework, this dissertation presents four visualization systems to deal with a variety of situations and different data types

Inferring Complex Activities for Context-aware Systems within Smart Environments

Author: Triboan Darpan
Publication venue: Faculty of Computing, Engineering and Media
Publication date: 01/04/2020
Field of study

The rising ageing population worldwide and the prevalence of age-related conditions such as physical fragility, mental impairments and chronic diseases have significantly impacted the quality of life and caused a shortage of health and care services. Over-stretched healthcare providers are leading to a paradigm shift in public healthcare provisioning. Thus, Ambient Assisted Living (AAL) using Smart Homes (SH) technologies has been rigorously investigated to help address the aforementioned problems. Human Activity Recognition (HAR) is a critical component in AAL systems which enables applications such as just-in-time assistance, behaviour analysis, anomalies detection and emergency notifications. This thesis is aimed at investigating challenges faced in accurately recognising Activities of Daily Living (ADLs) performed by single or multiple inhabitants within smart environments. Specifically, this thesis explores five complementary research challenges in HAR. The first study contributes to knowledge by developing a semantic-enabled data segmentation approach with user-preferences. The second study takes the segmented set of sensor data to investigate and recognise human ADLs at multi-granular action level; coarse- and fine-grained action level. At the coarse-grained actions level, semantic relationships between the sensor, object and ADLs are deduced, whereas, at fine-grained action level, object usage at the satisfactory threshold with the evidence fused from multimodal sensor data is leveraged to verify the intended actions. Moreover, due to imprecise/vague interpretations of multimodal sensors and data fusion challenges, fuzzy set theory and fuzzy web ontology language (fuzzy-OWL) are leveraged. The third study focuses on incorporating uncertainties caused in HAR due to factors such as technological failure, object malfunction, and human errors. Hence, existing studies uncertainty theories and approaches are analysed and based on the findings, probabilistic ontology (PR-OWL) based HAR approach is proposed. The fourth study extends the first three studies to distinguish activities conducted by more than one inhabitant in a shared smart environment with the use of discriminative sensor-based techniques and time-series pattern analysis. The final study investigates in a suitable system architecture with a real-time smart environment tailored to AAL system and proposes microservices architecture with sensor-based off-the-shelf and bespoke sensing methods. The initial semantic-enabled data segmentation study was evaluated with 100% and 97.8% accuracy to segment sensor events under single and mixed activities scenarios. However, the average classification time taken to segment each sensor events have suffered from 3971ms and 62183ms for single and mixed activities scenarios, respectively. The second study to detect fine-grained-level user actions was evaluated with 30 and 153 fuzzy rules to detect two fine-grained movements with a pre-collected dataset from the real-time smart environment. The result of the second study indicate good average accuracy of 83.33% and 100% but with the high average duration of 24648ms and 105318ms, and posing further challenges for the scalability of fusion rule creations. The third study was evaluated by incorporating PR-OWL ontology with ADL ontologies and Semantic-Sensor-Network (SSN) ontology to define four types of uncertainties presented in the kitchen-based activity. The fourth study illustrated a case study to extended single-user AR to multi-user AR by combining RFID tags and fingerprint sensors discriminative sensors to identify and associate user actions with the aid of time-series analysis. The last study responds to the computations and performance requirements for the four studies by analysing and proposing microservices-based system architecture for AAL system. A future research investigation towards adopting fog/edge computing paradigms from cloud computing is discussed for higher availability, reduced network traffic/energy, cost, and creating a decentralised system. As a result of the five studies, this thesis develops a knowledge-driven framework to estimate and recognise multi-user activities at fine-grained level user actions. This framework integrates three complementary ontologies to conceptualise factual, fuzzy and uncertainties in the environment/ADLs, time-series analysis and discriminative sensing environment. Moreover, a distributed software architecture, multimodal sensor-based hardware prototypes, and other supportive utility tools such as simulator and synthetic ADL data generator for the experimentation were developed to support the evaluation of the proposed approaches. The distributed system is platform-independent and currently supported by an Android mobile application and web-browser based client interfaces for retrieving information such as live sensor events and HAR results

De Montfort University Open Research Archive

A Comparative Performance Study of Feature Selection Methods for the Anti-spam Filtering Domain

Author: Corchado Rodríguez Juan Manuel
Díaz Gómez Fernando
Fernández Riverola Florentino
Iglesias E. L.
Méndez Jose R.
Publication venue: Springer Science + Business Media
Publication date: 01/01/2006
Field of study

In this paper we analyse the strengths and weaknesses of the mainly used feature selection methods in text categorization when they are applied to the spam problem domain. Several experiments with different feature selection methods and content-based filtering techniques are carried out and discussed. Information Gain, χ 2-text, Mutual Information and Document Frequency feature selection methods have been analysed in conjunction with Naïve Bayes, boosting trees, Support Vector Machines and ECUE models in different scenarios. From the experiments carried out the underlying ideas behind feature selection methods are identified and applied for improving the feature selection process of SpamHunting, a novel anti-spam filtering software able to accurate classify suspicious e-mails

Spatio-Temporal Multimedia Big Data Analytics Using Deep Neural Networks

Author: Pouyanfar Samira
Publication venue: FIU Digital Commons
Publication date: 01/01/2019
Field of study

With the proliferation of online services and mobile technologies, the world has stepped into a multimedia big data era, where new opportunities and challenges appear with the high diversity multimedia data together with the huge amount of social data. Nowadays, multimedia data consisting of audio, text, image, and video has grown tremendously. With such an increase in the amount of multimedia data, the main question raised is how one can analyze this high volume and variety of data in an efficient and effective way. A vast amount of research work has been done in the multimedia area, targeting different aspects of big data analytics, such as the capture, storage, indexing, mining, and retrieval of multimedia big data. However, there is insufficient research that provides a comprehensive framework for multimedia big data analytics and management. To address the major challenges in this area, a new framework is proposed based on deep neural networks for multimedia semantic concept detection with a focus on spatio-temporal information analysis and rare event detection. The proposed framework is able to discover the pattern and knowledge of multimedia data using both static deep data representation and temporal semantics. Specifically, it is designed to handle data with skewed distributions. The proposed framework includes the following components: (1) a synthetic data generation component based on simulation and adversarial networks for data augmentation and deep learning training, (2) an automatic sampling model to overcome the imbalanced data issue in multimedia data, (3) a deep representation learning model leveraging novel deep learning techniques to generate the most discriminative static features from multimedia data, (4) an automatic hyper-parameter learning component for faster training and convergence of the learning models, (5) a spatio-temporal deep learning model to analyze dynamic features from multimedia data, and finally (6) a multimodal deep learning fusion model to integrate different data modalities. The whole framework has been evaluated using various large-scale multimedia datasets that include the newly collected disaster-events video dataset and other public datasets