Search CORE

65 research outputs found

Prediction of Anomalies for a Dynamic Data System in an Optimized Approach

Author: Nethravathi K, Dr.Anna Saro Vijendran
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/09/2014
Field of study

No Abstrac

International Journal on Recent and Innovation Trends in Computing and Communication

Positive Unlabeled Learning Algorithm for One Class Classification of Social Text Stream with only very few Positive Training Samples

Author: Vishwakarma Abhinandan
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 30/03/2015
Field of study

Text classification using a small labelled set (positive data set) and large unlabeled data is seen as a promising technique especially in case of text stream classification where it is highly possible that only few positive data and no negative data is available. This paper studies how to devise a positive and unlabeled learning technique for the text stream environment. Our proposed approach works in two steps. Firstly we use the PNLH (Positive example and negative example labelling heuristic) approach for extracting both positive and negative example from unlabeled data. This extraction would enable us to obtain an enriched vector representation for the new test messages. Secondly we construct a one class classifier by using one class SVM classifier. Using the enriched vector representation as the input in one class SVM classifier predicts the importance level of each text message. Keywords: Positive and unlabeled learning, one class SVM (Support Vector Machine), one class classification, text stream classification

CiteSeerX

International Institute for Science, Technology and Education (IISTE): E-Journals

An Incremental Construction of Deep Neuro Fuzzy System for Continual Learning of Non-stationary Data Streams

Author: Pedrycz Witold
Pratama Mahardhika
Webb Geoffrey I.
Publication venue
Publication date: 01/01/2019
Field of study

Existing FNNs are mostly developed under a shallow network configuration having lower generalization power than those of deep structures. This paper proposes a novel self-organizing deep FNN, namely DEVFNN. Fuzzy rules can be automatically extracted from data streams or removed if they play limited role during their lifespan. The structure of the network can be deepened on demand by stacking additional layers using a drift detection method which not only detects the covariate drift, variations of input space, but also accurately identifies the real drift, dynamic changes of both feature space and target space. DEVFNN is developed under the stacked generalization principle via the feature augmentation concept where a recently developed algorithm, namely gClass, drives the hidden layer. It is equipped by an automatic feature selection method which controls activation and deactivation of input attributes to induce varying subsets of input features. A deep network simplification procedure is put forward using the concept of hidden layer merging to prevent uncontrollable growth of dimensionality of input space due to the nature of feature augmentation approach in building a deep network structure. DEVFNN works in the sample-wise fashion and is compatible for data stream applications. The efficacy of DEVFNN has been thoroughly evaluated using seven datasets with non-stationary properties under the prequential test-then-train protocol. It has been compared with four popular continual learning algorithms and its shallow counterpart where DEVFNN demonstrates improvement of classification accuracy. Moreover, it is also shown that the concept drift detection method is an effective tool to control the depth of network structure while the hidden layer merging scenario is capable of simplifying the network complexity of a deep network with negligible compromise of generalization performance.Comment: This paper has been published in IEEE Transactions on Fuzzy System

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Dynamic Data Mining: Methodology and Algorithms

Author: Deng Xiong
Deng Xiong
Publication venue: Computing, Imperial College London
Publication date: 01/07/2011
Field of study

Supervised data stream mining has become an important and challenging data mining task in modern organizations. The key challenges are threefold: (1) a possibly infinite number of streaming examples and time-critical analysis constraints; (2) concept drift; and (3) skewed data distributions. To address these three challenges, this thesis proposes the novel dynamic data mining (DDM) methodology by effectively applying supervised ensemble models to data stream mining. DDM can be loosely defined as categorization-organization-selection of supervised ensemble models. It is inspired by the idea that although the underlying concepts in a data stream are time-varying, their distinctions can be identified. Therefore, the models trained on the distinct concepts can be dynamically selected in order to classify incoming examples of similar concepts. First, following the general paradigm of DDM, we examine the different concept-drifting stream mining scenarios and propose corresponding effective and efficient data mining algorithms. • To address concept drift caused merely by changes of variable distributions, which we term pseudo concept drift, base models built on categorized streaming data are organized and selected in line with their corresponding variable distribution characteristics. • To address concept drift caused by changes of variable and class joint distributions, which we term true concept drift, an effective data categorization scheme is introduced. A group of working models is dynamically organized and selected for reacting to the drifting concept. Secondly, we introduce an integration stream mining framework, enabling the paradigm advocated by DDM to be widely applicable for other stream mining problems. Therefore, we are able to introduce easily six effective algorithms for mining data streams with skewed class distributions. In addition, we also introduce a new ensemble model approach for batch learning, following the same methodology. Both theoretical and empirical studies demonstrate its effectiveness. Future work would be targeted at improving the effectiveness and efficiency of the proposed algorithms. Meantime, we would explore the possibilities of using the integration framework to solve other open stream mining research problems

Spiral - Imperial College Digital Repository

Predictive Pattern Discovery in Dynamic Data Systems

Author: Zhang Wenjing
Publication venue: e-Publications@Marquette
Publication date: 01/01/2013
Field of study

This dissertation presents novel methods for analyzing nonlinear time series in dynamic systems. The purpose of the newly developed methods is to address the event prediction problem through modeling of predictive patterns. Firstly, a novel categorization mechanism is introduced to characterize different underlying states in the system. A new hybrid method was developed utilizing both generative and discriminative models to address the event prediction problem through optimization in multivariate systems. Secondly, in addition to modeling temporal dynamics, a Bayesian approach is employed to model the first-order Markov behavior in the multivariate data sequences. Experimental evaluations demonstrated superior performance over conventional methods, especially when the underlying system is chaotic and has heterogeneous patterns during state transitions. Finally, the concept of adaptive parametric phase space is introduced. The equivalence between time-domain phase space and associated parametric space is theoretically analyzed

epublications@Marquette

Multiple adaptive mechanisms for predictive models on streaming data.

Author: Bakirov Rashid
Publication venue
Publication date
Field of study

Making predictions on non-stationary streaming data remains a challenge in many application areas. Changes in data may cause a decrease in predictive accuracy, which in a streaming setting require a prompt response. In recent years many adaptive predictive models have been proposed for dealing with these issues. Most of these methods use more than one adaptive mechanism, deploying all of them at the same time at regular intervals or in some other fixed manner. However, this manner is often determined in an ad-hoc way, as the effects of adaptive mechanisms are largely unexplored. This thesis therefore investigates different aspects of adaptation with multiple adaptive mechanisms with the aim to increase knowledge in the area, and propose heuristic approaches for more accurate adaptive predictive models. This is done by systematising and formalising the “adaptive mechanism” notion, proposing a categorisation of adaptive mechanisms and a metric to measure their usefulness, comparing the results after deployment of different orders of adaptive mechanisms during the run of the predictive method, and suggesting techniques on how to select the most appropriate adaptive mechanisms. The literature review suggests that during the prediction process, adaptive mechanisms are selected to be deployed in a certain order which is usually fixed beforehand at the design time of the algorithm. For this reason, it was investigated whether changing the selection method for the adaptive mechanisms significantly affects predictive accuracy and whether there are certain deployment orders which provide better results than others. Commonly used adaptive mechanism selection methods are then examined and new methods are proposed. A novel regression ensemble method which uses several common adaptive mechanisms has been developed to be used as a vehicle for the experimentation. The predictive accuracy and behaviour of adaptive mechanisms while predicting on different real world datasets from the process industry were analysed. Empirical results suggest that different selection of adaptive mechanisms result in significantly different performance. It has been found that while some adaptive mechanisms adapt the predictive model better than others, there is none which is the best at all times. Finally, flexible orders of adaptive mechanisms generated using the proposed selection techniques often result in significantly more accurate models than fixed orders commonly used in literature

Bournemouth University Research Online

Machine learning: statistical physics based theory and smart industry applications

Author: Straat Michiel
Publication venue: 'University of Groningen Press'
Publication date: 01/01/2022
Field of study

The increasing computational power and the availability of data have made it possible to train ever-bigger artificial neural networks. These so-called deep neural networks have been used for impressive applications, like advanced driver assistance and support in medical diagnoses. However, various vulnerabilities have been revealed and there are many open questions concerning the workings of neural networks. Theoretical analyses are therefore essential for further progress. One current question is: why is it that networks with Rectified Linear Unit (ReLU) activation seemingly perform better than networks with sigmoidal activation?We contribute to the answer to this question by comparing ReLU networks with sigmoidal networks in diverse theoretical learning scenarios. In contrast to analysing specific datasets, we use a theoretical modelling using methods from statistical physics. They give the typical learning behaviour for chosen model scenarios. We analyse both the learning behaviour on a fixed dataset and on a data stream in the presence of a changing task. The emphasis is on the analysis of the network’s transition to a state wherein specific concepts have been learnt. We find significant benefits of ReLU networks: they exhibit continuous increases of their performance and adapt more quickly to changing tasks.In the second part of the thesis we treat applications of machine learning: we design a quick quality control method for material in a production line and study the relationship with product faults. Furthermore, we introduce a methodology for the interpretable classification of time series data

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

On utilising change over time in data mining

Author: Böttcher Mirko
Publication venue: Universitätsbibl.
Publication date
Field of study

Magdeburg, Univ., Fak. für Informatik, Diss., 2013von Mirko Böttche

Digital University Library Saxony-Anhalt

Computational Intelligence for the Micro Learning

Author: Lin Jiayin
Publication venue: School of Computing and Information Technology
Publication date: 01/01/2021
Field of study

The developments of the Web technology and the mobile devices have blurred the time and space boundaries of people’s daily activities, which enable people to work, entertain, and learn through the mobile device at almost anytime and anywhere. Together with the life-long learning requirement, such technology developments give birth to a new learning style, micro learning. Micro learning aims to effectively utilise learners’ fragmented spare time and carry out personalised learning activities. However, the massive volume of users and the online learning resources force the micro learning system deployed in the context of enormous and ubiquitous data. Hence, manually managing the online resources or user information by traditional methods are no longer feasible. How to utilise computational intelligence based solutions to automatically managing and process different types of massive information is the biggest research challenge for realising the micro learning service. As a result, to facilitate the micro learning service in the big data era efficiently, we need an intelligent system to manage the online learning resources and carry out different analysis tasks. To this end, an intelligent micro learning system is designed in this thesis. The design of this system is based on the service logic of the micro learning service. The micro learning system consists of three intelligent modules: learning material pre-processing module, learning resource delivery module and the intelligent assistant module. The pre-processing module interprets the content of the raw online learning resources and extracts key information from each resource. The pre-processing step makes the online resources ready to be used by other intelligent components of the system. The learning resources delivery module aims to recommend personalised learning resources to the target user base on his/her implicit and explicit user profiles. The goal of the intelligent assistant module is to provide some evaluation or assessment services (such as student dropout rate prediction and final grade prediction) to the educational resource providers or instructors. The educational resource providers can further refine or modify the learning materials based on these assessment results

Research Online

Demand forecasting using exogenous leading indicators

Author: Aghezzaf El-Houssaine
Desmet Bram
Kourentzes Nikolaos
Sagaert Yves
Publication venue
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography