3 research outputs found
Recommended from our members
Real-time pre-processing technique for drift detection, feature tracking, and feature selection using adaptive micro-clusters for data stream classification
Data streams are unbounded, sequential data instances that are generated with high Velocity.
Data streams arrive online (i.e., instance by instance) and there is no control over the order
in which data instances arrive either within a data stream or across data streams. Classifying
sequential data instances is a challenging problem in machine learning with applications in
network intrusion detection, financial markets and sensor networks. The automatic labelling
of unseen instances from the stream in real-time is the main challenge that data stream classification
faces. For this, the classifier needs to adapt to concept drifts and can only have a
single-pass through the data with a limited amount of memory if the stream is generating data
instances at a high Velocity. Nowadays the focus of Data Stream Mining (DSM) lies in the
development of data mining algorithms rather than on pre-processing techniques. To the best
of the author knowledge, at present, there are no developments for truly real-time feature selection
in a streaming setting. This research work presents a real-time pre-processing technique,
in particular, feature tracking in combination with concept drift detection. The feature tracking
is designed to improve DSM classification algorithms by enabling real-time feature selection.
The pre-processing technique is based on tracking adaptive statistical summaries of the data
and class label distributions, known as Micro-Clusters. Thus the three objectives of this research
were to develop a real-time pre-processing technique that can (1) detect a concept drift,
(2) identify features that were involved in concept drift and thus potentially change their relevance
and (3) build a real-time feature selection method based on the developments mentioned
above. The evaluation of the developed technique is based on artificial data streams with known
ground truth and real datasets with and without artificially induced concept drift (i.e., controlled
and uncontrolled real datasets). It was observed that the developed method for concept drift
detection did detect induced concept drifts very well compared with alternative concept drift
detection methods. Overall the research represents a first attempt to resolve real-time feature
selection for DSM tasks. It has been shown that the technique can indeed identify concept drift,
track features, and identify features that may have changed their relevance for the DSM task in
real-time. It has also been shown that the developed method for real-time feature selection can
improve the accuracy of data stream classification tasks
Recommended from our members
Real-time feature selection technique with concept drift detection using adaptive micro-clusters for data stream mining
Data streams are unbounded, sequential data instances that are generated with high Velocity. Classifying sequential data instances is a very challenging problem in machine learning with applications in network intrusion detection, financial markets and applications requiring real-time sensor-networks-based situation assessment. Data stream classification is concerned with the automatic labelling of unseen instances from the stream in real-time. For this the classifier needs to adapt to concept drifts and can only have a single pass through the data if the stream is fast moving. This research paper presents work on a real-time pre-processing technique, in particular feature tracking. The feature tracking technique is designed to improve Data Stream Mining (DSM) classification algorithms by enabling and optimising real-time feature selection. The technique is based on tracking adaptive statistical summaries of the data and class label distributions, known as Micro-Clusters. Currently the technique is able to detect concept drifts and identify which features have been influential in the drift
Recommended from our members
Towards real-time feature tracking technique using adaptive micro-clusters
Data streams are unbounded, sequential data instances that are generated with high velocity. Classifying sequential data instances is a very challenging problem in machine learning with applications in network intrusion detection, financial markets and sensor networks. Data stream classification is concerned with the automatic labelling of unseen instances from the stream in real-time. For this the classifier needs to adapt to concept drifts and can only have a single pass through the data if the stream is fast. This research paper presents our work on a real-time pre-processing technique, in particular a feature tracking technique that takes concept drift into consideration. The feature tracking technique is designed to improve Data Stream Mining (DSM) classification algorithms by enabling real-time feature selection. The technique is based on adaptive summaries of the data and class distributions, known as Micro-Clusters. Currently the technique is able to detect concept drift and identifies which features have been involved