Search CORE

197,593 research outputs found

A Historical Context for Data Streams

Author: Read Jesse
Zliobaite Indre
Publication venue
Publication date: 18/10/2023
Field of study

Machine learning from data streams is an active and growing research area. Research on learning from streaming data typically makes strict assumptions linked to computational resource constraints, including requirements for stream mining algorithms to inspect each instance not more than once and be ready to give a prediction at any time. Here we review the historical context of data streams research placing the common assumptions used in machine learning over data streams in their historical context.Comment: 9 page

arXiv.org e-Print Archive

A review on data stream classification

Author: A. A Haneen
A. Noraziah
Aggarwal C.C.
Aggarwal C.C.
Amini A.
Amini A.
Amini A.
Ankerst M.
Boden B.
Cao F.
Chen Y.
Esfandani G.
Forestiero A.
Hu W.
Huang T.-Q.
Kholghi M.
Mohd Helmy Abd Wahab
Nakata Y.
Namadchian A.
Rajaraman A.
Sun Z.
Xiong Z.
Publication venue: 'IOP Publishing'
Publication date: 01/01/2018
Field of study

At this present time, the significance of data streams cannot be denied as many researchers have placed their focus on the research areas of databases, statistics, and computer science. In fact, data streams refer to some data points sequences that are found in order with the potential to be non-binding, which is generated from the process of generating information in a manner that is not stationary. As such the typical tasks of searching data have been linked to streams of data that are inclusive of clustering, classification, and repeated mining of pattern. This paper presents several data stream clustering approaches, which are based on density, besides attempting to comprehend the function of the related algorithms; both semi-supervised and active learning, along with reviews of a number of recent studies

UTHM Institutional Repository

Crossref

UMP Institutional Repository

Learning relative similarity from data streams: Active online learning approaches

Author: Chunyan Miao
HOI Steven C. H.
Peilin Zhao
Shuji Hao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2015
Field of study

Crossref

Institutional Knowledge at Singapore Management University

A survey on online active learning

Author: Cacciarelli Davide
Kulahci Murat
Publication venue
Publication date: 01/01/2023
Field of study

Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of attention in recent years, particularly in real-world applications where data is only available in an unlabeled form. Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data. To overcome this issue, many active learning strategies have been proposed in the last decades, aiming to select the most informative observations for labeling in order to improve the performance of machine learning models. These approaches can be broadly divided into two categories: static pool-based and stream-based active learning. Pool-based active learning involves selecting a subset of observations from a closed pool of unlabeled data, and it has been the focus of many surveys and literature reviews. However, the growing availability of data streams has led to an increase in the number of approaches that focus on online active learning, which involves continuously selecting and labeling observations as they arrive in a stream. This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in the context of online active learning. We review the various techniques that have been proposed and discuss their strengths and limitations, as well as the challenges and opportunities that exist in this area of research. Our review aims to provide a comprehensive and up-to-date overview of the field and to highlight directions for future work

arXiv.org e-Print Archive

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Luleå University of Technology Publications

Online Research Database In Technology

Online Tool Condition Monitoring Based on Parsimonious Ensemble+

Author: Dimla Eric
Lughofer Edwin
Pedrycz Witold
Pratama Mahardhika
Tjahjowidowo Tegoeh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/12/2019
Field of study

Accurate diagnosis of tool wear in metal turning process remains an open challenge for both scientists and industrial practitioners because of inhomogeneities in workpiece material, nonstationary machining settings to suit production requirements, and nonlinear relations between measured variables and tool wear. Common methodologies for tool condition monitoring still rely on batch approaches which cannot cope with a fast sampling rate of metal cutting process. Furthermore they require a retraining process to be completed from scratch when dealing with a new set of machining parameters. This paper presents an online tool condition monitoring approach based on Parsimonious Ensemble+, pENsemble+. The unique feature of pENsemble+ lies in its highly flexible principle where both ensemble structure and base-classifier structure can automatically grow and shrink on the fly based on the characteristics of data streams. Moreover, the online feature selection scenario is integrated to actively sample relevant input attributes. The paper presents advancement of a newly developed ensemble learning algorithm, pENsemble+, where online active learning scenario is incorporated to reduce operator labelling effort. The ensemble merging scenario is proposed which allows reduction of ensemble complexity while retaining its diversity. Experimental studies utilising real-world manufacturing data streams and comparisons with well known algorithms were carried out. Furthermore, the efficacy of pENsemble was examined using benchmark concept drift data streams. It has been found that pENsemble+ incurs low structural complexity and results in a significant reduction of operator labelling effort.Comment: this paper has been published by IEEE Transactions on Cybernetic

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Online Active Learning for Human Activity Recognition from Sensory Data Streams

Author: Bouchachia A
Mohamad S.
Sayed-Mouchaweh M.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Human activity recognition (HAR) is highly relevant to many real-world do- mains like safety, security, and in particular healthcare. The current machine learning technology of HAR is highly human-dependent which makes it costly and unreliable in non-stationary environment. Existing HAR algorithms assume that training data is collected and annotated by human a prior to the training phase. Furthermore, the data is assumed to exhibit the true characteristics of the underlying distribution. In this paper, we propose a new autonomous approach that consists of novel algorithms. In particular, we adopt active learning (AL) strategy to selectively query the user/resident about the label of particular activities in order to improve the model accuracy. This strategy helps overcome the challenge of labelling sequential data with time dependency which is highly time-consuming and difficult. Because of the changes that may affect the way activities are performed, we regard sensor data as a stream and human activity learning as an online continuous process. In such process the leaner can adapt to changes, incorporate novel activities and discard obsolete ones. To this extent, we propose a novel semi-supervised classifier (OSC) that works together with a novel Bayesian stream-based active learning (BSAL). Because of the changes in the sensor layouts across different houses' settings, we use Conditional Re-stricted Boltzmann Machine (CRBM) to handle the features engineering issue by learning the features regardless of the environment settings. CRBM is then applied to extract low-level features from unlabelled raw high-dimensional activity input. The resulting approach will then tackle the challenges of activity recognition using a three-module architecture composed of a feature extractor (CRBM), an online semi-supervised classifier (OSC) equipped with BSAL. CRBM-BSAL-OSC allows completely autonomous learning that adjusts to the environment setting, explores the changes and adapt to them. The paper provides the theoretical details of the proposed approach as well as an extensive empirical study to evaluate the performance of the approach. we propose a novel semi-supervised classifier (OSC) that works together with a novel Bayesian stream-based active learning (BSAL). Because of the changes in the sensor layouts across di erent houses' settings, we use Conditional Re

HAL Descartes

Bournemouth University Research Online

Clustering based active learning for evolving data streams

Author: Adams E
Chen TC
Cobbold S
Daley S
Fairchild PJ
Graca L
Waldmann H
Publication venue: Springer
Publication date: 01/01/2006
Field of study

Data labeling is an expensive and time-consuming task. Choosing which labels to use is increasingly becoming important. In the active learning setting, a classifier is trained by asking for labels for only a small fraction of all instances. While many works exist that deal with this issue in non-streaming scenarios, few works exist in the data stream setting. In this paper we propose a new active learning approach for evolving data streams based on a pre-clustering step, for selecting the most informative instances for labeling. We consider a batch incremental setting: when a new batch arrives, first we cluster the examples, and then, we select the best instances to train the learner. The clustering approach allows to cover the whole data space avoiding to oversample examples from only few areas. We compare our method w.r.t. state of the art active learning strategies over real datasets. The results highlight the improvement in performance of our proposal. Experiments on parameter sensitivity are also reported

Crossref

Research Commons@Waikato

Queensland University of Technology ePrints Archive

Oxford University Research Archive