Search CORE

6,065 research outputs found

Data stream mining of event and complex event streams: a survey of existing and future technologies and applications in big data

Author: Di Fatta Giuseppe
Karthikeyan Vidhyalakshmi
Nauck Detlef D.
Stahl Frederic
Wrench Chris
Publication venue: 'IGI Global'
Publication date: 01/06/2016
Field of study

Central Archive at the University of Reading

Concept drift and machine learning model for detecting fraudulent transactions in streaming environment

Author: Patil Rudragoud
Shahapurkar Arati
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2023
Field of study

In a streaming environment, data is continuously generated and processed in an ongoing manner, and it is necessary to detect fraudulent transactions quickly to prevent significant financial losses. Hence, this paper proposes a machine learning-based approach for detecting fraudulent transactions in a streaming environment, with a focus on addressing concept drift. The approach utilizes the extreme gradient boosting (XGBoost) algorithm. Additionally, the approach employs four algorithms for detecting continuous stream drift. To evaluate the effectiveness of the approach, two datasets are used: a credit card dataset and a Twitter dataset containing financial fraud-related social media data. The approach is evaluated using cross-validation and the results demonstrate that it outperforms traditional machine learning models in terms of accuracy, precision, and recall, and is more robust to concept drift. The proposed approach can be utilized as a real-time fraud detection system in various industries, including finance, insurance, and e-commerce

Institute of Advanced Engineering and Science

Unsupervised learning approaches for non-stationary data streams

Author: Dearo Garcia Kemilly
Publication venue: University of Twente
Publication date: 16/04/2021
Field of study

University of Twente Research Information

OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams

Author: Diao Yiqun
He Bingsheng
Li Qinbin
Lu Mian
Yang Yutong
Publication venue
Publication date: 03/09/2023
Field of study

How to get insights from relational data streams in a timely manner is a hot research topic. This type of data stream can present unique challenges, such as distribution drifts, outliers, emerging classes, and changing features, which have recently been described as open environment challenges for machine learning. While existing studies have been done on incremental learning for data streams, their evaluations are mostly conducted with manually partitioned datasets. Thus, a natural question is how those open environment challenges look like in real-world relational data streams and how existing incremental learning algorithms perform on real datasets. To fill this gap, we develop an Open Environment Benchmark named OEBench to evaluate open environment challenges in relational data streams. Specifically, we investigate 55 real-world relational data streams and establish that open environment scenarios are indeed widespread in real-world datasets, which presents significant challenges for stream learning algorithms. Through benchmarks with existing incremental learning algorithms, we find that increased data quantity may not consistently enhance the model accuracy when applied in open environment scenarios, where machine learning models can be significantly compromised by missing values, distribution shifts, or anomalies in real-world data streams. The current techniques are insufficient in effectively mitigating these challenges posed by open environments. More researches are needed to address real-world open environment challenges. All datasets and code are open-sourced in https://github.com/sjtudyq/OEBench

arXiv.org e-Print Archive