6,065 research outputs found
Concept drift and machine learning model for detecting fraudulent transactions in streaming environment
In a streaming environment, data is continuously generated and processed in an ongoing manner, and it is necessary to detect fraudulent transactions quickly to prevent significant financial losses. Hence, this paper proposes a machine learning-based approach for detecting fraudulent transactions in a streaming environment, with a focus on addressing concept drift. The approach utilizes the extreme gradient boosting (XGBoost) algorithm. Additionally, the approach employs four algorithms for detecting continuous stream drift. To evaluate the effectiveness of the approach, two datasets are used: a credit card dataset and a Twitter dataset containing financial fraud-related social media data. The approach is evaluated using cross-validation and the results demonstrate that it outperforms traditional machine learning models in terms of accuracy, precision, and recall, and is more robust to concept drift. The proposed approach can be utilized as a real-time fraud detection system in various industries, including finance, insurance, and e-commerce
OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams
How to get insights from relational data streams in a timely manner is a hot
research topic. This type of data stream can present unique challenges, such as
distribution drifts, outliers, emerging classes, and changing features, which
have recently been described as open environment challenges for machine
learning. While existing studies have been done on incremental learning for
data streams, their evaluations are mostly conducted with manually partitioned
datasets. Thus, a natural question is how those open environment challenges
look like in real-world relational data streams and how existing incremental
learning algorithms perform on real datasets. To fill this gap, we develop an
Open Environment Benchmark named OEBench to evaluate open environment
challenges in relational data streams. Specifically, we investigate 55
real-world relational data streams and establish that open environment
scenarios are indeed widespread in real-world datasets, which presents
significant challenges for stream learning algorithms. Through benchmarks with
existing incremental learning algorithms, we find that increased data quantity
may not consistently enhance the model accuracy when applied in open
environment scenarios, where machine learning models can be significantly
compromised by missing values, distribution shifts, or anomalies in real-world
data streams. The current techniques are insufficient in effectively mitigating
these challenges posed by open environments. More researches are needed to
address real-world open environment challenges. All datasets and code are
open-sourced in https://github.com/sjtudyq/OEBench
- …