64 research outputs found
SMOClust: Synthetic Minority Oversampling based on Stream Clustering for Evolving Data Streams
Many real-world data stream applications not only suffer from concept drift
but also class imbalance. Yet, very few existing studies investigated this
joint challenge. Data difficulty factors, which have been shown to be key
challenges in class imbalanced data streams, are not taken into account by
existing approaches when learning class imbalanced data streams. In this work,
we propose a drift adaptable oversampling strategy to synthesise minority class
examples based on stream clustering. The motivation is that stream clustering
methods continuously update themselves to reflect the characteristics of the
current underlying concept, including data difficulty factors. This nature can
potentially be used to compress past information without caching data in the
memory explicitly. Based on the compressed information, synthetic examples can
be created within the region that recently generated new minority class
examples. Experiments with artificial and real-world data streams show that the
proposed approach can handle concept drift involving different minority class
decomposition better than existing approaches, especially when the data stream
is severely class imbalanced and presenting high proportions of safe and
borderline minority class examples.Comment: 59 pages, 85 figure
Resampling-Based Ensemble Methods for Online Class Imbalance Learning
Online class imbalance learning is a new learning problem that combines the challenges of both online learning and class imbalance learning. It deals with data streams having very skewed class distributions. This type of problems commonly exists in real-world applications, such as fault diagnosis of real-time control monitoring systems and intrusion detection in computer networks. In our earlier work, we defined class imbalance online, and proposed two learning algorithms OOB and UOB that build an ensemble model overcoming class imbalance in real time through resampling and time-decayed metrics. In this paper, we further improve the resampling strategy inside OOB and UOB, and look into their performance in both static and dynamicdatastreams.Wegivethefirstcomprehensiveanalysisofclassimbalanceindatastreams,intermsofdatadistributions, imbalance rates and changes in class imbalance status. We find that UOB is better at recognizing minority-class examples in static data streams, and OOB is more robust against dynamic changes in class imbalance status. The data distribution is a major factor affecting their performance. Based on the insight gained, we then propose two new ensemble methods that maintain both OOB and UOB with adaptive weights for final predictions, called WEOB1 and WEOB2. They are shown to possess the strength of OOB and UOB with good accuracy and robustness
Transaction profile estimation of queueing network models for IT systems using a search-based technique
peer-reviewedThe software and hardware systems required to deliver modern Web based services are becoming increasingly complex. Management
and evolution of the systems requires periodic analysis of performance
and capacity to maintain quality of service and maximise efficient use of
resources. In this work we present a method that uses a repeated local
search technique to improve the accuracy of modelling such systems while
also reducing the complexity and time required to perform this task. The
accuracy of the model derived from the search-based approach is validated by extrapolating the performance to multiple load levels which
enables system capacity and performance to be planned and managed
more efficiently
- …