15 research outputs found

    A suite of swarm dynamic multi-objective algorithms for rebalancing extremely imbalanced datasets

    Full text link
    © 2017 Imbalanced datasets can be found in a number of fields; they are commonly regarded as big data because of their sheer volume and high attribute dimensions. As the name suggests, imbalanced big datasets come with an extremely imbalanced ratio between the amount of major class and minority class samples. Traditional methods: have been attempted but still cannot fully, effectively, and reliably solve the imbalanced class classification problem, especially when the distribution of the classes is exceedingly imbalanced. In this paper, we propose a collection of algorithms to solve the problem of imbalanced datasets in binary data classification. Most traditional methods: rebalance the imbalanced dataset merely by matching the data quantities of the two classes. Our proposed algorithms, which take the form of a suite of variants, focus on guaranteeing the credibility of the classification model and reaching the greatest possible accuracy by dynamically rebalancing the training dataset with multi-objective swarm intelligence optimisation. The new algorithms are extended from those we proposed earlier, which had a single objective – first find a set of solutions that satisfy the Kappa criterion, then search for the solution in the set that offers the highest accuracy. Two main modifications are made in the new algorithms. Multi-objective optimisation is aimed at finding a solution that satisfies several criteria at the same time, such as accuracy and identifying a list of credibility indicators. The other enhancement is the incremental operation of the multi-objective optimisation. Incremental optimisation is imperative for processing data feeds that may arrive in a streaming manner. Instead of waiting for the full data archive to be available before optimisation, incremental optimisation rebalances the data feed segment by segment on the fly. The experimental results from the suite of proposed algorithms show that they can effectively attain better and more stable performances from the classification model and are accompanied by much greater credibility than the other five traditional methods when imbalanced datasets are used as training datasets for inducing a classifier

    Hot Topics in Cloud Computing

    No full text

    Collaborative Virtual Learning Model for Web Intelligence

    No full text
    Abstract. The integration of Learning Objects Repositories, Information Visualization, Web and new Visual Interaction techniques will change and expand the paradigms of current work of learners on the Web. Virtual learning will improve visual communication that takes place in all elements of the user collaboration and provide decreased "time-to-enlightenment". Virtual learning is a process that provides information visualization technology to address the challenge of discovering and exploiting information for the purpose of learning. This article examines the issue faced by most eLearning systems- how to turn data into understandable learning knowledge, and make this knowledge accessible to peers who rely on it. It introduces a generic design model for Collaborative Virtual Learning based on the Model-View-Controller design pattern.

    Discovering sub-patterns from time series using a normalized cross-match algorithm

    Full text link
    Time series data stream mining has attracted considerable research interest in recent years. Pattern discovery is a challenging problem in time series data stream mining. Because the data update continuously and the sampling rates may be different, dynamic time warping (DTW)-based approaches are used to solve the pattern discovery problem in time series data streams. However, the naive form of the DTW-based approach is computationally expensive. Therefore, Toyoda proposed the CrossMatch (CM) approach to discover the patterns between two time series data streams (sequences), which requires only O(n) time per data update, where n is the length of one sequence. CM, however, does not support normalization, which is required for some kinds of sequences (e.g. stock prices, ECG data). Therefore, we propose a normalized-CrossMatch approach that extends CM to enforce normalization while maintaining the same performance capabilities
    corecore