8 research outputs found

    Mining recurring concepts in a dynamic feature space

    Get PDF
    Most data stream classification techniques assume that the underlying feature space is static. However, in real-world applications the set of features and their relevance to the target concept may change over time. In addition, when the underlying concepts reappear, reusing previously learnt models can enhance the learning process in terms of accuracy and processing time at the expense of manageable memory consumption. In this paper, we propose mining recurring concepts in a dynamic feature space (MReC-DFS), a data stream classification system to address the challenges of learning recurring concepts in a dynamic feature space while simultaneously reducing the memory cost associated with storing past models. MReC-DFS is able to detect and adapt to concept changes using the performance of the learning process and contextual information. To handle recurring concepts, stored models are combined in a dynamically weighted ensemble. Incremental feature selection is performed to reduce the combined feature space. This contribution allows MReC-DFS to store only the features most relevant to the learnt concepts, which in turn increases the memory efficiency of the technique. In addition, an incremental feature selection method is proposed that dynamically determines the threshold between relevant and irrelevant features. Experimental results demonstrating the high accuracy of MReC-DFS compared with state-of-the-art techniques on a variety of real datasets are presented. The results also show the superior memory efficiency of MReC-DFS

    Ensemble-based prediction of business processes bottlenecks with recurrent concept drifts

    Get PDF
    Bottleneck prediction is an important sub-task of process mining that aims at optimizing the discovered process models by avoiding such congestions. This paper discusses an ongoing work on incorporating recurrent concept drift in bottleneck prediction when applied to a real-world scenario. In the field of process mining, we develop a method of predicting whether and which bottlenecks will likely appear based on data known before a case starts. We next introduce GRAEC, a carefully-designed weighting mechanism to deal with concept drifts. The weighting decays over time and is extendable to adapt to seasonality in data. The methods are then applied to a simulation, and an invoicing process in the field of installation services in real-world settings. The results show an improvement to prediction accuracy compared to retraining a model on the most recent data.</p

    Machine Learning for Financial Prediction Under Regime Change Using Technical Analysis: A Systematic Review

    Get PDF
    Recent crises, recessions and bubbles have stressed the non-stationary nature and the presence of drastic structural changes in the financial domain. The most recent literature suggests the use of conventional machine learning and statistical approaches in this context. Unfortunately, several of these techniques are unable or slow to adapt to changes in the price-generation process. This study aims to survey the relevant literature on Machine Learning for financial prediction under regime change employing a systematic approach. It reviews key papers with a special emphasis on technical analysis. The study discusses the growing number of contributions that are bridging the gap between two separate communities, one focused on data stream learning and the other on economic research. However, it also makes apparent that we are still in an early stage. The range of machine learning algorithms that have been tested in this domain is very wide, but the results of the study do not suggest that currently there is a specific technique that is clearly dominant

    PATTERN RECOGNITION IN CLASS IMBALANCED DATASETS

    Get PDF
    Class imbalanced datasets constitute a significant portion of the machine learning problems of interest, where recog­nizing the ‘rare class’ is the primary objective for most applications. Traditional linear machine learning algorithms are often not effective in recognizing the rare class. In this research work, a specifically optimized feed-forward artificial neural network (ANN) is proposed and developed to train from moderate to highly imbalanced datasets. The proposed methodology deals with the difficulty in classification task in multiple stages—by optimizing the training dataset, modifying kernel function to generate the gram matrix and optimizing the NN structure. First, the training dataset is extracted from the available sample set through an iterative process of selective under-sampling. Then, the proposed artificial NN comprises of a kernel function optimizer to specifically enhance class boundaries for imbalanced datasets by conformally transforming the kernel functions. Finally, a single hidden layer weighted neural network structure is proposed to train models from the imbalanced dataset. The proposed NN architecture is derived to effectively classify any binary dataset with even very high imbalance ratio with appropriate parameter tuning and sufficient number of processing elements. Effectiveness of the proposed method is tested on accuracy based performance metrics, achieving close to and above 90%, with several imbalanced datasets of generic nature and compared with state of the art methods. The proposed model is also used for classification of a 25GB computed tomographic colonography database to test its applicability for big data. Also the effectiveness of under-sampling, kernel optimization for training of the NN model from the modified kernel gram matrix representing the imbalanced data distribution is analyzed experimentally. Computation time analysis shows the feasibility of the system for practical purposes. This report is concluded with discussion of prospect of the developed model and suggestion for further development works in this direction

    Incremental learning algorithms and applications

    Get PDF
    International audienceIncremental learning refers to learning from streaming data, which arrive over time, with limited memory resources and, ideally, without sacrificing model accuracy. This setting fits different application scenarios where lifelong learning is relevant, e.g. due to changing environments , and it offers an elegant scheme for big data processing by means of its sequential treatment. In this contribution, we formalise the concept of incremental learning, we discuss particular challenges which arise in this setting, and we give an overview about popular approaches, its theoretical foundations, and applications which emerged in the last years

    A classifier graph based recurring concept detection and prediction approach

    Full text link
    It is common in real-world data streams that previously seen concepts will reappear, which suggests a unique kind of concept drift, known as recurring concepts. Unfortunately, most of existing algorithms do not take full account of this case. Motivated by this challenge, a novel paradigm was proposed for capturing and exploiting recurring concepts in data streams. It not only incorporates a distribution-based change detector for handling concept drift but also captures recurring concept by storing recurring concepts in a classifier graph. The possibility of detecting recurring drifts allows reusing previously learnt models and enhancing the overall learning performance. Extensive experiments on both synthetic and real-world data streams reveal that the approach performs significantly better than the state-of-the-art algorithms, especially when concepts reappear

    Learning Recurring Concepts from Data Streams in Ubiquitous Environments

    Full text link
    Due to recent scientific and technological advances in information systems it is now possible to continuously record data at high speeds in a wide range of devices. The need to make sense of such massive amounts of data opens an opportunity to create new data stream classification techniques to model and predict the behavior of streaming data. When learning from data streams, the problem of concept drift means that the underlying data distributions can change over time. This has a strong impact on classification techniques, as predictive models become invalid and have to be updated. Furthermore, these changes in concept are usually a consequence of changes in context, and this relationship could be exploited to handle concept drift. Recurring concepts is a particular case of concept drift, where concepts that have drifted can suddenly reoccur. In this situation it may be possible to avoid relearning these previously observed concepts. However, the few existing approaches that take advantage of concept recurrence are neither designed to take context into consideration nor to take into account the resources required to store representations of past concepts. Both issues are of particular significance for ubiquitous data stream mining, where the learning process is executed in dynamically changing environments using resource constrained devices. Moreover, most existing techniques assume that the underlying data stream feature space is static. However, in many real-world applications the set of features and their relevance to the target concept may change over time. Despite its importance, this issue has received little attention, particularly on how it can be eficiently addressed when tracking recurring concepts. Sharing knowledge among ubiquitous devices to collaboratively improve the modeling of local concepts is another interesting idea which has not been properly explored. This could improve the accuracy of the local model as it would benefit from patterns similar to the local concept that were observed in other ubiquitous devices, but not yet locally. In addition, the deployment of data stream classification as an autonomous and adaptive service to support the data analysis requirements of ubiquitous applications is still an open issue that lacks research in the field of ubiquitous data stream mining. This PhD thesis addresses the aforementioned open issues, focusing on learning anytime, anywhere classification models from data streams in ubiquitous environments, where the underlying concepts may change over time, with special emphasis on recurring concepts. Four main contributions are presented: _ The MReC (Mining Recurring Concepts) approach that integrates context with previously learned concepts to improve the adaptation to recurring concepts. Moreover, to deal with situations of resource constraints, an intelligent strategy to discard models is also proposed. _ The MReC-DFS (Mining Recurring Concepts in a Dynamic Feature Space) approach, that extends MReC to address the challenges of a dynamic feature space while simultaneously reducing the memory cost of storing past models. In addition, a novel incremental feature selection method is proposed that dynamically determines the threshold used to select the most relevant features for a certain concept. _ A Collaborative Data Stream Mining (Coll-Stream) approach that explores the knowledge available in the community to improve local classification accuracy. Coll-Stream integrates community knowledge using an ensemble method where the classifiers are selected and weighted based on their local accuracy for different partitions of the instance space. _ A UDSM (Ubiquitous Data Stream Mining) Service to support the data analysis requirements of ubiquitous applications. As the basis for our service we describe a general mechanism, which autonomously adapts the execution of the data stream classification process to each situation, using context and resource awareness. Finally, the experimental validation of the proposed contributions using synthetic and real datasets allows us to achieve the objectives and answer the research questions proposed for this dissertation
    corecore