8 research outputs found

    On the performance of deep learning models for time series classification in streaming

    Get PDF
    Processing data streams arriving at high speed requires the development of models that can provide fast and accurate predictions. Although deep neural networks are the state-of-the-art for many machine learning tasks, their performance in real-time data streaming scenarios is a research area that has not yet been fully addressed. Nevertheless, there have been recent efforts to adapt complex deep learning models for streaming tasks by reducing their processing rate. The design of the asynchronous dual-pipeline deep learning framework allows to predict over incoming instances and update the model simultaneously using two separate layers. The aim of this work is to assess the performance of different types of deep architectures for data streaming classification using this framework. We evaluate models such as multi-layer perceptrons, recurrent, convolutional and temporal convolutional neural networks over several time-series datasets that are simulated as streams. The obtained results indicate that convolutional architectures achieve a higher performance in terms of accuracy and efficiency.Comment: Paper submitted to the 15th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2020

    Meta-learning for dynamic tuning of active learning on stream classification

    Get PDF
    Supervised data stream learning depends on the incoming sample's true label to update a classifier's model. In real life, obtaining the ground truth for each instance is a challenging process; it is highly costly and time consuming. Active Learning has already bridged this gap by finding a reduced set of instances to support the creation of a reliable stream classifier. However, identifying a reduced number of informative instances to support a suitable classifier update and drift adaptation is very tricky. To better adapt to concept drifts using a reduced number of samples, we propose an online tuning of the Uncertainty Sampling threshold using a meta-learning approach. Our approach exploits statistical meta-features from adaptive windows to meta-recommend a suitable threshold to address the trade-off between the number of labelling queries and high accuracy. Experiments exposed that the proposed approach provides the best trade-off between accuracy and query reduction by dynamic tuning the uncertainty threshold using lightweight meta-features

    A survey on machine learning for recurring concept drifting data streams

    Get PDF
    The problem of concept drift has gained a lot of attention in recent years. This aspect is key in many domains exhibiting non-stationary as well as cyclic patterns and structural breaks affecting their generative processes. In this survey, we review the relevant literature to deal with regime changes in the behaviour of continuous data streams. The study starts with a general introduction to the field of data stream learning, describing recent works on passive or active mechanisms to adapt or detect concept drifts, frequent challenges in this area, and related performance metrics. Then, different supervised and non-supervised approaches such as online ensembles, meta-learning and model-based clustering that can be used to deal with seasonalities in a data stream are covered. The aim is to point out new research trends and give future research directions on the usage of machine learning techniques for data streams which can help in the event of shifts and recurrences in continuous learning scenarios in near real-time

    Recomendação de algoritmos em fluxos de dados com mudança de conceito

    Get PDF
    Trabalho de conclusão de curso (graduação)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2020.Muitas companhias vêm tirando proveito de mineração de dados para identificar infor- mações valiosas em conjuntos de dados massivos gerados em alta frequência, o chamado Big Data. Técnicas de Aprendizado de Máquina podem ser aplicadas para descoberta de informação, visto que podem extrair padrões dos dados para induzir modelos que preverão eventos futuros. Entretanto, ambientes dinâmicos e progressivos comumente geram fluxos de dados não estacionários. Logo, modelos treinados nesse cenário costumam perecer com o tempo pela sazonalidade ou mudança de conceito. O retreinamento periódico pode aju- dar, mas um espaço de hipóteses fixo pode não ser o mais apropriado ao fenômeno. Uma solução alternativa é usar meta-aprendizado para uma contínua seleção de algoritmos em ambientes que mudam com o tempo, escolhendo o viés que melhor condiz com os dados atuais. Nesse trabalho, apresentamos um framework aprimorado para seleção de algorit- mos em fluxos de dados baseado no MetaStream. Nossa abordagem usa meta-aprendizado e aprendizado incremental para ativamente selecionar o melhor algoritmo para o presente conceito em ambientes que mudam com o tempo. Ao contrário de trabalhos prévios, nós usamos uma coleção diversificada de meta-atributos estado-da-arte e uma abordagem de aprendizado incremental para o nível meta baseada no algoritmo LightGBM. Os resul- tados mostram que essa nova estratégia pode aprimorar a acurácia de recomendação do melhor algoritmo em dados que mudam com o tempo.In the last decades, many companies have had a growing interest in the “digital oil”, also called Big Data. Data mining has been applied in these massive volumes of data to obtain valuable information for clients and industries worldwide. Machine Learning, a prominent technique for data mining, can be used to extract patterns from data and induce models to predict future events. Still, complex environments that are constantly evolving usually generate non-stationary data streams. Thus, these models may perish in this scenario due to concept drift. Retraining periodically can help, but the algorithm bias may no longer be appropriate. A response to this is to use meta-learning for regular algorithm selection in time-changing environments, choosing the hypothesis space that best suits the current data. In this work, we enhanced MetaStream, a framework for data stream algorithm selection, though a rich set of state-of-the-art meta-features, and an incremental learning approach in the meta-level based on LightGBM, combining this to actively select the best algorithm for the current concept in a time-changing environment. The results show that this new strategy can improve the recommendation accuracy of the best algorithm in time-changing data

    Process-Oriented Stream Classification Pipeline:A Literature Review

    Get PDF
    Featured Application: Nowadays, many applications and disciplines work on the basis of stream data. Common examples are the IoT sector (e.g., sensor data analysis), or video, image, and text analysis applications (e.g., in social media analytics or astronomy). With our work, we gather different approaches and terminology, and give a broad overview over the topic. Our main target groups are practitioners and newcomers to the field of data stream classification. Due to the rise of continuous data-generating applications, analyzing data streams has gained increasing attention over the past decades. A core research area in stream data is stream classification, which categorizes or detects data points within an evolving stream of observations. Areas of stream classification are diverse—ranging, e.g., from monitoring sensor data to analyzing a wide range of (social) media applications. Research in stream classification is related to developing methods that adapt to the changing and potentially volatile data stream. It focuses on individual aspects of the stream classification pipeline, e.g., designing suitable algorithm architectures, an efficient train and test procedure, or detecting so-called concept drifts. As a result of the many different research questions and strands, the field is challenging to grasp, especially for beginners. This survey explores, summarizes, and categorizes work within the domain of stream classification and identifies core research threads over the past few years. It is structured based on the stream classification process to facilitate coordination within this complex topic, including common application scenarios and benchmarking data sets. Thus, both newcomers to the field and experts who want to widen their scope can gain (additional) insight into this research area and find starting points and pointers to more in-depth literature on specific issues and research directions in the field.</p

    Evaluating k-NN in the Classification of Data Streams with Concept Drift

    Full text link
    Data streams are often defined as large amounts of data flowing continuously at high speed. Moreover, these data are likely subject to changes in data distribution, known as concept drift. Given all the reasons mentioned above, learning from streams is often online and under restrictions of memory consumption and run-time. Although many classification algorithms exist, most of the works published in the area use Naive Bayes (NB) and Hoeffding Trees (HT) as base learners in their experiments. This article proposes an in-depth evaluation of k-Nearest Neighbors (k-NN) as a candidate for classifying data streams subjected to concept drift. It also analyses the complexity in time and the two main parameters of k-NN, i.e., the number of nearest neighbors used for predictions (k), and window size (w). We compare different parameter values for k-NN and contrast it to NB and HT both with and without a drift detector (RDDM) in many datasets. We formulated and answered 10 research questions which led to the conclusion that k-NN is a worthy candidate for data stream classification, especially when the run-time constraint is not too restrictive.Comment: 25 pages, 10 tables, 7 figures + 30 pages appendi

    Recurring concept meta-learning for evolving data streams

    No full text
    International audienc
    corecore