393 research outputs found

    New perspectives and methods for stream learning in the presence of concept drift.

    Get PDF
    153 p.Applications that generate data in the form of fast streams from non-stationary environments, that is,those where the underlying phenomena change over time, are becoming increasingly prevalent. In thiskind of environments the probability density function of the data-generating process may change overtime, producing a drift. This causes that predictive models trained over these stream data become obsoleteand do not adapt suitably to the new distribution. Specially in online learning scenarios, there is apressing need for new algorithms that adapt to this change as fast as possible, while maintaining goodperformance scores. Examples of these applications include making inferences or predictions based onfinancial data, energy demand and climate data analysis, web usage or sensor network monitoring, andmalware/spam detection, among many others.Online learning and concept drift are two of the most hot topics in the recent literature due to theirrelevance for the so-called Big Data paradigm, where nowadays we can find an increasing number ofapplications based on training data continuously available, named as data streams. Thus, learning in nonstationaryenvironments requires adaptive or evolving approaches that can monitor and track theunderlying changes, and adapt a model to accommodate those changes accordingly. In this effort, Iprovide in this thesis a comprehensive state-of-the-art approaches as well as I identify the most relevantopen challenges in the literature, while focusing on addressing three of them by providing innovativeperspectives and methods.This thesis provides with a complete overview of several related fields, and tackles several openchallenges that have been identified in the very recent state of the art. Concretely, it presents aninnovative way to generate artificial diversity in ensembles, a set of necessary adaptations andimprovements for spiking neural networks in order to be used in online learning scenarios, and finally, adrift detector based on this former algorithm. All of these approaches together constitute an innovativework aimed at presenting new perspectives and methods for the field

    New perspectives and methods for stream learning in the presence of concept drift.

    Get PDF
    153 p.Applications that generate data in the form of fast streams from non-stationary environments, that is,those where the underlying phenomena change over time, are becoming increasingly prevalent. In thiskind of environments the probability density function of the data-generating process may change overtime, producing a drift. This causes that predictive models trained over these stream data become obsoleteand do not adapt suitably to the new distribution. Specially in online learning scenarios, there is apressing need for new algorithms that adapt to this change as fast as possible, while maintaining goodperformance scores. Examples of these applications include making inferences or predictions based onfinancial data, energy demand and climate data analysis, web usage or sensor network monitoring, andmalware/spam detection, among many others.Online learning and concept drift are two of the most hot topics in the recent literature due to theirrelevance for the so-called Big Data paradigm, where nowadays we can find an increasing number ofapplications based on training data continuously available, named as data streams. Thus, learning in nonstationaryenvironments requires adaptive or evolving approaches that can monitor and track theunderlying changes, and adapt a model to accommodate those changes accordingly. In this effort, Iprovide in this thesis a comprehensive state-of-the-art approaches as well as I identify the most relevantopen challenges in the literature, while focusing on addressing three of them by providing innovativeperspectives and methods.This thesis provides with a complete overview of several related fields, and tackles several openchallenges that have been identified in the very recent state of the art. Concretely, it presents aninnovative way to generate artificial diversity in ensembles, a set of necessary adaptations andimprovements for spiking neural networks in order to be used in online learning scenarios, and finally, adrift detector based on this former algorithm. All of these approaches together constitute an innovativework aimed at presenting new perspectives and methods for the field

    Exploiting a Stimuli Encoding Scheme of Spiking Neural Networks for Stream Learning

    Get PDF
    Stream data processing has gained progressive momentum with the arriving of new stream applications and big data scenarios. One of the most promising techniques in stream learn- ing is the Spiking Neural Network, and some of them use an interesting population encod- ing scheme to transform the incoming stimuli into spikes. This study sheds lights on the key issue of this encoding scheme, the Gaussian receptive fields, and focuses on applying them as a pre-processing technique to any dataset in order to gain representativeness, and to boost the predictive performance of the stream learning methods. Experiments with synthetic and real data sets are presented, and lead to confirm that our approach can be applied successfully as a general pre-processing technique in many real cases

    Evolving Spiking Neural Networks for online learning over drifting data streams

    Get PDF
    Nowadays huge volumes of data are produced in the form of fast streams, which are further affected by non-stationary phenomena. The resulting lack of stationarity in the distribution of the produced data calls for efficient and scalable algorithms for online analysis capable of adapting to such changes (concept drift). The online learning field has lately turned its focus on this challenging scenario, by designing incremental learning algorithms that avoid becoming obsolete after a concept drift occurs. Despite the noted activity in the literature, a need for new efficient and scalable algorithms that adapt to the drift still prevails as a research topic deserving further effort. Surprisingly, Spiking Neural Networks, one of the major exponents of the third generation of artificial neural networks, have not been thoroughly studied as an online learning approach, even though they are naturally suited to easily and quickly adapting to changing environments. This work covers this research gap by adapting Spiking Neural Networks to meet the processing requirements that online learning scenarios impose. In particular the work focuses on limiting the size of the neuron repository and making the most of this limited size by resorting to data reduction techniques. Experiments with synthetic and real data sets are discussed, leading to the empirically validated assertion that, by virtue of a tailored exploitation of the neuron repository, Spiking Neural Networks adapt better to drifts, obtaining higher accuracy scores than naive versions of Spiking Neural Networks for online learning environments.This work was supported by the EU project Pacific AtlanticNetwork for Technical Higher Education and Research—PANTHER(grant number 2013-5659/004-001 EMA2)

    LUNAR: Cellular automata for drifting data streams

    Get PDF
    With the advent of fast data streams, real-time machine learning has become a challenging task, demanding many processing resources. In addition, they can be affected by the concept drift effect, by which learning methods have to detect changes in the data distribution and adapt to these evolving conditions. Several emerging paradigms such as the so-called Smart Dust, Utility Fog, or Swarm Robotics are in need for efficient and scalable solutions in real-time scenarios, and where usually computing resources are constrained. Cellular automata, as low-bias and robust-to-noise pattern recognition methods with competitive classification performance, meet the requirements imposed by the aforementioned paradigms mainly due to their simplicity and parallel nature. In this work we propose LUNAR, a streamified version of cellular automata devised to successfully meet the aforementioned requirements. LUNAR is able to act as a real incremental learner while adapting to drifting conditions. Furthermore, LUNAR is highly interpretable, as its cellular structure represents directly the mapping between the feature space and the labels to be predicted. Extensive simulations with synthetic and real data will provide evidence of its competitive behavior in terms of classification performance when compared to long-established and successful online learning methods

    Design and validation of novel methods for long-term road traffic forecasting

    Get PDF
    132 p.Road traffic management is a critical aspect for the design and planning of complex urban transport networks for which vehicle flow forecasting is an essential component. As a testimony of its paramount relevance in transport planning and logistics, thousands of scientific research works have covered the traffic forecasting topic during the last 50 years. In the beginning most approaches relied on autoregressive models and other analysis methods suited for time series data. During the last two decades, the development of new technology, platforms and techniques for massive data processing under the Big Data umbrella, the availability of data from multiple sources fostered by the Open Data philosophy and an ever-growing need of decision makers for accurate traffic predictions have shifted the spotlight to data-driven procedures. Even in this convenient context, with abundance of open data to experiment and advanced techniques to exploit them, most predictive models reported in literature aim for shortterm forecasts, and their performance degrades when the prediction horizon is increased. Long-termforecasting strategies are more scarce, and commonly based on the detection and assignment to patterns. These approaches can perform reasonably well unless an unexpected event provokes non predictable changes, or if the allocation to a pattern is inaccurate.The main core of the work in this Thesis has revolved around datadriven traffic forecasting, ultimately pursuing long-term forecasts. This has broadly entailed a deep analysis and understanding of the state of the art, and dealing with incompleteness of data, among other lesser issues. Besides, the second part of this dissertation presents an application outlook of the developed techniques, providing methods and unexpected insights of the local impact of traffic in pollution. The obtained results reveal that the impact of vehicular emissions on the pollution levels is overshadowe

    Design and validation of novel methods for long-term road traffic forecasting

    Get PDF
    132 p.Road traffic management is a critical aspect for the design and planning of complex urban transport networks for which vehicle flow forecasting is an essential component. As a testimony of its paramount relevance in transport planning and logistics, thousands of scientific research works have covered the traffic forecasting topic during the last 50 years. In the beginning most approaches relied on autoregressive models and other analysis methods suited for time series data. During the last two decades, the development of new technology, platforms and techniques for massive data processing under the Big Data umbrella, the availability of data from multiple sources fostered by the Open Data philosophy and an ever-growing need of decision makers for accurate traffic predictions have shifted the spotlight to data-driven procedures. Even in this convenient context, with abundance of open data to experiment and advanced techniques to exploit them, most predictive models reported in literature aim for shortterm forecasts, and their performance degrades when the prediction horizon is increased. Long-termforecasting strategies are more scarce, and commonly based on the detection and assignment to patterns. These approaches can perform reasonably well unless an unexpected event provokes non predictable changes, or if the allocation to a pattern is inaccurate.The main core of the work in this Thesis has revolved around datadriven traffic forecasting, ultimately pursuing long-term forecasts. This has broadly entailed a deep analysis and understanding of the state of the art, and dealing with incompleteness of data, among other lesser issues. Besides, the second part of this dissertation presents an application outlook of the developed techniques, providing methods and unexpected insights of the local impact of traffic in pollution. The obtained results reveal that the impact of vehicular emissions on the pollution levels is overshadowe
    corecore