6 research outputs found

    Learning from non-stationary data using a growing network of prototypes

    Get PDF
    Proceeding of: 2013 IEEE Congress on Evolutionary Computation (CEC), Cancun, 20-23 June 2013Learning from non-stationary data requires methods that are able to deal with a continuous stream of data instances, possibly of infinite size, where the class distributions are potentially drifting over time. For handling such datasets, we are proposing a new method that incrementally creates and adapts a network of prototypes for classifying complex data received in an online fashion. The algorithm includes both an accuracy-based and time-based forgetting mechanisms that ensure that the model size does not grow indefinitely with large datasets. We have performed tests on seven benchmarking datasets for comparing our proposal with several approaches found in the literature, including ensemble algorithms associated to two different base classifiers. Performances obtained show that our algorithm is comparable to the best of the ensemble classifiers in terms of accuracy/time trade-off. Moreover, our approach appears to have significant advantages for dealing with data that has a complex, non-linearly separable topology.Spanish Ministry of Science and Innovation under the project MOVES, grant reference TIN2011-28336, and NSERC-CanadaThis article has been funded by the Spanish Ministry of Science and Innovation under the project MOVES with grant reference TIN2011-28336, and NSERC-Canada.Publicad

    Online classification via self-organizing space partitioning

    Get PDF
    The authors study online supervised learning under the empirical zero-one loss and introduce a novel classification algorithm with strong theoretical guarantees. The proposed method is a highly dynamical self-organizing decision tree structure, which adaptively partitions the feature space into small regions and combines (takes the union of) the local simple classification models specialized in those regions. The authors' approach sequentially and directly minimizes the cumulative loss by jointly learning the optimal feature space partitioning and the corresponding individual partition-region classifiers. They mitigate overtraining issues by using basic linear classifiers at each region while providing a superior modeling power through hierarchical and data adaptive models. The computational complexity of the introduced algorithm scales linearly with the dimensionality of the feature space and the depth of the tree. Their algorithm can be applied to any streaming data without requiring a training phase or a priori information, hence processing data on-the-fly and then discarding it. Therefore, the introduced algorithm is especially suitable for the applications requiring sequential data processing at large scales/high rates. The authors present a comprehensive experimental study in stationary and nonstationary environments. In these experiments, their algorithm is compared with the state-of-the-art methods over the well-known benchmark datasets and shown to be computationally highly superior. The proposed algorithm significantly outperforms the competing methods in the stationary settings and demonstrates remarkable adaptation capabilities to nonstationarity in the presence of drifting concepts and abrupt/sudden concept changes. © 1991-2012 IEEE

    Online non-stationary boosting

    No full text
    Oza’s Online Boosting algorithm provides a version of AdaBoost which can be trained in an online way for stationary problems. One perspective is that this enables the power of the boosting framework to be applied to datasets which are too large to fit into memory. The online boosting algorithm assumes the data distribution to be independent and identically distributed (i.i.d.) and therefore has no provision for concept drift. We present an algorithm called Online Non-Stationary Boosting (ONSBoost) that, like Online Boosting, uses a static ensemble size without generating new members each time new examples are presented, and also adapts to a changing data distribution. We evaluate the new algorithm against Online Boosting, using the STAGGER dataset and three challenging datasets derived from a learning problem inside a parallelising virtual machine. We find that the new algorithm provides equivalent performance on the STAGGER dataset and an improvement of up to 3% on the parallelisation datasets
    corecore