Search CORE

3 research outputs found

An Online Tree-Based Approach for Mining Non-Stationary High-Speed Data Streams

Author: Baldo Fabiano
Frías Blanco Isvani Inocencio
Ortiz Díaz Agustín Alejandro
Palomino Mariño Laura María
Publication venue: Instituto de Informática - Universidade Federal do Rio Grande do Sul
Publication date: 15/01/2020
Field of study

This paper presents a new learning algorithm for inducing decision trees from data streams. In these domains, large amounts of data are constantly arriving over time, possibly at high speed. The proposed algorithm uses a top-down induction method for building trees, splitting leaf nodes recursively, until none of them can be expanded. The new algorithm combines two split methods in the tree induction. The first method is able to guarantee, with statistical significance, that each split chosen would be the same as that chosen using infinite examples. By doing so, it aims at ensuring that the tree induced online is close to the optimal model. However, this split method often needs too many examples to make a decision about the best split, which delays the accuracy improvement of the online predictive learning model. Therefore, the second method is used to split nodes more quickly, speeding up the tree growth. The second split method is based on the observation that larger trees are able to store more information about the training examples and to represent more complex concepts. The first split method is also used to correct splits previously suggested by the second one, when it has sufficient evidence. Finally, an additional procedure rebuilds the tree model according to the suggestions made with an adequate level of statistical significance. The proposed algorithm is empirically compared with several well-known induction algorithms for learning decision trees from data streams. In the tests it is possible to observe that the proposed algorithm is more competitive in terms of accuracy and model size using various synthetic and real world datasets.

Em Questao

Archives of the Faculty of Veterinary Medicine UFRGS

An Online Tree-Based Approach for Mining Non-Stationary High-Speed Data Streams

Author: Baldo Fabiano
Frías Blanco Isvani Inocencio
Ortiz Díaz Agustín Alejandro
Palomino Mariño Laura María
Publication venue: Instituto de Informática - Universidade Federal do Rio Grande do Sul
Publication date: 15/01/2020
Field of study

Em Questao