1 research outputs found

    T茅cnicas de minera莽茫o incrementais em recupera莽茫o de informa莽茫o

    Get PDF
    [EN] A desirable property of learning algorithms is the ability of incorporating new data in an incremental way. Incremental algorithms have received attention on the last few years. Particulary Bayesian networks, this is due to the hardness of the task. In Bayesian networks one example can change the whole structure of the Bayesian network. In this theses we focus on incremental induction of Tree Augmented Naive Bayes (TAN) algorithm. A incremental version of TAN saves computing time, is more suite to data mining and concept drift. But, as usual in Bayesian learning TAN is restricted to discrete attributes. Complementary to the incremental TAN, we propose an incremental discretization algorithm, necessary to evaluate TAN in domains with continuous attribute. Discretization is a fundamental pre-processing step for some well- known algorithms, the topic of incremental discretization has received few attention from the community. This theses has two major contributions, the benefict of both proposals is incremental learning, one for TAN and the other for discretization.We present and test a algorithm that rebuilds the network structure of tree augmented naive Bayes (TAN) based on the weighted sum of vectors containing the mutual information. We also present a new discretization method, this works in two layers. This two-stage architecture is very fexible. It can be used as supervised or unsupervised. For the second layer any base discretization method can be used: equal width, equal frequency, recursive entropy discretization, chi-merge, etc. The most relevant aspect is that the boundaries of the intervals of the second layer can change when new data is available. We tested experimentally the incremental approach to discretization with batch and incremental learners. The experimental evaluation of incremental TAN shows a perfor mance similar to the batch version. Similar remarks apply to incremental discretization. This is a relevant aspect, because few works in machine learning address the fundamental aspect of incremental discretization. We believe that with Incremental discretization, the evaluation of the incremental algorithms can become more realistic and accurate. We evaluated two versions of incremental discretization: supervised and unsupervised. We have seen that this feature can improve accuracy for the incremental learners and that the preview of future algorithm performance can be more precise. This method of discretization has another advantages, like, can be used with large data set's or can be used in dynamic environments with concept drift, areas where a batch discretization can be difficult or is not adequate.[ES] Esta tesis ten铆a como objetivo el estudio de una red Bayesiana (TAN) incremental. Durante el transcurso de esta se verific贸 la laguna en el 谩rea de una discretizaci贸n incremental para la evaluaci贸n de un algoritmo incremental. As铆 se procur贸 dar como contribuci贸n para el 谩rea no solo un clasificador Bayesiano incremental sino tambi茅n un modo de evaluaci贸n correcto del clasificador. Los Sistemas de Recuperaci贸n de Informaci贸n tienen como objetivo la realizaci贸n de las tareas de indexaci贸n, b煤squeda y clasificaci贸n de documentos (expresos en la forma textual), con el fin de satisfacer la necesidad de informaci贸n del individuo, generalmente expresa a trav茅s de consultas. La necesidad de informaci贸n puede ser entendida como la b煤squeda de respuestas para determinadas cuestiones que tienen que ser resueltas, la recuperaci贸n de documentos que tratan sobre un determinado asunto o incluso la relaci贸n entre asuntos
    corecore