Attribute scoring based on performance of an learning algorithm on samples of attribute space

Abstract

In the field of machine learning and knowledge discovery in databases attributes or features have a central role, thus it is reasonable to also question their quality and importance for the given problem. Because this is in general a difficult problem, we focused in the thesis on the development of a new method for estimating attribute importance. The new method is based on sampling the attribute space, evaluating the performance of algorithms for machine learning and reasoning about the importance of individual attributes based on the obtained scores. More specifically, at first different combinations of attributes are chosen and smaller data sets that contain them are prepared on which a testing procedure with sampling obtains estimates on performance of an arbitrary chosen learning algorithm. Performance estimates obtained that way are statistically processed for each attribute according to their presence and with a given formula joined into final scores for individual attributes. In order to determine how well different variants of the new method work, an appropriate experimental methodology and many diverse data sets has been prepared. Some successful methods have also been further tested in more detail to reinforce the conclusion, that certain variants of the new method really are statistically significant better than conventional widely used methods for this problem, but unfortunately an improved version of the best one of them still seems to be better. The thesis concludes with a discussion of the results and various ideas for further work, improvements and applications of the method

Similar works

This paper was published in ePrints.FRI.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.