3 research outputs found

    Unsupervised feature construction for improving data representation and semantics

    Full text link
    Attribute-based format is the main data representation format used by machine learning algorithms. When the attributes do not properly describe the initial data, performance starts to degrade. Some algorithms address this problem by internally changing the representation space, but the newly constructed features rarely have any meaning. We seek to construct, in an unsupervised way, new attributes that are more appropriate for describing a given dataset and, at the same time, comprehensible for a human user. We propose two algorithms that construct the new attributes as conjunctions of the initial primitive attributes or their negations. The generated feature sets have reduced correlations between features and succeed in catching some of the hidden relations between individuals in a dataset. For example, a feature like sky \wedge \neg building \wedge panorama would be true for non-urban images and is more informative than simple features expressing the presence or the absence of an object. The notion of Pareto optimality is used to evaluate feature sets and to obtain a balance between total correlation and the complexity of the resulted feature set. Statistical hypothesis testing is employed in order to automatically determine the values of the parameters used for constructing a data-dependent feature set. We experimentally show that our approaches achieve the construction of informative feature sets for multiple datasets. © 2013 Springer Science+Business Media New York

    Adding Domain Knowledge to SBL through Feature Construction

    No full text
    This paper presents two methods for adding domain knowledge to similarity-based learning through feature construction, a form of representation change in which new features are constructed from relationships detected among existing features. In the first method, domain-knowledge constraints are used to eliminate less desirable new features before they are constructed. In the second method, domain-dependent transformations generalize new features in ways meaningful to the current problem. These two uses of domain knowledge are illustrated in CITRE where they are shown to improve hypothesis accuracy and conciseness on a tic-tac-toe classification problem. Introduction One advantage of explanation-based learning (EBL) is its ability to learn from few examples by exploiting domain-specific constraints represented in a domain theory. Similarity-based learning (SBL), on the other hand, requires relatively large numbers of training instances, but is more readily applicable because a domain ..
    corecore