6 research outputs found

    Redundancy Is Not Necessarily Detrimental in Classification Problems

    Get PDF
    In feature selection, redundancy is one of the major concerns since the removal of redun dancy in data is connected with dimensionality reduction. Despite the evidence of such a connection, few works present theoretical studies regarding redundancy. In this work, we analyze the effect of redundant features on the performance of classification models. We can summarize the contribution of this work as follows: (i) develop a theoretical framework to analyze feature construction and selection, (ii) show that certain properly defined features are redundant but make the data linearly separable, and (iii) propose a formal criterion to validate feature construction methods. The results of experiments suggest that a large number of redundant features can reduce the classification error. The results imply that it is not enough to analyze features solely using criteria that measure the amount of information provided by such features.CONACYT - Consejo Nacional de Ciencia y TecnologĂ­aPROCIENCI

    Interactive feature space construction using semantic information

    No full text
    Specifying an appropriate feature space is an important aspect of achieving good performance when designing systems based upon learned classifiers. Effectively incorporating information regarding semantically related words into the feature space is known to produce robust, accurate classifiers and is one apparent motivation for efforts to automatically generate such resources. However, naive incorporation of this semantic information may result in poor performance due to increased ambiguity. To overcome this limitation, we introduce the interactive feature space construction protocol, where the learner identifies inadequate regions of the feature space and in coordination with a domain expert adds descriptiveness through existing semantic resources. We demonstrate effectiveness on an entity and relation extraction system including both performance improvements and robustness to reductions in annotated data.
    corecore