Search CORE

6 research outputs found

Redundancy Is Not Necessarily Detrimental in Classification Problems

Author: Deysi Natalia Leguizamon Correa
Diego P. Pinto-Roa
Francisco Gómez-Vela
Jacques Facon
José Luis Vázquez Noguera
Julio César Mello Román
Laura Raquel Bareiro Paniagua
Luis Salgueiro Romero
Miguel García-Torres
Sebastián Alberto Grillo
Publication venue
Publication date: 01/01/2021
Field of study

In feature selection, redundancy is one of the major concerns since the removal of redun dancy in data is connected with dimensionality reduction. Despite the evidence of such a connection, few works present theoretical studies regarding redundancy. In this work, we analyze the effect of redundant features on the performance of classification models. We can summarize the contribution of this work as follows: (i) develop a theoretical framework to analyze feature construction and selection, (ii) show that certain properly defined features are redundant but make the data linearly separable, and (iii) propose a formal criterion to validate feature construction methods. The results of experiments suggest that a large number of redundant features can reduce the classification error. The results imply that it is not enough to analyze features solely using criteria that measure the amount of information provided by such features.CONACYT - Consejo Nacional de Ciencia y TecnologíaPROCIENCI

Repositorio Institucional CONACYT

Directory of Open Access Journals

Recommended from our members

End-user feature labeling: Supervised and semi-supervised approaches based on locally-weighted logistic regression

Author: Attenberg
Attenberg
Bengio
Blum
Chang
Cleveland
Cohn
Craven
Deng
Druck
Ganchev
Graça
Hastie
Ian Oberst
Kevin McIntosh
Kulesza
Kulesza
Lang
Lewis
Lewis
Liang
Liu
Liu
Margaret Burnett
McCallum
McCallum
McCallum
Melville
Nocedal
Pang
Raghavan
Raghavan
Roth
Settles
Settles
Shubhomoy Das
Simone Stumpf
Sindhwani
Speer
Stumpf
Travis Moore
Weng-Keen Wong
Wong
Wong
Wu
Zhou
Zhu
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/11/2013
Field of study

When intelligent interfaces, such as intelligent desktop assistants, email classifiers, and recommender systems, customize themselves to a particular end user, such customizations can decrease productivity and increase frustration due to inaccurate predictions — especially in early stages when training data is limited. The end user ca improve the learning algorithm by tediously labeling a substantial amount of additional training data, but this takes time and is too ad hoc to target a particular area of inaccuracy. To solve this problem, we propose new supervised and semi-supervised learning algorithms based on locally weighted logistic regression for feature labeling by end users, enabling them to point out which features are important for a class, rather than provide new training instances. We first evaluate our algorithms against other feature labeling algorithms under idealized conditions using feature labels generated by an oracle. In addition, another of our contributions is an evaluation of feature labeling algorithms under real world conditions using feature labels harvested from actual end users in our user study. Our user study is the first statistical user study for feature labeling involving a large number of end users (43 participants), all of whom have no background in machine learning. Our supervised and semi-supervised algorithms were among the best performers when compared to other feature labeling algorithms in the idealized setting and they are also robust to poor quality feature labels provided by ordinary end users in our study. We also perform an analysis to investigate the relative gains of incorporating the different sources of knowledge available in the labeled training set, the feature labels and the unlabeled data. Together, our results strongly suggest that feature labeling by end users is both viable and effective for allowing end users to improve the learning algorithm behind their customized applications

Recommended from our members

End-user feature engineering in the presence of class imbalance

Author: Burnett Margaret
Kulesza Todd
Moore Travis
Oberst Ian
Riche Yann
Stumpf Simone
Wong Weng-Keen
Publication venue
Publication date
Field of study

Intelligent user interfaces, such as recommender systems and email classifiers, use machine learning algorithms to customize their behavior to the preferences of an end user. Although these learning systems are somewhat reliable, they are not perfectly accurate. Traditionally, end users who need to correct these learning systems can only provide more labeled training data. In this paper, we focus on incorporating new features suggested by the end user into machine learning systems. To investigate the effects of user-generated features on accuracy we developed an auto- coding application that enables end users to assist a machine-learned program in coding a transcript by adding custom features. Our results show that adding user-generated features to the machine learning algorithm can result in modest improvements to its F1 score. Further improvements are possible if the algorithm accounts for class imbalance in the training data and deals with low-quality user-generated features that add noise to the learning algorithm. We show that addressing class imbalance improves performance to an extent but improving the quality of features brings about the most beneficial change. Finally, we discuss changes to the user interface that can help end users avoid the creation of low-quality features.Keywords: Feature Engineering, Class Imbalance, machine learning, artificial intelligence, end-user programming, HC

ScholarsArchive@OSU

Recommended from our members

End-User Feature Labeling: Supervised and Semi-supervised Approaches Based on Locally-Weighted Logistic Regression

Author: Burnetta Margaret
Das Shubhomoy
McIntosh Kevin
Moore Travis
Oberst Ian
Stumpf Simone
Wong Weng-Keen
Publication venue: 'Elsevier BV'
Publication date
Field of study

When intelligent interfaces, such as intelligent desktop assistants, email classifiers, and recommender systems, customize themselves to a particular end user, such customizations can decrease productivity and increase frustration due to inaccurate predictions—especially in early stages when training data is limited. The end user can improve the learning algorithm by tediously labeling a substantial amount of additional training data, but this takes time and is too ad hoc to target a particular area of inaccuracy. To solve this problem, we propose new supervised and semi-supervised learning algorithms based on locally weighted logistic regression for feature labeling by end users, enabling them to point out which features are important for a class, rather than provide new training instances. We first evaluate our algorithms against other feature labeling algorithms under idealized conditions using feature labels generated by an oracle. In addition, another of our contributions is an evaluation of feature labeling algorithms under real world conditions using feature labels harvested from actual end users in our user study. Our user study is the first statistical user study for feature labeling involving a large number of end users (43 participants), all of whom have no background in machine learning. Our supervised and semi-supervised algorithms were among the best performers when compared to other feature labeling algorithms in the idealized setting and they are also robust to poor quality feature labels provided by ordinary end users in our study. We also perform an analysis to investigate the relative gains of incorporating the different sources of knowledge available in the labeled training set, the feature labels and the unlabeled data. Together, our results strongly suggest that feature labeling by end users is both viable and effective for allowing end users to improve the learning algorithm behind their customized applications.Keywords: Locally weighted logistic regression, Semi-supervised learning, Feature labeling, Machine learning, Intelligent interfacesKeywords: Locally weighted logistic regression, Semi-supervised learning, Feature labeling, Machine learning, Intelligent interface

ScholarsArchive@OSU

Interactive feature space construction using semantic information

Author: Dan Roth
Kevin Small
Publication venue
Publication date: 01/01/2009
Field of study

Specifying an appropriate feature space is an important aspect of achieving good performance when designing systems based upon learned classifiers. Effectively incorporating information regarding semantically related words into the feature space is known to produce robust, accurate classifiers and is one apparent motivation for efforts to automatically generate such resources. However, naive incorporation of this semantic information may result in poor performance due to increased ambiguity. To overcome this limitation, we introduce the interactive feature space construction protocol, where the learner identifies inadequate regions of the feature space and in coordination with a domain expert adds descriptiveness through existing semantic resources. We demonstrate effectiveness on an entity and relation extraction system including both performance improvements and robustness to reductions in annotated data.

CiteSeerX

Crossref