Search CORE

34,680 research outputs found

Recommended from our members

End-user feature labeling: Supervised and semi-supervised approaches based on locally-weighted logistic regression

Author: Attenberg
Attenberg
Bengio
Blum
Chang
Cleveland
Cohn
Craven
Deng
Druck
Ganchev
Graça
Hastie
Ian Oberst
Kevin McIntosh
Kulesza
Kulesza
Lang
Lewis
Lewis
Liang
Liu
Liu
Margaret Burnett
McCallum
McCallum
McCallum
Melville
Nocedal
Pang
Raghavan
Raghavan
Roth
Settles
Settles
Shubhomoy Das
Simone Stumpf
Sindhwani
Speer
Stumpf
Travis Moore
Weng-Keen Wong
Wong
Wong
Wu
Zhou
Zhu
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/11/2013
Field of study

When intelligent interfaces, such as intelligent desktop assistants, email classifiers, and recommender systems, customize themselves to a particular end user, such customizations can decrease productivity and increase frustration due to inaccurate predictions — especially in early stages when training data is limited. The end user ca improve the learning algorithm by tediously labeling a substantial amount of additional training data, but this takes time and is too ad hoc to target a particular area of inaccuracy. To solve this problem, we propose new supervised and semi-supervised learning algorithms based on locally weighted logistic regression for feature labeling by end users, enabling them to point out which features are important for a class, rather than provide new training instances. We first evaluate our algorithms against other feature labeling algorithms under idealized conditions using feature labels generated by an oracle. In addition, another of our contributions is an evaluation of feature labeling algorithms under real world conditions using feature labels harvested from actual end users in our user study. Our user study is the first statistical user study for feature labeling involving a large number of end users (43 participants), all of whom have no background in machine learning. Our supervised and semi-supervised algorithms were among the best performers when compared to other feature labeling algorithms in the idealized setting and they are also robust to poor quality feature labels provided by ordinary end users in our study. We also perform an analysis to investigate the relative gains of incorporating the different sources of knowledge available in the labeled training set, the feature labels and the unlabeled data. Together, our results strongly suggest that feature labeling by end users is both viable and effective for allowing end users to improve the learning algorithm behind their customized applications

On semi-supervised estimation using exponential tilt mixture models

Author: Tan Zhiqiang
Tian Ye
Zhang Xinwei
Publication venue
Publication date: 14/11/2023
Field of study

Consider a semi-supervised setting with a labeled dataset of binary responses and predictors and an unlabeled dataset with only the predictors. Logistic regression is equivalent to an exponential tilt model in the labeled population. For semi-supervised estimation, we develop further analysis and understanding of a statistical approach using exponential tilt mixture (ETM) models and maximum nonparametric likelihood estimation, while allowing that the class proportions may differ between the unlabeled and labeled data. We derive asymptotic properties of ETM-based estimation and demonstrate improved efficiency over supervised logistic regression in a random sampling setup and an outcome-stratified sampling setup previously used. Moreover, we reconcile such efficiency improvement with the existing semiparametric efficiency theory when the class proportions in the unlabeled and labeled data are restricted to be the same. We also provide a simulation study to numerically illustrate our theoretical findings

arXiv.org e-Print Archive

Adapting to Change: Robust Counterfactual Explanations in Dynamic Data Landscapes

Author: Kasneci Gjergji
Leemann Tobias
Prenkaj Bardh
Villaizan-Vallelado Mario
Publication venue
Publication date: 04/08/2023
Field of study

We introduce a novel semi-supervised Graph Counterfactual Explainer (GCE) methodology, Dynamic GRAph Counterfactual Explainer (DyGRACE). It leverages initial knowledge about the data distribution to search for valid counterfactuals while avoiding using information from potentially outdated decision functions in subsequent time steps. Employing two graph autoencoders (GAEs), DyGRACE learns the representation of each class in a binary classification scenario. The GAEs minimise the reconstruction error between the original graph and its learned representation during training. The method involves (i) optimising a parametric density function (implemented as a logistic regression function) to identify counterfactuals by maximising the factual autoencoder's reconstruction error, (ii) minimising the counterfactual autoencoder's error, and (iii) maximising the similarity between the factual and counterfactual graphs. This semi-supervised approach is independent of an underlying black-box oracle. A logistic regression model is trained on a set of graph pairs to learn weights that aid in finding counterfactuals. At inference, for each unseen graph, the logistic regressor identifies the best counterfactual candidate using these learned weights, while the GAEs can be iteratively updated to represent the continual adaptation of the learned graph representation over iterations. DyGRACE is quite effective and can act as a drift detector, identifying distributional drift based on differences in reconstruction errors between iterations. It avoids reliance on the oracle's predictions in successive iterations, thereby increasing the efficiency of counterfactual discovery. DyGRACE, with its capacity for contrastive learning and drift detection, will offer new avenues for semi-supervised learning and explanation generation

arXiv.org e-Print Archive

Semi-Supervised Factored Logistic Regression for High-Dimensional Neuroimaging Data

Author: Bzdok Danilo
Eickenberg Michael
Grisel Olivier
Thirion Bertrand
Varoquaux Gaël
Publication venue: HAL CCSD
Publication date: 07/12/2015
Field of study

International audienceImaging neuroscience links human behavior to aspects of brain biology in ever-increasing datasets. Existing neuroimaging methods typically perform either discovery of unknown neural structure or testing of neural structure associated with mental tasks. However, testing hypotheses on the neural correlates underlying larger sets of mental tasks necessitates adequate representations for the observations. We therefore propose to blend representation modelling and task classification into a unified statistical learning problem. A multinomial logistic regression is introduced that is constrained by factored coefficients and coupled with an au-toencoder. We show that this approach yields more accurate and interpretable neural models of psychological tasks in a reference dataset, as well as better generalization to other datasets

INRIA a CCSD electronic archive server

HAL-CEA