34,680 research outputs found
Recommended from our members
End-user feature labeling: Supervised and semi-supervised approaches based on locally-weighted logistic regression
When intelligent interfaces, such as intelligent desktop assistants, email classifiers, and recommender systems, customize themselves to a particular end user, such customizations can decrease productivity and increase frustration due to inaccurate predictions — especially in early stages when training data is limited. The end user ca
improve the learning algorithm by tediously labeling a substantial amount of additional training data, but this takes time and is too ad hoc to target a particular area of inaccuracy. To solve this problem, we propose new supervised and semi-supervised learning algorithms based on locally weighted logistic regression for feature labeling by end users, enabling them to point out which features are important for a class, rather than provide new training instances.
We first evaluate our algorithms against other feature labeling algorithms under idealized conditions using feature labels generated by an oracle. In addition, another of our contributions is an evaluation of feature labeling algorithms under real world conditions using feature labels harvested from actual end users in our user study. Our user study is the first statistical user study for feature labeling involving a large number of end users (43 participants), all of whom have no background in machine learning.
Our supervised and semi-supervised algorithms were among
the best performers when compared to other feature labeling algorithms in the idealized setting and they are also robust to poor quality feature labels provided by ordinary
end users in our study. We also perform an analysis to investigate the relative gains of incorporating the different sources of knowledge available in the labeled training set, the feature labels and the unlabeled data. Together, our results strongly suggest that feature labeling by end users is both viable and effective for allowing end users to improve the learning algorithm behind their customized applications
On semi-supervised estimation using exponential tilt mixture models
Consider a semi-supervised setting with a labeled dataset of binary responses
and predictors and an unlabeled dataset with only the predictors. Logistic
regression is equivalent to an exponential tilt model in the labeled
population. For semi-supervised estimation, we develop further analysis and
understanding of a statistical approach using exponential tilt mixture (ETM)
models and maximum nonparametric likelihood estimation, while allowing that the
class proportions may differ between the unlabeled and labeled data. We derive
asymptotic properties of ETM-based estimation and demonstrate improved
efficiency over supervised logistic regression in a random sampling setup and
an outcome-stratified sampling setup previously used. Moreover, we reconcile
such efficiency improvement with the existing semiparametric efficiency theory
when the class proportions in the unlabeled and labeled data are restricted to
be the same. We also provide a simulation study to numerically illustrate our
theoretical findings
Adapting to Change: Robust Counterfactual Explanations in Dynamic Data Landscapes
We introduce a novel semi-supervised Graph Counterfactual Explainer (GCE)
methodology, Dynamic GRAph Counterfactual Explainer (DyGRACE). It leverages
initial knowledge about the data distribution to search for valid
counterfactuals while avoiding using information from potentially outdated
decision functions in subsequent time steps. Employing two graph autoencoders
(GAEs), DyGRACE learns the representation of each class in a binary
classification scenario. The GAEs minimise the reconstruction error between the
original graph and its learned representation during training. The method
involves (i) optimising a parametric density function (implemented as a
logistic regression function) to identify counterfactuals by maximising the
factual autoencoder's reconstruction error, (ii) minimising the counterfactual
autoencoder's error, and (iii) maximising the similarity between the factual
and counterfactual graphs. This semi-supervised approach is independent of an
underlying black-box oracle. A logistic regression model is trained on a set of
graph pairs to learn weights that aid in finding counterfactuals. At inference,
for each unseen graph, the logistic regressor identifies the best
counterfactual candidate using these learned weights, while the GAEs can be
iteratively updated to represent the continual adaptation of the learned graph
representation over iterations. DyGRACE is quite effective and can act as a
drift detector, identifying distributional drift based on differences in
reconstruction errors between iterations. It avoids reliance on the oracle's
predictions in successive iterations, thereby increasing the efficiency of
counterfactual discovery. DyGRACE, with its capacity for contrastive learning
and drift detection, will offer new avenues for semi-supervised learning and
explanation generation
Semi-Supervised Factored Logistic Regression for High-Dimensional Neuroimaging Data
International audienceImaging neuroscience links human behavior to aspects of brain biology in ever-increasing datasets. Existing neuroimaging methods typically perform either discovery of unknown neural structure or testing of neural structure associated with mental tasks. However, testing hypotheses on the neural correlates underlying larger sets of mental tasks necessitates adequate representations for the observations. We therefore propose to blend representation modelling and task classification into a unified statistical learning problem. A multinomial logistic regression is introduced that is constrained by factored coefficients and coupled with an au-toencoder. We show that this approach yields more accurate and interpretable neural models of psychological tasks in a reference dataset, as well as better generalization to other datasets
- …