2,396 research outputs found
Ridge Regression, Hubness, and Zero-Shot Learning
This paper discusses the effect of hubness in zero-shot learning, when ridge
regression is used to find a mapping between the example space to the label
space. Contrary to the existing approach, which attempts to find a mapping from
the example space to the label space, we show that mapping labels into the
example space is desirable to suppress the emergence of hubs in the subsequent
nearest neighbor search step. Assuming a simple data model, we prove that the
proposed approach indeed reduces hubness. This was verified empirically on the
tasks of bilingual lexicon extraction and image labeling: hubness was reduced
with both of these tasks and the accuracy was improved accordingly.Comment: To be presented at ECML/PKDD 201
PCA-based dimensionality reduction for face recognition
In this paper, we conduct a comprehensive study on dimensionality reduction (DR) techniques and discuss the mostly used statistical DR technique called principal component analysis (PCA) in detail with a view to addressing the classical face recognition problem. Therefore, we, more devotedly, propose a solution to either a typical face or individual face recognition based on the principal components, which are constructed using PCA on the face images. We simulate the proposed solution with several training and test sets of manually captured face images and also with the popular Olivetti Research Laboratory (ORL) and Yale face databases. The performance measure of the proposed face recognizer signifies its superiority
Positive Definite Kernels in Machine Learning
This survey is an introduction to positive definite kernels and the set of
methods they have inspired in the machine learning literature, namely kernel
methods. We first discuss some properties of positive definite kernels as well
as reproducing kernel Hibert spaces, the natural extension of the set of
functions associated with a kernel defined
on a space . We discuss at length the construction of kernel
functions that take advantage of well-known statistical models. We provide an
overview of numerous data-analysis methods which take advantage of reproducing
kernel Hilbert spaces and discuss the idea of combining several kernels to
improve the performance on certain tasks. We also provide a short cookbook of
different kernels which are particularly useful for certain data-types such as
images, graphs or speech segments.Comment: draft. corrected a typo in figure
Learning Representations of Social Media Users
User representations are routinely used in recommendation systems by platform
developers, targeted advertisements by marketers, and by public policy
researchers to gauge public opinion across demographic groups. Computer
scientists consider the problem of inferring user representations more
abstractly; how does one extract a stable user representation - effective for
many downstream tasks - from a medium as noisy and complicated as social media?
The quality of a user representation is ultimately task-dependent (e.g. does
it improve classifier performance, make more accurate recommendations in a
recommendation system) but there are proxies that are less sensitive to the
specific task. Is the representation predictive of latent properties such as a
person's demographic features, socioeconomic class, or mental health state? Is
it predictive of the user's future behavior?
In this thesis, we begin by showing how user representations can be learned
from multiple types of user behavior on social media. We apply several
extensions of generalized canonical correlation analysis to learn these
representations and evaluate them at three tasks: predicting future hashtag
mentions, friending behavior, and demographic features. We then show how user
features can be employed as distant supervision to improve topic model fit.
Finally, we show how user features can be integrated into and improve existing
classifiers in the multitask learning framework. We treat user representations
- ground truth gender and mental health features - as auxiliary tasks to
improve mental health state prediction. We also use distributed user
representations learned in the first chapter to improve tweet-level stance
classifiers, showing that distant user information can inform classification
tasks at the granularity of a single message.Comment: PhD thesi
- …