3,055 research outputs found
Cross-Lingual Adaptation using Structural Correspondence Learning
Cross-lingual adaptation, a special case of domain adaptation, refers to the
transfer of classification knowledge between two languages. In this article we
describe an extension of Structural Correspondence Learning (SCL), a recently
proposed algorithm for domain adaptation, for cross-lingual adaptation. The
proposed method uses unlabeled documents from both languages, along with a word
translation oracle, to induce cross-lingual feature correspondences. From these
correspondences a cross-lingual representation is created that enables the
transfer of classification knowledge from the source to the target language.
The main advantages of this approach over other approaches are its resource
efficiency and task specificity.
We conduct experiments in the area of cross-language topic and sentiment
classification involving English as source language and German, French, and
Japanese as target languages. The results show a significant improvement of the
proposed method over a machine translation baseline, reducing the relative
error due to cross-lingual adaptation by an average of 30% (topic
classification) and 59% (sentiment classification). We further report on
empirical analyses that reveal insights into the use of unlabeled data, the
sensitivity with respect to important hyperparameters, and the nature of the
induced cross-lingual correspondences
Labeling the Features Not the Samples: Efficient Video Classification with Minimal Supervision
Feature selection is essential for effective visual recognition. We propose
an efficient joint classifier learning and feature selection method that
discovers sparse, compact representations of input features from a vast sea of
candidates, with an almost unsupervised formulation. Our method requires only
the following knowledge, which we call the \emph{feature sign}---whether or not
a particular feature has on average stronger values over positive samples than
over negatives. We show how this can be estimated using as few as a single
labeled training sample per class. Then, using these feature signs, we extend
an initial supervised learning problem into an (almost) unsupervised clustering
formulation that can incorporate new data without requiring ground truth
labels. Our method works both as a feature selection mechanism and as a fully
competitive classifier. It has important properties, low computational cost and
excellent accuracy, especially in difficult cases of very limited training
data. We experiment on large-scale recognition in video and show superior speed
and performance to established feature selection approaches such as AdaBoost,
Lasso, greedy forward-backward selection, and powerful classifiers such as SVM.Comment: arXiv admin note: text overlap with arXiv:1411.771
Bounded Coordinate-Descent for Biological Sequence Classification in High Dimensional Predictor Space
We present a framework for discriminative sequence classification where the
learner works directly in the high dimensional predictor space of all
subsequences in the training set. This is possible by employing a new
coordinate-descent algorithm coupled with bounding the magnitude of the
gradient for selecting discriminative subsequences fast. We characterize the
loss functions for which our generic learning algorithm can be applied and
present concrete implementations for logistic regression (binomial
log-likelihood loss) and support vector machines (squared hinge loss).
Application of our algorithm to protein remote homology detection and remote
fold recognition results in performance comparable to that of state-of-the-art
methods (e.g., kernel support vector machines). Unlike state-of-the-art
classifiers, the resulting classification models are simply lists of weighted
discriminative subsequences and can thus be interpreted and related to the
biological problem
Beyond Sparsity: Tree Regularization of Deep Models for Interpretability
The lack of interpretability remains a key barrier to the adoption of deep
models in many applications. In this work, we explicitly regularize deep models
so human users might step through the process behind their predictions in
little time. Specifically, we train deep time-series models so their
class-probability predictions have high accuracy while being closely modeled by
decision trees with few nodes. Using intuitive toy examples as well as medical
tasks for treating sepsis and HIV, we demonstrate that this new tree
regularization yields models that are easier for humans to simulate than
simpler L1 or L2 penalties without sacrificing predictive power.Comment: To appear in AAAI 2018. Contains 9-page main paper and appendix with
supplementary materia
Dropout Training as Adaptive Regularization
Dropout and other feature noising schemes control overfitting by artificially
corrupting the training data. For generalized linear models, dropout performs a
form of adaptive regularization. Using this viewpoint, we show that the dropout
regularizer is first-order equivalent to an L2 regularizer applied after
scaling the features by an estimate of the inverse diagonal Fisher information
matrix. We also establish a connection to AdaGrad, an online learning
algorithm, and find that a close relative of AdaGrad operates by repeatedly
solving linear dropout-regularized problems. By casting dropout as
regularization, we develop a natural semi-supervised algorithm that uses
unlabeled data to create a better adaptive regularizer. We apply this idea to
document classification tasks, and show that it consistently boosts the
performance of dropout training, improving on state-of-the-art results on the
IMDB reviews dataset.Comment: 11 pages. Advances in Neural Information Processing Systems (NIPS),
201
- …