71 research outputs found
Transfer learning for image classification with sparse prototype representations
To learn a new visual category from few examples, prior knowledge from unlabeled data as well as previous related categories may be useful. We develop a new method for transfer learning which exploits available unlabeled data and an arbitrary kernel function; we form a representation based on kernel distances to a large set of unlabeled data points. To transfer knowledge from previous related problems we observe that a category might be learnable using only a small subset of reference prototypes. Related problems may share a significant number of relevant prototypes; we find such a reduced representation by performing a joint loss minimization over the training sets of related problems with a shared regularization penalty that minimizes the total number of prototypes involved in the approximation.This optimization problem can be formulated as a linear program thatcan be solved efficiently. We conduct experiments on a news-topic prediction task where the goal is to predict whether an image belongs to a particularnews topic. Our results show that when only few examples are available for training a target topic, leveraging knowledge learnt from other topics can significantly improve performance
Latent-Dynamic Discriminative Models for Continuous Gesture Recognition
Many problems in vision involve the prediction of a class label for each frame in an unsegmented sequence. In this paper we develop a discriminative framework for simultaneous sequence segmentation and labeling which can capture both intrinsic and extrinsic class dynamics. Our approach incorporates hidden state variables which model the sub-structure of a class sequence and learn the dynamics between class labels. Each class label has a disjoint set of associated hidden states, which enables efficient training and inference in our model. We evaluated our method on the task of recognizing human gestures from unsegmented video streams and performed experiments on three different datasets of head and eye gestures. Our results demonstrate that our model for visual gesture recognition outperform models based on Support Vector Machines, Hidden Markov Models, and Conditional Random Fields
A Projected Subgradient Method for Scalable Multi-Task Learning
Recent approaches to multi-task learning have investigated the use of a variety of matrix norm regularization schemes for promoting feature sharing across tasks.In essence, these approaches aim at extending the l1 framework for sparse single task approximation to the multi-task setting. In this paper we focus on the computational complexity of training a jointly regularized model and propose an optimization algorithm whose complexity is linear with the number of training examples and O(n log n) with n being the number of parameters of the joint model. Our algorithm is based on setting jointly regularized loss minimization as a convex constrained optimization problem for which we develop an efficient projected gradient algorithm. The main contribution of this paper is the derivation of a gradient projection method with l1ââ constraints that can be performed efficiently and which has convergence rates
Transfering Nonlinear Representations using Gaussian Processes with a Shared Latent Space
When a series of problems are related, representations derived fromlearning earlier tasks may be useful in solving later problems. Inthis paper we propose a novel approach to transfer learning withlow-dimensional, non-linear latent spaces. We show how suchrepresentations can be jointly learned across multiple tasks in adiscriminative probabilistic regression framework. When transferred tonew tasks with relatively few training examples, learning can befaster and/or more accurate. Experiments on a digit recognition taskshow significantly improved performance when compared to baselineperformance with the original feature representation or with arepresentation derived from a semi-supervised learning approach
Learning task-specific bilexical embeddings
We present a method that learns bilexical operators over distributional representations of words and leverages supervised data for a linguistic relation. The learning algorithm exploits lowrank bilinear forms and induces low-dimensional embeddings of the lexical space tailored for the target linguistic relation. An advantage of imposing low-rank constraints is that prediction
is expressed as the inner-product between low-dimensional embeddings, which can have great computational benefits. In experiments with multiple linguistic bilexical relations we show that our method effectively learns using embeddings of a few dimensions.Peer ReviewedPostprint (published version
Transferring Nonlinear Representations using Gaussian Processes with a Shared Latent Space
When a series of problems are related, representations derived from learning earlier tasks may be useful in solving later problems. In this paper we propose a novel approach to transfer learning with low-dimensional, non-linear latent spaces. We show how such representations can be jointly learned across multiple tasks in a Gaussian Process framework. When transferred to new tasks with relatively few training examples, learning can be faster and/or more accurate. Experiments on digit recognition and newsgroup classification tasks show significantly improved performance when compared to baseline performance with a representation derived from a semi-supervised learning approach or with a discriminative approach that uses only the target data
Spectral regularization for max-margin sequence tagging
We frame max-margin learning of latent variable structured prediction models as a convex optimization problem, making use of scoring functions computed by input-output observable operator models. This learning problem can be expressed as an optimization problem involving a low-rank Hankel matrix that represents the inputoutput operator model. The direct outcome of our work is a new spectral regularization method for max-margin structured prediction. Our experiments confirm that our proposed regularization framework leads to an effective way of controlling the capacity of structured prediction models.Peer ReviewedPostprint (published version
Entity disambiguation on a tight labeling budget
Many real-world NLP applications face the challenge of training an entity disambiguation model for a specific domain with a small labeling budget. In this setting there is often access to a large unlabeled pool of documents. It is then natural to ask the question: which samples should be selected for annotation? In this paper we propose a solution that combines feature diversity with low rank correction. Our sampling strategy is formulated in the context of bilinear tensor models. Our experiments show that the proposed approach can significantly reduce the amount of labeled data necessary to achieve a given performance.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant agreement No 853459. The authors gratefully acknowledge the computer resources at ARTEMISA, funded by the European Union ERDF and Comunitat Valenciana as well as the technical support provided by the Instituto de Física Corpuscular, IFIC (CSIC-UV). This research is supported by a recognition 2021 SGR-Cat (01266 LQMC) from AGAUR (Generalitat de Catalunya).Peer ReviewedPostprint (published version
Spectral learning of transducers over continuous sequences
In this paper we present a spectral algorithm for learning weighted nite state transducers (WFSTs) over paired input-output sequences, where the input is continuous and the output discrete. WFSTs are an important tool for modeling paired input-output sequences and have numerous applications in
real-world problems. Recently, Balle et al (2011) proposed a spectral method for learning WFSTs that overcomes some of the well known limitations of gradient-based or EM optimizations which can be computationally expensive and su er from local optima issues. Their algorithm can model distributions where both inputs and outputs are sequences from a discrete alphabet.
However, many real world problems require modeling paired sequences where the inputs are not discrete but continuos sequences. Modelling continuous sequences with spectral methods has been studied in the context of HMMs (Song et al 2010), where a spectral algorithm for this case was derived. In this
paper we follow that line of work and propose a spectral learning algorithm
for modelling paired input-output sequences where the inputs are continuous and the outputs are discrete. Our approach is based on generalizing the class of weighted nite state transducers over discrete input-output sequences to a class where transitions are linear combinations of elementary transitions and the weights of this linear combinations are determined by dynamic features of the continuous input sequence.
At its core, the algorithm is simple and scalable to large data sets. We present experiments on a real task that validate the eff ectiveness of the proposed approach.Postprint (published version
- …
