3,494 research outputs found

    Empirical Gaussian priors for cross-lingual transfer learning

    Full text link
    Sequence model learning algorithms typically maximize log-likelihood minus the norm of the model (or minimize Hamming loss + norm). In cross-lingual part-of-speech (POS) tagging, our target language training data consists of sequences of sentences with word-by-word labels projected from translations in kk languages for which we have labeled data, via word alignments. Our training data is therefore very noisy, and if Rademacher complexity is high, learning algorithms are prone to overfit. Norm-based regularization assumes a constant width and zero mean prior. We instead propose to use the kk source language models to estimate the parameters of a Gaussian prior for learning new POS taggers. This leads to significantly better performance in multi-source transfer set-ups. We also present a drop-out version that injects (empirical) Gaussian noise during online learning. Finally, we note that using empirical Gaussian priors leads to much lower Rademacher complexity, and is superior to optimally weighted model interpolation.Comment: Presented at NIPS 2015 Workshop on Transfer and Multi-Task Learnin

    On Correcting Inputs: Inverse Optimization for Online Structured Prediction

    Get PDF
    Algorithm designers typically assume that the input data is correct, and then proceed to find "optimal" or "sub-optimal" solutions using this input data. However this assumption of correct data does not always hold in practice, especially in the context of online learning systems where the objective is to learn appropriate feature weights given some training samples. Such scenarios necessitate the study of inverse optimization problems where one is given an input instance as well as a desired output and the task is to adjust the input data so that the given output is indeed optimal. Motivated by learning structured prediction models, in this paper we consider inverse optimization with a margin, i.e., we require the given output to be better than all other feasible outputs by a desired margin. We consider such inverse optimization problems for maximum weight matroid basis, matroid intersection, perfect matchings, minimum cost maximum flows, and shortest paths and derive the first known results for such problems with a non-zero margin. The effectiveness of these algorithmic approaches to online learning for structured prediction is also discussed.Comment: Conference version to appear in FSTTCS, 201

    Novel modeling of task versus rest brain state predictability using a dynamic time warping spectrum: comparisons and contrasts with other standard measures of brain dynamics

    Get PDF
    Dynamic time warping, or DTW, is a powerful and domain-general sequence alignment method for computing a similarity measure. Such dynamic programming-based techniques like DTW are now the backbone and driver of most bioinformatics methods and discoveries. In neuroscience it has had far less use, though this has begun to change. We wanted to explore new ways of applying DTW, not simply as a measure with which to cluster or compare similarity between features but in a conceptually different way. We have used DTW to provide a more interpretable spectral description of the data, compared to standard approaches such as the Fourier and related transforms. The DTW approach and standard discrete Fourier transform (DFT) are assessed against benchmark measures of neural dynamics. These include EEG microstates, EEG avalanches, and the sum squared error (SSE) from a multilayer perceptron (MLP) prediction of the EEG time series, and simultaneously acquired FMRI BOLD signal. We explored the relationships between these variables of interest in an EEG-FMRI dataset acquired during a standard cognitive task, which allowed us to explore how DTW differentially performs in different task settings. We found that despite strong correlations between DTW and DFT-spectra, DTW was a better predictor for almost every measure of brain dynamics. Using these DTW measures, we show that predictability is almost always higher in task than in rest states, which is consistent to other theoretical and empirical findings, providing additional evidence for the utility of the DTW approach
    • …
    corecore