9,851 research outputs found
Domain adaptation for statistical machine translation of corporate and user-generated content
The growing popularity of Statistical Machine Translation (SMT) techniques in recent years has led to the development of multiple domain-specic resources and adaptation scenarios. In this thesis we address two important and industrially relevant adaptation scenarios, each suited to different kinds of content.
Initially focussing on professionally edited `enterprise-quality' corporate content, we address a specic scenario of data translation from a mixture of different domains where, for each of them domain-specific data is available. We utilise an automatic classifier to combine multiple domain-specific models and empirically show that such a configuration results in better translation quality compared to both traditional and state-of-the-art techniques for handling mixed domain translation.
In the second phase of our research we shift our focus to the translation of possibly `noisy' user-generated content in web-forums created around products and services of a multinational company. Using professionally edited translation memory (TM) data for training, we use different normalisation and data selection techniques to adapt SMT models to noisy forum content. In this scenario, we also study the effect of mixture adaptation using a combination of in-domain and out-of-domain data at different component levels of an SMT system. Finally we focus on the task of optimal supplementary training data selection from out-of-domain corpora using a novel incremental model merging mechanism to adapt TM-based models to improve forum-content translation quality
High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered Face Images
Generating photorealistic images of human faces at scale remains a
prohibitively difficult task using computer graphics approaches. This is
because these require the simulation of light to be photorealistic, which in
turn requires physically accurate modelling of geometry, materials, and light
sources, for both the head and the surrounding scene. Non-photorealistic
renders however are increasingly easy to produce. In contrast to computer
graphics approaches, generative models learned from more readily available 2D
image data have been shown to produce samples of human faces that are hard to
distinguish from real data. The process of learning usually corresponds to a
loss of control over the shape and appearance of the generated images. For
instance, even simple disentangling tasks such as modifying the hair
independently of the face, which is trivial to accomplish in a computer
graphics approach, remains an open research question. In this work, we propose
an algorithm that matches a non-photorealistic, synthetically generated image
to a latent vector of a pretrained StyleGAN2 model which, in turn, maps the
vector to a photorealistic image of a person of the same pose, expression,
hair, and lighting. In contrast to most previous work, we require no synthetic
training data. To the best of our knowledge, this is the first algorithm of its
kind to work at a resolution of 1K and represents a significant leap forward in
visual realism
Detecting single-trial EEG evoked potential using a wavelet domain linear mixed model: application to error potentials classification
Objective. The main goal of this work is to develop a model for multi-sensor
signals such as MEG or EEG signals, that accounts for the inter-trial
variability, suitable for corresponding binary classification problems. An
important constraint is that the model be simple enough to handle small size
and unbalanced datasets, as often encountered in BCI type experiments.
Approach. The method involves linear mixed effects statistical model, wavelet
transform and spatial filtering, and aims at the characterization of localized
discriminant features in multi-sensor signals. After discrete wavelet transform
and spatial filtering, a projection onto the relevant wavelet and spatial
channels subspaces is used for dimension reduction. The projected signals are
then decomposed as the sum of a signal of interest (i.e. discriminant) and
background noise, using a very simple Gaussian linear mixed model. Main
results. Thanks to the simplicity of the model, the corresponding parameter
estimation problem is simplified. Robust estimates of class-covariance matrices
are obtained from small sample sizes and an effective Bayes plug-in classifier
is derived. The approach is applied to the detection of error potentials in
multichannel EEG data, in a very unbalanced situation (detection of rare
events). Classification results prove the relevance of the proposed approach in
such a context. Significance. The combination of linear mixed model, wavelet
transform and spatial filtering for EEG classification is, to the best of our
knowledge, an original approach, which is proven to be effective. This paper
improves on earlier results on similar problems, and the three main ingredients
all play an important role
- …