50,868 research outputs found
Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision
The goal of this work is to train discriminative cross-modal embeddings
without access to manually annotated data. Recent advances in self-supervised
learning have shown that effective representations can be learnt from natural
cross-modal synchrony. We build on earlier work to train embeddings that are
more discriminative for uni-modal downstream tasks. To this end, we propose a
novel training strategy that not only optimises metrics across modalities, but
also enforces intra-class feature separation within each of the modalities. The
effectiveness of the method is demonstrated on two downstream tasks: lip
reading using the features trained on audio-visual synchronisation, and speaker
recognition using the features trained for cross-modal biometric matching. The
proposed method outperforms state-of-the-art self-supervised baselines by a
signficant margin.Comment: Under submission as a conference pape
Acquiring Correct Knowledge for Natural Language Generation
Natural language generation (NLG) systems are computer software systems that
produce texts in English and other human languages, often from non-linguistic
input data. NLG systems, like most AI systems, need substantial amounts of
knowledge. However, our experience in two NLG projects suggests that it is
difficult to acquire correct knowledge for NLG systems; indeed, every knowledge
acquisition (KA) technique we tried had significant problems. In general terms,
these problems were due to the complexity, novelty, and poorly understood
nature of the tasks our systems attempted, and were worsened by the fact that
people write so differently. This meant in particular that corpus-based KA
approaches suffered because it was impossible to assemble a sizable corpus of
high-quality consistent manually written texts in our domains; and structured
expert-oriented KA techniques suffered because experts disagreed and because we
could not get enough information about special and unusual cases to build
robust systems. We believe that such problems are likely to affect many other
NLG systems as well. In the long term, we hope that new KA techniques may
emerge to help NLG system builders. In the shorter term, we believe that
understanding how individual KA techniques can fail, and using a mixture of
different KA techniques with different strengths and weaknesses, can help
developers acquire NLG knowledge that is mostly correct
TectoMT – a deep-Âlinguistic core of the combined Chimera MT system
Chimera is a machine translation system that combines the TectoMT deep-linguistic core with phrase-based MT system Moses. For English–Czech pair it also uses the Depfix post-correction system. All the components run on Unix/Linux platform and are open source (available from Perl repository CPAN and the LINDAT/CLARIN repository). The main website is https://ufal.mff.cuni.cz/tectomt. The development is currently supported by the QTLeap 7th FP project (http://qtleap.eu)
- …