Search CORE

24,343 research outputs found

Compositional Morphology for Word Representations and Language Modelling

Author: Blunsom Phil
Botha Jan A.
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents a scalable method for integrating compositional morphological representations into a vector-based probabilistic language model. Our approach is evaluated in the context of log-bilinear language models, rendered suitably efficient for implementation inside a machine translation decoder by factoring the vocabulary. We perform both intrinsic and extrinsic evaluations, presenting results on a range of languages which demonstrate that our model learns morphological representations that both perform well on word similarity tasks and lead to substantial reductions in perplexity. When used for translation into morphologically rich languages with large vocabularies, our models obtain improvements of up to 1.2 BLEU points relative to a baseline system using back-off n-gram models.Comment: Proceedings of the 31st International Conference on Machine Learning (ICML

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

Language Modeling with Power Low Rank Ensembles

Author: Dyer Chris
Parikh Ankur P.
Saluja Avneesh
Xing Eric P.
Publication venue
Publication date: 01/01/2014
Field of study

We present power low rank ensembles (PLRE), a flexible framework for n-gram language modeling where ensembles of low rank matrices and tensors are used to obtain smoothed probability estimates of words in context. Our method can be understood as a generalization of n-gram modeling to non-integer n, and includes standard techniques such as absolute discounting and Kneser-Ney smoothing as special cases. PLRE training is efficient and our approach outperforms state-of-the-art modified Kneser Ney baselines in terms of perplexity on large corpora as well as on BLEU score in a downstream machine translation task

arXiv.org e-Print Archive

CiteSeerX

Crossref

MultiFarm: A benchmark for multilingual ontology matching

Author: Andrei Tamilin
Christian Meilicke
Cássia Trojahn
Elena Montiel-Ponsoda
Euzenat
Euzenat
Fred Freitas
Fu
García-Castro
Giunchiglia
Heiner Stuckenschmidt
Jung
Neches
Niepert
Ondřej Šváb-Zamazal
Raúl García-Castro
Ryan Ribeiro de Azevedo
Shenghui Wang
Vojtěch Svátek
Wang
Willem Robert van Hage
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2012
Field of study

In this paper we present the MultiFarm dataset, which has been designed as a benchmark for multilingual ontology matching. The MultiFarm dataset is composed of a set of ontologies translated in different languages and the corresponding alignments between these ontologies. It is based on the OntoFarm dataset, which has been used successfully for several years in the Ontology Alignment Evaluation Initiative (OAEI). By translating the ontologies of the OntoFarm dataset into eight different languages – Chinese, Czech, Dutch, French, German, Portuguese, Russian, and Spanish – we created a comprehensive set of realistic test cases. Based on these test cases, it is possible to evaluate and compare the performance of matching approaches with a special focus on multilingualism

VU Research Portal

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

MAnnheim DOCument Server

Archivo Digital UPM

Speech rhythm: a metaphor?

Author: Abercrombie D
Barry WJ
Carter PM
Couper-Kuhlen E
Cummins F
Dankovičová J
Dellwo V
Dellwo V
Eriksson A
Fletcher J
Francis Nolan
Gibbon D
Grabe E
Hae-Sung Jeon
Kim J-M
Koreman J
Lehiste I
Lin H
Lloyd James A
Mok P
Nazzi T
O'Dell M
O'Rourke E
Patel AD
Pike KL
Platt JT
Ramus F
Ramus F
Shattuck-Hufnagel S
Tongue RK
White L
White L
Windmann A
Zvonik E
Publication venue: 'The Royal Society'
Publication date: 11/11/2014
Field of study

Is speech rhythmic? In the absence of evidence for a traditional view that languages strive to coordinate either syllables or stress-feet with regular time intervals, we consider the alternative that languages exhibit contrastive rhythm subsisting merely in the alternation of stronger and weaker elements. This is initially plausible, particularly for languages with a steep ‘prominence gradient’, i.e. a large disparity between stronger and weaker elements; but we point out that alternation is poorly achieved even by a ‘stress-timed’ language such as English, and, historically, languages have conspicuously failed to adopt simple phonological remedies that would ensure alternation. Languages seem more concerned to allow ‘syntagmatic contrast’ between successive units and to use durational effects to support linguistic functions than to facilitate rhythm. Furthermore, some languages (e.g. Tamil, Korean) lack the lexical prominence which would most straightforwardly underpin prominence alternation. We conclude that speech is not incontestibly rhythmic, and may even be antirhythmic. However, its linguistic structure and patterning allow the metaphorical extension of rhythm in varying degrees and in different ways depending on the language, and that it is this analogical process which allows speech to be matched to external rhythms

CLoK

Crossref

PubMed Central