129,265 research outputs found

    Cross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation

    Full text link
    Existing approaches to automatic VerbNet-style verb classification are heavily dependent on feature engineering and therefore limited to languages with mature NLP pipelines. In this work, we propose a novel cross-lingual transfer method for inducing VerbNets for multiple languages. To the best of our knowledge, this is the first study which demonstrates how the architectures for learning word embeddings can be applied to this challenging syntactic-semantic task. Our method uses cross-lingual translation pairs to tie each of the six target languages into a bilingual vector space with English, jointly specialising the representations to encode the relational information from English VerbNet. A standard clustering algorithm is then run on top of the VerbNet-specialised representations, using vector dimensions as features for learning verb classes. Our results show that the proposed cross-lingual transfer approach sets new state-of-the-art verb classification performance across all six target languages explored in this work.Comment: EMNLP 2017 (long paper

    A Correlational Encoder Decoder Architecture for Pivot Based Sequence Generation

    Full text link
    Interlingua based Machine Translation (MT) aims to encode multiple languages into a common linguistic representation and then decode sentences in multiple target languages from this representation. In this work we explore this idea in the context of neural encoder decoder architectures, albeit on a smaller scale and without MT as the end goal. Specifically, we consider the case of three languages or modalities X, Z and Y wherein we are interested in generating sequences in Y starting from information available in X. However, there is no parallel training data available between X and Y but, training data is available between X & Z and Z & Y (as is often the case in many real world applications). Z thus acts as a pivot/bridge. An obvious solution, which is perhaps less elegant but works very well in practice is to train a two stage model which first converts from X to Z and then from Z to Y. Instead we explore an interlingua inspired solution which jointly learns to do the following (i) encode X and Z to a common representation and (ii) decode Y from this common representation. We evaluate our model on two tasks: (i) bridge transliteration and (ii) bridge captioning. We report promising results in both these applications and believe that this is a right step towards truly interlingua inspired encoder decoder architectures.Comment: 10 page

    Reanalyzing language expectations: Native language knowledge modulates the sensitivity to intervening cues during anticipatory processing

    Get PDF
    Issue Online:21 September 2018We investigated how native language experience shapes anticipatory language processing. Two groups of bilinguals (either Spanish or Basque natives) performed a word matching task (WordMT) and a picture matching task (PictureMT). They indicated whether the stimuli they visually perceived matched with the noun they heard. Spanish noun endings were either diagnostic of the gender (transparent) or ambiguous (opaque). ERPs were time-locked to an intervening gender-marked determiner preceding the predicted noun. The determiner always gender agreed with the following noun but could also introduce a mismatching noun, so that it was not fully task diagnostic. Evoked brain activity time-locked to the determiner was considered as reflecting updating/reanalysis of the task-relevant preactivated representation. We focused on the timing of this effect by estimating the comparison between a gender-congruent and a gender-incongruent determiner. In the WordMT, both groups showed a late N400 effect. Crucially, only Basque natives displayed an earlier P200 effect for determiners preceding transparent nouns. In the PictureMT, both groups showed an early P200 effect for determiners preceding opaque nouns. The determiners of transparent nouns triggered a negative effect at similar to 430 ms in Spanish natives, but at similar to 550 ms in Basque natives. This pattern of results supports a "retracing hypothesis" according to which the neurocognitive system navigates through the intermediate (sublexical and lexical) linguistic representations available from previous processing to evaluate the need of an update in the linguistic expectation concerning a target lexical item.Spanish Ministry of Economy and Competitiveness (MINECO), Agencia Estatal de Investigación (AEI), Fondo Europeo de Desarrollo Regional (FEDER) (grant PSI2015‐65694‐P to N. M.), Spanish Ministry of Economy and Competitiveness “Severo Ochoa” Programme for Centres/Units of Excellence in R&D (grant SEV‐2015‐490

    Set the controls for the heart of the alternation: Dahl’s Law in Kitharaka

    Get PDF
    This paper looks at Dahl’s Law, a voicing dissimilation process found in a number of Bantu languages, in Kitharaka, and argues that it is best analysed within a framework of minimal (contrastive) feature spe- cifications. We show that the standard account of [±voice] dissimilation runs into a number of problems in Kitharaka and propose a new analysis, couched within the framework of the Parallel Structures Model of Feature Geometry (Morén 2003; 2006) and Optimality Theory, thereby also addressing the question of the division of labour between constraints and representations. The analysis shows that it is crucial to look at the whole system of phonological oppositions and natural classes in Kitharaka to understand how the process works, ultimately also using loanwords to glean crucial insight into how the phoneme system of Kitharaka is organised
    corecore