Search CORE

2,877 research outputs found

Neural Networks Revisited for Proper Name Retrieval from Diachronic Documents

Author: Fohr Dominique
Illina Irina
Publication venue: HAL CCSD
Publication date: 27/11/2015
Field of study

International audienceDeveloping high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. To increase the vocabulary coverage, a huge amount of text data should be used. In this paper, we extend the previously proposed neural networks for word embedding models: word vector representation proposed by Mikolov is enriched by an additional non-linear transformation. This model allows to better take into account lexical and semantic word relationships. In the context of broadcast news transcription and in terms of recall, experimental results show a good ability of the proposed model to select new relevant proper names

INRIA a CCSD electronic archive server

Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition

Author: Fohr Dominique
Illina Irina
Linares Georges
Sheikh Imran
Publication venue: 'International Speech Communication Association'
Publication date: 08/09/2016
Field of study

International audienceMany Proper Names (PNs) are Out-Of-Vocabulary (OOV) words for speech recognition systems used to process di-achronic audio data. To enable recovery of the PNs missed by the system, relevant OOV PNs can be retrieved by exploiting the semantic context of the spoken content. In this paper, we explore the Neural Bag-of-Words (NBOW) model, proposed previously for text classification, to retrieve relevant OOV PNs. We propose a Neural Bag-of-Weighted-Words (NBOW2) model in which the input embedding layer is augmented with a context anchor layer. This layer learns to assign importance to input words and has the ability to capture (task specific) keywords in a NBOW model. With experiments on French broadcast news videos we show that the NBOW and NBOW2 models outper-form earlier methods based on raw embeddings from LDA and Skip-gram. Combining NBOW with NBOW2 gives faster convergence during training

INRIA a CCSD electronic archive server

Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition

Author: Fohr Dominique
Illina Irina
Linares Georges
Sheikh Imran,
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/01/2017
Field of study

International audienceThe diachronic nature of broadcast news data leads to the problem of Out-Of-Vocabulary (OOV) words in Large Vocabulary Continuous Speech Recognition (LVCSR) systems. Analysis of OOV words reveals that a majority of them are Proper Names (PNs). However PNs are important for automatic indexing of audio-video content and for obtaining reliable automatic transcriptions. In this paper, we focus on the problem of OOV PNs in diachronic audio documents. To enable recovery of the PNs missed by the LVCSR system, relevant OOV PNs are retrieved by exploiting the semantic context of the LVCSR transcriptions. For retrieval of OOV PNs, we explore topic and semantic context derived from Latent Dirichlet Allocation (LDA) topic models, continuous word vector representations and the Neural Bag-of-Words (NBOW) model which is capable of learning task specific word and context representations. We propose a Neural Bag-of-Weighted Words (NBOW2) model which learns to assign higher weights to words that are important for retrieval of an OOV PN. With experiments on French broadcast news videos we show that the NBOW and NBOW2 models outperform the methods based on raw embeddings from LDA and Skip-gram models. Combining the NBOW and NBOW2 models gives a faster convergence during training. Second pass speech recognition experiments, in which the LVCSR vocabulary and language model are updated with the retrieved OOV PNs, demonstrate the effectiveness of the proposed context models

INRIA a CCSD electronic archive server

Computational approaches to semantic change

Author: Batista-Navarro Riza
Boons Frank
Borin Lars
Ciobanu Alina Maria
Dinu Liviu P.
Duan Yijun
Dubossarsky Haim
Grewal Karan
Handl Julia
Haslam Nick
Hengchen Simon
Jatowt Adam
Mahanty Sampriti
McGillivray Barbara
Palma Marco
Perrone Valerio
Peterson Stellan
Schlechtweg Dominik
Sköldberg Emma
Smith Jim Q.
Tahmasebi Nina
Uban Ana-Sabina
Vatri Alessandro
Vylomova Ekaterina
Xu Yang
Yoshikawa Masatoshi
Zhang Zheng-sheng
Publication venue: Language Science Press
Publication date: 26/02/2021
Field of study

Semantic change — how the meanings of words change over time — has preoccupied scholars since well before modern linguistics emerged in the late 19th and early 20th century, ushering in a new methodological turn in the study of language change. Compared to changes in sound and grammar, semantic change is the least  understood. Ever since, the study of semantic change has progressed steadily, accumulating a vast store of knowledge for over a century, encompassing many languages and language families. Historical linguists also early on realized the potential of computers as research tools, with papers at the very first international conferences in computational linguistics in the 1960s. Such computational studies still tended to be small-scale, method-oriented, and qualitative. However, recent years have witnessed a sea-change in this regard. Big-data empirical quantitative investigations are now coming to the forefront, enabled by enormous advances in storage capability and processing power. Diachronic corpora have grown beyond imagination, defying exploration by traditional manual qualitative methods, and language technology has become increasingly data-driven and semantics-oriented. These developments present a golden opportunity for the empirical study of semantic change over both long and short time spans. A major challenge presently is to integrate the hard-earned  knowledge and expertise of traditional historical linguistics with  cutting-edge methodology explored primarily in computational linguistics. The idea for the present volume came out of a concrete response to this challenge.  The 1st International Workshop on Computational Approaches to Historical Language Change (LChange'19), at ACL 2019, brought together scholars from both fields. This volume offers a survey of this exciting new direction in the study of semantic change, a discussion of the many remaining challenges that we face in pursuing it, and considerably updated and extended versions of a selection of the contributions to the LChange'19 workshop, addressing both more theoretical problems —  e.g., discovery of "laws of semantic change" — and practical applications, such as information retrieval in longitudinal text archives

Language Science Press