Search CORE

518 research outputs found

On the use of probabilistic grammars in speech annotation and segmentation tasks

Author: Nesterenko Irina
Rauzy Stéphane
Publication venue: HAL CCSD
Publication date: 01/10/2007
Field of study

International audienceThe present paper explores the issue of corpus prosodic parsing in terms of prosodic words. This question is of importance in both speech processing and corpus annotation studies. We propose a method grounded on both statistical ans symbolic (phonologicial) representations of tonal phenomena and we have recourse to probabilisitic grammars, within which we implement a minimal prosodic hierarchical structure. Both stages of probabilistic grammar building and its testing in prediction are explored and quantitatively and qualitatively evaluated

HAL AMU

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

Author: Berzak Yevgeni
Korhonen Anna
O'Horan Helen
Poibeau Thierry
Ponti Edoardo Maria
Reichart Roi
Shutova Ekaterina
Vulić Ivan
Publication venue
Publication date: 27/02/2019
Field of study

Linguistic typology aims to capture structural and semantic variation across the world's languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-employment of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such approach could be facilitated by recent developments in data-driven induction of typological knowledge

arXiv.org e-Print Archive

Edinburgh Research Explorer

Apollo (Cambridge)

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Introduction to the special issue on cross-language algorithms and applications

Author: Bangalore Srinivas
Lambert Patrik
Montiel-Ponsoda Elena
Màrquez Lluís
Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2016
Field of study

With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

The pervasiveness of language contact: Evidence from negative existentials in Romeyka/Turkish code-switching

Author: Kaya Zeyneb
Publication venue: 'Linguistic Society of America'
Publication date: 27/04/2023
Field of study

This paper investigates the morpho-syntactic features of language contact in the endangered Greek dialect Romeyka with Turkish. We analyze the use of the borrowed negative existential jok to (a) determine its role in Romeyka’s negation patterns (b) examine the effects of contact in Romeyka through cross-linguistic comparisons of jok with Turkish and forms of the dialect as spoken in Greece and (c) apply the identified grammatical patterns of jok to Myers-Scotton’s linguistic explanations for the code switching phenomena in the Matrix Language Turnover Hypothesis. The analysis demonstrates the pervasive influence of Turkish on the morpho-syntax of Romeyka through the incorporation of Turkish grammatical structures. We observe changes in the fundamental predicate grammar that are aligned with Turkish and that are inconsistent with Pontic’s existential constructions where the verb indicating existence is used. The patterns of contact confirm the Matrix Language hypothesis and provide evidence that indicate that Romeyka may be undergoing language turnover. Our findings are relevant to further understanding code switching among speakers of minority languages and assessing the vitality of Romeyka in Turkey

Proceedings Published by the LSA (Linguistic Society of America)

Recommended from our members

DIRECTIONAL HARMONIC SERIALISM

Author: Lamont Andrew
Publication venue: ScholarWorks@UMass Amherst
Publication date: 26/10/2022
Field of study

This dissertation proposes a novel phonological framework, directional Harmonic Serialism, that synthesizes constraint-based, rule-based, and formal language theoretic approaches to phonology. I illustrate its advantages in the domains of feature spreading, quantity-insensitive footing, and autosegmental phonology. Specifically, I demonstrate that across these disparate domains, directional Harmonic Serialism makes empirical predictions that more tightly model natural language phonology than alternative theories and that it does so using fewer theoretical mechanisms. At a high level, the theory outperforms alternatives using a simpler, more restricted toolkit

ScholarWorks@UMass Amherst

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

Author: Berzak Y.
Korhonen A.
O'Horan H.
Poibeau T.
Ponti E.M.
Reichart R.
Shutova E.
Vulić I.
Publication venue: 'MIT Press - Journals'
Publication date: 01/09/2019
Field of study

International Migration, Integration and Social Cohesion online publications

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

Author: Berzak Yevgeni
Korhonen Anna
O'Horan Helen
Poibeau Thierry
Ponti Edoardo Maria
Reichart Roi
Shutova Ekaterina
Vulic Ivan
Publication venue: COMPUTATIONAL LINGUISTICS
Publication date: 09/08/2018
Field of study

Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge.</jats:p

arXiv.org e-Print Archive

Edinburgh Research Explorer

Apollo (Cambridge)

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Normalization of Dutch user-generated content

Author: De Clercq Orphée
Desmet Bart
Hoste Veronique
Lefever Els
Schulz Sarah
Publication venue: INCOMA
Publication date: 01/01/2013
Field of study

Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-generated content (UGC). We compiled a corpus of three different social media genres (text messages, message board posts and tweets) to have a sample of this recent domain. We describe the various characteristics of this noisy text material and explain how it has been manually normalized using newly developed guidelines. For the automatic normalization task we focus on text messages, and find that a cascaded SMT system where a token-based module is followed by a translation at the character level gives the best word error rate reduction. After these initial experiments, we investigate the system's robustness on the complete domain of UGC by testing it on the other two social media genres, and find that the cascaded approach performs best on these genres as well. To our knowledge, we deliver the first proof-of-concept system for Dutch UGC normalization, which can serve as a baseline for future work

CiteSeerX

Ghent University Academic Bibliography

Archivsystem Ask23