1,753 research outputs found

    Probabilistic Models for Alignment of Etymological Data

    Get PDF
    Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), 246-253. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

    The influence of film music on moral judgments of movie scenes and felt emotions

    Get PDF
    Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG geförderten) Allianz- bzw. Nationallizenz frei zugänglich.This publication is with permission of the rights owner freely accessible due to an Alliance licence and a national licence (funded by the DFG, German Research Foundation) respectively.Music can modulate perceptions, actions, and judgments in everyday situations. The aim of this study was to investigate a potential influence of music on moral judgments in the context of film reception. In the course of an online experiment, 252 participants were assigned to three different experimental conditions (no, positive, or negative music). Participants were requested to assess actions shown in two 2–3-minute audio-visual film excerpts with regard to their perceived moral rightness and to report induced emotions after watching the film clips. Afterwards, they were asked to complete the MFQ-30 questionnaire measuring the foundations of their moral judgments. Results revealed that in one of four cases (i.e. happiness in film excerpt 1), music had a significant effect on recipients’ emotions and also indirectly influenced their moral judgment. In three of four cases, however, the intended emotion induction through film music did not succeed, and thus a significant indirect influence of music on moral judgment was not found. Furthermore, associations between moral foundations, perceived rightness of action, and induced emotions were observed. Future lab studies are indicated to investigate potential moderating influences of the experimental environment on emotion induction through film music

    Computational Approaches to Historical Language Comparison

    Get PDF
    The chapter discusses recently developed computational techniques providing concrete help in addressing various tasks in historical language comparison, focusing specifically on those tasks which are typically subsumed under the framework of the comparative method. These include the proof of relationship, cognate and correspondence detection, phonological reconstruction and sound law induction, and the reconstruction of evolutionary scenarios

    Character-level and syntax-level models for low-resource and multilingual natural language processing

    Get PDF
    There are more than 7000 languages in the world, but only a small portion of them benefit from Natural Language Processing resources and models. Although languages generally present different characteristics, “cross-lingual bridges” can be exploited, such as transliteration signals and word alignment links. Such information, together with the availability of multiparallel corpora and the urge to overcome language barriers, motivates us to build models that represent more of the world’s languages. This thesis investigates cross-lingual links for improving the processing of low-resource languages with language-agnostic models at the character and syntax level. Specifically, we propose to (i) use orthographic similarities and transliteration between Named Entities and rare words in different languages to improve the construction of Bilingual Word Embeddings (BWEs) and named entity resources, and (ii) exploit multiparallel corpora for projecting labels from high- to low-resource languages, thereby gaining access to weakly supervised processing methods for the latter. In the first publication, we describe our approach for improving the translation of rare words and named entities for the Bilingual Dictionary Induction (BDI) task, using orthography and transliteration information. In our second work, we tackle BDI by enriching BWEs with orthography embeddings and a number of other features, using our classification-based system to overcome script differences among languages. The third publication describes cheap cross-lingual signals that should be considered when building mapping approaches for BWEs since they are simple to extract, effective for bootstrapping the mapping of BWEs, and overcome the failure of unsupervised methods. The fourth paper shows our approach for extracting a named entity resource for 1340 languages, including very low-resource languages from all major areas of linguistic diversity. We exploit parallel corpus statistics and transliteration models and obtain improved performance over prior work. Lastly, the fifth work models annotation projection as a graph-based label propagation problem for the part of speech tagging task. Part of speech models trained on our labeled sets outperform prior work for low-resource languages like Bambara (an African language spoken in Mali), Erzya (a Uralic language spoken in Russia’s Republic of Mordovia), Manx (the Celtic language of the Isle of Man), and Yoruba (a Niger-Congo language spoken in Nigeria and surrounding countries)

    A quantitative approach to social and geographical dialect variation

    Get PDF

    Automated methods for the investigation of language contact, with a focus on lexical borrowing

    Get PDF
    While language contact has so far been predominantly studied on the basis of detailed case studies, the emergence of methods for phylogenetic reconstruction and automated word comparison – as a result of the recent quantitative turn in historical linguistics – has also resulted in new proposals to study language contact situations by means of automated approaches. This study provides a concise introduction to the most important approaches which have been proposed in the past, presenting methods that use (A) phylogenetic networks to detect reticulation events during language history, (B) sequence comparison methods in order to identify borrowings in multilingual datasets, and (C) arguments for the borrowability of shared traits to decide if traits have been borrowed or inherited. While the overview focuses on approaches dealing with lexical borrowing, questions of general contact inference will also be discussed where applicable

    Information-theoretic causal inference of lexical flow

    Get PDF
    This volume seeks to infer large phylogenetic networks from phonetically encoded lexical data and contribute in this way to the historical study of language varieties. The technical step that enables progress in this case is the use of causal inference algorithms. Sample sets of words from language varieties are preprocessed into automatically inferred cognate sets, and then modeled as information-theoretic variables based on an intuitive measure of cognate overlap. Causal inference is then applied to these variables in order to determine the existence and direction of influence among the varieties. The directed arcs in the resulting graph structures can be interpreted as reflecting the existence and directionality of lexical flow, a unified model which subsumes inheritance and borrowing as the two main ways of transmission that shape the basic lexicon of languages. A flow-based separation criterion and domain-specific directionality detection criteria are developed to make existing causal inference algorithms more robust against imperfect cognacy data, giving rise to two new algorithms. The Phylogenetic Lexical Flow Inference (PLFI) algorithm requires lexical features of proto-languages to be reconstructed in advance, but yields fully general phylogenetic networks, whereas the more complex Contact Lexical Flow Inference (CLFI) algorithm treats proto-languages as hidden common causes, and only returns hypotheses of historical contact situations between attested languages. The algorithms are evaluated both against a large lexical database of Northern Eurasia spanning many language families, and against simulated data generated by a new model of language contact that builds on the opening and closing of directional contact channels as primary evolutionary events. The algorithms are found to infer the existence of contacts very reliably, whereas the inference of directionality remains difficult. This currently limits the new algorithms to a role as exploratory tools for quickly detecting salient patterns in large lexical datasets, but it should soon be possible for the framework to be enhanced e.g. by confidence values for each directionality decision

    Approches Neuronales pour la Reconstruction de Mots Historiques

    Get PDF
    In historical linguistics, cognates are words that descend in direct line from a common ancestor, called their proto-form, andtherefore are representative of their respective languages evolutions through time, as well as of the relations between theselanguages synchronically. As they reflect the phonetic history of the languages they belong to, they allow linguists to betterdetermine all manners of synchronic and diachronic linguistic relations (etymology, phylogeny, sound correspondences).Cognates of related languages tend to be linked through systematic phonetic correspondence patterns, which neuralnetworks could well learn to model, being especially good at learning latent patterns. In this dissertation, we seek tomethodically study the applicability of machine translation inspired neural networks to historical word prediction, relyingon the surface similarity of both tasks. We first create an artificial dataset inspired by the phonetic and phonotactic rules ofRomance languages, which allow us to vary task complexity and data size in a controlled environment, therefore identifyingif and under which conditions neural networks were applicable. We then extend our work to real datasets (after havingupdated an etymological database to gather a correct amount of data), study the transferability of our conclusions toreal data, then the applicability of a number of data augmentation techniques to the task, to try to mitigate low-resourcesituations. We finally investigat in more detail our best models, multilingual neural networks. We first confirm that, onthe surface, they seem to capture language relatedness information and phonetic similarity, confirming prior work. Wethen discover, by probing them, that the information they store is actually more complex: our multilingual models actuallyencode a phonetic language model, and learn enough latent historical information to allow decoders to reconstruct the(unseen) proto-form of the studied languages as well or better than bilingual models trained specifically on the task. Thislatent information is likely the explanation for the success of multilingual methods in the previous worksEn linguistique historique, les cognats sont des mots qui descendent en ligne directe d'un ancêtre commun, leur proto-forme, et qui sont ainsi représentatifs de l'évolution de leurs langues respectives à travers le temps. Comme ils portent eneux l'histoire phonétique des langues auxquelles ils appartiennent, ils permettent aux linguistes de mieux déterminer toutessortes de relations linguistiques synchroniques et diachroniques (étymologie, phylogénie, correspondances phonétiques).Les cognats de langues apparentées sont liés par des correspondances phonétiques systématiques. Les réseaux deneurones, particulièrement adaptés à l'apprentissage de motifs latents, semblent donc bien un bon outil pour modéliserces correspondances. Dans cette thèse, nous cherchons donc à étudier méthodiquement l'applicabilité de réseaux deneurones spécifiques (inspirés de la traduction automatique) à la `prédiction de mots historiques', en nous appuyantsur les similitudes entre ces deux tâches. Nous créons tout d'abord un jeu de données artificiel à partir des règlesphonétiques et phonotactiques des langues romanes, que nous utilisons pour étudier l'utilisation de nos réseaux ensituation controlée, et identifions ainsi sous quelles conditions les réseaux de neurones sont applicables à notre tâched'intérêt. Nous étendons ensuite notre travail à des données réelles (après avoir mis à jour une base étymologiquespour obtenir d'avantage de données), étudions si nos conclusions précédentes leur sont applicables, puis s'il est possibled'utiliser des techniques d'augmentation des données pour pallier aux manque de ressources de certaines situations.Enfin, nous analysons plus en détail nos meilleurs modèles, les réseaux neuronaux multilingues. Nous confirmons àpartir de leurs résultats bruts qu'ils semblent capturer des informations de parenté linguistique et de similarité phonétique,ce qui confirme des travaux antérieurs. Nous découvrons ensuite en les sondant (probing) que les informations qu'ilsstockent sont en fait plus complexes : nos modèles multilingues encodent en fait un modèle phonétique de la langue, etapprennent suffisamment d'informations diachroniques latentes pour permettre à des décodeurs de reconstruire la proto-forme (non vue) des langues étudiées aussi bien, voire mieux, que des modèles bilingues entraînés spécifiquement surcette tâche. Ces informations latentes expliquent probablement le succès des méthodes multilingues dans les travauxprécédents

    Information-theoretic causal inference of lexical flow

    Get PDF
    This volume seeks to infer large phylogenetic networks from phonetically encoded lexical data and contribute in this way to the historical study of language varieties. The technical step that enables progress in this case is the use of causal inference algorithms. Sample sets of words from language varieties are preprocessed into automatically inferred cognate sets, and then modeled as information-theoretic variables based on an intuitive measure of cognate overlap. Causal inference is then applied to these variables in order to determine the existence and direction of influence among the varieties. The directed arcs in the resulting graph structures can be interpreted as reflecting the existence and directionality of lexical flow, a unified model which subsumes inheritance and borrowing as the two main ways of transmission that shape the basic lexicon of languages
    corecore