191 research outputs found
Development of linguistic linked open data resources for collaborative data-intensive research in the language sciences
Making diverse data in linguistics and the language sciences open, distributed, and accessible: perspectives from language/language acquistiion researchers and technical LOD (linked open data) researchers. This volume examines the challenges inherent in making diverse data in linguistics and the language sciences open, distributed, integrated, and accessible, thus fostering wide data sharing and collaboration. It is unique in integrating the perspectives of language researchers and technical LOD (linked open data) researchers. Reporting on both active research needs in the field of language acquisition and technical advances in the development of data interoperability, the book demonstrates the advantages of an international infrastructure for scholarship in the field of language sciences. With contributions by researchers who produce complex data content and scholars involved in both the technology and the conceptual foundations of LLOD (linguistics linked open data), the book focuses on the area of language acquisition because it involves complex and diverse data sets, cross-linguistic analyses, and urgent collaborative research. The contributors discuss a variety of research methods, resources, and infrastructures. Contributors Isabelle Barrière, Nan Bernstein Ratner, Steven Bird, Maria Blume, Ted Caldwell, Christian Chiarcos, Cristina Dye, Suzanne Flynn, Claire Foley, Nancy Ide, Carissa Kang, D. Terence Langendoen, Barbara Lust, Brian MacWhinney, Jonathan Masci, Steven Moran, Antonio Pareja-Lora, Jim Reidy, Oya Y. Rieger, Gary F. Simons, Thorsten Trippel, Kara Warburton, Sue Ellen Wright, Claus Zin
Development of Linguistic Linked Open Data Resources for Collaborative Data-Intensive Research in the Language Sciences
This book is the product of an international workshop dedicated to addressing data accessibility in the linguistics field. It is therefore vital to the book’s mission that its content be open access. Linguistics as a field remains behind many others as far as data management and accessibility strategies. The problem is particularly acute in the subfield of language acquisition, where international linguistic sound files are needed for reference. Linguists' concerns are very much tied to amount of information accumulated by individual researchers over the years that remains fragmented and inaccessible to the larger community. These concerns are shared by other fields, but linguistics to date has seen few efforts at addressing them. This collection, undertaken by a range of leading experts in the field, represents a big step forward. Its international scope and interdisciplinary combination of scholars/librarians/data consultants will provide an important contribution to the field
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
Language contact: Briding the gap between individual interactions and areal patterns
Contact linguistics is the overarching term for a highly diversified field with branches that connect to such widely divergent areas as historical linguistics, typology, sociolinguistics, psycholinguistics, and grammatical theory. Because of this diversification, there is a risk of fragmentation and lack of interaction between the different subbranches of contact linguistics. Nevertheless, the different approaches share the general goal of accounting for the results of interacting linguistic systems. This common goal opens up possibilities for active communication, cooperation, and coordination between the different branches of contact linguistics. This book, therefore, explores the extent to which contact linguistics can be viewed as a coherent field, and whether the advances achieved in a particular subfield can be translated to others. In this way our aim is to encourage a boundary-free discussion between different types of specialists of contact linguistics, and to stimulate cross-pollination between them
Bridging the gap between individual interactions and areal patterns
Synopsis:
Contact linguistics is the overarching term for a highly diversified field with branches that connect to such widely divergent areas as historical linguistics, typology, sociolinguistics, psycholinguistics, and grammatical theory. Because of this diversification, there is a risk of fragmentation and lack of interaction between the different subbranches of contact linguistics. Nevertheless, the different approaches share the general goal of accounting for the results of interacting linguistic systems. This common goal opens up possibilities for active communication, cooperation, and coordination between the different branches of contact linguistics. This book, therefore, explores the extent to which contact linguistics can be viewed as a coherent field, and whether the advances achieved in a particular subfield can be translated to others. In this way our aim is to encourage a boundary-free discussion between different types of specialists of contact linguistics, and to stimulate cross-pollination between them
Quantifying the psychological properties of words
This thesis explores the psychological properties of words – the idea that words carry links to additional information beyond their dictionary meaning. It does so by presenting three distinct publications and an applied project, the Macroscope. The published research respectively covers: the modelling of language networks to explain lexical growth; the use of high dimensional vector representations of words to discuss language learning; and the collection of a normative dataset of single word humour ratings. The first publication outlines the use of network science in psycholinguistics. The methodology is discussed, providing clear guidelines on the application of networks when answering psychologically motivated questions. A selection of psychological studies is presented as a demonstration of use cases for networks in cognitive psychology. The second publication uses referent feature norms to represent words in a high dimensional vector space. A correlative link between referent distinctiveness and age of acquisition is proposed. The shape bias literature (the idea that children only pay attention to the shape of objects early on) is evaluated in relation to the findings. The third publication collects and shares a normative dataset of single word humour ratings. Descriptive properties of the dataset are outlined and the potential future use in the field of humour is discussed. Finally, the thesis presents the Macroscope, a collaborative project put together with Li Ying. The Macroscope is an online platform, allowing for easy analysis of the psychological properties of target words. The platform is showcased, and its full functionality is presented, including visualisation examples. Overall, the thesis aims to give researchers all that’s necessary to start working with psychological properties of words – the understanding of network science in psycholinguistics, high dimensional vector spaces, normative datasets and the applied use of all the above through the Macroscope
Lexical Access in L1 Attrition—Competition versus Frequency: A Comparison of Turkish and Moroccan Attriters in the Netherlands
Lexical access and lexical diversity are often assumed to be vulnerable to first language (L1) attrition. They also differ between monolinguals and nonimmersed bilinguals. This raises the question whether lexical attrition can be ascribed to nonuse or to competition between the two languages. We compare two populations of late L2 learners of Dutch living in the Netherlands. One of them was largely monolingual prior to emigration (Turkish migrants), while the other comes from a highly multilingual society (Morocco). While both experimental populations should be affected by erosion due to nonuse, we expect competition effects to be more strongly pronounced when compared against a monolingual versus a multilingual baseline population. The results show that this is not the case with attrition effects being even stronger in the Moroccan group than in the Turkish group. Furthermore, there is no impact of individual measures of frequency of exposure or language attitudes among the attriters. We conclude that being immersed in an L2 environment leads to weakening of lexical access
Limitations and possibilities of machine translation: a case study
Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro de Comunicação e ExpressĂŁo. Programa de PĂłs-Graduação em Letras/InglĂŞs e Literatura CorrespondenteEste trabalho apresenta resultados de um estudo de caso sobre a tradução do pronome inglĂŞs it para o portuguĂŞs. Apresenta tambĂ©m um breve panorama geral do desenvolvimento da tradução de máquina desde seu inĂcio atĂ© a atualidade. Um corpus paralelo de aproximadamente quarenta e cinco mil palavras das lĂnguas de partida e chegada foi coletado. TambĂ©m foi utilizado um esquema de anotação especificamente desenvolvido para os propĂłsitos deste estudo, a fim de classificar as 305 ocorrĂŞncias do pronome it. Os elementos que compõem a anotação sĂŁo: função sintática, tipo de antecedente e estratĂ©gia de processamento, os quais sĂŁo discutidos nesta dissertação. Os resultados sĂŁo comparados a traduções de sistemas comerciais de tradução de máquina, tendo como parâmetro soluções apresentadas por tradutores humanos no corpus. Sugestões sĂŁo feitas quanto a possĂveis melhorias dos sistemas existentes com base em corpus. Alguns aspectos da abordagem de corpus sĂŁo comparados com os princĂpios das presentes abordagens de tradução de máquina, numa tentativa de enriquecer a discussĂŁo sobre as atuais tendĂŞncias nesta área
- …