191 research outputs found

    Development of linguistic linked open data resources for collaborative data-intensive research in the language sciences

    Get PDF
    Making diverse data in linguistics and the language sciences open, distributed, and accessible: perspectives from language/language acquistiion researchers and technical LOD (linked open data) researchers. This volume examines the challenges inherent in making diverse data in linguistics and the language sciences open, distributed, integrated, and accessible, thus fostering wide data sharing and collaboration. It is unique in integrating the perspectives of language researchers and technical LOD (linked open data) researchers. Reporting on both active research needs in the field of language acquisition and technical advances in the development of data interoperability, the book demonstrates the advantages of an international infrastructure for scholarship in the field of language sciences. With contributions by researchers who produce complex data content and scholars involved in both the technology and the conceptual foundations of LLOD (linguistics linked open data), the book focuses on the area of language acquisition because it involves complex and diverse data sets, cross-linguistic analyses, and urgent collaborative research. The contributors discuss a variety of research methods, resources, and infrastructures. Contributors Isabelle Barrière, Nan Bernstein Ratner, Steven Bird, Maria Blume, Ted Caldwell, Christian Chiarcos, Cristina Dye, Suzanne Flynn, Claire Foley, Nancy Ide, Carissa Kang, D. Terence Langendoen, Barbara Lust, Brian MacWhinney, Jonathan Masci, Steven Moran, Antonio Pareja-Lora, Jim Reidy, Oya Y. Rieger, Gary F. Simons, Thorsten Trippel, Kara Warburton, Sue Ellen Wright, Claus Zin

    Development of Linguistic Linked Open Data Resources for Collaborative Data-Intensive Research in the Language Sciences

    Get PDF
    This book is the product of an international workshop dedicated to addressing data accessibility in the linguistics field. It is therefore vital to the book’s mission that its content be open access. Linguistics as a field remains behind many others as far as data management and accessibility strategies. The problem is particularly acute in the subfield of language acquisition, where international linguistic sound files are needed for reference. Linguists' concerns are very much tied to amount of information accumulated by individual researchers over the years that remains fragmented and inaccessible to the larger community. These concerns are shared by other fields, but linguistics to date has seen few efforts at addressing them. This collection, undertaken by a range of leading experts in the field, represents a big step forward. Its international scope and interdisciplinary combination of scholars/librarians/data consultants will provide an important contribution to the field

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    Language contact: Briding the gap between individual interactions and areal patterns

    Full text link
    Contact linguistics is the overarching term for a highly diversified field with branches that connect to such widely divergent areas as historical linguistics, typology, sociolinguistics, psycholinguistics, and grammatical theory. Because of this diversification, there is a risk of fragmentation and lack of interaction between the different subbranches of contact linguistics. Nevertheless, the different approaches share the general goal of accounting for the results of interacting linguistic systems. This common goal opens up possibilities for active communication, cooperation, and coordination between the different branches of contact linguistics. This book, therefore, explores the extent to which contact linguistics can be viewed as a coherent field, and whether the advances achieved in a particular subfield can be translated to others. In this way our aim is to encourage a boundary-free discussion between different types of specialists of contact linguistics, and to stimulate cross-pollination between them

    Bridging the gap between individual interactions and areal patterns

    Get PDF
    Synopsis: Contact linguistics is the overarching term for a highly diversified field with branches that connect to such widely divergent areas as historical linguistics, typology, sociolinguistics, psycholinguistics, and grammatical theory. Because of this diversification, there is a risk of fragmentation and lack of interaction between the different subbranches of contact linguistics. Nevertheless, the different approaches share the general goal of accounting for the results of interacting linguistic systems. This common goal opens up possibilities for active communication, cooperation, and coordination between the different branches of contact linguistics. This book, therefore, explores the extent to which contact linguistics can be viewed as a coherent field, and whether the advances achieved in a particular subfield can be translated to others. In this way our aim is to encourage a boundary-free discussion between different types of specialists of contact linguistics, and to stimulate cross-pollination between them

    Quantifying the psychological properties of words

    Get PDF
    This thesis explores the psychological properties of words – the idea that words carry links to additional information beyond their dictionary meaning. It does so by presenting three distinct publications and an applied project, the Macroscope. The published research respectively covers: the modelling of language networks to explain lexical growth; the use of high dimensional vector representations of words to discuss language learning; and the collection of a normative dataset of single word humour ratings. The first publication outlines the use of network science in psycholinguistics. The methodology is discussed, providing clear guidelines on the application of networks when answering psychologically motivated questions. A selection of psychological studies is presented as a demonstration of use cases for networks in cognitive psychology. The second publication uses referent feature norms to represent words in a high dimensional vector space. A correlative link between referent distinctiveness and age of acquisition is proposed. The shape bias literature (the idea that children only pay attention to the shape of objects early on) is evaluated in relation to the findings. The third publication collects and shares a normative dataset of single word humour ratings. Descriptive properties of the dataset are outlined and the potential future use in the field of humour is discussed. Finally, the thesis presents the Macroscope, a collaborative project put together with Li Ying. The Macroscope is an online platform, allowing for easy analysis of the psychological properties of target words. The platform is showcased, and its full functionality is presented, including visualisation examples. Overall, the thesis aims to give researchers all that’s necessary to start working with psychological properties of words – the understanding of network science in psycholinguistics, high dimensional vector spaces, normative datasets and the applied use of all the above through the Macroscope

    Lexical Access in L1 Attrition—Competition versus Frequency: A Comparison of Turkish and Moroccan Attriters in the Netherlands

    Get PDF
    Lexical access and lexical diversity are often assumed to be vulnerable to first language (L1) attrition. They also differ between monolinguals and nonimmersed bilinguals. This raises the question whether lexical attrition can be ascribed to nonuse or to competition between the two languages. We compare two populations of late L2 learners of Dutch living in the Netherlands. One of them was largely monolingual prior to emigration (Turkish migrants), while the other comes from a highly multilingual society (Morocco). While both experimental populations should be affected by erosion due to nonuse, we expect competition effects to be more strongly pronounced when compared against a monolingual versus a multilingual baseline population. The results show that this is not the case with attrition effects being even stronger in the Moroccan group than in the Turkish group. Furthermore, there is no impact of individual measures of frequency of exposure or language attitudes among the attriters. We conclude that being immersed in an L2 environment leads to weakening of lexical access

    Limitations and possibilities of machine translation: a case study

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão. Programa de Pós-Graduação em Letras/Inglês e Literatura CorrespondenteEste trabalho apresenta resultados de um estudo de caso sobre a tradução do pronome inglês it para o português. Apresenta também um breve panorama geral do desenvolvimento da tradução de máquina desde seu início até a atualidade. Um corpus paralelo de aproximadamente quarenta e cinco mil palavras das línguas de partida e chegada foi coletado. Também foi utilizado um esquema de anotação especificamente desenvolvido para os propósitos deste estudo, a fim de classificar as 305 ocorrências do pronome it. Os elementos que compõem a anotação são: função sintática, tipo de antecedente e estratégia de processamento, os quais são discutidos nesta dissertação. Os resultados são comparados a traduções de sistemas comerciais de tradução de máquina, tendo como parâmetro soluções apresentadas por tradutores humanos no corpus. Sugestões são feitas quanto a possíveis melhorias dos sistemas existentes com base em corpus. Alguns aspectos da abordagem de corpus são comparados com os princípios das presentes abordagens de tradução de máquina, numa tentativa de enriquecer a discussão sobre as atuais tendências nesta área
    • …
    corecore