51 research outputs found

    Protocol to design a CEFR-linked proficiency rating scale for oral production and app implementation

    Get PDF
    [ES] En su proceso de adaptación al Espacio Europeo de Educación Superior, las universidades andaluzas comenzaron en 201O a aprobar Memorias de Grado en las que se exigía a los alumnos universitarios que acreditasen un nivel de dominio en una lengua extranjera. En este contexto, la Universidad de Jaén (UJA) desarrolla sus exámenes de acreditación desde 2011. Esta tesis describe el diseño (a partir del Marco Común Europeo de Referencia) y validación (mediante el modelo logístico paramétrico Rasch) de unas escalas analíticas (o rúbricas) para la producción oral de los exámenes de acreditación de la UJA. Del proceso (llevado a cabo con la colaboración de 8 evaluadores de las diferentes universidades andaluzas y con 128 candidatos a examen) se extrae un protocolo de diseño de rúbricas aplicable en otros contextos. Asimismo, se describe el diseño y creación de una aplicación para dispositivos móviles con la que usar las mencionadas rúbricas.[EN] Since 201O, it is compulsory for students at Andalusian public universities to certify a minimum leve} of proficiency in a foreign language by the end of their degree. The University of Jaén (UJA) started to develop its own language proficiency tests in 2011 to provide its students with the possibility of taking such tests. This doctoral dissertation describes the process of design (based on the Common European Framework of Reference ) and validation (through logistic parametric Rasch models) of one analytic scale (or rubric) to be used during the mentioned proficiency tests. This process (carried out with the collaboration of 8 evaluation experts from different Andalusian universities and 128 candidates) has also yielded a protocol to design rubrics which can be applied in different contexts. The doctoral dissertation finally describes the design of a digital application for mobile devices to implement the aforementioned rubrics.Tesis Univ. Jaén. Departamento de Filología Inglesa, leída el 29 de noviembre de 201

    Unsupervised zero-shot classification of Finnish documents using pre-trained language models

    Get PDF
    In modern Natural Language Processing, document categorisation tasks can achieve success rates of over 95% using fine-tuned neural network models. However, so-called "zero-shot" situations, where specific training data is not available, are researched much less frequently. The objective of this thesis is to investigate how pre-trained Finnish language models fare when classifying documents in a completely unsupervised way: by relying only on their general "knowledge of the world" obtained during training, without using any additional data. Two datasets are created expressly for this study, since labelled and openly available datasets in Finnish are very uncommon: one is built using around 5k news articles from Yle, the Finnish Broacasting Company, and the other, 100 pieces of Finnish legislation obtained from the Semantic Finlex data service. Several language representation models are built, based on the vector space model, by combining modular elements: different kinds of textual representations for documents and category labels, different algorithms that transform these representations into vectors (TF-IDF, Annif, fastText, LASER, FinBERT, S-BERT), different similarity measures and post-processing techniques (such as SVD and ensemble models). This approach allows for a variety of models to be tested. The combination of Annif for extracting keywords and fastText for producing word embeddings out of them achieves F1 scores of 0.64 on the Finlex dataset and 0.73-0.74 on the Yle datasets. Model ensembles are able to raise these figures by up to three percentage points. SVD can bring these numbers to 0.7 and 0.74-0.75 respectively, but these gains are not necessarily reproducible on unseen data. These results are distant from the ones obtained from state-of-the-art supervised models, but this is a method that is flexible, can be quickly deployed and, most importantly, do not depend on labelled data, which can be slow and expensive to make. A reliable way to set the input parameter for SVD would be an important next step for the work done in this thesis

    Development of linguistic linked open data resources for collaborative data-intensive research in the language sciences

    Get PDF
    Making diverse data in linguistics and the language sciences open, distributed, and accessible: perspectives from language/language acquistiion researchers and technical LOD (linked open data) researchers. This volume examines the challenges inherent in making diverse data in linguistics and the language sciences open, distributed, integrated, and accessible, thus fostering wide data sharing and collaboration. It is unique in integrating the perspectives of language researchers and technical LOD (linked open data) researchers. Reporting on both active research needs in the field of language acquisition and technical advances in the development of data interoperability, the book demonstrates the advantages of an international infrastructure for scholarship in the field of language sciences. With contributions by researchers who produce complex data content and scholars involved in both the technology and the conceptual foundations of LLOD (linguistics linked open data), the book focuses on the area of language acquisition because it involves complex and diverse data sets, cross-linguistic analyses, and urgent collaborative research. The contributors discuss a variety of research methods, resources, and infrastructures. Contributors Isabelle Barrière, Nan Bernstein Ratner, Steven Bird, Maria Blume, Ted Caldwell, Christian Chiarcos, Cristina Dye, Suzanne Flynn, Claire Foley, Nancy Ide, Carissa Kang, D. Terence Langendoen, Barbara Lust, Brian MacWhinney, Jonathan Masci, Steven Moran, Antonio Pareja-Lora, Jim Reidy, Oya Y. Rieger, Gary F. Simons, Thorsten Trippel, Kara Warburton, Sue Ellen Wright, Claus Zin

    Development of Linguistic Linked Open Data Resources for Collaborative Data-Intensive Research in the Language Sciences

    Get PDF
    This book is the product of an international workshop dedicated to addressing data accessibility in the linguistics field. It is therefore vital to the book’s mission that its content be open access. Linguistics as a field remains behind many others as far as data management and accessibility strategies. The problem is particularly acute in the subfield of language acquisition, where international linguistic sound files are needed for reference. Linguists' concerns are very much tied to amount of information accumulated by individual researchers over the years that remains fragmented and inaccessible to the larger community. These concerns are shared by other fields, but linguistics to date has seen few efforts at addressing them. This collection, undertaken by a range of leading experts in the field, represents a big step forward. Its international scope and interdisciplinary combination of scholars/librarians/data consultants will provide an important contribution to the field

    The OA Diamond Journals Study. Part 1: Findings

    Get PDF
    From June 2020 to February 2021, a consortium of 10 organisations undertook a large-scale study on open access journals across the world that are free for readers and authors, usually referred to as “OA diamond journals”. This study was commissioned by cOAlition S in order to gain a better understanding of the OA diamond landscape
    corecore