1,159 research outputs found

    Quantifying the Dialect Gap and its Correlates Across Languages

    Full text link
    Historically, researchers and consumers have noticed a decrease in quality when applying NLP tools to minority variants of languages (i.e. Puerto Rican Spanish or Swiss German), but studies exploring this have been limited to a select few languages. Additionally, past studies have mainly been conducted in a monolingual context, so cross-linguistic trends have not been identified and tied to external factors. In this work, we conduct a comprehensive evaluation of the most influential, state-of-the-art large language models (LLMs) across two high-use applications, machine translation and automatic speech recognition, to assess their functionality on the regional dialects of several high- and low-resource languages. Additionally, we analyze how the regional dialect gap is correlated with economic, social, and linguistic factors. The impact of training data, including related factors like dataset size and its construction procedure, is shown to be significant but not consistent across models or languages, meaning a one-size-fits-all approach cannot be taken in solving the dialect gap. This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.Comment: Accepted to EMNLP Findings 202

    Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition

    Full text link
    The Huqariq corpus is a multilingual collection of speech from native Peruvian languages. The transcribed corpus is intended for the research and development of speech technologies to preserve endangered languages in Peru. Huqariq is primarily designed for the development of automatic speech recognition, language identification and text-to-speech tools. In order to achieve corpus collection sustainably, we employ the crowdsourcing methodology. Huqariq includes four native languages of Peru, and it is expected that by the end of the year 2022, it can reach up to 20 native languages out of the 48 native languages in Peru. The corpus has 220 hours of transcribed audio recorded by more than 500 volunteers, making it the largest speech corpus for native languages in Peru. In order to verify the quality of the corpus, we present speech recognition experiments using 220 hours of fully transcribed audio.Comment: Language Resources and Evaluation Conference (LREC 2022

    Social Justice Documentary: Designing for Impact

    Get PDF
    Explores current methodologies for assessing social issue documentary films by combining strategic design and evaluation of multiplatform outreach and impact, including documentaries' role in network- and field-building. Includes six case studies

    Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

    Get PDF
    Peer reviewe

    The Andean Tribunal of Justice and Its Interlocutors: Understanding Preliminary Reference Patterns in the Andean Community

    Get PDF
    In the European Union, national courts have been key intermediaries in helping to bolster and expand the authority of the European Court of Justice through its preliminary reference mechanism. This article analyzes the role of national judges in the Andean Community, a regional legal system whose judicial institution - the Andean Tribunal of Justice (ATJ) - was modeled directly on its European predecessor. Our analysis is based on an original coding of every publically available national court referral to the ATJ from 1987 to 2007 and interviews with over forty participants in the Andean legal system. We find that the relationship between the ATJ and national judges differs significantly from the relationship between the ECJ and its domestic judicial colleagues. As in Europe, references from national judges account for the vast majority of cases on the ATJ\u27s docket. But unlike in Europe, national courts are mostly passive intermediaries. Our coding reveals that national judges do not pose provocative questions to the ATJ, and that there is significant cross-national variation in referral patterns. Interviews corroborate what the data suggests: national judges have a circumscribed understanding of what Andean law requires of them. More than 90% of references involve technical issues of Andean intellectual property (IP) law and the registration decisions of domestic IP administrative agencies. National judges have embraced the ATJ\u27s active role in IP disputes because of the support of these agencies, which seek the Tribunal\u27s guidance to interpret vague areas of Andean law. Outside the area of IP, national judges are far more reluctant, contributing to the limited penetration of Andean law into national legal orders. We conclude by comparing the role of national judges in Europe to their role in the Andean context, extracting broader insights about the role of national judges in building international rules of law

    Pathways through early childhood education in Ethiopia, India and Peru: rights, equity and diversity

    Get PDF
    El potencial de calidad de la primera infancia y la educación primaria para ayudar a romper los ciclos de pobreza intergeneracionales es ampliamente reconocido. Mi atención se centra en hasta qué punto este potencial se está traduciendo en realidad, mediante la implementación positiva de principios políticas de infancia en la práctica. El documento resume la evidencia de la investigación de Young Lives sobre las transiciones tempranas, basadas tanto en la encuesta como en la investigación cualitativa en profundidad con 6.000 niños de la cohorte más joven de Young Lives en Etiopía, Andhra Pradesh (India) y Perú. La educación primaria todavía se está consolidando en Etiopía, y la educación preescolar es una minoría urbana experiencia, principalmente ofrecida por el sector privado. Perú ofrece una historia muy diferente, con un gobierno bien establecido sistema primario y preescolar pero preocupaciones sobre la calidad y coordinación entre sectores. Andhra Pradesh ofrece el conjunto más complejo de desafíos, con un sistema gubernamental de AEPC de larga data, pero una tendencia creciente hacia el uso de servicios privados, incluso entre las comunidades más pobres. El documento ofrece cinco conclusiones generales, sobre la importancia de: garantizar la calidad y la equidad en la educación temprana; sistemas preescolares y escolares mejor coordinados; dirigidos a los niños más vulnerables y desfavorecidos; reconocer la gama completa de problemas de equidad; y garantizar una gobernanza más efectiva, incluida la gobernanza del sector privado
    corecore