2,098 research outputs found
Recommended from our members
Identifying and Modeling Code-Switched Language
Code-switching is the phenomenon by which bilingual speakers switch between multiple languages during written or spoken communication. The importance of developing language technologies that are able to process code-switched language is immense, given the large populations that routinely code-switch. Current NLP and Speech models break down when used on code-switched data, interrupting the language processing pipeline in back-end systems and forcing users to communicate in ways which for them are unnatural.
There are four main challenges that arise in building code-switched models: lack of code-switched data on which to train generative language models; lack of multilingual language annotations on code-switched examples which are needed to train supervised models; little understanding of how to leverage monolingual and parallel resources to build better code-switched models; and finally, how to use these models to learn why and when code-switching happens across language pairs. In this thesis, I look into different aspects of these four challenges.
The first part of this thesis focuses on how to obtain reliable corpora of code-switched language. We collected a large corpus of code-switched language from social media using a combination of sets of anchor words that exist in one language and sentence-level language taggers. The newly obtained corpus is superior to other corpora collected via different strategies when it comes to the amount and type of bilingualism in it. It also helps train better language tagging models. We also have proposed a new annotation scheme to obtain part-of-speech tags for code-switched English-Spanish language. The annotation scheme is composed of three different subtasks including automatic labeling, word-specific questions labeling and question-tree word labeling. The part-of-speech labels obtained for the Miami Bangor corpus of English-Spanish conversational speech show very high agreement and accuracy.
The second section of this thesis focuses on the tasks of part-of-speech tagging and language modeling. For the first task, we proposed a state-of-the-art approach to part-of-speech tagging of code-switched English-Spanish data based on recurrent neural networks.Our models were tested on the Miami Bangor corpus on the task of POS tagging alone, for which we achieved 96.34% accuracy, and joint part-of-speech and language ID tagging,which achieved similar POS tagging accuracy (96.39%) and very high language ID accuracy (98.78%).
For the task of language modeling, we first conducted an exhaustive analysis of the relationship between cognate words and code-switching. We then proposed a set of cognate-based features that helped improve language modeling performance by 12% relative points. Furthermore, we showed that these features can also be used across language pairs and still obtain performance improvements.
Finally, we tackled the question of how to use monolingual resources for code-switching models by pre-training state-of-the-art cross-lingual language models on large monolingual corpora and fine-tuning them on the tasks of language modeling and word-level language tagging on code-switched data. We obtained state-of-the-art results on both tasks
Recommended from our members
Whites\u27 physiological and psychological reactions toward affirmative action programs
Discrimination has many effects on the individual/group being discriminated against regardless of the reasons for the discrimination. Further exploration on discrimination processes and their relationships to physiological and psychological outcomes, both of which, over time may become problematic and affect the health and well-being of individuals
PHOTOGRAMMETRIC SYSTEM FOR MOVEMENT ANALYSIS IN TEAM SPORTS
This work develops a system based on photogrammetric methodologies; such system enables us to analyse the movements of the player in team sports such US soccer, indoor soccer, handball, basketball, etc. It can also be used in individual sports such us tennis with the same aim of quantifying player's movements. This system, called "RUNNER", can quantify player movements, by obtaining several kinematic parameters useful for deducing the physical loads on the players in real conditions. In this work we present some results obtained from soccer and indoor soccer studies
Mechanisms for AAA and QoS Interaction
Proceedings of Third IEEE Workshop on Applications and Services in Wireless Networks, ASWN 2003. Bern, Switzerland, July 2-4, 2003.The interaction between Authentication, Authorization and Accounting (AAA) systems and the Quality of Service (QoS) infrastructure is to become a must in the near future. This interaction will allow rich control and management of both users and networks. DIAMETER and DiffServ are likely to turn into the future standards in AAA and QoS systems, but they are not designed to interact with each other. To face this, we propose a new Diameter-Diffserv interaction model and describe the Application Specific Module (ASM) implemented to allow this interaction. The ASM has been implemented and tested in a complete AAA-QoS IPv6 scenario
Editorial: Prion-Like transmission of pathogenic proteins in neurodegenerative diseases: structural and molecular bases
Analysing a license plate-based vehicle restriction policy with optional exemption charge: The case in Cali, Colombia
Several cities have restricted the use of private vehicles based on the last digit of a vehicle's license plate to reduce traffic congestion and pollution. However, the effectiveness of this measure has been questioned. In 2017, a hybrid scheme, License Plate Restriction Charging (LPRC), was implemented in Cali, Colombia. With this scheme, drivers can pay a charge (monthly, quarterly, or yearly) to circumvent the restriction, while the revenue is used to subsidise the BRT System. Cali was the first city in Latin America to implement such a scheme, while Colombia's capital, Bogota, adopted a similar policy in 2020. This article analyses the evolution of the measure using official information. In addition, we conducted a stated preferences survey and estimated a choice model to evaluate the behaviour of car owners to policy variables. Results show that LPRC price is the most relevant attribute in decision-making. Increasing the number of days with traffic restrictions and extending the hours of vehicle use restriction increases drivers' probability of paying for the LPRC. As currently implemented in Cali, the LPRC is a fixed cost that does not vary according to the car use level, encouraging users who pay for the exemption to use their car as much as possible to make the most out of the payment. Furthermore, the revenue from the charge contributes only marginally to financing the BRT. Finally, we propose several changes in the policy to improve its efficiency. Among them, consider a daily payment and hardening the current driving restriction. © 2023 The Author(s)APCs y acuerdos transformativos 2023, Elsevie
Solving with containing an arbitrary number of prime factors.
In this paper we prove new cases of the asymptotic Fermatequation with coefficients. This is done by solving some remarkable S-unit equations and applying a method of Frey-Kraus-Mazur
La declaración jurada de pareja sentimental y el nepotismo en los funcionarios públicos en Lima Metropolitana 2020
El presente trabajo de investigación, tiene como título “La declaración jurada de
pareja sentimental y el nepotismo en los funcionarios públicos en Lima
Metropolitana, 2020”, tuvo como objetivo general, determinar si la declaración
jurada de pareja sentimental incide en el nepotismo en los funcionarios públicos en
Lima Metropolitana, 2020. En lo que concierne a la metodología se empleó el
enfoque cualitativo de tipo aplicada, con un diseño de la teoría fundamentada con
un alcance descriptivo e interpretativo.
Por otro lado, para la obtención de resultados se utilizó como técnicas las
entrevistas y análisis de fuente documental, y como instrumentos se aplicó la guía
de entrevista y la guía de análisis documental. Por último, se logró determinar que
la declaración de pareja sentimental incidirá en el nepotismo de manera positiva,
ya que se puede advertir por medio de este documento la existencia del nepotismo,
por ser un documento por el cual una persona, declara bajo promesa o juramento
la veracidad de las acciones y datos que consigno, sobre la presunción del Iuris
Tantum de ser verdaderos
- …