6 research outputs found
Recommended from our members
Unsupervised Morphological Segmentation and Part-of-Speech Tagging for Low-Resource Scenarios
With the high cost of manually labeling data and the increasing interest in low-resource languages, for which human annotators might not be even available, unsupervised approaches have become essential for processing a typologically diverse set of languages, whether high-resource or low-resource. In this work, we propose new fully unsupervised approaches for two tasks in morphology: unsupervised morphological segmentation and unsupervised cross-lingual part-of-speech (POS) tagging, which have been two essential subtasks for several downstream NLP applications, such as machine translation, speech recognition, information extraction and question answering.
We propose a new unsupervised morphological-segmentation approach that utilizes Adaptor Grammars (AGs), nonparametric Bayesian models that generalize probabilistic context-free grammars (PCFGs), where a PCFG models word structure in the task of morphological segmentation. We implement the approach as a publicly available morphological-segmentation framework, MorphAGram, that enables unsupervised morphological segmentation through the use of several proposed language-independent grammars. In addition, the framework allows for the use of scholar knowledge, when available, in the form of affixes that can be seeded into the grammars. The framework handles the cases when the scholar-seeded knowledge is either generated from language resources, possibly by someone who does not know the language, as weak linguistic priors, or generated by an expert in the underlying language as strong linguistic priors. Another form of linguistic priors is the design of a grammar that models language-dependent specifications. We also propose a fully unsupervised learning setting that approximates the effect of scholar-seeded knowledge through self-training. Moreover, since there is no single grammar that works best across all languages, we propose an approach that picks a nearly optimal configuration (a learning setting and a grammar) for an unseen language, a language that is not part of the development. Finally, we examine multilingual learning for unsupervised morphological segmentation in low-resource setups.
For unsupervised POS tagging, two cross-lingual approaches have been widely adapted: 1) annotation projection, where POS annotations are projected across an aligned parallel text from a source language for which a POS tagger is accessible to the target one prior to training a POS model; and 2) zero-shot model transfer, where a model of a source language is directly applied on texts in the target language. We propose an end-to-end architecture for unsupervised cross-lingual POS tagging via annotation projection in truly low-resource scenarios that do not assume access to parallel corpora that are large in size or represent a specific domain. We integrate and expand the best practices in alignment and projection and design a rich neural architecture that exploits non-contextualized and transformer-based contextualized word embeddings, affix embeddings and word-cluster embeddings. Additionally, since parallel data might be available between the target language and multiple source ones, as in the case of the Bible, we propose different approaches for learning from multiple sources. Finally, we combine our work on unsupervised morphological segmentation and unsupervised cross-lingual POS tagging by conducting unsupervised stem-based cross-lingual POS tagging via annotation projection, which relies on the stem as the core unit of abstraction for alignment and projection, which is beneficial to low-resource morphologically complex languages. We also examine morpheme-based alignment and projection, the use of linguistic priors towards better POS models and the use of segmentation information as learning features in the neural architecture.
We conduct comprehensive evaluation and analysis to assess the performance of our approaches of unsupervised morphological segmentation and unsupervised POS tagging and show that they achieve the state-of-the-art performance for the two morphology tasks when evaluated on a large set of languages of different typologies: analytic, fusional, agglutinative and synthetic/polysynthetic
Burnout among surgeons before and during the SARS-CoV-2 pandemic: an international survey
Background: SARS-CoV-2 pandemic has had many significant impacts within the surgical realm, and surgeons have been obligated to reconsider almost every aspect of daily clinical practice. Methods: This is a cross-sectional study reported in compliance with the CHERRIES guidelines and conducted through an online platform from June 14th to July 15th, 2020. The primary outcome was the burden of burnout during the pandemic indicated by the validated Shirom-Melamed Burnout Measure. Results: Nine hundred fifty-four surgeons completed the survey. The median length of practice was 10 years; 78.2% included were male with a median age of 37 years old, 39.5% were consultants, 68.9% were general surgeons, and 55.7% were affiliated with an academic institution. Overall, there was a significant increase in the mean burnout score during the pandemic; longer years of practice and older age were significantly associated with less burnout. There were significant reductions in the median number of outpatient visits, operated cases, on-call hours, emergency visits, and research work, so, 48.2% of respondents felt that the training resources were insufficient. The majority (81.3%) of respondents reported that their hospitals were included in the management of COVID-19, 66.5% felt their roles had been minimized; 41% were asked to assist in non-surgical medical practices, and 37.6% of respondents were included in COVID-19 management. Conclusions: There was a significant burnout among trainees. Almost all aspects of clinical and research activities were affected with a significant reduction in the volume of research, outpatient clinic visits, surgical procedures, on-call hours, and emergency cases hindering the training. Trial registration: The study was registered on clicaltrials.gov "NCT04433286" on 16/06/2020
Transliteration of Arabizi into Arabic Orthography: Developing a Parallel Annotated Arabizi-Arabic Script SMS/Chat Corpus
This paper describes the process of creating a novel resource, a parallel Arabizi-Arabic script corpus of SMS/Chat data. The lan-guage used in social media expresses many differences from other written genres: its vo-cabulary is informal with intentional devia-tions from standard orthography such as re-peated letters for emphasis; typos and non-standard abbreviations are common; and non-linguistic content is written out, such as laughter, sound representations, and emoti-cons. This situation is exacerbated in the case of Arabic social media for two reasons. First, Arabic dialects, commonly used in so-cial media, are quite different from Modern Standard Arabic phonologically, morphologi-cally and lexically, and most importantly, they lack standard orthographies. Second, Arabic speakers in social media as well as discussion forums, SMS messaging and online chat often use a non-standard romani-zation called Arabizi. In the context of natu-ral language processing of social media Ara-bic, transliterating from Arabizi of various dialects to Arabic script is a necessary step, since many of the existing state-of-the-art re-sources for Arabic dialect processing expect Arabic script input. The corpus described in this paper is expected to support Arabic NLP by providing this resource.
Recommended from our members
Effects of pre-operative isolation on postoperative pulmonary complications after elective surgery: an international prospective cohort study an international prospective cohort study
We aimed to determine the impact of pre-operative isolation on postoperative pulmonary complications after elective surgery during the global SARS-CoV-2 pandemic. We performed an international prospective cohort study including patients undergoing elective surgery in October 2020. Isolation was defined as the period before surgery during which patients did not leave their house or receive visitors from outside their household. The primary outcome was postoperative pulmonary complications, adjusted in multivariable models for measured confounders. Pre-defined sub-group analyses were performed for the primary outcome. A total of 96,454 patients from 114 countries were included and overall, 26,948 (27.9%) patients isolated before surgery. Postoperative pulmonary complications were recorded in 1947 (2.0%) patients of which 227 (11.7%) were associated with SARS-CoV-2 infection. Patients who isolated pre-operatively were older, had more respiratory comorbidities and were more commonly from areas of high SARS-CoV-2 incidence and high-income countries. Although the overall rates of postoperative pulmonary complications were similar in those that isolated and those that did not (2.1% vs 2.0%, respectively), isolation was associated with higher rates of postoperative pulmonary complications after adjustment (adjusted OR 1.20, 95%CI 1.05–1.36, p = 0.005). Sensitivity analyses revealed no further differences when patients were categorised by: pre-operative testing; use of COVID-19-free pathways; or community SARS-CoV-2 prevalence. The rate of postoperative pulmonary complications increased with periods of isolation longer than 3 days, with an OR (95%CI) at 4–7 days or ≥ 8 days of 1.25 (1.04–1.48), p = 0.015 and 1.31 (1.11–1.55), p = 0.001, respectively. Isolation before elective surgery might be associated with a small but clinically important increased risk of postoperative pulmonary complications. Longer periods of isolation showed no reduction in the risk of postoperative pulmonary complications. These findings have significant implications for global provision of elective surgical care. We aimed to determine the impact of pre-operative isolation on postoperative pulmonary complications after elective surgery during the global SARS-CoV-2 pandemic. We performed an international prospective cohort study including patients undergoing elective surgery in October 2020. Isolation was defined as the period before surgery during which patients did not leave their house or receive visitors from outside their household. The primary outcome was postoperative pulmonary complications, adjusted in multivariable models for measured confounders. Pre-defined sub-group analyses were performed for the primary outcome. A total of 96,454 patients from 114 countries were included and overall, 26,948 (27.9%) patients isolated before surgery. Postoperative pulmonary complications were recorded in 1947 (2.0%) patients of which 227 (11.7%) were associated with SARS-CoV-2 infection. Patients who isolated pre-operatively were older, had more respiratory comorbidities and were more commonly from areas of high SARS-CoV-2 incidence and high-income countries. Although the overall rates of postoperative pulmonary complications were similar in those that isolated and those that did not (2.1% vs 2.0%, respectively), isolation was associated with higher rates of postoperative pulmonary complications after adjustment (adjusted OR 1.20, 95%CI 1.05–1.36, p = 0.005). Sensitivity analyses revealed no further differences when patients were categorised by: pre-operative testing; use of COVID-19-free pathways; or community SARS-CoV-2 prevalence. The rate of postoperative pulmonary complications increased with periods of isolation longer than 3 days, with an OR (95%CI) at 4–7 days or ≥ 8 days of 1.25 (1.04–1.48), p = 0.015 and 1.31 (1.11–1.55), p = 0.001, respectively. Isolation before elective surgery might be associated with a small but clinically important increased risk of postoperative pulmonary complications. Longer periods of isolation showed no reduction in the risk of postoperative pulmonary complications. These findings have significant implications for global provision of elective surgical care