1,213 research outputs found

    Practical Natural Language Processing for Low-Resource Languages.

    Full text link
    As the Internet and World Wide Web have continued to gain widespread adoption, the linguistic diversity represented has also been growing. Simultaneously the field of Linguistics is facing a crisis of the opposite sort. Languages are becoming extinct faster than ever before and linguists now estimate that the world could lose more than half of its linguistic diversity by the year 2100. This is a special time for Computational Linguistics; this field has unprecedented access to a great number of low-resource languages, readily available to be studied, but needs to act quickly before political, social, and economic pressures cause these languages to disappear from the Web. Most work in Computational Linguistics and Natural Language Processing (NLP) focuses on English or other languages that have text corpora of hundreds of millions of words. In this work, we present methods for automatically building NLP tools for low-resource languages with minimal need for human annotation in these languages. We start first with language identification, specifically focusing on word-level language identification, an understudied variant that is necessary for processing Web text and develop highly accurate machine learning methods for this problem. From there we move onto the problems of part-of-speech tagging and dependency parsing. With both of these problems we extend the current state of the art in projected learning to make use of multiple high-resource source languages instead of just a single language. In both tasks, we are able to improve on the best current methods. All of these tools are practically realized in the "Minority Language Server," an online tool that brings these techniques together with low-resource language text on the Web. The Minority Language Server, starting with only a few words in a language can automatically collect text in a language, identify its language and tag its parts of speech. We hope that this system is able to provide a convincing proof of concept for the automatic collection and processing of low-resource language text from the Web, and one that can hopefully be realized before it is too late.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113373/1/benking_1.pd

    Somatic mutations in facial skin from countries of contrasting skin cancer risk

    Get PDF
    The incidence of keratinocyte cancer (basal cell and squamous cell carcinomas of the skin) is 17-fold lower in Singapore than the UK1-3, despite Singapore receiving 2-3 times more ultraviolet (UV) radiation4,5. Aging skin contains somatic mutant clones from which such cancers develop6,7. We hypothesized that differences in keratinocyte cancer incidence may be reflected in the normal skin mutational landscape. Here we show that, compared to Singapore, aging facial skin from populations in the UK has a fourfold greater mutational burden, a predominant UV mutational signature, increased copy number aberrations and increased mutant TP53 selection. These features are shared by keratinocyte cancers from high-incidence and low-incidence populations8-13. In Singaporean skin, most mutations result from cell-intrinsic processes; mutant NOTCH1 and NOTCH2 are more strongly selected than in the UK. Aging skin in a high-incidence country has multiple features convergent with cancer that are not found in a low-risk country. These differences may reflect germline variation in UV-protective genes

    The Interrelationships of Placental Mammals and the Limits of Phylogenetic Inference

    Get PDF
    Placental mammals comprise three principal clades: Afrotheria (e.g., elephants and tenrecs), Xenarthra (e.g., armadillos and sloths), and Boreoeutheria (all other placental mammals), the relationships among which are the subject of controversy and a touchstone for debate on the limits of phylogenetic inference. Previous analyses have found support for all three hypotheses, leading some to conclude that this phylogenetic problem might be impossible to resolve due to the compounded effects of incomplete lineage sorting (ILS) and a rapid radiation. Here we show, using a genome scale nucleotide data set, microRNAs, and the reanalysis of the three largest previously published amino acid data sets, that the root of Placentalia lies between Atlantogenata and Boreoeutheria. Although we found evidence for ILS in early placental evolution, we are able to reject previous conclusions that the placental root is a hard polytomy that cannot be resolved. Reanalyses of previous data sets recover Atlantogenata + Boreoeutheria and show that contradictory results are a consequence of poorly fitting evolutionary models; instead, when the evolutionary process is better-modeled, all data sets converge on Atlantogenata. Our Bayesian molecular clock analysis estimates that marsupials diverged from placentals 157-170 Ma, crown Placentalia diverged 86-100 Ma, and crown Atlantogenata diverged 84-97 Ma. Our results are compatible with placental diversification being driven by dispersal rather than vicariance mechanisms, postdating early phases in the protracted opening of the Atlantic Ocean

    Effectiveness of BNT162b2 and CoronaVac vaccinations against mortality and severe complications after SARS-CoV-2 Omicron BA.2 infection: a case–control study

    Get PDF
    Data regarding protection against mortality and severe complications after Omicron BA.2 infection with CoronaVac and BNT162b2 vaccines remains limited. We conducted a case–control study to evaluate the risk of severe complications and mortality following 1–3 doses of CoronaVac and BNT162b2 using electronic health records database. Cases were adults with their first COVID-19-related mortality or severe complications between 1 January and 31 March 2022, matched with up-to 10 controls by age, sex, index date, and Charlson Comorbidity Index. Vaccine effectiveness against COVID-19-related mortality and severe complications by type and number of doses was estimated using conditional logistic regression adjusted for comorbidities and medications. Vaccine effectiveness (95% CI) against COVID-19-related mortality after two doses of BNT162b2 and CoronaVac were 90.7% (88.6–92.3) and 74.8% (72.5–76.9) in those aged ≥65, 87.6% (81.4–91.8) and 80.7% (72.8–86.3) in those aged 50–64, 86.6% (71.0–93.8) and 82.7% (56.5–93.1) in those aged 18–50. Vaccine effectiveness against severe complications after two doses of BNT162b2 and CoronaVac were 82.1% (74.6–87.3) and 58.9% (50.3–66.1) in those aged ≥65, 83.0% (69.6–90.5) and 67.1% (47.1–79.6) in those aged 50–64, 78.3% (60.8–88.0) and 77.8% (49.6–90.2) in those aged 18–50. Further risk reduction with the third dose was observed especially in those aged ≥65 years, with vaccine effectiveness of 98.0% (96.5–98.9) for BNT162b2 and 95.5% (93.7–96.8) for CoronaVac against mortality, 90.8% (83.4–94.9) and 88.0% (80.8–92.5) against severe complications. Both CoronaVac and BNT162b2 vaccination were effective against COVID-19-related mortality and severe complications amidst the Omicron BA.2 pandemic, and risks decreased further with the third dose

    Effects of antiplatelet therapy on stroke risk by brain imaging features of intracerebral haemorrhage and cerebral small vessel diseases: subgroup analyses of the RESTART randomised, open-label trial

    Get PDF
    Background Findings from the RESTART trial suggest that starting antiplatelet therapy might reduce the risk of recurrent symptomatic intracerebral haemorrhage compared with avoiding antiplatelet therapy. Brain imaging features of intracerebral haemorrhage and cerebral small vessel diseases (such as cerebral microbleeds) are associated with greater risks of recurrent intracerebral haemorrhage. We did subgroup analyses of the RESTART trial to explore whether these brain imaging features modify the effects of antiplatelet therapy

    International Nonregimes: A Research Agenda1

    Full text link
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/146934/1/j.1468-2486.2007.00672.x.pd

    Disorders of sex development : insights from targeted gene sequencing of a large international patient cohort

    Get PDF
    Background: Disorders of sex development (DSD) are congenital conditions in which chromosomal, gonadal, or phenotypic sex is atypical. Clinical management of DSD is often difficult and currently only 13% of patients receive an accurate clinical genetic diagnosis. To address this we have developed a massively parallel sequencing targeted DSD gene panel which allows us to sequence all 64 known diagnostic DSD genes and candidate genes simultaneously. Results: We analyzed DNA from the largest reported international cohort of patients with DSD (278 patients with 46, XY DSD and 48 with 46, XX DSD). Our targeted gene panel compares favorably with other sequencing platforms. We found a total of 28 diagnostic genes that are implicated in DSD, highlighting the genetic spectrum of this disorder. Sequencing revealed 93 previously unreported DSD gene variants. Overall, we identified a likely genetic diagnosis in 43% of patients with 46, XY DSD. In patients with 46, XY disorders of androgen synthesis and action the genetic diagnosis rate reached 60%. Surprisingly, little difference in diagnostic rate was observed between singletons and trios. In many cases our findings are informative as to the likely cause of the DSD, which will facilitate clinical management. Conclusions: Our massively parallel sequencing targeted DSD gene panel represents an economical means of improving the genetic diagnostic capability for patients affected by DSD. Implementation of this panel in a large cohort of patients has expanded our understanding of the underlying genetic etiology of DSD. The inclusion of research candidate genes also provides an invaluable resource for future identification of novel genes
    corecore