22,234 research outputs found

    Diffusion of Lexical Change in Social Media

    Full text link
    Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level patterns in diffusion of linguistic change over the United States. Our model is robust to unpredictable changes in Twitter's sampling rate, and provides a probabilistic characterization of the relationship of macro-scale linguistic influence to a set of demographic and geographic predictors. The results of this analysis offer support for prior arguments that focus on geographical proximity and population size. However, demographic similarity -- especially with regard to race -- plays an even more central role, as cities with similar racial demographics are far more likely to share linguistic influence. Rather than moving towards a single unified "netspeak" dialect, language evolution in computer-mediated communication reproduces existing fault lines in spoken American English.Comment: preprint of PLOS-ONE paper from November 2014; PLoS ONE 9(11) e11311

    Dual Language Models for Code Switched Speech Recognition

    Full text link
    In this work, we present a simple and elegant approach to language modeling for bilingual code-switched text. Since code-switching is a blend of two or more different languages, a standard bilingual language model can be improved upon by using structures of the monolingual language models. We propose a novel technique called dual language models, which involves building two complementary monolingual language models and combining them using a probabilistic model for switching between the two. We evaluate the efficacy of our approach using a conversational Mandarin-English speech corpus. We prove the robustness of our model by showing significant improvements in perplexity measures over the standard bilingual language model without the use of any external information. Similar consistent improvements are also reflected in automatic speech recognition error rates.Comment: Accepted at Interspeech 201

    Massachusetts Health Reform in 2008: Who Are the Remaining Uninsured Adults?

    Get PDF
    Profiles residents still uninsured after the individual mandate was implemented: young, single, urban, male racial/ethnic minorities and non-citizens with limited English proficiency. Outlines lessons on outreach to those eligible for public coverage

    On the Linearity of Semantic Change: Investigating Meaning Variation via Dynamic Graph Models

    Full text link
    We consider two graph models of semantic change. The first is a time-series model that relates embedding vectors from one time period to embedding vectors of previous time periods. In the second, we construct one graph for each word: nodes in this graph correspond to time points and edge weights to the similarity of the word's meaning across two time points. We apply our two models to corpora across three different languages. We find that semantic change is linear in two senses. Firstly, today's embedding vectors (= meaning) of words can be derived as linear combinations of embedding vectors of their neighbors in previous time periods. Secondly, self-similarity of words decays linearly in time. We consider both findings as new laws/hypotheses of semantic change.Comment: Published at ACL 2016, Berlin (short papers

    How do we evaluate the cost of nosocomial infection? The ECONI protocol: an incidence study with nested case-control evaluating cost and quality of life

    Get PDF
    Introduction Healthcare-associated or nosocomial infection (HAI) is distressing to patients and costly for the National Health Service (NHS). With increasing pressure to demonstrate cost-effectiveness of interventions to control HAI and notwithstanding the risk from antimicrobial-resistant infections, there is a need to understand the incidence rates of HAI and costs incurred by the health system and for patients themselves. Methods and analysis The Evaluation of Cost of Nosocomial Infection study (ECONI) is an observational incidence survey with record linkage and a nested case-control study that will include postdischarge longitudinal follow-up and qualitative interviews. ECONI will be conducted in one large teaching hospital and one district general hospital in NHS Scotland. The case mix of these hospitals reflects the majority of overnight admissions within Scotland. An incidence survey will record all HAI cases using standard case definitions. Subsequent linkage to routine data sets will provide information on an admission cohort which will be grouped into HAI and non-HAI cases. The case-control study will recruit eligible patients who develop HAI and twice that number without HAI as controls. Patients will be asked to complete five questionnaires: the first during their stay, and four others during the year following discharge from their recruitment admission (1, 3, 6 and 12 months). Multiple data collection methods will include clinical case note review; patient-reported outcome; linkage to electronic health records and qualitative interviews. Outcomes collected encompass infection types; morbidity and mortality; length of stay; quality of life; healthcare utilisation; repeat admissions and postdischarge prescribing. Ethics and dissemination The study has received a favourable ethical opinion from the Scotland A Research Ethics Committee (reference 16/SS/0199). All publications arising from this study will be published in open-access peer-reviewed journal. Lay-person summaries will be published on the ECONI website. Trial registration number NCT03253640; Pre-results
    • …
    corecore