22,234 research outputs found
Diffusion of Lexical Change in Social Media
Computer-mediated communication is driving fundamental changes in the nature
of written language. We investigate these changes by statistical analysis of a
dataset comprising 107 million Twitter messages (authored by 2.7 million unique
user accounts). Using a latent vector autoregressive model to aggregate across
thousands of words, we identify high-level patterns in diffusion of linguistic
change over the United States. Our model is robust to unpredictable changes in
Twitter's sampling rate, and provides a probabilistic characterization of the
relationship of macro-scale linguistic influence to a set of demographic and
geographic predictors. The results of this analysis offer support for prior
arguments that focus on geographical proximity and population size. However,
demographic similarity -- especially with regard to race -- plays an even more
central role, as cities with similar racial demographics are far more likely to
share linguistic influence. Rather than moving towards a single unified
"netspeak" dialect, language evolution in computer-mediated communication
reproduces existing fault lines in spoken American English.Comment: preprint of PLOS-ONE paper from November 2014; PLoS ONE 9(11) e11311
Dual Language Models for Code Switched Speech Recognition
In this work, we present a simple and elegant approach to language modeling
for bilingual code-switched text. Since code-switching is a blend of two or
more different languages, a standard bilingual language model can be improved
upon by using structures of the monolingual language models. We propose a novel
technique called dual language models, which involves building two
complementary monolingual language models and combining them using a
probabilistic model for switching between the two. We evaluate the efficacy of
our approach using a conversational Mandarin-English speech corpus. We prove
the robustness of our model by showing significant improvements in perplexity
measures over the standard bilingual language model without the use of any
external information. Similar consistent improvements are also reflected in
automatic speech recognition error rates.Comment: Accepted at Interspeech 201
Massachusetts Health Reform in 2008: Who Are the Remaining Uninsured Adults?
Profiles residents still uninsured after the individual mandate was implemented: young, single, urban, male racial/ethnic minorities and non-citizens with limited English proficiency. Outlines lessons on outreach to those eligible for public coverage
On the Linearity of Semantic Change: Investigating Meaning Variation via Dynamic Graph Models
We consider two graph models of semantic change. The first is a time-series
model that relates embedding vectors from one time period to embedding vectors
of previous time periods. In the second, we construct one graph for each word:
nodes in this graph correspond to time points and edge weights to the
similarity of the word's meaning across two time points. We apply our two
models to corpora across three different languages. We find that semantic
change is linear in two senses. Firstly, today's embedding vectors (= meaning)
of words can be derived as linear combinations of embedding vectors of their
neighbors in previous time periods. Secondly, self-similarity of words decays
linearly in time. We consider both findings as new laws/hypotheses of semantic
change.Comment: Published at ACL 2016, Berlin (short papers
How do we evaluate the cost of nosocomial infection? The ECONI protocol: an incidence study with nested case-control evaluating cost and quality of life
Introduction Healthcare-associated or nosocomial infection (HAI) is distressing to patients and costly for the National Health Service (NHS). With increasing pressure to demonstrate cost-effectiveness of interventions to control HAI and notwithstanding the risk from antimicrobial-resistant infections, there is a need to understand the incidence rates of HAI and costs incurred by the health system and for patients themselves. Methods and analysis The Evaluation of Cost of Nosocomial Infection study (ECONI) is an observational incidence survey with record linkage and a nested case-control study that will include postdischarge longitudinal follow-up and qualitative interviews. ECONI will be conducted in one large teaching hospital and one district general hospital in NHS Scotland. The case mix of these hospitals reflects the majority of overnight admissions within Scotland. An incidence survey will record all HAI cases using standard case definitions. Subsequent linkage to routine data sets will provide information on an admission cohort which will be grouped into HAI and non-HAI cases. The case-control study will recruit eligible patients who develop HAI and twice that number without HAI as controls. Patients will be asked to complete five questionnaires: the first during their stay, and four others during the year following discharge from their recruitment admission (1, 3, 6 and 12 months). Multiple data collection methods will include clinical case note review; patient-reported outcome; linkage to electronic health records and qualitative interviews. Outcomes collected encompass infection types; morbidity and mortality; length of stay; quality of life; healthcare utilisation; repeat admissions and postdischarge prescribing. Ethics and dissemination The study has received a favourable ethical opinion from the Scotland A Research Ethics Committee (reference 16/SS/0199). All publications arising from this study will be published in open-access peer-reviewed journal. Lay-person summaries will be published on the ECONI website. Trial registration number NCT03253640; Pre-results
- …