18 research outputs found

    Modeling Statistical Properties of Written Text

    Get PDF
    Written text is one of the fundamental manifestations of human language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Among these regularities, only Zipf's law has been explored in depth. Other basic properties, such as the existence of bursts of rare words in specific documents, have only been studied independently of each other and mainly by descriptive models. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on burstiness, Heaps' law describing the sublinear growth of vocabulary size with the length of a document, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science and linguistics

    Caffeine intake during pregnancy, late miscarriage and stillbirth

    No full text
    Caffeine is a commonly consumed drug during pregnancy with the potential to affect the developing fetus. Findings from previous studies have shown inconsistent results. We recruited a cohort of 2,643 pregnant women, aged 18-45 years, attending two UK maternity units between 8 and 12 weeks gestation from September 2003 to June 2006. We used a validated tool to assess caffeine intake at different stages of pregnancy and related this to late miscarriage and stillbirth, adjusting for confounders, including salivary cotinine as a biomarker of smoking status. There was a strong association between caffeine intake in the first trimester and subsequent late miscarriage and stillbirth, adjusting for confounders. Women whose pregnancies resulted in late miscarriage or stillbirth had higher caffeine intakes (geometric mean = 145 mg/day; 95% CI: 85-249) than those with live births (103 mg/day; 95% CI: 98-108). Compared to those consuming < 100 mg/day, odds ratios increased to 2.2 (95% CI: 0.7-7.1) for 100-199 mg/day, 1.7 (0.4-7.1) for 200-299 mg/day, and 5.1 (1.6-16.4) for 300+ mg/day (P (trend) = 0.004). Greater caffeine intake is associated with increases in late miscarriage and stillbirth. Despite remaining uncertainty in the strength of association, our study strengthens the observational evidence base on which current guidance is founded
    corecore