13 research outputs found

    Identifying Virtues and Values Through Obituary Data-Mining

    Get PDF
    Because obituaries are succinct and explicitly intended to summarize their subjects’ lives, they may be expected to include only the features that the author finds most salient but also to signal to others in the community the socially-recognized aspects of the deceased’s character. We begin by reviewing studies 1 and 2, in which obituaries were carefully read and labeled. We then report study 3, which further develops these results with a semi-automated, large-scale semantic analysis of several thousand obituaries. Geography, gender, and elite status all turn out to be associated with the virtues and values associated with the deceased

    The Axiology of Necrologies: Using Natural Language Processing to Examine Values in Obituaries

    Get PDF
    This dissertation is centrally concerned with exploring obituaries as repositories of values. Obituaries are a publicly-available natural language source that are variably written for members of communities that are wide (nation-level) and narrow (city-level, or at the level of specific groups therein). Because they are explicitly summative, limited in size, and written for consumption by a public audience, obituaries may be expected to express concisely the aspects of their subjects' lives that the authors (often family members living in the same communities) found most salient or worthy of featuring. 140,599 obituaries nested in 832 newspapers from across the USA were scraped with permission from *Legacy.com,* an obituaries publisher. Obituaries were coded for the age at death and gender (female/male) of the deceased using automated algorithms. For each publishing newspaper, county-level median income, educational achievement (operationalized as percent of the population with a Bachelor's degree or higher), and race and ethnicity were averaged across counties, weighting by population size. A Neo4J graph database was constructed using WordNet and the University of South Florida Free Association Norms datasets. Each word in each obituary in the corpus was lemmatized. The shortest path through the WordNet graph from each lemma to 30 Schwartz value prototype words published by Bardi, Calogero, and Mullen (2008) was then recorded. From these path lengths, a new measure, "word-by-hop," was calculated for each Schwartz value to reflect the relative lexical distance between each obituary and that Schwartz value. Of the Schwartz values, Power, Conformity, and Security were most indicated in the corpus, while Universalism, Hedonism, and Stimulation were least indicated. A series of nine two-level regression models suggested that, across Schwartz values, newspaper community accounted for the greatest amount of word-by-hop variability in the corpus. The best-fitting model indicated a small, negative effect of female status across Schwartz values. Unexpectedly, Hedonism and Conformity, which had conceptually opposite prototype words, were highly correlated, possibly indicating that obituary authors "compensate" for describing the deceased in a hedonistic way by concurrently emphasizing restraint. Future research could usefully further expand word-by-hop and incorporate individual-level covariates that match the newspaper-level covariates used here

    Sci-Hub provides access to nearly all scholarly literature

    Full text link
    The website Sci-Hub enables users to download PDF versions of scholarly articles, including many articles that are paywalled at their journal\u27s site. Sci-Hub has grown rapidly since its creation in 2011, but the extent of its coverage was unclear. Here we report that, as of March 2017, Sci-Hub\u27s database contains 68.9% of the 81.6 million scholarly articles registered with Crossref and 85.1% of articles published in toll access journals. We find that coverage varies by discipline and publisher, and that Sci-Hub preferentially covers popular, paywalled content. For toll access articles, we find that Sci-Hub provides greater coverage than the University of Pennsylvania, a major research university in the United States. Green open access to toll access articles via licit services, on the other hand, remains quite limited. Our interactive browser at https://greenelab.github.io/scihub allows users to explore these findings in more detail. For the first time, nearly all scholarly literature is available gratis to anyone with an Internet connection, suggesting the toll access business model may become unsustainable

    The Axiology of Necrologies: Using Natural Language Processing to Examine Values in Obituaries (Dissertation Code and Limited Data)

    No full text
    This dissertation is centrally concerned with exploring obituaries as repositories of values. Obituaries are a publicly-available natural language source that are variably written for members of communities that are wide (nation- level) and narrow (city-level, or at the level of specific groups therein). Because they are explicitly summative, limited in size, and written for consumption by a public audience, obituaries may be expected to express concisely the aspects of their subjects’ lives that the authors (often family members living in the same communities) found most salient or worthy of featuring. 140,599 obituaries nested in 832 newspapers from across the USA were scraped with permission from Legacy.com, an obituaries publisher. Obituaries were coded for the age at death and gender (female/male) of the deceased using automated algorithms. For each publishing newspaper, county-level median income, educational achievement (operationalized as percent of the population with a Bachelor’s degree or higher), and race and ethnicity were averaged across counties, weighting by population size. A Neo4J graph database was constructed using WordNet and the University of South Florida Free Association Norms datasets. Each word in each obituary inthe corpus was lemmatized. The shortest path through the WordNet graph from each lemma to 30 Schwartz value prototype words published by Bardi, Calogero, and Mullen (2008) was then recorded. From these path lengths, a new measure, “word-by-hop,” was calculated for each Schwartz value to reflect the relative lexical distance between each obituary and that Schwartz value. Of the Schwartz values, Power, Conformity, and Security were most indicated in the corpus, while Universalism, Hedonism, and Stimulation were least indicated. A series of seven two-level regression models suggested that, across Schwartz values, newspaper community accounted for the greatest amount of word-by-hop variability in the corpus. The best-fitting model indicated a small, negative effect of female status across Schwartz values. Unexpectedly, Hedonism and Conformity, which had conceptually opposite prototype words, were highly correlated, possibly indicating that obituary authors “compensate” for describing the deceased in a hedonistic way by concurrently emphasizing restraint. Future research could usefully further expand word-by-hop and incorporate individual-level covariates that match the newspaper-level covariates used here

    Mapping Human Values: Enhancing Social Marketing through Obituary Data-Mining

    No full text
    Obituaries are an especially rich resource for identifying people’s values. Because obituaries are succinct and explicitly intended to summarize their subjects’ lives, they may be expected to include only the features that the author(s) find most salient, not only for themselves as relatives or friends of the deceased, but also to signal to others in the community the socially-recognized aspects of the deceased’s character. We report three approaches to the scientific study of virtue and value through obituaries. We begin by reviewing studies 1 and 2, in which obituaries were carefully read and labeled. We then report study 3, which further develops these results with a semi-automated, large-scale semantic analysis of several thousand obituaries. Finally, we present the results of study 4 in which individuals were asked to write prospective obituaries. Geography, gender, and elite status all turn out to influence the virtues and values associated with the deceased

    publicus/r-veccompare: version 0.1.0

    No full text
    <ul> <li> <p>This version is (as of this writing) <a href="https://CRAN.R-project.org/package=veccompare">listed on CRAN</a>, and can be installed from there with</p> <code>install.packages('veccompare') </code> <p>or, to target this release specifically,</p> <code># install.packages('devtools') devtools::install_version('veccompare', version = '0.1.0', repos = 'http://cran.us.r-project.org') </code> </li> <li> <p>It can also be installed directly from GitHub with</p> <code># install.packages('devtools') devtools::install_github("publicus/[email protected]") </code> </li> </ul

    Using an anomaly detection approach for the segmentation of colorectal cancer tumors in whole slide images

    No full text
    Colorectal cancer (CRC) is the second most commonly diagnosed cancer in the United States. Genetic testing is critical in assisting in the early detection of CRC and selection of individualized treatment plans, which have shown to improve the survival rate of CRC patients. The tissue slide review (TSR), a tumor tissue macro-dissection procedure, is a required pre-analytical step to perform genetic testing. Due to the subjective nature of the process, major discrepancies in CRC diagnostics by pathologists are reported, and metrics for quality are often only qualitative. Progressive context encoder anomaly detection (P-CEAD) is an anomaly detection approach to detect tumor tissue from whole slide images (WSIs), since tumor tissue is by its nature, an anomaly. P-CEAD-based CRC tumor segmentation achieves a 71% 26% sensitivity, 92% 7% specificity, and 63% 23% F1 score. The proposed approach provides an automated CRC tumor segmentation pipeline with a quantitatively reproducible quality compared with the conventional manual tumor segmentation procedure

    A Practical Guide to Performing a Library User Data Risk Assessment in Library-Built Systems

    No full text
    A report from a subgroup of DLF's Privacy and Ethics in Technology Working Group
    corecore