13 research outputs found
Identifying Virtues and Values Through Obituary Data-Mining
Because obituaries are succinct and explicitly intended to summarize their subjects’ lives, they may be expected to include only the features that the author finds most salient but also to signal to others in the community the socially-recognized aspects of the deceased’s character. We begin by reviewing studies 1 and 2, in which obituaries were carefully read and labeled. We then report study 3, which further develops these results with a semi-automated, large-scale semantic analysis of several thousand obituaries. Geography, gender, and elite status all turn out to be associated with the virtues and values associated with the deceased
The Axiology of Necrologies: Using Natural Language Processing to Examine Values in Obituaries
This dissertation is centrally concerned with exploring obituaries as repositories of values. Obituaries are a publicly-available natural language source that are variably written for members of communities that are wide (nation-level) and narrow (city-level, or at the level of specific groups therein). Because they are explicitly summative, limited in size, and written for consumption by a public audience, obituaries may be expected to express concisely the aspects of their subjects' lives that the authors (often family members living in the same communities) found most salient or worthy of featuring.
140,599 obituaries nested in 832 newspapers from across the USA were scraped with permission from *Legacy.com,* an obituaries publisher. Obituaries were coded for the age at death and gender (female/male) of the deceased using automated algorithms. For each publishing newspaper, county-level median income, educational achievement (operationalized as percent of the population with a Bachelor's degree or higher), and race and ethnicity were averaged across counties, weighting by population size.
A Neo4J graph database was constructed using WordNet and the University of South Florida Free Association Norms datasets. Each word in each obituary in the corpus was lemmatized. The shortest path through the WordNet graph from each lemma to 30 Schwartz value prototype words published by Bardi, Calogero, and Mullen (2008) was then recorded. From these path lengths, a new measure, "word-by-hop," was calculated for each Schwartz value to reflect the relative lexical distance between each obituary and that Schwartz value.
Of the Schwartz values, Power, Conformity, and Security were most indicated in the corpus, while Universalism, Hedonism, and Stimulation were least indicated. A series of nine two-level regression models suggested that, across Schwartz values, newspaper community accounted for the greatest amount of word-by-hop variability in the corpus. The best-fitting model indicated a small, negative effect of female status across Schwartz values. Unexpectedly, Hedonism and Conformity, which had conceptually opposite prototype words, were highly correlated, possibly indicating that obituary authors "compensate" for describing the deceased in a hedonistic way by concurrently emphasizing restraint. Future research could usefully further expand word-by-hop and incorporate individual-level covariates that match the newspaper-level covariates used here
Sci-Hub provides access to nearly all scholarly literature
The website Sci-Hub enables users to download PDF versions of scholarly articles, including many articles that are paywalled at their journal\u27s site. Sci-Hub has grown rapidly since its creation in 2011, but the extent of its coverage was unclear. Here we report that, as of March 2017, Sci-Hub\u27s database contains 68.9% of the 81.6 million scholarly articles registered with Crossref and 85.1% of articles published in toll access journals. We find that coverage varies by discipline and publisher, and that Sci-Hub preferentially covers popular, paywalled content. For toll access articles, we find that Sci-Hub provides greater coverage than the University of Pennsylvania, a major research university in the United States. Green open access to toll access articles via licit services, on the other hand, remains quite limited. Our interactive browser at https://greenelab.github.io/scihub allows users to explore these findings in more detail. For the first time, nearly all scholarly literature is available gratis to anyone with an Internet connection, suggesting the toll access business model may become unsustainable
VIVO: a system for research discovery
In this paper the software VIVO is described. VIVO is a member-supported, enterprise open source software and an ontology for representing scholarship
The Axiology of Necrologies: Using Natural Language Processing to Examine Values in Obituaries (Dissertation Code and Limited Data)
This dissertation is centrally concerned with exploring obituaries as repositories of values. Obituaries are a publicly-available natural language source that are variably written for members of communities that are wide (nation- level) and narrow (city-level, or at the level of specific groups therein). Because they are explicitly summative, limited in size, and written for consumption by
a public audience, obituaries may be expected to express concisely the aspects
of their subjects’ lives that the authors (often family members living in the same communities) found most salient or worthy of featuring.
140,599 obituaries nested in 832 newspapers from across the USA were scraped with permission from Legacy.com, an obituaries publisher. Obituaries
were coded for the age at death and gender (female/male) of the deceased using automated algorithms. For each publishing newspaper, county-level median income, educational achievement (operationalized as percent of the population with a Bachelor’s degree or higher), and race and ethnicity were averaged across counties, weighting by population size.
A Neo4J graph database was constructed using WordNet and the University of South Florida Free Association Norms datasets. Each word in each obituary inthe corpus was lemmatized. The shortest path through the WordNet graph from each lemma to 30 Schwartz value prototype words published by Bardi, Calogero, and Mullen (2008) was then recorded. From these path lengths, a new measure, “word-by-hop,” was calculated for each Schwartz value to reflect the relative lexical distance between each obituary and that Schwartz value.
Of the Schwartz values, Power, Conformity, and Security were most indicated in the corpus, while Universalism, Hedonism, and Stimulation were least indicated. A series of seven two-level regression models suggested that, across Schwartz values, newspaper community accounted for the greatest amount of word-by-hop variability in the corpus. The best-fitting model indicated a small, negative effect of female status across Schwartz values. Unexpectedly, Hedonism and Conformity, which had conceptually opposite prototype words, were highly correlated, possibly indicating that obituary authors “compensate” for describing the deceased in a hedonistic way by concurrently emphasizing restraint. Future research could usefully further expand word-by-hop and incorporate individual-level covariates that match the newspaper-level covariates used here
Mapping Human Values: Enhancing Social Marketing through Obituary Data-Mining
Obituaries are an especially rich resource for identifying people’s values. Because obituaries are succinct and explicitly intended to summarize their subjects’ lives, they may be expected to include only the features that the author(s) find most salient, not only for themselves as relatives or friends of the deceased, but also to signal to others in the community the socially-recognized aspects of the deceased’s character. We report three approaches to the scientific study of virtue and value through obituaries. We begin by reviewing studies 1 and 2, in which obituaries were carefully read and labeled. We then report study 3, which further develops these results with a semi-automated, large-scale semantic analysis of several thousand obituaries. Finally, we present the results of study 4 in which individuals were asked to write prospective obituaries. Geography, gender, and elite status all turn out to influence the virtues and values associated with the deceased
publicus/r-veccompare: version 0.1.0
<ul>
<li>
<p>This version is (as of this writing) <a href="https://CRAN.R-project.org/package=veccompare">listed on CRAN</a>, and can be installed from there with</p>
<code>install.packages('veccompare')
</code>
<p>or, to target this release specifically,</p>
<code># install.packages('devtools')
devtools::install_version('veccompare', version = '0.1.0', repos = 'http://cran.us.r-project.org')
</code>
</li>
<li>
<p>It can also be installed directly from GitHub with</p>
<code># install.packages('devtools')
devtools::install_github("publicus/[email protected]")
</code>
</li>
</ul
Using an anomaly detection approach for the segmentation of colorectal cancer tumors in whole slide images
Colorectal cancer (CRC) is the second most commonly diagnosed cancer in the United States. Genetic testing is critical in assisting in the early detection of CRC and selection of individualized treatment plans, which have shown to improve the survival rate of CRC patients. The tissue slide review (TSR), a tumor tissue macro-dissection procedure, is a required pre-analytical step to perform genetic testing. Due to the subjective nature of the process, major discrepancies in CRC diagnostics by pathologists are reported, and metrics for quality are often only qualitative. Progressive context encoder anomaly detection (P-CEAD) is an anomaly detection approach to detect tumor tissue from whole slide images (WSIs), since tumor tissue is by its nature, an anomaly. P-CEAD-based CRC tumor segmentation achieves a 71% 26% sensitivity, 92% 7% specificity, and 63% 23% F1 score. The proposed approach provides an automated CRC tumor segmentation pipeline with a quantitatively reproducible quality compared with the conventional manual tumor segmentation procedure
A Practical Guide to Performing a Library User Data Risk Assessment in Library-Built Systems
A report from a subgroup of DLF's Privacy and Ethics in Technology Working Group