9 research outputs found
Multi-level computational methods for interdisciplinary research in the HathiTrust Digital Library
We show how faceted search using a combination of traditional classification systems and mixed-membership topic models can go beyond keyword search to inform resource discovery, hypothesis formulation, and argument extraction for interdisciplinary research. Our test domain is the history and philosophy of scientific work on animal mind and cognition. The methods can be generalized to other research areas and ultimately support a system for semi-automatic identification of argument structures. We provide a case study for the application of the methods to the problem of identifying and extracting arguments about anthropomorphism during a critical period in the development of comparative psychology. We show how a combination of classification systems and mixed-membership models trained over large digital libraries can inform resource discovery in this domain. Through a novel approach of âdrill-downâ topic modelingâsimultaneously reducing both the size of the corpus and the unit of analysisâwe are able to reduce a large collection of fulltext volumes to a much smaller set of pages within six focal volumes containing arguments of interest to historians and philosophers of comparative psychology. The volumes identified in this way did not appear among the first ten results of the keyword search in the HathiTrust digital library and the pages bear the kind of âclose readingâ needed to generate original interpretations that is the heart of scholarly work in the humanities. Zooming back out, we provide a way to place the books onto a map of science originally constructed from very different data and for different purposes. The multilevel approach advances understanding of the intellectual and societal contexts in which writings are interpreted
Topics ranked by similarity to âanthropomorphismâ, âanimalâ, and âpsychologyâ in the <i>HT</i>1315 corpus.
<p>Topics 26, 16, and 10 (highlighted with bold text) were used to derive the <i>HT</i>86 corpus, as they were most relevant to the inquiry.</p
Pages for which OVA+ argument maps were created, showing total number of pages analyzed and numbers of arguments identified on each of the passes described in the main text.
<p>Pages for which OVA+ argument maps were created, showing total number of pages analyzed and numbers of arguments identified on each of the passes described in the main text.</p
Topics ranked by similarity to âanthropomorphismâ in the <i>HT</i>1315 corpus.
<p>Topic 16 (highlighted with bold text) is highly relevant to the inquiry.</p
Topics ranked by similarity to âanthropomorphismâ in the <i>HT</i>86 corpus, as modeled at the page level.
<p>Topics ranked by similarity to âanthropomorphismâ in the <i>HT</i>86 corpus, as modeled at the page level.</p
Corpus analysis sequence.
<p>Schematic rendering of the six-step process that sequentially drills down from macroscopic âdistant readingâ to microscopic âclose readingâ before zooming back out to the macroscopic scale at the final step. The approximate orders of magnitude of the datasets either side of each processing step are shown below the icons as powers of 10 of book/fulltext-sized units, and grey bars representing the data are scaled logarithmically.</p
UCSD map of science with overlay of HathiTrust search results.
<p>This image shows topical coverage of humanities and life science data. The basemap of science shows each sub-discipline denoted by a circle colored or shaded according to the 13 core disciplines. Links indicate journal co-citations from the basemap. The 776 volumes of <i>HT</i>1315 with LCCN metadata are shown on the map as circles. Volumes also in <i>HT</i>86 are shown with thicker circles, and those in <i>HT</i>6 are shown in the thickest circles. An online, interactive version can be explored at <a href="http://inpho.cogs.indiana.edu/scimap/scits" target="_blank">http://inpho.cogs.indiana.edu/scimap/scits</a>.</p
Topics ranked by similarity to âanthropomorphismâ, âanimalâ, and âpsychologyâ in the <i>HT</i>86 corpus.
<p>Topics ranked by similarity to âanthropomorphismâ, âanimalâ, and âpsychologyâ in the <i>HT</i>86 corpus.</p