168 research outputs found

    Multi-level computational methods for interdisciplinary research in the HathiTrust Digital Library

    Get PDF
    We show how faceted search using a combination of traditional classification systems and mixed-membership topic models can go beyond keyword search to inform resource discovery, hypothesis formulation, and argument extraction for interdisciplinary research. Our test domain is the history and philosophy of scientific work on animal mind and cognition. The methods can be generalized to other research areas and ultimately support a system for semi-automatic identification of argument structures. We provide a case study for the application of the methods to the problem of identifying and extracting arguments about anthropomorphism during a critical period in the development of comparative psychology. We show how a combination of classification systems and mixed-membership models trained over large digital libraries can inform resource discovery in this domain. Through a novel approach of “drill-down” topic modeling—simultaneously reducing both the size of the corpus and the unit of analysis—we are able to reduce a large collection of fulltext volumes to a much smaller set of pages within six focal volumes containing arguments of interest to historians and philosophers of comparative psychology. The volumes identified in this way did not appear among the first ten results of the keyword search in the HathiTrust digital library and the pages bear the kind of “close reading” needed to generate original interpretations that is the heart of scholarly work in the humanities. Zooming back out, we provide a way to place the books onto a map of science originally constructed from very different data and for different purposes. The multilevel approach advances understanding of the intellectual and societal contexts in which writings are interpreted

    Finding and Interpreting Arguments: An Important Challenge for Humanities Computing and Scholarly Practice

    Get PDF
    Skillful identification and interpretation of arguments is a cornerstone of learning, scholarly activity and thoughtful civic engagement. These are difficult skills for people to learn, and they are beyond the reach of current computational methods from artificial intelligence and machine learning, despite hype suggesting the contrary. In previous work, we have attempted to build systems that scaffold these skills in people. In this paper we reflect on the difficulties posed by this work, and we argue that it is a serious challenge which ought to be taken up within the digital humanities and related efforts to computationally support scholarly practice. Network analysis, bibliometrics, and stylometrics, essentially leave out the fundamental humanistic skill of charitable argument interpretation because they touch very little on the meanings embedded in texts. We present a problematisation of the design space for potential tool development, as a result of insights about the nature and form of arguments in historical texts gained from our attempt to locate and map the arguments in one corner of the Hathi Trust digital library

    Workset Creation for Scholarly Analysis: Prototyping Project

    Get PDF
    Scholars rely on library collections to support their scholarship. Out of these collections, scholars select, organize, and refine the worksets that will answer to their particular research objectives. The requirements for those worksets are becoming increasingly sophisticated and complex, both as humanities scholarship has become more interdisciplinary and as it has become more digital. The HathiTrust is a repository that centrally collects image and text representations of library holdings digitized by the Google Books project and other mass-digitization efforts. The HathiTrust's computational infrastructure is being built to support large-scale manipulation and preservation of these representations, but it organizes them according to catalog records that were created to enable users to find books in a building or to make high-level generalizations about duplicate holdings across libraries, etc. These catalog records were never meant to support the granularity of sorting and selection or works that scholars now expect, much less page-level or chapter-level sorting and selection out of a corpus of billions of pages. The ability to slice through a massive corpus consisting of many different library collections, and out of that to construct the precise workset required for a particular scholarly investigation, is the “game changing” potential of the HathiTrust; understanding how to do that is a research problem, and one that is keenly of interest to the HathiTrust Research Center (HTRC), since we believe that scholarship begins with the selection of appropriate resources. Given the unprecedented size and scope of the HathiTrust corpus—in conjunction with the HTRC’s unique computational access to copyrighted materials—we are proposing a project that will engage scholars in designing tools for exploration, location, and analytic grouping of materials so they can routinely conduct computational scholarship at scale, based on meaningful worksets. “Workset Creation for Scholarly Analysis: Prototyping Project” (WCSA) seeks to address three sets of tightly intertwined research questions regarding 1) enriching the metadata in the HathiTrust corpus, 2) augmenting string-based metadata with URIs to leverage discovery and sharing through external services, and 3) formalizing the notion of collections and worksets in the context of the HathiTrust Research Center. Building upon the model of the Open Annotation Collaboration, the HTRC proposes to release an open, competitive Request for Proposals with the intent to fund four prototyping projects that will build tools for enriching and augmenting metadata for the HathiTrust corpus. Concurrently, the HTRC will work closely with the Center for Informatics Research in Science and Scholarship (CIRSS) to develop and instantiate a set of formal data models that will be used to capture and integrate the outputs of the funded prototyping projects with the larger HathiTrust corpus.Andrew W. Mellon Foundation, grant no. 21300666Ope

    Workset Creation for Scholarly Analysis and Data Capsules (WCSA+DC): Laying the foundations for secure computation with copyrighted data in the HathiTrust Research Center, Phase I

    Get PDF
    The primary objective of the WCSA+DC project is the seamless integration of the workset model and tools with the Data Capsule framework to provide non-consumptive research access HathiTrust’s massive corpus of data objects, securely and at scale, regardless of copyright status. That is, we plan to surmount the copyright wall on behalf of scholars and their students. Notwithstanding the substantial preliminary work that has been done on both the WCSA and DC fronts, they are both still best characterized as being in the prototyping stages. It is our intention to that this proposed Phase I of the project devote an intense two-year burst of effort to move the suite of WCSA and DC prototypes from the realm of proof-of-concept to that of a firmly integrated at-scale deployment. We plan to concentrate our requested resources on making sure our systems are as secure and robust at scale as possible. Phase I will engage four external research partners. Two of the external partners, Kevin Page (Oxford) and Annika Hinze (Waikato) were recipients of WCSA prototyping sub-awards. We are very glad to propose extending and refining aspects of their prototyping work in the context of WCSA+DC. Two other scholars, Ted Underwood (Illinois) and James Pustejovsky (Brandeis) will play critical roles in Phase I as active participants in the development and refinement of the tools and systems from their particular user-scholar perspectives: Underwood, Digital Humanities (DH); Pustejovsky, Computational Linguistics (CL). The four key outcomes and benefits of the WCSA+DC, Phase I project are: 1. The deployment of a new Workset Builder tool that enhances search and discovery across the entire HTDL by complementing traditional volume-level bibliographic metadata with new metadata derived from a variety of sources at various levels granularity. 2. The creation of Linked Open Data resources to help scholars find, select, integrate and disseminate a wider range of data as part of their scholarly analysis life-cycle. 3. A new Data Capsule framework that integrates worksets, runs at scale, and does both in a secure, non-consumptive, manner. 4. A set of exemplar pre-built Data Capsules that incorporate tools commonly used by both the DH and CL communities that scholars can then customize to their specific needs.Andrew W. Mellon Foundation, grant no. 41500672Ope

    HathiTrust Research Center User Requirements Study White Paper

    Get PDF
    This paper presents findings from an investigation into trends and practices in humanities and social sciences research that incorporates text data mining. As affiliates of the HathiTrust Research Center (HTRC), the purpose of our study was to illuminate researcher needs and expectations for text data, tools, and training for text mining in order to better understand our current and potential user community. Results of our study have and will continue to inform development of HTRC tools and services for computational text analysis.Ope

    Scholarly Commons Digital Humanities Needs Assessment Study

    Get PDF
    The members of the Digital Humanities Needs Assessment Working Group completed an analysis of current activities and future needs for digital humanities and digital scholarship-oriented research and teaching at the University of Illinois at Urbana Champaign. This study originated as an investigation into the particular practices and work of digital humanities researchers, and how the University Library could support the needs for digital humanities research, particularly via the resources and expertise provided in the Scholarly Commons. This report delivers findings gathered via interviews and follow-up survey, and analyzed by the Working Group. It identifies thematic Areas of Need and also proposed Recommendations for the Library.Ope

    The Visual Page

    Get PDF
    All printed texts convey meaning through both linguistic and graphic signs, but existing tools for computational text analysis focus only on the linguistic content. The Visual Page will develop a prototype application to identify and analyze visual features in digitized Victorian books of poetry, such as margin space, line indentation, and typeface attributes. This will enable scholars to compare documents, identify distinctive or typical books, and track historical changes and influence over very large sets of digitized texts. Current research into such questions is limited by our human capacity to view and compare only a fairly small number of texts at one time. Thus our understanding of their historical significance is based on limited information. Computer analysis can point to significant patterns and trends over a much larger set of texts, which will ultimately transform our understanding of Victorian print culture and the humanities at large

    When Michigan Changed the World

    Full text link
    http://deepblue.lib.umich.edu/bitstream/2027.42/168165/1/2020-Feb_When_UM_Changed_the_World.pd
    • 

    corecore