951 research outputs found

    A fragmentising interface to a large corpus of digitized text: (Post)humanism and non-consumptive reading via features

    Get PDF
    While the idea of distant reading does not rule out the possibility of close reading of the individual components of the corpus of digitized text that is being distant-read, this ceases to be the case when parts of the corpus are, for reasons relating to intellectual property, not accessible for consumption through downloading followed by close reading. Copyright restrictions on material in collections of digitized text such as the HathiTrust Digital Library (HTDL) necessitates providing facilities for non-consumptive reading, one of the approaches to which consists of providing users with features from the text in the form of small fragments of text, instead of the text itself. We argue that, contrary to expectation, the fragmentary quality of the features generated by the reading interface does not necessarily imply that the mode of reading enabled and mediated by these features points in an anti-humanist direction. We pose the fragmentariness of the features as paradigmatic of the fragmentation with which digital techniques tend, more generally, to trouble the humanities. We then generalize our argument to put our work on feature-based non-consumptive reading in dialogue with contemporary debates that are currently taking place in philosophy and in cultural theory and criticism about posthumanism and agency. While the locus of agency in such a non-consumptive practice of reading does not coincide with the customary figure of the singular human subject as reader, it is possible to accommodate this fragmentising practice within the terms of an ampler notion of agency imagined as dispersed across an entire technosocial ensemble. When grasped in this way, such a practice of reading may be considered posthumanist but not necessarily antihumanist.Ope

    Stewardship of the evolving scholarly record: from the invisible hand to conscious coordination

    Get PDF
    The scholarly record is increasingly digital and networked, while at the same time expanding in both the volume and diversity of the material it contains. The long-term future of the scholarly record cannot be effectively secured with traditional stewardship models developed for print materials. This report describes the key features of future stewardship models adapted to the characteristics of a digital, networked scholarly record, and discusses some practical implications of implementing these models. Key highlights include: As the scholarly record continues to evolve, conscious coordination will become an important organizing principle for stewardship models. Past stewardship models were built on an "invisible hand" approach that relied on the uncoordinated, institution-scale efforts of individual academic libraries acting autonomously to maintain local collections. Future stewardship of the evolving scholarly record requires conscious coordination of context, commitments, specialization, and reciprocity. With conscious coordination, local stewardship efforts leverage scale by collecting more of less. Keys to conscious coordination include right-scaling consolidation, cooperation, and community mix. Reducing transaction costs and building trust facilitate conscious coordination. Incentives to participate in cooperative stewardship activities should be linked to broader institutional priorities. The long-term future of the scholarly record in its fullest expression cannot be effectively secured with stewardship strategies designed for print materials. The features of the evolving scholarly record suggest that traditional stewardship strategies, built on an “invisible hand” approach that relies on the uncoordinated, institution-scale efforts of individual academic libraries acting autonomously to maintain local collections, is no longer suitable for collecting, organizing, making available, and preserving the outputs of scholarly inquiry. As the scholarly record continues to evolve, conscious coordination will become an important organizing principle for stewardship models. Conscious coordination calls for stewardship strategies that incorporate a broader awareness of the system-wide stewardship context; declarations of explicit commitments around portions of the local collection; formal divisions of labor within cooperative arrangements; and robust networks for reciprocal access. Stewardship strategies based on conscious coordination involve an acceleration of an already perceptible transition away from relatively autonomous local collections to ones built on networks of cooperation across many organizations, within and outside the traditional cultural heritage community

    Workset Creation for Scholarly Analysis: Prototyping Project

    Get PDF
    Scholars rely on library collections to support their scholarship. Out of these collections, scholars select, organize, and refine the worksets that will answer to their particular research objectives. The requirements for those worksets are becoming increasingly sophisticated and complex, both as humanities scholarship has become more interdisciplinary and as it has become more digital. The HathiTrust is a repository that centrally collects image and text representations of library holdings digitized by the Google Books project and other mass-digitization efforts. The HathiTrust's computational infrastructure is being built to support large-scale manipulation and preservation of these representations, but it organizes them according to catalog records that were created to enable users to find books in a building or to make high-level generalizations about duplicate holdings across libraries, etc. These catalog records were never meant to support the granularity of sorting and selection or works that scholars now expect, much less page-level or chapter-level sorting and selection out of a corpus of billions of pages. The ability to slice through a massive corpus consisting of many different library collections, and out of that to construct the precise workset required for a particular scholarly investigation, is the “game changing” potential of the HathiTrust; understanding how to do that is a research problem, and one that is keenly of interest to the HathiTrust Research Center (HTRC), since we believe that scholarship begins with the selection of appropriate resources. Given the unprecedented size and scope of the HathiTrust corpus—in conjunction with the HTRC’s unique computational access to copyrighted materials—we are proposing a project that will engage scholars in designing tools for exploration, location, and analytic grouping of materials so they can routinely conduct computational scholarship at scale, based on meaningful worksets. “Workset Creation for Scholarly Analysis: Prototyping Project” (WCSA) seeks to address three sets of tightly intertwined research questions regarding 1) enriching the metadata in the HathiTrust corpus, 2) augmenting string-based metadata with URIs to leverage discovery and sharing through external services, and 3) formalizing the notion of collections and worksets in the context of the HathiTrust Research Center. Building upon the model of the Open Annotation Collaboration, the HTRC proposes to release an open, competitive Request for Proposals with the intent to fund four prototyping projects that will build tools for enriching and augmenting metadata for the HathiTrust corpus. Concurrently, the HTRC will work closely with the Center for Informatics Research in Science and Scholarship (CIRSS) to develop and instantiate a set of formal data models that will be used to capture and integrate the outputs of the funded prototyping projects with the larger HathiTrust corpus.Andrew W. Mellon Foundation, grant no. 21300666Ope

    Piece by Piece Review of Digitize-and-Lend Projects Through the Lens of Copyright and Fair Use

    Get PDF
    Digitize-and-lend library projects can benefit societies in multiple ways, from providing information to people in remote areas, to reducing duplication of effort in digitization, to providing access to people with disabilities. Such projects contemplate not just digitizing library titles for regular patron use, but also allowing the digitized versions to be used for interlibrary loan (ILL), sharing within consortia, and replacing print copies at other libraries. Many of these functions are already supported within the analog world (e.g., ILL), and the digitize-and-lend concept is largely a logical outgrowth of technology, much like the transitioning from manual hand duplication of books to printing presses. The purpose of each function is to facilitate user access to information. Technology can amplify that access, but in doing so, libraries must also be careful not to upset the long established balance in copyright, where authors’ rights sit on the other side of the scale from public benefit. This article seeks to provide a primer on the various components in a digitize-and-lend project, explore the core copyright issues in each, and explain how these projects maintain the balance of copyright even as libraries take advantage of newer technologies

    Collaborative Academic Library Digital Collections Post- Cambridge University Press, HathiTrust and Google Decisions on Fair Use

    Get PDF
    Academic libraries face numerous stressors as they seek to meet the needs of their users through technological advances while adhering to copyright laws. This paper seeks to explore one specific proposal to balance these interests, the impact of recent decisions on its viability, and the copyright challenges that remain after these decisions

    Workset Creation for Scholarly Analysis: Recommendations and Prototyping Project Reports

    Get PDF
    This document assembles and describes the outcomes of the four prototyping projects undertaken as part of the Workset Creation for Scholarly Analysis (WCSA) research project (2013 – 2015). Each prototyping project team provided its own final report. These reports are assembled together and included in this document. Based on the totality of results reported, the WCSA project team also provide a set of overarching recommendations for HTRC implementation and adoption of research conducted by the Prototyping Project teams. The work described here was made possible through the generous support of The Andrew W. Mellon Foundation (Grant Ref # 21300666).The Andrew W. Mellon Foundation (Grant Ref # 21300666)Ope

    TextRWeb: Large-Scale Text Analytics with R on the Web

    Get PDF
    As digital data sources grow in number and size, they pose an opportunity for computational investigation by means of text mining, NLP, and other text analysis techniques. R is a popular and powerful text analytics tool; however, it needs to run in parallel and re- quires special handling to protect copyrighted content against full access (consumption). The HathiTrust Research Center (HTRC) currently has 11 million volumes (books) where 7 million volumes are copyrighted. In this paper we propose HTRC TextRWeb, an interactive R software environment which employs complexity hiding interfaces and automatic code generation to allow large-scale text analytics in a non-consumptive means. For our principal test case of copyrighted data in HathiTrust Digital Library, TextRWeb permits us to code, edit, and submit text analytics methods empowered by a family of interactive web user interfaces. All these methods combine to reveal a new interactive paradigm for large-scale text analytics on the web

    HathiTrust Research Center Data Capsule v1.0: An Overview of Functionality

    Get PDF
    The first mode of access by the community of digital humanities and informatics researchers and educators to the copyrighted content of the HathiTrust digital repository will be to extracted statistical and aggregated information about the copyrighted texts. But can the HathiTrust Research Center support scientific research that allows a researcher to carry out their own analysis and extract their own information? This question is the focus of a 3-year, $606,000 grant from the Alfred P. Sloan Foundation (Plale, Prakash 2011-2014), which has resulted in a novel experimental framework that permits analytical investigation of a corpus but prohibits data from leaving the capsule. The HTRC Data Capsule is both a system architecture and set of policies that enable computational investigation over the protected content of the HT digital repository that is carried out and controlled directly by a researcher. It leverages the foundational security principles of the Data Capsules of A. Prakash of University of Michigan, which allows privileged access to sensitive data while also restricting the channels through which that data can be released. Ongoing work extends the HTRC Data Capsule to give researchers more compute power at their fingertips. The new thrust, HT-DC Cloud, extends existing security guarantees and features to allow researchers to carry out compute-heavy tasks, like LDA topic modeling, on large-scale compute resources. HTRC Data Capsule works by giving a researcher their own virtual machine that runs within the HTRC domain. The researcher can configure the VM as they would their own desktop with their own tools. After they are done, the VM switches into a "secure" mode, where network and other data channels are restricted in exchange for access to the data being protected. Results are emailed to the user. In this talk we discuss the motivations for the HTRC Data Capsule, its successes and challenges. HTRC Data Capsule runs at Indiana University. See more at http://d2i.indiana.edu/non-consumptive-researc

    HathiTrust Research Center: Computational Research on the HathiTrust Repository

    Get PDF
    PIs (exec mgt team): Beth A. Plale, Indiana University; Marshall Scott Poole, University of Illinois Urbana-Champaign ; Robert McDonald, IU; John Unsworth (UIUC) Senior investigators: Loretta Auvil (UIUC); Johan Bollen (IU), Randy Butler (UIUC); Dennis Cromwell (IU), Geoffrey Fox (IU), Eileen Julien (IU), Stacy Kowalczyk (IU); Danny Powell (UIUC); Beth Sandore (UIUC); Craig Stewart (IU); John Towns (UIUC); Carolyn Walters (IU), Michael Welge (UIUC); Eric Wernert (IU

    Scholarly Needs for Text Analysis Resources: A User Assessment Study for the HathiTrust Research Center

    Get PDF
    The HathiTrust Research Center (HTRC) is undertaking a study to better understand the needs of current and potential users of the center’s tools and services for computational text analysis. In this paper, we report on the results of the first phase of the study, which consisted of interviews with scholars, administrators, and librarians whose work involves text data mining. Our study reveals that text analysis workflows are specific to the individual research project and are often nonlinear. In spite of, and in some cases because of, the wealth of textual data available, scholars find it most difficult to locate, access, and curate textual data for their research. While the goals of the study directly relate to research and development for the HTRC, our results are useful for other large-scale data providers developing solutions for allowing computational access to their content
    • …
    corecore