57 research outputs found

    Defining Textual Entailment

    Get PDF
    Textual entailment is a relationship that obtains between fragments of text when one fragment in some sense implies the other fragment. The automation of textual entailment recognition supports a wide variety of text-based tasks, including information retrieval, information extraction, question answering, text summarization, and machine translation. Much ingenuity has been devoted to developing algorithms for identifying textual entailments, but relatively little to saying what textual entailment actually is. This article is a review of the logical and philosophical issues involved in providing an adequate definition of textual entailment. We show that many natural definitions of textual entailment are refuted by counterexamples, including the most widely cited definition of Dagan et al. We then articulate and defend the following revised definition: T textually entails H = df typically, a human reading T would be justified in inferring the proposition expressed by H from the proposition expressed by T. We also show that textual entailment is context-sensitive, nontransitive, and nonmonotonic

    Modeling Worksets in the HathiTrust Research Center

    Get PDF
    Report formally defining the notion of workset both generally and specifically within the context of the HTRC. See executive summary for full details.Mellon Reference Number 21300666Ope

    Exploring the Benefits for Users of Linked Open Data for Digitized Special Collections: Google Analytics data summary

    Get PDF
    In addition to one-on-one user interactions and a planned focus group, additional assessment methods, i.e., site traffic data gathered from Google Analytics and test queries using Google’s search engine, were used to produce supplementary benchmark data. The sections below summarize the facts observed from these two data collections.Ope

    To Map or Not to Map: Rethinking Crosswalk Agendas

    Get PDF
    In the two decades since their publication, the Functional Requirements of Bibliographic Records and succeeding standards such as the Library Reference Model have had a marked impact on discourse concerning descriptive theory and practice. The BIBFRAME model, which began as an effort to replace MARC as a linked data-capable modeling format, offers an alternate view of the bibliographic universe with three principal entities rather than four. Differences between BIBFRAME and LRM are based in competing intuitions on the nature of creative works, and at first the two approaches appear to compete for the same intellectual space. BIBFRAME offers us a less constrained model of bibliographic descriptions than the FRBR models, and if interoperability between BIBFRAME and WEMI-aligned standards like Resource Description and Access requires translation of RDA records both to and from BIBFRAME descriptions, then the latter’s flexibility poses problems for mapping between the models. Proposed solutions to those problems reveal as much about different modeling philosophies as they do about different views of creative works and their relationships to texts and copies. Linked data protocols are intended to support resources and scenarios that are far too diverse for either a single account of creative works or for a subsumption-based taxonomy of models. But a need for descriptions flexible enough to include them all does not require us to retreat from modeling commitments to either reductionism or operationalism. BIBFRAME can be seen as reaching for or pointing toward a descriptive domain that supports a complementary role to the IFLA standards

    When conceptual models collide: aggregates in IFLA's Library Reference Model

    Get PDF
    IFLA’s Library Reference Model defines manifestations as sets of carriers sharing relevant physical and intentional properties, and aggregates as manifestations that embody multiple expressions. Taken together, these accounts pose consistency problems for some manifestation-level properties, and for the constraint that an item exemplifies exactly one manifestation.Ope

    Enhancing Cultural Heritage Collections by Supporting and Analyzing Participation in Flickr

    Get PDF
    Cultural heritage institutions can enhance their collections by sharing content through popular web services. Drawing on current analyses from the Flickr Feasibility Study, we report on the pronounced increase in use of the IMLS DCC Flickr Photostream in the past year, trends in how users are engaging with the content, and data provider perspectives on participation in Flickr through the DCC. In addition to users providing comments and tags for images, they are increasingly integrating historical images from libraries and museums into new digital objects and special collections. Intermediary services can fill a key role in lowering the burden for institutions to engage in Web 2.0 initiatives and broadening public access to cultural heritage content. To extend the scope of the current DCC services, we propose a feedback framework for transferring user-generated information to institutional data providers.IMLS LG-06-07-0020published or submitted for publicationis peer reviewe

    Exploring the Benefits for Users of Linked Open Data for Digitized Special Collections: Benchmark case studies of two digital library websites

    Get PDF
    This report presents the results from a pair of case studies conducted as part of the Exploring the benefits for users of Linked Open Data for digitized special collections project. Each case study was produced from a series of interviews with users of digital special collections. The case studies compare the Motley Collection of Theatre & Costume Design1 (Motley) to the Harvard Theatre Collection2 and the Kolb-Proust Archive for Research3 (KPA) to the Bovary Manuscript Archive4 respectively. Each of the users was a volunteer and was asked to compare to digital collection websites to one another during the course of completing a series of user tasks which included assessing the overall layout and utility of each digital collection’s interface, searching for a specific resource, and characterizing how they might employ the collections in their research.Andrew W. Mellon Foundation, Award No. 31500650Ope

    Disambiguating Descriptions: Mapping Digital Special Collections Metadata into Linked Open Data Formats

    Get PDF
    In this poster we describe the Linked Open Data (LOD) for Digital Special Collections project at the University of Illinois at Urbana-Champaign and describe some of the particular challenges that legacy metadata poses for representation in LOD formats. LOD formats are primarily based on the World Wide Web Consortium’s Resource Description Framework standard which demands both that entities be named by opaque universal identifiers whenever possible but also that metadata descriptions for entities be as unambiguous as possible. The challenges for disambiguating those descriptions are illustrated through examples drawn from digital special collections based at four different digital librariesOpe

    Conceptualizing worksets for non-consumptive research

    Get PDF
    The HathiTrust (HT) digital library comprises 4 billion pages (composing 11 million volumes). The HathiTrust Research Center (HTRC) – a unique collaboration between University of Illinois and Indiana University – is developing tools to connect scholars to this large and diverse corpus. This poster discusses HTRC’s activities surrounding the discovery, formation and optimization of useful analytic subsets of the HT corpus (i.e., workset creation and use). As a part of this development we are prototyping a RDF-based triple-store designed to record and serialize metadata describing worksets and the bibliographic entities that are collected within them. At the heart of this work is the construction of a formal conceptual model that captures sufficient descriptive information about worksets, including provenance, curatorial intent, and other useful metadata, so that digital humanities scholars can more easily select, group, and cite their research data collections based upon HT and external corpora. The prototype’s data model is in being designed to be extensible and fit well within the Linked Open Data community.ye

    Proposal for Persistent & Unique Entity Identifiers

    Get PDF
    This proposal argues for the establishment of persistent and unique identifiers for page level content. The page is a key conceptual entity within the HathiTrust Research Center (HTRC) framework. Volumes are composed of pages and pages are the size of the portions of data that the HTRC’s analytics modules consume and execute algorithms across. The need for infrastructure that supports persistent and unique identity for is best described by seven use cases: 1. Persistent Citability: Scholars engaging in the analysis of HTRC resources have a clear need to cite those resources in a persistent manner independent of those resources’ relative positions within other entities. 2. Point-in-time Citability: Scholars engaging in the analysis of HTRC resources have a clear need to cite resources in an unambiguous way that is persistent with respect to time. 3. Reproducibility: Scholars need methods by which the resources that they cite can be shared so that their work conforms to the norms of peer-review and reproducibility of results. 4. Supporting “non-consumptive” Usage: Anonymizing page-level content by disassociating it from the volumes that it is conceptually a part of increases the difficulty of leveraging HTRC analytics modules for the direct reproduction of HathiTrust (HT) content. 5. Improved Granularity: Since many features that scholars are interested in exist at the conceptual level of a page rather than at the level of a volume, unique page-level entities expand the types of methods by which worksets can be gathered and by which analytics modules can be constructed. 6. Expanded Workset Membership: In the near future we would like to empower scholars with options for creating worksets from arbitrary resources at arbitrary levels of granularity, including constructing worksets from collections of arbitrary pages. 7. Supporting Graph Representations: Unique identifiers for page-level content facilitate the creation of more conceptually accurate and functional graph representations of the HT corpus. There several waysOpe
    • …
    corecore