1 research outputs found

    Author Gender Metadata Augmentation of HathiTrust Digital Library

    Get PDF
    ABSTRACT Bibliographic metadata is essential for digital library resource description. Especially as the size and number of bibliographic entities grows, high-quality metadata enables richer forms of digital library access, search, and use. Metadata records can be enriched through automated techniques. For example, a digital humanities scholar might use the gender of a set of authors during their literature analysis. In this study, we undertook to enrich the metadata description of a large-scale digital library, the HathiTrust (HT) digital library, specifically by determining the gender of authors of the public domain portion of the collection. The results are stored to a separate Solr index accessible through the HathiTrust Research Center services. This study, which successfully resolved in 78.9% of the cases the gender of authors in the HT public domain corpus, suggests future research directions in capturing and representing the provenance of the contributing sources to enhance trust, and in machine learning to resolve the remaining names
    corecore