16 research outputs found

    Clipping the Page – Automatic Article Detection and Marking Software in Production of Newspaper Clippings of a Digitized Historical Journalistic Collection

    Get PDF
    This paper describes utilization of article detection and extraction on the Finnish Digi (https://digi.kansalliskirjasto.fi/etusivu?set_language=en) newspaper material of the National Library of Finland (NLF) using data of one newspaper, Uusi Suometar 1869–1918. We use PIVAJ software [1] for detection and marking of articles in our collection. Out of the separated articles we can produce automatic clippings for the user. The user can collect clippings for own use both as images and as OCRed text. Together these functionalities improve usability of the digitized journalistic collection by providing a structured access to the contents of a page.Peer reviewe

    OCR Quality Affects Perceived Usefulness of Historical Newspaper Clippings. A User Study

    Get PDF
    Publisher Copyright: © 2022 Copyright for this paper by its authors.Effects of Optical Character Recognition (OCR) quality on historical information retrieval have so far been studied in data-oriented scenarios regarding the effectiveness of retrieval results. Such studies have either focused on the effects of artificially degraded OCR quality (see, e.g., [1-2]) or utilized test collections containing texts based on authentic low quality OCR data (see, e.g., [3]). In this paper the effects of OCR quality are studied in a user-oriented information retrieval setting. Thirty-two users evaluated subjectively query results of six topics each (out of 30 topics) based on pre-formulated queries using a simulated work task setting. To the best of our knowledge our simulated work task experiment is the first one showing empirically that users' subjective relevance assessments of retrieved documents are affected by a change in the quality of optically read text. Users of historical newspaper collections have so far commented effects of OCR'ed data quality mainly in impressionistic ways, and controlled user environments for studying effects of OCR quality on users' relevance assessments of the retrieval results have so far been missing. To remedy this The National Library of Finland (NLF) set up an experimental query environment for the contents of one Finnish historical newspaper, Uusi Suometar 1869-1918, to be able to compare users' evaluation of search results of two different OCR qualities for digitized newspaper articles. The query interface was able to present the same underlying document for the user based on two alternatives: either based on the lower OCR quality, or based on the higher OCR quality, and the choice was randomized. The users did not know about quality differences in the article texts they evaluated. The main result of the study is that improved optical character recognition quality affects perceived usefulness of historical newspaper articles significantly. The mean average evaluation score for the improved OCR results was 7.94% higher than the mean average evaluation score of the old OCR results.Peer reviewe

    Planning non existent dictionaries

    Get PDF
    In 2013, a conference entitled Planning non-existent dictionaries was held at the University of Lisbon. Scholars and lexicographers were invited to present and submit for discussion their research and practices, focusing on aspects that are traditionally perceived as shortcomings by dictionary makers and dictionary users. This book contains a collection of papers divided in three sections. The first section is devoted to heritage dictionaries, referring to lexicographic projects that aim to register all the documented words in a language, particularly those that can be described as early linguistic evidence. The second section is devoted to dictionaries for special purposes and it gathers papers that describe innovative lexicographic projects. The last section in this volume provides an overview of contemporary e- lexicography projects.publishe

    Battle of the Brains: Election-Night Forecasting at the Dawn of the Computer Age

    Get PDF
    This dissertation examines journalists' early encounters with computers as tools for news reporting, focusing on election-night forecasting in 1952. Although election night 1952 is frequently mentioned in histories of computing and journalism as a quirky but seminal episode, it has received little scholarly attention. This dissertation asks how and why election night and the nascent field of television news became points of entry for computers in news reporting. The dissertation argues that although computers were employed as pathbreaking "electronic brains" on election night 1952, they were used in ways consistent with a long tradition of election-night reporting. As central events in American culture, election nights had long served to showcase both news reporting and new technology, whether with 19th-century devices for displaying returns to waiting crowds or with 20th-century experiments in delivering news by radio. In 1952, key players - television news broadcasters, computer manufacturers, and critics - showed varied reactions to employing computers for election coverage. But this computer use in 1952 did not represent wholesale change. While live use of the new technology was a risk taken by broadcasters and computer makers in a quest for attention, the underlying methodology of forecasting from early returns did not represent a sharp break with pre-computer approaches. And while computers were touted in advance as key features of election-night broadcasts, the "electronic brains" did not replace "human brains" as primary sources of analysis on election night in 1952. This case study chronicles the circumstances under which a new technology was employed by a relatively new form of the news media. On election night 1952, the computer was deployed not so much to revolutionize news reporting as to capture public attention. It functioned in line with existing values and practices of election-night journalism. In this important instance, therefore, the new technology's technical features were less a driving force for adoption than its usefulness as a wonder and as a symbol to enhance the prestige of its adopters. This suggests that a new technology's capacity to provide both technical and symbolic social utility can be key to its chances for adoption by the news media

    On a Mission to Scan: Visibility, Value(s), and Labor in Large-Scale Digitization

    Full text link
    As an often overlooked piece of internet infrastructure, print media digitization at scale is pervasive yet elusive; its output is widely accessible but its transformative processes are largely invisible. Easy access to scanned media objects thus obscures important questions about the work required for their creation. Through two qualitative research projects on large-scale book digitization efforts—Google Books and FamilySearch Books—this dissertation investigates the labor of digitization. Using an interdisciplinary theoretical framework from science and technology studies and infrastructure studies, the research draws on the concepts of information labor and a feminist ethics of care to center and reframe digitization work. This approach animates the institutional and cultural values, labor, and information systems through which physical materials, digital conversion processes, and human workers cohere to produce large-scale digitization. The first project reconstructs the confluence of technical and cultural values and priorities that shaped the Google Books project through an analysis of project documentation and public statements. A new term, algorithmic digitization, describes Google’s commitment not only to scale and speed but to standardization, automation, and iterative improvement of scanned images. The relative inaccessibility of Google Books— a closed system with limited available documentation—serves as both context and jumping off point for the second project, which comprises the bulk of this dissertation research. The second project is an ethnography of FamilySearch Books, a book digitization project undertaken by the genealogy organization FamilySearch (the family history wing of the Church of Jesus Christ of Latter-day Saints) and public library partners. The research layers three project perspectives: institutional participants, social and technical divisions of labor in digitization roles and tasks, and the ways that digitization workers make sense of their work. FamilySearch Books constructs scanning as “meaningful” work that “anyone” can do; in practice, this means that the particulars of how “anyone” has been constructed shape what tasks are visible as “work.” The visibility of religious service often obscures skilled work undertaken by professional librarians, even as this work is also service-oriented. This includes coordination and support work, maintenance and repair work, work to connect users to digitized output, work to manage the evolving relationship between print and digital resources, and work to care for resources, patrons, and colleagues. The findings suggest that different configurations of work in large-scale digitization shape ideas about building, maintaining, or devaluing infrastructure. Lofty rhetoric about the democratizing power of digital access to print content overshadows the contingency, fragility, or often the proprietary characteristics of the infrastructure required to create and/or maintain this access. The dissertation foregrounds the latter so as to consider implications for long-term access provision and digital knowledge infrastructure development. By illuminating the mediating role played by workers who transform information from one medium to another, this work contributes to an emerging research literature on data, digital, or Internet labor. By expanding the definition of digitization work to include more actors and integrating an ethics of care, this research informs ongoing debates over the future of both public libraries and public librarianship.PHDInformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/153473/1/mechalms_1.pd

    Film Serials and the American Cinema, 1910-1940: Operational Detection

    Get PDF
    Before the advent of television, cinema offered serialised films as a source of weekly entertainment. This book traces the history from the days of silent screen heroines to the sound era's daring adventure serials, unearthing a thriving film culture beyond the self-contained feature. Through extensive archival research, Ilka Brasch details the aesthetic appeals of film serials within their context of marketing and exhibition and that they adapt the pleasures of a flourishing crime fiction culture to both serialised visual culture and the affordances of the media-modernity of the early 20th century. The study furthermore traces how film serials brought the broadcast model of radio and television to the big screen and thereby introduced models of serial storytelling that informed popular culture even beyond the serial's demise

    A Holmes and Doyle Bibliography, Volume 9: All Formats—Combined Alphabetical Listing

    Get PDF
    This bibliography is a work in progress. It attempts to update Ronald B. De Waal’s comprehensive bibliography, The Universal Sherlock Holmes, but does not claim to be exhaustive in content. New works are continually discovered and added to this bibliography. Readers and researchers are invited to suggest additional content. This volume contains all listings in all formats, arranged alphabetically by author or main entry. In other words, it combines the listings from Volume 1 (Monograph and Serial Titles), Volume 3 (Periodical Articles), and Volume 7 (Audio/Visual Materials) into a comprehensive bibliography. (There may be additional materials included in this list, e.g. duplicate items and items not yet fully edited.) As in the other volumes, coverage of this material begins around 1994, the final year covered by De Waal's bibliography, but may not yet be totally up-to-date (given the ongoing nature of this bibliography). It is hoped that other titles will be added at a later date. At present, this bibliography includes 12,594 items
    corecore