6 research outputs found

    Navigating through 200 years of historical newspapers

    Get PDF
    This paper aims to describe and explain the processes behind the creation of a digital library composed of two Swiss newspapers, namely Gazette de Lausanne (1798-1998) and Journal de Genève (1826-1998), covering an almost two-century period. We developed a general purpose application giving access to this cultural heritage asset; a large variety of users (e.g. historians, journalists, linguists and the general public) can search through the content of around 4 million articles via an innovative interface. Moreover, users are offered different strategies to navigate through the collection: lexical and temporal lookup, n-gram viewer and named entities

    Explorer la presse numérisée : le projet Impresso

    Get PDF
    « Impresso – Media Monitoring of the Past » est un projet de recherche interdisciplinaire dans lequel une équipe d’historiens, de linguistes informaticiens et de designers collabore à la mise en données d’un corpus d’archives de presse numérisées. Les principaux objectifs du projet sont d’améliorer les outils d’extraction d’information pour les textes historiques, d’indexer sémantiquement des journaux historiques, et d’intégrer les enrichissements obtenus dans les pratiques de recherche des historiens au moyen d’une interface nouvellement développée

    Processamento e Navegação por Tópicos em Imagens de Páginas de Jornais Históricos

    Get PDF
    ABSTRACTThis paper presents the architecture and operation of a HistoricalNewspaper Page Image Topic Navigation System designed tofacilitate the access and use of social and historical research tothe historical newspaper collection. The system consists of fourmodules which are: Text Subimage Segmentation, Text Extractionand Preprocessing, Topic Network Extraction, and Document Viewingand Retrieval Interface. The algorithmic and technological approachesof each module are described and the initial test resultsare presented

    Named Entity Recognition for early-modern textual sources: a review of capabilities and challenges with strategies for the future

    Get PDF
    Purpose: By mapping-out the capabilities, challenges and limitations of named-entity recognition (NER), this article aims to synthesise the state of the art of NER in the context of the early modern research field and to inform discussions about the kind of resources, methods and directions that may be pursued to enrich the application of the technique going forward. // Design/methodology/approach: Through an extensive literature review, this article maps out the current capabilities, challenges and limitations of NER and establishes the state of the art of the technique in the context of the early modern, digitally augmented research field. It also presents a new case study of NER research undertaken by Enlightenment Architectures: Sir Hans Sloane's Catalogues of his Collections (2016–2021), a Leverhulme funded research project and collaboration between the British Museum and University College London, with contributing expertise from the British Library and the Natural History Museum. // Findings: Currently, it is not possible to benchmark the capabilities of NER as applied to documents of the early modern period. The authors also draw attention to the situated nature of authority files, and current conceptualisations of NER, leading them to the conclusion that more robust reporting and critical analysis of NER approaches and findings is required. // Research limitations/implications: This article examines NER as applied to early modern textual sources, which are mostly studied by Humanists. As addressed in this article, detailed reporting of NER processes and outcomes is not necessarily valued by the disciplines of the Humanities, with the result that it can be difficult to locate relevant data and metrics in project outputs. The authors have tried to mitigate this by contacting projects discussed in this paper directly, to further verify the details they report here. // Practical implications: The authors suggest that a forum is needed where tools are evaluated according to community standards. Within the wider NER community, the MUC and ConLL corpora are used for such experimental set-ups and are accompanied by a conference series, and may be seen as a useful model for this. The ultimate nature of such a forum must be discussed with the whole research community of the early modern domain. // Social implications: NER is an algorithmic intervention that transforms data according to certain rules-, patterns- or training data and ultimately affects how the authors interpret the results. The creation, use and promotion of algorithmic technologies like NER is not a neutral process, and neither is their output A more critical understanding of the role and impact of NER on early modern documents and research and focalization of some of the data- and human-centric aspects of NER routines that are currently overlooked are called for in this paper. // Originality/value: This article presents a state of the art snapshot of NER, its applications and potential, in the context of early modern research. It also seeks to inform discussions about the kinds of resources, methods and directions that may be pursued to enrich the application of NER going forward. It draws attention to the situated nature of authority files, and current conceptualisations of NER, and concludes that more robust reporting of NER approaches and findings are urgently required. The Appendix sets out a comprehensive summary of digital tools and resources surveyed in this article

    The International Image Interoperability Framework (IIIF): raising awareness of the user benefits for scholarly editions

    Get PDF
    The International Image Interoperability Framework (IIIF), an initiative born in 2011, defines a set of common application programming interfaces (APIs) to retrieve, display, manipulate, compare, and annotate digitised and born-digital images. Upon implementation, these technical specifications have offered institutions and end users alike new possibilities. In Switzerland, only a handful of organizations and projects have collaborated with the IIIF community. For instance, e-codices, the Virtual Manuscript Library, implemented in December 2014 the two core IIIF APIs (Image API and Presentation API). Since then, no other Swiss collection has fully complied with the IIIF specifications to make true interoperability possible. The NIE-INE project, overseen by the University of Basel and funded by Swissuniversities, has aimed to build a national platform for scientific editions. There is a shared rationale between NIE-INE and IIIF who both advocate flexible and consistent technical architecture as well as providing high-quality user experience (UX) in their content delivery. Remote and in-person usability tests were conducted on the Universal Viewer (UV) and Mirador, two IIIF-compliant image viewers deployed by many IIIF implementers, in order to assess their satisfaction and efficiency as well as their perceived usability. NIE-INE was the target audience of the usability testing with a view to evaluating how scholarly research and the wider scientific community could benefit from leveraging IIIF-compliant technology. To conclude this bachelor’s thesis, a set of recommendations, based on the usability testing results and throughout this assignment, was drawn for the developing teams of both viewers, the IIIF community and the NIE-INE team members

    Navigating through 200 Years of Historical Newspapers: Paper - iPRES 2016 - Swiss National Library, Bern

    No full text
    This paper describes the processes which led to the creation of an innovative interface to access a digital archive composed of two Swiss newspapers, namely Gazette de Lausanne (1798–1998) and Journal de Gen`eve (1826–1998). Based on several textual processing steps, including lexical indexation, n-grams computation and named entity recognition, a general purpose web-based application was designed and implemented ; it allows a large variety of users (e.g. historians, journalists, linguists and the general public) to explore different facets of about 4 million press articles spanning an almost 200 hundred years period
    corecore