3 research outputs found

    Studying Linguistic Changes on 200 Years of Newspapers

    Get PDF
    Large databases of scanned newspapers open new avenues for studying linguistic evolution. By studying a two-billion-word corpus corresponding to 200 years of newspapers, we compare several methods in order to assess how fast language is changing. After critically evaluating an initial set of methods for assessing textual distance between subsets corresponding to consecutive years, we introduce the notion of a lexical kernel, the set of unique words that maintain themselves over long periods of time. Focusing on linguistic stability instead of linguistic change allows building more robust measures to assess long term phenomena such as word resilience. By systematically comparing the results obtained on two subsets of the corpus corresponding to two independent newspapers, we argue that the results obtained are independent of the specificity of the chosen corpus, and are likely to be the results of more general linguistic phenomena

    Layout analysis on newspaper archives

    Get PDF
    The study of newspaper layout evolution through historical corpora has been addressed by diverse qualitative and quantitative methods in the past few years. The recent availability of large corpora of newspapers is now making the quantitative analysis of layout evolution ever more popular. This research investigates a method for the automatic detection of layout evolution on scanned images with a factorial analysis approach. The notion of eigenpages is defined by analogy with eigenfaces used in face recognition processes. The corpus of scanned newspapers that was used contains 4 million press articles, covering about 200 years of archives. This method can automatically detect layout changes of a given newspaper over time, rebuilding a part of its past publishing strategy and retracing major changes in its history in terms of layout. Besides these advantages, it also makes it possible to compare several newspapers at the same time and therefore to compare the layout changes of multiple newspapers based only on scans of their issues

    Navigating through 200 years of historical newspapers

    Get PDF
    This paper aims to describe and explain the processes behind the creation of a digital library composed of two Swiss newspapers, namely Gazette de Lausanne (1798-1998) and Journal de Genève (1826-1998), covering an almost two-century period. We developed a general purpose application giving access to this cultural heritage asset; a large variety of users (e.g. historians, journalists, linguists and the general public) can search through the content of around 4 million articles via an innovative interface. Moreover, users are offered different strategies to navigate through the collection: lexical and temporal lookup, n-gram viewer and named entities
    corecore