32,381 research outputs found

    Layout analysis on newspaper archives

    Get PDF
    The study of newspaper layout evolution through historical corpora has been addressed by diverse qualitative and quantitative methods in the past few years. The recent availability of large corpora of newspapers is now making the quantitative analysis of layout evolution ever more popular. This research investigates a method for the automatic detection of layout evolution on scanned images with a factorial analysis approach. The notion of eigenpages is defined by analogy with eigenfaces used in face recognition processes. The corpus of scanned newspapers that was used contains 4 million press articles, covering about 200 years of archives. This method can automatically detect layout changes of a given newspaper over time, rebuilding a part of its past publishing strategy and retracing major changes in its history in terms of layout. Besides these advantages, it also makes it possible to compare several newspapers at the same time and therefore to compare the layout changes of multiple newspapers based only on scans of their issues

    Logical segmentation for article extraction in digitized old newspapers

    Full text link
    Newspapers are documents made of news item and informative articles. They are not meant to be red iteratively: the reader can pick his items in any order he fancies. Ignoring this structural property, most digitized newspaper archives only offer access by issue or at best by page to their content. We have built a digitization workflow that automatically extracts newspaper articles from images, which allows indexing and retrieval of information at the article level. Our back-end system extracts the logical structure of the page to produce the informative units: the articles. Each image is labelled at the pixel level, through a machine learning based method, then the page logical structure is constructed up from there by the detection of structuring entities such as horizontal and vertical separators, titles and text lines. This logical structure is stored in a METS wrapper associated to the ALTO file produced by the system including the OCRed text. Our front-end system provides a web high definition visualisation of images, textual indexing and retrieval facilities, searching and reading at the article level. Articles transcriptions can be collaboratively corrected, which as a consequence allows for better indexing. We are currently testing our system on the archives of the Journal de Rouen, one of France eldest local newspaper. These 250 years of publication amount to 300 000 pages of very variable image quality and layout complexity. Test year 1808 can be consulted at plair.univ-rouen.fr.Comment: ACM Document Engineering, France (2012

    Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

    Full text link
    The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance

    Exploring the information behaviour of users of Welsh Newspapers Online through web log analysis

    Get PDF
    Purpose – Webometric techniques have been applied to many websites and online resources, especially since the launch of Google Analytics (GA). To date, though, there has been little consideration of information behaviour in relation to digitised newspaper collections. The purpose of this paper is to address a perceived gap in the literature by providing an account of user behaviour in the newly launched Welsh Newspapers Online (WNO). Design/methodology/approach – The author collected webometric data for WNO using GA and web server content logs. These were analysed to identify patterns of engagement and user behaviour, which were then considered in relation to existing information behaviour. Findings – Use of WNO, while reminiscent of archival information seeking, can be understood as centring on the web interface rather than the digitised material. In comparison to general web browsing, users are much more deeply engaged with the resource. This engagement incorporates reading online, but users’ information seeking utilises website search and browsing functionality rather than filtering in newspaper material. Information seeking in digitised newspapers resembles the model of the “user” more closely than that of the “reader”, a value-laden distinction which needs further unpacking. Research limitations/implications – While the behaviour discussed in this paper is likely to be more widely representative, a larger longitudinal data set would increase the study’s significance. Additionally, the methodology of this paper can only tell us what users are doing, and further research is needed to identify the drivers for this behaviour. Originality/value – This study provides important insights into the underinvestigated area of digitised newspaper collections, and shows the importance of webometric methods in analysing online user behaviour

    Marxism\u27s ‘Communicative Crisis’? Mapping Debates over Leninist Print-Media Practices in the 20th Century

    Get PDF
    Despite the scholarly neglect of Marxism’s ‘communicative crisis’, it was a topic of concern that was addressed, debated and negotiated over by party leaders, intellectuals and activists on a continuous basis throughout the 20th century. These concerns revolved around three areas: first, the primary means of print communication, the party paper; second, the specialization of production, particularly around the role of writers and journalists; and third, the search for a popular rhetoric and writing style, which would appeal to the general public. This paper maps out the ‘communicative crisis’ of Marxism in the 20th century through an examination of key intersections of disputes over the correct approach to its practices of print communication, as a starting point for an historical analysis of the failures and successes of Marxist political praxis

    Doing and Making: History as Digital Practice

    Get PDF

    Freedom of Speech and the Press in the Information Age

    Get PDF
    On June 26 -- 27, 2008, more than 130 social studies teachers from across the United States, its territories, Cuba and even Iraq gathered at Georgetown University in Washington, D.C., for the James Madison Symposium conducted in partnership with the McCormick Freedom Museum. The symposium was titled Freedom of Speech and Press in the Information Age and explored four related topics under this thematic umbrella including free speech on the Internet and blogs, as well as in the traditional press; the Fairness Doctrine; press coverage during wartime; and the free speech implications of campaign finance reform.The two-day conference was organized around four separate panels based on the aforementioned subjects, and also included an evening banquet with a keynote address by C-SPAN President and CEO Brian Lamb, as well as a morning working session on lesson plans to address the four central topics.This report presents a summary of these deliberations in chapter form, with each chapter followed by a lesson plan rooted in the conference proceedings. The hope is that the summaries of the panel discussions help to contextualize the topics addressed and provide solid leads for further examination of these issues. They frame the embedded lesson plans, each designed for use in social studies classes at the secondary level
    • 

    corecore