32,381 research outputs found
Layout analysis on newspaper archives
The study of newspaper layout evolution through historical corpora has been addressed by diverse qualitative and quantitative methods in the past few years. The recent availability of large corpora of newspapers is now making the quantitative analysis of layout evolution ever more popular. This research investigates a method for the automatic detection of layout evolution on scanned images with a factorial analysis approach. The notion of eigenpages is defined by analogy with eigenfaces used in face recognition processes. The corpus of scanned newspapers that was used contains 4 million press articles, covering about 200 years of archives. This method can automatically detect layout changes of a given newspaper over time, rebuilding a part of its past publishing strategy and retracing major changes in its history in terms of layout. Besides these advantages, it also makes it possible to compare several newspapers at the same time and therefore to compare the layout changes of multiple newspapers based only on scans of their issues
Logical segmentation for article extraction in digitized old newspapers
Newspapers are documents made of news item and informative articles. They are
not meant to be red iteratively: the reader can pick his items in any order he
fancies. Ignoring this structural property, most digitized newspaper archives
only offer access by issue or at best by page to their content. We have built a
digitization workflow that automatically extracts newspaper articles from
images, which allows indexing and retrieval of information at the article
level. Our back-end system extracts the logical structure of the page to
produce the informative units: the articles. Each image is labelled at the
pixel level, through a machine learning based method, then the page logical
structure is constructed up from there by the detection of structuring entities
such as horizontal and vertical separators, titles and text lines. This logical
structure is stored in a METS wrapper associated to the ALTO file produced by
the system including the OCRed text. Our front-end system provides a web high
definition visualisation of images, textual indexing and retrieval facilities,
searching and reading at the article level. Articles transcriptions can be
collaboratively corrected, which as a consequence allows for better indexing.
We are currently testing our system on the archives of the Journal de Rouen,
one of France eldest local newspaper. These 250 years of publication amount to
300 000 pages of very variable image quality and layout complexity. Test year
1808 can be consulted at plair.univ-rouen.fr.Comment: ACM Document Engineering, France (2012
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
The massive amounts of digitized historical documents acquired over the last
decades naturally lend themselves to automatic processing and exploration.
Research work seeking to automatically process facsimiles and extract
information thereby are multiplying with, as a first essential step, document
layout analysis. If the identification and categorization of segments of
interest in document images have seen significant progress over the last years
thanks to deep learning techniques, many challenges remain with, among others,
the use of finer-grained segmentation typologies and the consideration of
complex, heterogeneous documents such as historical newspapers. Besides, most
approaches consider visual features only, ignoring textual signal. In this
context, we introduce a multimodal approach for the semantic segmentation of
historical newspapers that combines visual and textual features. Based on a
series of experiments on diachronic Swiss and Luxembourgish newspapers, we
investigate, among others, the predictive power of visual and textual features
and their capacity to generalize across time and sources. Results show
consistent improvement of multimodal models in comparison to a strong visual
baseline, as well as better robustness to high material variance
Exploring the information behaviour of users of Welsh Newspapers Online through web log analysis
Purpose â Webometric techniques have been applied to many websites and online resources,
especially since the launch of Google Analytics (GA). To date, though, there has been little
consideration of information behaviour in relation to digitised newspaper collections. The purpose of
this paper is to address a perceived gap in the literature by providing an account of user behaviour in
the newly launched Welsh Newspapers Online (WNO).
Design/methodology/approach â The author collected webometric data for WNO using GA and
web server content logs. These were analysed to identify patterns of engagement and user behaviour,
which were then considered in relation to existing information behaviour.
Findings â Use of WNO, while reminiscent of archival information seeking, can be understood as
centring on the web interface rather than the digitised material. In comparison to general web browsing,
users are much more deeply engaged with the resource. This engagement incorporates reading online,
but usersâ information seeking utilises website search and browsing functionality rather than filtering in
newspaper material. Information seeking in digitised newspapers resembles the model of the âuserâ more
closely than that of the âreaderâ, a value-laden distinction which needs further unpacking.
Research limitations/implications â While the behaviour discussed in this paper is likely to be
more widely representative, a larger longitudinal data set would increase the studyâs significance.
Additionally, the methodology of this paper can only tell us what users are doing, and further research
is needed to identify the drivers for this behaviour.
Originality/value â This study provides important insights into the underinvestigated area of
digitised newspaper collections, and shows the importance of webometric methods in analysing online
user behaviour
Marxism\u27s âCommunicative Crisisâ? Mapping Debates over Leninist Print-Media Practices in the 20th Century
Despite the scholarly neglect of Marxismâs âcommunicative crisisâ, it was a topic of concern that was addressed, debated and negotiated over by party leaders, intellectuals and activists on a continuous basis throughout the 20th century. These concerns revolved around three areas: first, the primary means of print communication, the party paper; second, the specialization of production, particularly around the role of writers and journalists; and third, the search for a popular rhetoric and writing style, which would appeal to the general public. This paper maps out the âcommunicative crisisâ of Marxism in the 20th century through an examination of key intersections of disputes over the correct approach to its practices of print communication, as a starting point for an historical analysis of the failures and successes of Marxist political praxis
Freedom of Speech and the Press in the Information Age
On June 26 -- 27, 2008, more than 130 social studies teachers from across the United States, its territories, Cuba and even Iraq gathered at Georgetown University in Washington, D.C., for the James Madison Symposium conducted in partnership with the McCormick Freedom Museum. The symposium was titled Freedom of Speech and Press in the Information Age and explored four related topics under this thematic umbrella including free speech on the Internet and blogs, as well as in the traditional press; the Fairness Doctrine; press coverage during wartime; and the free speech implications of campaign finance reform.The two-day conference was organized around four separate panels based on the aforementioned subjects, and also included an evening banquet with a keynote address by C-SPAN President and CEO Brian Lamb, as well as a morning working session on lesson plans to address the four central topics.This report presents a summary of these deliberations in chapter form, with each chapter followed by a lesson plan rooted in the conference proceedings. The hope is that the summaries of the panel discussions help to contextualize the topics addressed and provide solid leads for further examination of these issues. They frame the embedded lesson plans, each designed for use in social studies classes at the secondary level
- âŠ