12,120 research outputs found

    Article Segmentation in Digitised Newspapers

    Get PDF
    Digitisation projects preserve and make available vast quantities of historical text. Among these, newspapers are an invaluable resource for the study of human culture and history. Article segmentation identifies each region in a digitised newspaper page that contains an article. Digital humanities, information retrieval (IR), and natural language processing (NLP) applications over digitised archives improve access to text and allow automatic information extraction. The lack of article segmentation impedes these applications. We contribute a thorough review of the existing approaches to article segmentation. Our analysis reveals divergent interpretations of the task, and inconsistent and often ambiguously defined evaluation metrics, making comparisons between systems challenging. We solve these issues by contributing a detailed task definition that examines the nuances and intricacies of article segmentation that are not immediately apparent. We provide practical guidelines on handling borderline cases and devise a new evaluation framework that allows insightful comparison of existing and future approaches. Our review also reveals that the lack of large datasets hinders meaningful evaluation and limits machine learning approaches. We solve these problems by contributing a distant supervision method for generating large datasets for article segmentation. We manually annotate a portion of our dataset and show that our method produces article segmentations over characters nearly as well as costly human annotators. We reimplement the seminal textual approach to article segmentation (Aiello and Pegoretti, 2006) and show that it does not generalise well when evaluated on a large dataset. We contribute a framework for textual article segmentation that divides the task into two distinct phases: block representation and clustering. We propose several techniques for block representation and contribute a novel highly-compressed semantic representation called similarity embeddings. We evaluate and compare different clustering techniques, and innovatively apply label propagation (Zhu and Ghahramani, 2002) to spread headline labels to similar blocks. Our similarity embeddings and label propagation approach substantially outperforms Aiello and Pegoretti but still falls short of human performance. Exploring visual approaches to article segmentation, we reimplement and analyse the state-of-the-art Bansal et al. (2014) approach. We contribute an innovative 2D Markov model approach that captures reading order dependencies and reduces the structured labelling problem to a Markov chain that we decode with Viterbi (1967). Our approach substantially outperforms Bansal et al., achieves accuracy as good as human annotators, and establishes a new state of the art in article segmentation. Our task definition, evaluation framework, and distant supervision dataset will encourage progress in the task of article segmentation. Our state-of-the-art textual and visual approaches will allow sophisticated IR and NLP applications over digitised newspaper archives, supporting research in the digital humanities

    Multimodal discourse on online newspaper home pages: A social-semiotic perspective

    Get PDF
    In a short space of time, online newspapers have emerged to play an important role in the institutional construction of ‘news’ and the mass mediation of information. The home pages of online newspapers feature short verbal texts, and communicate using language, image, layout, colour, and other semiotic resources: they communicate multimodally. This thesis examines the multimodal discourse of three English-language online newspapers: the Bangkok Post (Thailand), the English-language edition (translated from Chinese) of the People’s Daily (China), and the Sydney Morning Herald (Australia). Between February, 2002 and April, 2006, three data collections were made (February-April, 2002; September-November, 2005; January-April, 2006) using a five-day ‘constructed week’ method. The main corpus was 15 home pages from each newspaper (five per collection per newspaper), but the total corpus (including other pages from each newspaper) was 603 web pages. Two senior editors (one each from the Bangkok Post and the Sydney Morning Herald) were interviewed. The multimodal discourse of the home pages was analysed using tools from Systemic Functional Multimodal Discourse Analysis (SF-MDA), and a ‘visual grammar’ of home pages building on the work of Kress & van Leeuwen (1996) was developed. In addition, a rank scale for online newspapers was proposed, and limitations of applying the tool of rank scale to this corpus were identified. An emerging genre - the headline-plus-lead-plus-hyperlink newsbite - was identified, and the design of newsbites on the home page of the Sydney Morning Herald and the evolution of their design over time was analysed. The use of images on the home pages in the corpus was analysed, and the increasing use of thumbnail images in the Sydney Morning Herald - particularly close-up thumbnails of faces - was investigated in further depth. The visual design of online newspaper home pages and the news texts appearing on them are an evolution of print news genres and their design practices. Newsbites and headline-only newsbits are verbally short, so the authors of newspaper home pages are forced to rely increasingly on visual communication in order to position stories and readers, and to communicate the values of the news institution on the home page as mediated by the screen. Thumbnail images are evolving as a new form of punctuation on some home pages, and this may be a short-lived, or an emerging historical trend in the development of punctuation, at least in online environments. Overall, online newspaper home pages are tending towards shorter texts, which communicate in novel ways. These short texts cannot communicate the values and ideology of news institutions in the way that extended verbal texts have done for centuries, yet this function of news texts remains important to the construction and maintenance of a readership, and therefore crucial to the home page of a newspaper. As a result, news institutions express values visually in their design of newspaper home pages. As readers become familiar with the meanings of online news design, they become adept at reading and understanding short stories within these multimodally-construed frames of reference. Ideology is increasingly fragmented on shorter timescales, but expressed over longer timescales in a hypermedia environment that affords and extends many of the pre-existing multimodal features of print newspaper discourse

    Optimising Visual Layout for Training and Learning Technologies

    Get PDF
    The layout and arraying of information in electronic aids used for training can affect viewer comprehension and impressions. This paper explains existing layout guidance, and defines an integrated design model for applying these recommendations. To test the efficacy of this model, two similar presentations were created, which contained the same content. However, one of these presentations applied the integrated design model to shape the positioning of the visual content, and a variant was developed that flipped the layout, so it did not conform to this design approach. The experimental results demonstrated that developing layouts that bias the important visual material to the top and left positively influenced viewer impressions. These results will have design implications for predominantly text-based material (e.g. presentations, web-pages, e-learning systems); particularly when the content is being delivered to people who typically read from left to right and top to bottom

    Attitudes towards Italian wine of practitioners in the Chinese distribution

    Get PDF
    China’s economy has grown at an impressive rate after the integration into the global trading system (WTO) in 2001, a major turning point in the Chinese economic history. The opening policy has increased business opportunities for both local and foreign operators; however, in spite of the great appeal of such cooperation, many obstacles yet exist: language, culture, education, business practices, and industrial development. Food products supply and access to the market are mastered by a relatively small group of businessmen: international buyers, purchasing agents, retailers and representatives of large-scale distribution chains. The perception they have of a potential source country is a key factor for a successful market approach. The present study aims at understanding the attitudes of distribution practitioners in the Chinese market towards imported Italian quality wine, as well as the current communication, marketing, strategic and organizational advantages or deficiencies of Italian producers, compared to other European counterparts. The primary data were collected through personal interviews with key informants in Shanghai, Beijing and Guangzhou. Such information has been completed with an analysis of the existing literature, meetings with sector operators as well as with talks and presentations of experts attending the “International Workshop on Chinese Wine Market”, held in Beijing on August 8-10, 2007. The interviews have been administered as conversation-like dialogues, on the base of a semi-structured interview outline, providing also the framework for a qualitative content analysis. This paper is aimed at giving an insight on import and distribution of Italian wine in China, highlighting both positive and negative feedbacks on the effectiveness of marketing strategies of Italian wine trading companies.wine, international trade, distribution, China, "Made in Italy"

    Cultural Sensitivity in Visual Communication

    Get PDF
    Not Include
    • 

    corecore