21,024 research outputs found

    Predicting Text Quality: Metrics for Content, Organization and Reader Interest

    Get PDF
    When people read articles---news, fiction or technical---most of the time if not always, they form perceptions about its quality. Some articles are well-written and others are poorly written. This thesis explores if such judgements can be automated so that they can be incorporated into applications such as information retrieval and automatic summarization. Text quality does not involve a single aspect but is a combination of numerous and diverse criteria including spelling, grammar, organization, informative nature, creative and beautiful language use, and page layout. In the education domain, comprehensive lists of such properties are outlined in the rubrics used for assessing writing. But computational methods for text quality have addressed only a handful of these aspects, mainly related to spelling, grammar and organization. In addition, some text quality aspects could be more relevant for one genre versus another. But previous work have placed little focus on specialized metrics based on the genre of texts. This thesis proposes new insights and techniques to address the above issues. We introduce metrics that score varied dimensions of quality such as content, organization and reader interest. For content, we present two measures: specificity and verbosity level. Specificity measures the amount of detail present in a text while verbosity captures which details are essential to include. We measure organization quality by quantifying the regularity of the intentional structure in the article and also using the specificity levels of adjacent sentences in the text. Our reader interest metrics aim to identify engaging and interesting articles. The development of these measures is backed by the use of articles from three different genres: academic writing, science journalism and automatically generated summaries. Proper presentation of content is critical during summarization because summaries have a word limit. Our specificity and verbosity metrics are developed with this genre as the focus. The argumentation structure of academic writing lends support to the idea of using intentional structure to model organization quality. Science journalism articles convey research findings in an engaging manner and are ideally suited for the development and evaluation of measures related to reader interest

    Video summarisation: A conceptual framework and survey of the state of the art

    Get PDF
    This is the post-print (final draft post-refereeing) version of the article. Copyright @ 2007 Elsevier Inc.Video summaries provide condensed and succinct representations of the content of a video stream through a combination of still images, video segments, graphical representations and textual descriptors. This paper presents a conceptual framework for video summarisation derived from the research literature and used as a means for surveying the research literature. The framework distinguishes between video summarisation techniques (the methods used to process content from a source video stream to achieve a summarisation of that stream) and video summaries (outputs of video summarisation techniques). Video summarisation techniques are considered within three broad categories: internal (analyse information sourced directly from the video stream), external (analyse information not sourced directly from the video stream) and hybrid (analyse a combination of internal and external information). Video summaries are considered as a function of the type of content they are derived from (object, event, perception or feature based) and the functionality offered to the user for their consumption (interactive or static, personalised or generic). It is argued that video summarisation would benefit from greater incorporation of external information, particularly user based information that is unobtrusively sourced, in order to overcome longstanding challenges such as the semantic gap and providing video summaries that have greater relevance to individual users

    A corpus of science journalism for analyzing writing quality

    Get PDF
    We introduce a corpus of science journalism articles, categorized in three levels of writing quality. The corpus fulï¬lls a glaring need for realistic data on which applications concerned with predicting text quality can be developed and evaluated. In this article we describe how we identiï¬ed, guided by the judgements of renowned writers, samples of extraordinarily well-written pieces and how these were expanded to a larger set of typical journalistic writing. We provide details about the corpus and the text quality evaluations it can support. Our intention is to further extend the corpus with annotations of phenomena that reveal quantiï¬able differences between levels of writing quality. Here we introduce two of the many types of annotation on the sentence level that distinguish amazing from typical writing: text generality/speciï¬city and communicative goal. We explore the feasibility of acquiring annotations automatically, and verify that such features are indeed predictive of writing quality. We ï¬nd that the annotation of general/speciï¬c on sentence level can be performed reasonably accurately fully automatically, while automatic annotations of communicative goal reveals salient characteristics of journalistic writing but does not align with categories we wish to annotate in future work
    corecore