24,379 research outputs found

    Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping

    Full text link
    We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora. Whereas prior work has addressed this problem primarily with supervised machine learning, our approach follows a fully unsupervised bootstrapping paradigm. It leverages the redundancy present in large news corpora, more precisely, the fact that the same quotation often appears across multiple news articles in slightly different contexts. Starting from a few seed patterns, such as ["Q", said S.], our method extracts a set of quotation-speaker pairs (Q, S), which are in turn used for discovering new patterns expressing the same quotations; the process is then repeated with the larger pattern set. Our algorithm is highly scalable, which we demonstrate by running it on the large ICWSM 2011 Spinn3r corpus. Validating our results against a crowdsourced ground truth, we obtain 90% precision at 40% recall using a single seed pattern, with significantly higher recall values for more frequently reported (and thus likely more interesting) quotations. Finally, we showcase the usefulness of our algorithm's output for computational social science by analyzing the sentiment expressed in our extracted quotations.Comment: Accepted at the 12th International Conference on Web and Social Media (ICWSM), 201

    KACST Arabic Text Classification Project: Overview and Preliminary Results

    No full text
    Electronically formatted Arabic free-texts can be found in abundance these days on the World Wide Web, often linked to commercial enterprises and/or government organizations. Vast tracts of knowledge and relations lie hidden within these texts, knowledge that can be exploited once the correct intelligent tools have been identified and applied. For example, text mining may help with text classification and categorization. Text classification aims to automatically assign text to a predefined category based on identifiable linguistic features. Such a process has different useful applications including, but not restricted to, E-Mail spam detection, web pages content filtering, and automatic message routing. In this paper an overview of King Abdulaziz City for Science and Technology (KACST) Arabic Text Classification Project will be illustrated along with some preliminary results. This project will contribute to the better understanding and elaboration of Arabic text classification techniques

    Evaluation in media texts: a cross-cultural linguistic investigation

    Get PDF
    A quantitative/interpretative approach to the comparative linguistic analysis of media texts is proposed and applied to a contrastive analysis of texts from the English-language China Daily and the UK Times to look for evidence of differences in what Labov calls “evaluation.” These differences are then correlated to differences in the roles played by the media in Britain and China in their respective societies. The aim is to demonstrate that, despite reservations related to the Chinese texts not being written in the journalists' native language, a direct linguistic comparison of British media texts with Chinese media texts written in English can yield valuable insights into the workings of the Chinese media that supplement nonlinguistic studies

    Econometrics meets sentiment : an overview of methodology and applications

    Get PDF
    The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software

    The role of metaphor in shaping the identity and agenda of the United Nations: the imagining of an international community and international threat

    Get PDF
    This article examines the representation of the United Nations in speeches delivered by its Secretary General. It focuses on the role of metaphor in constructing a common ‘imagining’ of international diplomacy and legitimising an international organisational identity. The SG legitimises the organisation, in part, through the delegitimisation of agents/actions/events constructed as threatening to the international community and to the well-being of mankind. It is a desire to combat the forces of menace or evil which are argued to motivate and determine the organisational agenda. This is predicated upon an international ideology of humanity in which difference is silenced and ‘working towards the common good’ is emphasised. This is exploited to rouse emotions and legitimise institutional power. Polarisation and antithesis are achieved through the employment of metaphors designed to enhance positive and negative evaluations. The article further points to the constitutive, persuasive and edifying power of topic and situationally-motivated metaphors in speech-making

    Ideology of objectivity in political journalism. Attitudes, values and beliefs around truth as a possible horizon?

    Get PDF
    Desde un enfoque crĂ­tico-discursivo se analizan contenidos automĂĄticos y reflexivos en torno a la “objetividad”, como cĂłdigo estilĂ­stico-normativo y dispositivo cultural de contornos mĂ­ticos, compartido por periodistas y audiencias de la informaciĂłn polĂ­tica. Con base en entrevistas realizadas bajo un enfoque etnogrĂĄfico entre 2012 y 2014, a profesionales de diferentes medios masivos de CĂłrdoba-Argentina, primero se discuten la auto-percepciĂłn de su rol contemporĂĄneo y las condiciones de su vĂ­nculo cotidiano con fuentes y acontecimientos. Dado el carĂĄcter inter-subjetivo del fenĂłmeno, en un segundo momento se incluye el contraste entre las perspectivas periodĂ­sticas y las percepciones de audiencias locales, recopiladas en sesiones experimentales simultĂĄneas. Mediante una estrategia de triangulaciĂłn analĂ­tica, se advierte un significativo vĂ­nculo de circularidad entre definiciones profesionales y expectativas de consumo.From a critical-discursive approach, automatic and reflexive contents are analyzed around "objectivity", as a stylistic-normative code and cultural device with mythical contours, shared by journalists and audience of political information. Based on interviews to professionals from different mass media in CĂłrdoba-Argentina (conducted under an ethnographic approach between 2012 and 2014), firstly the self-perception of their contemporary role and the conditions of their daily link with sources and events are discussed. Given the inter-subjective nature of the phenomenon, in a second moment the contrast between the journalistic perspectives and the perceptions of local audiences, gathered in simultaneous experimental sessions, is included. Through an analytical triangulation strategy, a significant circularity link between professional definitions and consumption expectations is noticed.From a critical-discursive approach, automatic and reflexive contents are analyzed around “objectivity”, as a stylistic-normative code and cultural device with mythical contours, shared by journalists and audience of political information. Based on interviews to professionals from different mass media in CĂłrdoba-Argentina — conducted under an ethnographic approach between 2012 and 2014 —, firstly the self-perception of their contemporary role and the conditions of their daily link with sources and events are discussed. Given the inter-subjective nature of the phenomenon, in a second moment the contrast between the journalistic perspectives and the perceptions of local audiences, gathered in simultaneous experimental sessions, is included. Through an analytical triangulation strategy, a significant circularity link between professional definitions and consumption expectations is noticed.Fil: Paz Garcia, Ana Pamela. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - CĂłrdoba; Argentina. Instituto de Investigaciones PsicolĂłgicas (IIPsi), CONICET - Facultad de PsicologĂ­a, Universidad Nacional de CĂłrdoba; Argentin
    • 

    corecore