167 research outputs found

    Distinguishing between factual information and insulting or abusive messages bearing words or phrases in news articles

    Get PDF
    This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2006.Cataloged from PDF version of thesis report.Includes bibliographical references (page 75).Since Internet has become the leading source of information for the users, flames or abusive messages have also become the prominent factors of time wasting for retrieving information. Moreover, a text can contain factual information as well as abusive or insulting contents. This paper describes a new approach for an automated system to distinguish between information and personal attack containing insulting or abusive messages in a given document. In NLP, flames or abusive messages are considered as extreme subjective language, which refers to detect personal opinions or emotions in a news article. Insulting or abusive messages are viewed as extreme subset of the subjective language because of its extreme nature. We defined some rules to extract the semantic information of a given sentence from the general semantic structure of that sentence.Altaf MahmudKazi Zubair AhmedB. Computer Science and Engineerin

    Recognizing and organizing opinions expressed in the world press

    Get PDF
    Journal ArticleTomorrow's question answering systems will need to have the ability to process information about beliefs, opinions, and evaluations-the perspective of an agent. Answers to many simple factual questions-even yes/no questions-are affected by the perspective of the information source. For example, a questioner asking question (1) might be interested to know that, in general, sources in European and North American governments tend to answer "no" to question (1), while sources in African governments tend to answer "yes:

    Detecting Misleading Headlines Through the Automatic Recognition of Contradiction in Spanish

    Get PDF
    Misleading headlines are part of the disinformation problem. Headlines should give a concise summary of the news story helping the reader to decide whether to read the body text of the article, which is why headline accuracy is a crucial element of a news story. This work focuses on detecting misleading headlines through the automatic identification of contradiction between the headline and body text of a news item. When the contradiction is detected, the reader is alerted to the lack of precision or trustworthiness of the headline in relation to the body text. To facilitate the automatic detection of misleading headlines, a new Spanish dataset is created (ES_Headline_Contradiction) for the purpose of identifying contradictory information between a headline and its body text. This dataset annotates the semantic relationship between headlines and body text by categorising the relation between texts as compatible , contradictory and unrelated . Furthermore, another novel aspect of this dataset is that it distinguishes between different types of contradictions, thereby enabling a more fine-grain identification of them. The dataset was built via a novel semi-automatic methodology, which resulted in a more cost-efficient development process. The results of the experiments show that pre-trained language models can be fine-tuned with this dataset, producing very encouraging results for detecting incongruency or non-relation between headline and body text.This research work is funded by MCIN/AEI/ 10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” or by the “European Union NextGenerationEU/PRTR” through the project TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP (PID2021-122263OB-C22) and the project SOCIALTRUST: Assessing trustworthiness in digital media (PDC2022-133146-C22). Also funded by Generalitat Valenciana through the project NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation (CIPROM/2021/21), and the grant ACIF/2020/177

    A CORPUS-BASED STUDY ON ‘REGRET’ AS A FACTIVE VERB AND ITS COMPLEMENTS

    Get PDF
    Factive verbs are known to presuppose the truth of their complements, and regret is an example of factive verbs. This corpus-based study aims to investigate the complement, use, and frequency of occurrence of regret in COCA, and shed light on various uses of this verb. The scope of this study was limited to the bare form of regret, and most common complements of regret were analyzed throughout the study. The findings revealed that all the complements were presupposed to be true due to the factive verb regret, and regret has various complements such as regret + Ving, regret + to V1, regret + that. What is more, all these complements assign different meanings to regret in context.  Article visualizations

    Towards Computing Inferences from English News Headlines

    Full text link
    Newspapers are a popular form of written discourse, read by many people, thanks to the novelty of the information provided by the news content in it. A headline is the most widely read part of any newspaper due to its appearance in a bigger font and sometimes in colour print. In this paper, we suggest and implement a method for computing inferences from English news headlines, excluding the information from the context in which the headlines appear. This method attempts to generate the possible assumptions a reader formulates in mind upon reading a fresh headline. The generated inferences could be useful for assessing the impact of the news headline on readers including children. The understandability of the current state of social affairs depends greatly on the assimilation of the headlines. As the inferences that are independent of the context depend mainly on the syntax of the headline, dependency trees of headlines are used in this approach, to find the syntactical structure of the headlines and to compute inferences out of them.Comment: PACLING 2019 Long paper, 15 page

    Automatic Detection of Modality with ITGETARUNS

    Get PDF
    In this paper we present a system for modality detection which is then used for Subjectivity and Factuality evaluation. The system has been tested lately on a task for Subjectivity and Irony detection in Italian tweets , where the performance was 10th and 4th, respectively, over 27 participants overall. We will focus our paper on an internal evaluation where we considered three national newspapers Il Corriere, Repubblica, Libero. This task was prompted by a project on the evaluation of press stylistic features in political discourse. The project used newspaper articles from the same sources over a period of three months, thus including latest political 2013 governmental crisis. We intended to produce a similar experiment and evaluate results in comparison with previous 2011 crisis. In this evaluation, we focused on Subjectivity, Polarity and Factuality which include Modality evaluation. Final graphs at the end of the paper will show results confirming our previous findings about differences in style, with Il Corriere emerging as the most atypical

    Annotating Subordinators in the Turkish Discourse Bank

    Get PDF
    In this paper we explain how we annotated subordinators in the Turkish Discourse Bank (TDB), an effort that started in 2007 and is still continuing. We introduce the project and describe some of the issues that were important in annotating three subordinators, namely kars¸ın, ragmen ˘ and halde, all of which encode the coherence relation Contrast-Concession. We also describe the annotation tool

    Crowdsourcing Question-Answer Meaning Representations

    Full text link
    We introduce Question-Answer Meaning Representations (QAMRs), which represent the predicate-argument structure of a sentence as a set of question-answer pairs. We also develop a crowdsourcing scheme to show that QAMRs can be labeled with very little training, and gather a dataset with over 5,000 sentences and 100,000 questions. A detailed qualitative analysis demonstrates that the crowd-generated question-answer pairs cover the vast majority of predicate-argument relationships in existing datasets (including PropBank, NomBank, QA-SRL, and AMR) along with many previously under-resourced ones, including implicit arguments and relations. The QAMR data and annotation code is made publicly available to enable future work on how best to model these complex phenomena.Comment: 8 pages, 6 figures, 2 table

    Attribution and its Annotation in the Penn Discourse TreeBank

    Get PDF
    An emerging task in text understanding and generation is to categorize information as fact or opinion and to further attribute it to the appropriate source. Corpus annotation schemes aim to encode such distinctions for NLP applications concerned with such tasks, such as information extraction, question answering, summarization, and generation. We describe an annotation scheme for marking the attribution of abstract objects such as propositions, facts and eventualities associated with discourse relations and their arguments annotated in the Penn Discourse TreeBank. The scheme aims to capture the source and degrees of factuality of the abstract objects. Key aspects of the scheme are annotation of the text spans signalling the attribution, and annotation of features recording the source, type, scopal polarity, and determinacy of attribution.
    corecore