6,859 research outputs found

    Learning to Identify Ambiguous and Misleading News Headlines

    Full text link
    Accuracy is one of the basic principles of journalism. However, it is increasingly hard to manage due to the diversity of news media. Some editors of online news tend to use catchy headlines which trick readers into clicking. These headlines are either ambiguous or misleading, degrading the reading experience of the audience. Thus, identifying inaccurate news headlines is a task worth studying. Previous work names these headlines "clickbaits" and mainly focus on the features extracted from the headlines, which limits the performance since the consistency between headlines and news bodies is underappreciated. In this paper, we clearly redefine the problem and identify ambiguous and misleading headlines separately. We utilize class sequential rules to exploit structure information when detecting ambiguous headlines. For the identification of misleading headlines, we extract features based on the congruence between headlines and bodies. To make use of the large unlabeled data set, we apply a co-training method and gain an increase in performance. The experiment results show the effectiveness of our methods. Then we use our classifiers to detect inaccurate headlines crawled from different sources and conduct a data analysis.Comment: Accepted by IJCAI 201

    Mining Comparison Opinions from Chinese Online Reviews for Restaurant Competitive Analysis

    Get PDF
    Comparison is widely used by consumers during the process of product evaluation in order to emphasize their preference, which can contribute to a proxy for product competitiveness analysis. This paper proposes a novel method for mining comparative sentences based on the achievements of linguistic study. The definition of comparative sentence subcategory is put forward and a mixed rule pool containing both artificial rules and CSR is set up. Besides, an entity dictionary is used to re-check the identification result which can ensure precise identification and classification of comparative sentences. Real online comments are collected from Dianping.com as experimental data. The result shows that the proposed method outperforms baseline methods in terms of identification precision. Based on the result, features and opinions of comparative sentences are mined. We then conducted sentiment analysis to calculate the sentimental score of comparison relations. Finally, a feature competitive network of restaurants is constructed

    A Review of Text Corpus-Based Tourism Big Data Mining

    Get PDF
    With the massive growth of the Internet, text data has become one of the main formats of tourism big data. As an effective expression means of tourists’ opinions, text mining of such data has big potential to inspire innovations for tourism practitioners. In the past decade, a variety of text mining techniques have been proposed and applied to tourism analysis to develop tourism value analysis models, build tourism recommendation systems, create tourist profiles, and make policies for supervising tourism markets. The successes of these techniques have been further boosted by the progress of natural language processing (NLP), machine learning, and deep learning. With the understanding of the complexity due to this diverse set of techniques and tourism text data sources, this work attempts to provide a detailed and up-to-date review of text mining techniques that have been, or have the potential to be, applied to modern tourism big data analysis. We summarize and discuss different text representation strategies, text-based NLP techniques for topic extraction, text classification, sentiment analysis, and text clustering in the context of tourism text mining, and their applications in tourist profiling, destination image analysis, market demand, etc. Our work also provides guidelines for constructing new tourism big data applications and outlines promising research areas in this field for incoming years

    A Review of Text Corpus-Based Tourism Big Data Mining

    Get PDF
    With the massive growth of the Internet, text data has become one of the main formats of tourism big data. As an effective expression means of tourists’ opinions, text mining of such data has big potential to inspire innovations for tourism practitioners. In the past decade, a variety of text mining techniques have been proposed and applied to tourism analysis to develop tourism value analysis models, build tourism recommendation systems, create tourist profiles, and make policies for supervising tourism markets. The successes of these techniques have been further boosted by the progress of natural language processing (NLP), machine learning, and deep learning. With the understanding of the complexity due to this diverse set of techniques and tourism text data sources, this work attempts to provide a detailed and up-to-date review of text mining techniques that have been, or have the potential to be, applied to modern tourism big data analysis. We summarize and discuss different text representation strategies, text-based NLP techniques for topic extraction, text classification, sentiment analysis, and text clustering in the context of tourism text mining, and their applications in tourist profiling, destination image analysis, market demand, etc. Our work also provides guidelines for constructing new tourism big data applications and outlines promising research areas in this field for incoming years
    • …
    corecore