2 research outputs found

    Aspect Based Sentiment Analysis for Large Documents with Applications to US Presidential Elections 2016

    Get PDF
    Aspect based sentiment analysis (ABSA) deals with the fine grained analysis of text to extract entities and aspects and analyze sentiments expressed towards them. Previous work in this area has mostly focused on data of short reviews for products, restaurants and services. We explore ABSA for human entities in the context of large documents like news articles. We create the first-of-its-kind corpus containing multiple entities and aspects from US news articles consisting of approximately 1000 annotated sentences in 300 articles. We develop a novel algorithm to mine entity-aspect pairs from large documents and perform sentiment analysis on them. We demonstrate the application of our algorithm to social and political factors by analyzing the campaign for US presidential elections of 2016. We analyze the frequency and intensity of newspaper coverage in a cross-sectional data from various newspapers and find interesting evidence of catering to a partisan audience and consumer preferences by focusing on selective aspects of presidential candidates in different demographics

    About the exploration of data mining techniques using structured features for information extraction

    Get PDF
    The World Wide Web is a huge source of information. The amount of information being available in the World Wide Web becomes bigger and bigger every day. It is impossible to handle this amount of information by hand. Special techniques have to be used to deliver smaller excerpts of information which become manageable. Unfortunately, these techniques like search engines, for instance, just deliver a certain view of the informations original appearance. The delivered information is present in various types of les like websites, text documents, video clips, audio files and the like. The extraction of relevant and interesting pieces of information out of these files is very complex and time-consuming. Special techniques which allow for an automatic extraction of interesting informational units are analyzed in this work. Such techniques are based on Machine Learning methods. In contrast to traditional Machine Learning tasks the processing of text documents in this context needs certain techniques. The structure of natural language contained in text document poses constraints which should be respected by the Machine Learning method. These constraints and the specially tuned methods respecting them are another important aspect in this work. After defining all needed formalisms of Machine Learning which are used in this work, I present multiple approaches of Machine Learning applicable to the fields of Information Extraction. I describe the historical development from first approaches of Information Extraction over Named Entity Recognition to the point of Relation Extraction. The possibilities of using linguistic resources for the creation of feature sets for Information Extraction purposes are presented. I show how Relation Extraction is formally defined, and I additionally show what kind of methods are used for Relation Extraction in Machine Learning. I focus on Relation Extraction techniques which benefit on the one hand from minimum optimization and on the other hand from efficient data structure. Most of the experiments and implementations described in this work were done using the open source framework for Data Mining RapidMiner. To apply this framework on Information Extraction tasks I developed an extension called Information Extraction Plugin which is exhaustively described. Finally, I present applications which explicitly benefit from the collaboration of Data Mining and Information Extraction
    corecore