7 research outputs found
Dynamics and tipping point of issue attention in newspapers: quantitative and qualitative content analysis at sentence level in a longitudinal study using supervised machine learning and big data
This study aims to provide a more sensitive understanding of the dynamics and tipping points of issue attention in news media by combining the strengths of quantitative and qualitative research. The topic of this 25-year longitudinal study is the volume and the content of newspaper articles about the emerging risk of gas drilling in The Netherlands. We applied supervised machine learning (SML) because this allowed us to study changes in the quantitative use of subtopics at the detailed sentence level in a large number of articles. The study shows that the actual risk of drilling-induced seismicity gradually increased and that the volume of newspaper attention for the issue also gradually increased for two decades. The sub-topics extracted from media articles during the low media attention period, covering factual information, can b
Semantics-Driven Aspect-Based Sentiment Analysis
People using the Web are constantly invited to share their opinions and preferences with the rest of the world, which has led to an explosion of opinionated blogs, reviews of products and services, and comments on virtually everything. This type of web-based content is increasingly recognized as a source of data that has added value for multiple application domains.
While the large number of available reviews almost ensures that all relevant parts of the entity under review are properly covered, manually reading each and every review is not feasible. Aspect-based sentiment analysis aims to solve this issue, as it is concerned with the development of algorithms that can automatically extract fine-grained sentiment information from a set of reviews, computing a separate sentiment value for the various aspects of the product or service being reviewed.
This dissertation focuses on which discriminants are useful when performing aspect-based sentiment analysis. What signals for sentiment can be extracted from the text itself and what is the effect of using extra-textual discriminants? We find that using semantic lexicons or ontologies, can greatly improve the quality of aspect-based sentiment analysis, especially with limited training data. Additionally, due to semantics driving the analysis, the algorithm is less of a black box and results are easier to explain
Implicit feature detection for sentiment analysis
Implicit feature detection is a promising research direction that has not seen much research yet. Based on previous work, where co-occurrences between notional words and ex- plicit features are used to find implicit features, this research critically reviews its underlying assumptions and proposes a revised algorithm, that directly uses the co-occurrences be- Tween implicit features and notional words. The revision is shown to perform better than the original method, but both methods are shown to fail in a more realistic scenario
Using linguistic graph similarity to search for sentences in news articles
With the volume of daily news growing to sizes too big to handle for any individual human, there is a clear need for effective search algorithms. Since traditional bag-of-words approaches are inherently limited since they ignore much of the information that is embedded in the structure of the text, we propose a linguistic approach to search called Destiny in this paper. With Destiny, sentences, both from news items and the user queries, are represented as graphs where the nodes represent the words in the sentence and the edges represent the grammatical relations between the words. The proposed algorithm is evaluated against a TF-IDF baseline using a custom corpus of user-rated sentences. Destiny significantly outperforms TF-IDF in terms of Mean Average Precision, normalized Discounted Cumulative Gain, and Spearman's Rho
Framing a Conflict! How Media Report on Earthquake Risks Caused by Gas Drilling: A Longitudinal Analysis Using Machine Learning Techniques of Media Reporting on Gas Drilling from 1990 to 2015
Using a new analytical tool, supervised machine learning (SML), a large number of newspaper articles is analysed to answer the question how newspapers frame the news of public risks, in this case of ea
Review-aggregated aspect-based sentiment analysis with ontology features
With all the information that is available on the World Wide Web, there is great demand for data mining techniques and sentiment analysis is a particularly popular domain, both in business and research. Sentiment analysis aims to determine the sentiment value, often on a positive–negative scale, for a given product or service based on a set of textual reviews. As fine-grained information is more useful than just a single overall score, modern aspect-based sentiment analysis techniques break down the sentiment and assign sentiment scores to various aspects of the product or service mentioned in the review. In this work, we focus on aspect-based sentim
Sentiment analysis of multiple implicit features per sentence in consumer review data
With the rise of e-commerce, online consumer reviews have become crucial for consumers' purchasing decisions. Most of the existing research focuses on the detection of explicit features and sentiments in such reviews, thereby ignoring all that is reviewed implicitly. This study builds, in extension of an existing implicit feature algorithm that can only assign one implicit feature to each sentence, a classifier that predicts the presence of multiple implicit features in sentences. The classifier makes its prediction based on a custom score function and a trained threshold. Only if this score exceeds the threshold, we allow for the detection of multiple implicit feature. In this way, we increase the recall while limiting the decrease in precision. In the more realistic scenario, the classifier-based approach improves the F1-score from 62.9% to 64.5% on a restaurant review data set. The precision of the computed sentiment associated with the detected features is 63.9%