949 research outputs found
Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine
The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far.
Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews.
Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level.
In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data.
The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience
Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine
The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far.
Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews.
Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level.
In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data.
The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience
Learning Contextualized Music Semantics from Tags via a Siamese Network
Music information retrieval faces a challenge in modeling contextualized
musical concepts formulated by a set of co-occurring tags. In this paper, we
investigate the suitability of our recently proposed approach based on a
Siamese neural network in fighting off this challenge. By means of tag features
and probabilistic topic models, the network captures contextualized semantics
from tags via unsupervised learning. This leads to a distributed semantics
space and a potential solution to the out of vocabulary problem which has yet
to be sufficiently addressed. We explore the nature of the resultant
music-based semantics and address computational needs. We conduct experiments
on three public music tag collections -namely, CAL500, MagTag5K and Million
Song Dataset- and compare our approach to a number of state-of-the-art
semantics learning approaches. Comparative results suggest that this approach
outperforms previous approaches in terms of semantic priming and music tag
completion.Comment: 20 pages. To appear in ACM TIST: Intelligent Music Systems and
Application
Employing Crowdsourcing for Enriching a Music Knowledge Base in Higher Education
This paper describes the methodology followed and the lessons learned from
employing crowdsourcing techniques as part of a homework assignment involving
higher education students of computer science. Making use of a platform that
supports crowdsourcing in the cultural heritage domain students were solicited
to enrich the metadata associated with a selection of music tracks. The results
of the campaign were further analyzed and exploited by students through the use
of semantic web technologies. In total, 98 students participated in the
campaign, contributing more than 6400 annotations concerning 854 tracks. The
process also led to the creation of an openly available annotated dataset,
which can be useful for machine learning models for music tagging. The
campaign's results and the comments gathered through an online survey enable us
to draw some useful insights about the benefits and challenges of integrating
crowdsourcing into computer science curricula and how this can enhance
students' engagement in the learning process.Comment: To be published in The 4th International Conference on Artificial
Intelligence in Education Technology (AIET 2023), Berlin, Germany, 31 June-2
July 2023. For The GitHub code for the created music dataset, see
https://github.com/vaslyb/MusicCro
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
Captioning is a crucial and challenging task for video understanding. In
videos that involve active agents such as humans, the agent's actions can bring
about myriad changes in the scene. These changes can be observable, such as
movements, manipulations, and transformations of the objects in the scene --
these are reflected in conventional video captioning. However, unlike images,
actions in videos are also inherently linked to social and commonsense aspects
such as intentions (why the action is taking place), attributes (such as who is
doing the action, on whom, where, using what etc.) and effects (how the world
changes due to the action, the effect of the action on other agents). Thus for
video understanding, such as when captioning videos or when answering question
about videos, one must have an understanding of these commonsense aspects. We
present the first work on generating \textit{commonsense} captions directly
from videos, in order to describe latent aspects such as intentions,
attributes, and effects. We present a new dataset "Video-to-Commonsense (V2C)"
that contains 9k videos of human agents performing various actions, annotated
with 3 types of commonsense descriptions. Additionally we explore the use of
open-ended video-based commonsense question answering (V2C-QA) as a way to
enrich our captions. We finetune our commonsense generation models on the
V2C-QA task where we ask questions about the latent aspects in the video. Both
the generation task and the QA task can be used to enrich video captions
Semantic Indexing and Retrieval based on Formal Concept Analysis
Semantic indexing and retrieval has become an important research area, as the available amount of information on the Web is growing more and more. In this paper, we introduce an original approach to semantic indexing and retrieval based on Formal Concept Analysis. The concept lattice is used as a semantic index and we propose an original algorithm for traversing the lattice and answering user queries. This framework has been used and evaluated on song datasets
CASAM: Collaborative Human-machine Annotation of Multimedia.
The CASAM multimedia annotation system implements a model of cooperative annotation between a human annotator and automated components. The aim is that they work asynchronously but together. The system focuses upon the areas where automated recognition and reasoning are most effective and the user is able to work in the areas where their unique skills are required. The system’s reasoning is influenced by the annotations provided by the user and, similarly, the user can see the system’s work and modify and, implicitly, direct it. The CASAM system interacts with the user by providing a window onto the current state of annotation, and by generating requests for information which are important for the final annotation or to constrain its reasoning. The user can modify the annotation, respond to requests and also add their own annotations. The objective is that the human annotator’s time is used more effectively and that the result is an annotation that is both of higher quality and produced more quickly. This can be especially important in circumstances where the annotator has a very restricted amount of time in which to annotate the document. In this paper we describe our prototype system. We expand upon the techniques used for automatically analysing the multimedia document, for reasoning over the annotations generated and for the generation of an effective interaction with the end-user. We also present the results of evaluations undertaken with media professionals in order to validate the approach and gain feedback to drive further research
- …