12,759 research outputs found
A large multilingual and multi-domain dataset for recommender systems
This paper presents a multi-domain interests dataset to train and test Recommender Systems, and the methodology to create the dataset
from Twitter messages in English and Italian. The English dataset includes an average of 90 preferences per user on music, books,
movies, celebrities, sport, politics and much more, for about half million users. Preferences are either extracted from messages of
users who use Spotify, Goodreads and other similar content sharing platforms, or induced from their ”topical” friends, i.e., followees
representing an interest rather than a social relation between peers. In addition, preferred items are matched with Wikipedia articles
describing them. This unique feature of our dataset provides a mean to derive a semantic categorization of the preferred items, exploiting
available semantic resources linked to Wikipedia such as the Wikipedia Category Graph, DBpedia, BabelNet and others
Extracting semantic entities and events from sports tweets
Large volumes of user-generated content on practically every major issue and event are being created on the microblogging site Twitter. This content can be combined and processed to detect events, entities and popular moods to feed various knowledge-intensive practical applications. On the downside, these content items are very noisy and highly informal, making it difficult to extract sense out of the stream. In this paper, we exploit various approaches to detect the named entities and significant micro-events from users’ tweets during a live sports event. Here we describe how combining linguistic features with background knowledge and the use of Twitter-specific features can achieve high, precise detection results (f-measure = 87%) in different datasets. A study was conducted on tweets from cricket matches in the ICC World Cup in order to augment the event-related non-textual media with collective intelligence
Recommended from our members
A Linked Open Data Approach for Sentiment Lexicon Adaptation
Social media platforms have recently become a gold mine for organisations to monitor their reputation by extracting and analysing the sentiment of the posts generated about them, their markets, and competitors. Among the approaches to analyse sentiment from social media, approaches based on sentiment lexicons (sets of words with associated sentiment scores) have gained popularity since they do not rely on training data, as opposed to Machine Learning approaches. However, sentiment lexicons consider a static sentiment score for each word without taking into consideration the different contexts in which the word is used (e.g, great problem vs. great smile). Additionally, new words constantly emerge from dynamic and rapidly changing social media environments that may not be covered by the lexicons. In this paper we propose a lexicon adaptation approach that makes use of semantic relations extracted from DBpedia to better understand the various contextual scenarios in which words are used. We evaluate our approach on three different Twitter datasets and show that using semantic information to adapt the lexicon improves sentiment computation by 3.7% in average accuracy, and by 2.6% in average F1 measure
Recommended from our members
Extracting Semantics of Individual Places from Movement Data by Analyzing Temporal Patterns of Visits
Data reflecting movements of people, such as GPS or GSM tracks, can be a source of information about mobility behaviors and activities of people. Such information is required for various kinds of spatial planning in the public and business sectors. Movement data by themselves are semantically poor. Meaningful information can be derived by means of interactive visual analysis performed by a human expert; however, this is only possible for data about a small number of people. We suggest an approach that allows scaling to large datasets reflecting movements of numerous people. It includes extracting stops, clustering them for identifying personal places of interest (POIs), and creating temporal signatures of the POIs characterizing the temporal distribution of the stops with respect to the daily and weekly time cycles and the time line. The analyst can give meanings to selected POIs based on their temporal signatures (i.e., classify them as home, work, etc.), and then POIs with similar signatures can be classified automatically. We demonstrate the possibilities for interactive visual semantic analysis by example of GSM, GPS, and Twitter data. GPS data allow inferring richer semantic information, but temporal signatures alone may be insufficient for interpreting short stops. Twitter data are similar to GSM data but additionally contain message texts, which can help in place interpretation. We plan to develop an intelligent system that learns how to classify personal places and trips while a human analyst visually analyzes and semantically annotates selected subsets of movement data
Towards Deep Semantic Analysis Of Hashtags
Hashtags are semantico-syntactic constructs used across various social
networking and microblogging platforms to enable users to start a topic
specific discussion or classify a post into a desired category. Segmenting and
linking the entities present within the hashtags could therefore help in better
understanding and extraction of information shared across the social media.
However, due to lack of space delimiters in the hashtags (e.g #nsavssnowden),
the segmentation of hashtags into constituent entities ("NSA" and "Edward
Snowden" in this case) is not a trivial task. Most of the current
state-of-the-art social media analytics systems like Sentiment Analysis and
Entity Linking tend to either ignore hashtags, or treat them as a single word.
In this paper, we present a context aware approach to segment and link entities
in the hashtags to a knowledge base (KB) entry, based on the context within the
tweet. Our approach segments and links the entities in hashtags such that the
coherence between hashtag semantics and the tweet is maximized. To the best of
our knowledge, no existing study addresses the issue of linking entities in
hashtags for extracting semantic information. We evaluate our method on two
different datasets, and demonstrate the effectiveness of our technique in
improving the overall entity linking in tweets via additional semantic
information provided by segmenting and linking entities in a hashtag.Comment: To Appear in 37th European Conference on Information Retrieva
Sentiment Lexicon Adaptation with Context and Semantics for the Social Web
Sentiment analysis over social streams offers governments and organisations a fast and effective way to monitor the publics' feelings towards policies, brands, business, etc. General purpose sentiment lexicons have been used to compute sentiment from social streams, since they are simple and effective. They calculate the overall sentiment of texts by using a general collection of words, with predetermined sentiment orientation and strength. However, words' sentiment often vary with the contexts in which they appear, and new words might be encountered that are not covered by the lexicon, particularly in social media environments where content emerges and changes rapidly and constantly. In this paper, we propose a lexicon adaptation approach that uses contextual as well as semantic information extracted from DBPedia to update the words' weighted sentiment orientations and to add new words to the lexicon. We evaluate our approach on three different Twitter datasets, and show that enriching the lexicon with contextual and semantic information improves sentiment computation by 3.4% in average accuracy, and by 2.8% in average F1 measure
Discovering New Sentiments from the Social Web
A persistent challenge in Complex Systems (CS) research is the
phenomenological reconstruction of systems from raw data. In order to face the
problem, the use of sound features to reason on the system from data processing
is a key step. In the specific case of complex societal systems, sentiment
analysis allows to mirror (part of) the affective dimension. However it is not
reasonable to think that individual sentiment categorization can encompass the
new affective phenomena in digital social networks.
The present papers addresses the problem of isolating sentiment concepts
which emerge in social networks. In an analogy to Artificial Intelligent
Singularity, we propose the study and analysis of these new complex sentiment
structures and how they are similar to or diverge from classic conceptual
structures associated to sentiment lexicons. The conjecture is that it is
highly probable that hypercomplex sentiment structures -not explained with
human categorizations- emerge from high dynamic social information networks.
Roughly speaking, new sentiment can emerge from the new global nervous systems
as it occurs in humans
- …