8,088 research outputs found
Data mining and association rules to determine twitter trends
Opinion mining has been widely studied in the last decade due to its great interest in the field of research and countless real-world applications. This research proposes a system that combines association rules, generalization of rules, and sentiment analysis to catalog and discover opinion trends in Twitter [1]. The sentiment analysis is used to favor the generalization of the association rules. In this sense, an initial set of 1.6 million tweets captured in an undirected way is first summarized through text mining in an input set for the algorithms of rules and sentiment analysis of 158,354 tweets. On this last group, easily interpretable standard and generalized sets of rules are obtained about characters, which were revealed as an interesting result of the system
Non-Query-Based Pattern Mining and Sentiment Analysis for Massive Microblogging Online Texts
Pattern mining has been widely studied in the last decade given its great interest for research and its numerous applications in the real world. In this paper the definition of query and non-query based systems is proposed, highlighting the needs of non-query based systems in the era of Big Data. For this, we propose a new approach of a non-query based system that combines association rules, generalized rules and sentiment analysis in order to catalogue and discover opinion patterns in the social network Twitter. Association rules have been previously applied for sentiment analysis, but in most cases, they are used once the process of sentiment analysis is finished to see which tokens appear commonly related to a certain sentiment. On the other hand, they have also been used to discover patterns between sentiments. Our work differs from these in that it proposes a non-query based system which combines both techniques, in a mixed proposal of sentiment analysis and association rules to discover patterns and sentiment patterns in microblogging texts. The obtained rules generalize and summarize the sentiments obtained from a group of tweets about any character, brand or product mentioned in them. To study the performance of the proposed system, an initial set of 1.7 million tweets have been employed to analyse the most salient sentiments during the American pre-election campaign. The analysis of the obtained results supports the capability of the system of obtaining association rules and patterns with great descriptive value in this use case. Parallelisms can be established in these patterns that match perfectly with real life events.COPKIT Project, through the European Union's Horizon 2020 Research and Innovation Programme
786687Spanish Ministry for Economy and Competitiveness
TIN2015-64776-C3-1-RAndalusian Government, through Data Analysis in Medicine: from Medical Records to Big Data Project
P18-RT-2947Spanish Ministry of Education, Culture, and Sport
FPU18/00150University of Granad
Universal, Unsupervised (Rule-Based), Uncovered Sentiment Analysis
We present a novel unsupervised approach for multilingual sentiment analysis
driven by compositional syntax-based rules. On the one hand, we exploit some of
the main advantages of unsupervised algorithms: (1) the interpretability of
their output, in contrast with most supervised models, which behave as a black
box and (2) their robustness across different corpora and domains. On the other
hand, by introducing the concept of compositional operations and exploiting
syntactic information in the form of universal dependencies, we tackle one of
their main drawbacks: their rigidity on data that are structured differently
depending on the language concerned. Experiments show an improvement both over
existing unsupervised methods, and over state-of-the-art supervised models when
evaluating outside their corpus of origin. Experiments also show how the same
compositional operations can be shared across languages. The system is
available at http://www.grupolys.org/software/UUUSA/Comment: 19 pages, 5 Tables, 6 Figures. This is the authors version of a work
that was accepted for publication in Knowledge-Based System
Social media mining for identification and exploration of health-related information from pregnant women
Widespread use of social media has led to the generation of substantial
amounts of information about individuals, including health-related information.
Social media provides the opportunity to study health-related information about
selected population groups who may be of interest for a particular study. In
this paper, we explore the possibility of utilizing social media to perform
targeted data collection and analysis from a particular population group --
pregnant women. We hypothesize that we can use social media to identify cohorts
of pregnant women and follow them over time to analyze crucial health-related
information. To identify potentially pregnant women, we employ simple
rule-based searches that attempt to detect pregnancy announcements with
moderate precision. To further filter out false positives and noise, we employ
a supervised classifier using a small number of hand-annotated data. We then
collect their posts over time to create longitudinal health timelines and
attempt to divide the timelines into different pregnancy trimesters. Finally,
we assess the usefulness of the timelines by performing a preliminary analysis
to estimate drug intake patterns of our cohort at different trimesters. Our
rule-based cohort identification technique collected 53,820 users over thirty
months from Twitter. Our pregnancy announcement classification technique
achieved an F-measure of 0.81 for the pregnancy class, resulting in 34,895 user
timelines. Analysis of the timelines revealed that pertinent health-related
information, such as drug-intake and adverse reactions can be mined from the
data. Our approach to using user timelines in this fashion has produced very
encouraging results and can be employed for other important tasks where
cohorts, for which health-related information may not be available from other
sources, are required to be followed over time to derive population-based
estimates.Comment: 9 page
- âŠ