12 research outputs found
A study on text-score disagreement in online reviews
In this paper, we focus on online reviews and employ artificial intelligence
tools, taken from the cognitive computing field, to help understanding the
relationships between the textual part of the review and the assigned numerical
score. We move from the intuitions that 1) a set of textual reviews expressing
different sentiments may feature the same score (and vice-versa); and 2)
detecting and analyzing the mismatches between the review content and the
actual score may benefit both service providers and consumers, by highlighting
specific factors of satisfaction (and dissatisfaction) in texts.
To prove the intuitions, we adopt sentiment analysis techniques and we
concentrate on hotel reviews, to find polarity mismatches therein. In
particular, we first train a text classifier with a set of annotated hotel
reviews, taken from the Booking website. Then, we analyze a large dataset, with
around 160k hotel reviews collected from Tripadvisor, with the aim of detecting
a polarity mismatch, indicating if the textual content of the review is in
line, or not, with the associated score.
Using well established artificial intelligence techniques and analyzing in
depth the reviews featuring a mismatch between the text polarity and the score,
we find that -on a scale of five stars- those reviews ranked with middle scores
include a mixture of positive and negative aspects.
The approach proposed here, beside acting as a polarity detector, provides an
effective selection of reviews -on an initial very large dataset- that may
allow both consumers and providers to focus directly on the review subset
featuring a text/score disagreement, which conveniently convey to the user a
summary of positive and negative features of the review target.Comment: This is the accepted version of the paper. The final version will be
published in the Journal of Cognitive Computation, available at Springer via
http://dx.doi.org/10.1007/s12559-017-9496-
Implicit location sharing detection in social media turkish text messaging
2nd International Workshop on Machine Learning, Optimization and Big Data (2016 : Volterra; Italy)Social media have become a significant venue for information sharing of live updates. Users of social media are producing and sharing large amount of personal data as a part of the live updates. A significant share of this data contains location information that can be used by other people for many purposes. Some of the social media users deliberately share their own location information with other users. However, a large number of users blindly or implicitly share their own location without noticing it and its possible consequences. Implicit location sharing is investigated in the current paper. We perform a large scale study on implicit location sharing detection for one of the most popular social media platform, namely Twitter. After a careful study, we prepared a training data set of Turkish tweets and manually labelled them. Using machine learning techniques we induced classifiers that are able to classify whether a given tweet contains implicit location sharing or not. The classifiers are shown to be very accurate and efficient as well. Moreover, the best classifier is employed in a browser add-on tool which warns the user whenever an implicit location sharing is predicted from just to be released tweet. The paper provides the followed methodology and the technical analysis as well. Furthermore, it discusses how these techniques can be extended to different social network services and also to different languages. © Springer International Publishing AG 2016
Accelerating Infinite Ensemble of Clustering by Pivot Features
The infinite ensemble clustering (IEC) incorporates both ensemble clustering and representation learning by fusing infinite basic partitions and shows appealing performance in the unsupervised context. However, it needs to solve the linear equation system with the high time complexity in proportion to O(d3) where d is the concatenated dimension of many clustering results. Inspired by the cognitive characteristic of human memory that can pay attention to the pivot features in a more compressed data space, we propose an acceleration version of IEC (AIEC) by extracting the pivot features and learning the multiple mappings to reconstruct them, where the linear equation system can be solved with the time complexity O(dr2) (r ≪ d). Experimental results on the standard datasets including image and text ones show that our algorithm AIEC improves the running time of IEC greatly but achieves the comparable clustering performance
Semi-supervised echo state networks for audio classification
Echo state networks (ESNs), belonging to the wider family of reservoir computing methods, are a powerful tool for the analysis of dynamic data. In an ESN, the input signal is fed to a fixed (possibly large) pool of interconnected neurons, whose state is then read by an adaptable layer to provide the output. This last layer is generally trained via a regularized linear least-squares procedure. In this paper, we consider the more complex problem of training an ESN for classification problems in a semi-supervised setting, wherein only a part of the input sequences are effectively labeled with the desired response. To solve the problem, we combine the standard ESN with a semi-supervised support vector machine (S3VM) for training its adaptable connections. Additionally, we propose a novel algorithm for solving the resulting non-convex optimization problem, hinging on a series of successive approximations of the original problem. The resulting procedure is highly customizable and also admits a principled way of parallelizing training over multiple processors/computers. An extensive set of experimental evaluations on audio classification tasks supports the presented semi-supervised ESN as a practical tool for dynamic problems requiring the analysis of partially labeled data