18,100 research outputs found
A study on text-score disagreement in online reviews
In this paper, we focus on online reviews and employ artificial intelligence
tools, taken from the cognitive computing field, to help understanding the
relationships between the textual part of the review and the assigned numerical
score. We move from the intuitions that 1) a set of textual reviews expressing
different sentiments may feature the same score (and vice-versa); and 2)
detecting and analyzing the mismatches between the review content and the
actual score may benefit both service providers and consumers, by highlighting
specific factors of satisfaction (and dissatisfaction) in texts.
To prove the intuitions, we adopt sentiment analysis techniques and we
concentrate on hotel reviews, to find polarity mismatches therein. In
particular, we first train a text classifier with a set of annotated hotel
reviews, taken from the Booking website. Then, we analyze a large dataset, with
around 160k hotel reviews collected from Tripadvisor, with the aim of detecting
a polarity mismatch, indicating if the textual content of the review is in
line, or not, with the associated score.
Using well established artificial intelligence techniques and analyzing in
depth the reviews featuring a mismatch between the text polarity and the score,
we find that -on a scale of five stars- those reviews ranked with middle scores
include a mixture of positive and negative aspects.
The approach proposed here, beside acting as a polarity detector, provides an
effective selection of reviews -on an initial very large dataset- that may
allow both consumers and providers to focus directly on the review subset
featuring a text/score disagreement, which conveniently convey to the user a
summary of positive and negative features of the review target.Comment: This is the accepted version of the paper. The final version will be
published in the Journal of Cognitive Computation, available at Springer via
http://dx.doi.org/10.1007/s12559-017-9496-
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
Harnessing data flow and modelling potentials for sustainable development
Tackling some of the global challenges relating to health, poverty, business and the environment is known to be heavily dependent on the flow and utilisation of data. However, while enhancements in data generation, storage, modelling, dissemination and the related integration of global economies and societies are fast transforming the way we live and interact, the resulting dynamic, globalised and information society remains digitally divided. On the African continent, in particular, the division has resulted into a gap between knowledge generation and its transformation into tangible products and services which Kirsop and Chan (2005) attribute to a broken information flow. This paper proposes some fundamental approaches for a sustainable transformation of data into knowledge for the purpose of improving the peoples' quality of life. Its main strategy is based on a generic data sharing model providing access to data utilising and generating entities in a multi disciplinary environment. It highlights the great potentials in using unsupervised and supervised modelling in tackling the typically predictive-in-nature challenges we face. Using both simulated and real data, the paper demonstrates how some of the key parameters may be generated and embedded in models to enhance their predictive power and reliability.
Its main outcomes include a proposed implementation framework setting the scene for the creation of decision support systems capable of addressing the key issues in society. It is expected that a sustainable data flow will forge synergies between the private sector, academic and research institutions within and between countries. It is also expected that the paper's findings will help in the design and development of knowledge extraction from data in the wake of cloud computing and, hence, contribute towards the improvement in the peoples' overall quality of life. To void running high implementation costs, selected open source tools are recommended for developing and sustaining the system.
Key words: Cloud Computing, Data Mining, Digital Divide, Globalisation, Grid Computing, Information Society, KTP, Predictive Modelling and STI
DeepBrain: Functional Representation of Neural In-Situ Hybridization Images for Gene Ontology Classification Using Deep Convolutional Autoencoders
This paper presents a novel deep learning-based method for learning a
functional representation of mammalian neural images. The method uses a deep
convolutional denoising autoencoder (CDAE) for generating an invariant, compact
representation of in situ hybridization (ISH) images. While most existing
methods for bio-imaging analysis were not developed to handle images with
highly complex anatomical structures, the results presented in this paper show
that functional representation extracted by CDAE can help learn features of
functional gene ontology categories for their classification in a highly
accurate manner. Using this CDAE representation, our method outperforms the
previous state-of-the-art classification rate, by improving the average AUC
from 0.92 to 0.98, i.e., achieving 75% reduction in error. The method operates
on input images that were downsampled significantly with respect to the
original ones to make it computationally feasible
- âŠ