484 research outputs found
Predicting customer satisfaction with product reviews: A comparitive study of some machine learning approaches.
In past two decades e-commerce platform developed exponentially, and with this advent, there came several challenges due to a vast amount of information. Customers not only buy products online but also get valuable information about a product they intend to buy through an online platform. Customers share their experiences by providing feedback which creates a pool of textual information and this process continuously generates data every day. The information provided by customers contains both subjective and objective text that contains a rich information regarding behaviour, liking and disliking towards a product and sentiments of customers. Moreover, this information can be helpful for the customers who are yet to buy or who are yet in decision making process. This thesis studies comparison of four supervised machine learning approaches to predict customer satisfaction. These approaches are: Naïve Bayes, Support Vector Machines (SVM), Logistic Regression (LR), and Decision Tree (DT). The models use term frequency inverse document frequency (TF-IDF) vectorization for training and testing sets of data. The models are applied after basic pre-processing of text data that includes the lower casing, lemmatization, the stop words removal, smileys removal, and digits removal. We compare the performance of models using accuracy, precision, recall, and F1-scores. Support Vector Machines (SVM) outperforms the rest of the models with the accuracy rate 83% while Naïve Bayes, Logistic Regression (LR) and Decision Tree (DT) have accuracy rate 82%, 78%, and 76%, respectively. Moreover, we evaluate the performance of classifiers using confusion matrix
データ/テキストマイニングをベースとするサービス産業評価指標の開発
国立大学法人長岡技術科学大
Application of pre-training and fine-tuning AI models to machine translation: a case study of multilingual text classification in Baidu
With the development of international information technology, we are producing
a huge amount of information all the time. The processing ability of information in
various languages is gradually replacing information and becoming a rarer resource.
How to obtain the most effective information in such a large and complex amount of
multilingual textual information is a major goal of multilingual information
processing.
Multilingual text classification helps users to break the language barrier and
accurately locate the required information and triage information. At the same time,
the rapid development of the Internet has accelerated the communication among users
of various languages, giving rise to a large number of multilingual texts, such as book
and movie reviews, online chats, product introductions and other forms, which
contain a large amount of valuable implicit information and urgently need automated
tools to categorize and process those multilingual texts.
This work describes the Natural Language Process (NLP) sub-task known as
Multilingual Text Classification (MTC) performed within the context of Baidu, a
Chinese leading AI company with a strong Internet base, whose NLP division led the
industry in deep learning technology to go online in Machine Translation (MT) and
search. Multilingual text classification is an important module in NLP machine
translation and a basic module in NLP tasks. It can be applied to many fields, such as
Fake Reviews Detection, News Headlines Categories Classification, Analysis of
positive and negative reviews and so on.
In the following work, we will first define the AI model paradigm of
'pre-training and fine-tuning' in deep learning in the Baidu NLP department. Then
investigated the application scenarios of multilingual text classification. Most of the
text classification systems currently available in the Chinese market are designed for a
single language, such as Alibaba's text classification system. If users need to classify
texts of the same category in multiple languages, they need to train multiple single
text classification systems and then classify them one by one.
However, many internationalized products do not have a single text language,
such as AliExpress cross-border e-commerce business, Airbnb B&B business, etc.
Industry needs to understand and classify users’ reviews in various languages, and
have conducted in-depth statistics and marketing strategy development, and
multilingual text classification is particularly important in this scenario.
Therefore, we focus on interpreting the methodology of multilingual text
classification model of machine translation in Baidu NLP department, and capture
sets of multilingual data of reviews, news headlines and other data for manual
classification and labeling, use the labeling results for fine-tuning of multilingual text
classification model, and output the quality evaluation data of Baidu multilingual text
classification model after fine-tuning. We will discuss if the pre-training and
fine-tuning of the large model can substantially improve the quality and performance
of multilingual text classification.
Finally, based on the machine translation-multilingual text classification model,
we derive the application method of pre-training and fine-tuning paradigm in the
current cutting-edge deep learning AI model under the NLP system and verify the
generality and cutting-edge of the pre-training and fine-tuning paradigm in the deep
learning-intelligent search field.Com o desenvolvimento da tecnologia de informação internacional, estamos
sempre a produzir uma enorme quantidade de informação e o recurso mais escasso já
não é a informação, mas a capacidade de processar informação em cada língua. A
maior parte da informação multilingue é expressa sob a forma de texto. Como obter a
informação mais eficaz numa quantidade tão considerável e complexa de informação
textual multilingue é um dos principais objetivos do processamento de informação
multilingue.
A classificação de texto multilingue ajuda os utilizadores a quebrar a barreira
linguística e a localizar com precisão a informação necessária e a classificá-la. Ao
mesmo tempo, o rápido desenvolvimento da Internet acelerou a comunicação entre
utilizadores de várias línguas, dando origem a um grande número de textos
multilingues, tais como críticas de livros e filmes, chats, introduções de produtos e
outros distintos textos, que contêm uma grande quantidade de informação implícita
valiosa e necessitam urgentemente de ferramentas automatizadas para categorizar e
processar esses textos multilingues.
Este trabalho descreve a subtarefa do Processamento de Linguagem Natural
(PNL) conhecida como Classificação de Texto Multilingue (MTC), realizada no
contexto da Baidu, uma empresa chinesa líder em IA, cuja equipa de PNL levou a
indústria em tecnologia baseada em aprendizagem neuronal a destacar-se em
Tradução Automática (MT) e pesquisa científica. A classificação multilingue de
textos é um módulo importante na tradução automática de PNL e um módulo básico
em tarefas de PNL. A MTC pode ser aplicada a muitos campos, tais como análise de
sentimentos multilingues, categorização de notícias, filtragem de conteúdos
indesejados (do inglês spam), entre outros.
Neste trabalho, iremos primeiro definir o paradigma do modelo AI de 'pré-treino
e afinação' em aprendizagem profunda no departamento de PNL da Baidu. Em
seguida, realizaremos a pesquisa sobre outros produtos no mercado com capacidade
de classificação de texto — a classificação de texto levada a cabo pela Alibaba. Após
a pesquisa, verificamos que a maioria dos sistemas de classificação de texto
atualmente disponíveis no mercado chinês são concebidos para uma única língua, tal como o sistema de classificação de texto Alibaba. Se os utilizadores precisarem de
classificar textos da mesma categoria em várias línguas, precisam de aplicar vários
sistemas de classificação de texto para cada língua e depois classificá-los um a um.
No entanto, muitos produtos internacionalizados não têm uma única língua de
texto, tais como AliExpress comércio eletrónico transfronteiriço, Airbnb B&B
business, etc. A indústria precisa compreender e classificar as revisões dos
utilizadores em várias línguas. Esta necessidade conduziu a um desenvolvimento
aprofundado de estatísticas e estratégias de marketing, e a classificação de textos
multilingues é particularmente importante neste cenário.
Desta forma, concentrar-nos-emos na interpretação da metodologia do modelo
de classificação de texto multilingue da tradução automática no departamento de PNL
Baidu. Colhemos para o efeito conjuntos de dados multilingues de comentários e
críticas, manchetes de notícias e outros dados para classificação manual, utilizamos os
resultados dessa classificação para o aperfeiçoamento do modelo de classificação de
texto multilingue e produzimos os dados de avaliação da qualidade do modelo de
classificação de texto multilingue da Baidu. Discutiremos se o pré-treino e o
aperfeiçoamento do modelo podem melhorar substancialmente a qualidade e o
desempenho da classificação de texto multilingue. Finalmente, com base no modelo
de classificação de texto multilingue de tradução automática, derivamos o método de
aplicação do paradigma de pré-formação e afinação no atual modelo de IA de
aprendizagem profunda de ponta sob o sistema de PNL, e verificamos a robustez e os
resultados positivos do paradigma de pré-treino e afinação no campo de pesquisa de
aprendizagem profunda
ANALYZING CUSTOMER REVIEWS IN TURKISH USING MACHINE LEARNING AND DATA SCIENCE METHODOLOGIES
Digital life, especially after the introduction of Web 2.0, has significantly altered
human relations, providing all people the “right of public speech”. Ideas, emotions,
and opinions on many topics are generously shared in virtual environments. A new age
global and digital Mouth of World is shaping the society where knowledge is the most
influential power. Being fed by social media data highly dynamic in either amount or
shape, automatic handling is indispensable.
Natural Language Processing, in cooperation with Machine Language techniques, has
an important say in analyzing written textual data. Traditional techniques exploited in
the literature are empowered when hybrid ones are applied, in accordance also with the
characteristic properties of the language used and the domain-specific data. Although
all the subsequent steps of the text classification chain are important, adequate feature
selecting has a notable huge impact on accurate classification prediction.
In this study, a simple classification of the sentiment polarity of comments in document
level of subjective texts in Turkish is done. Different domains include reviews of
customers towards company products, movies, and healthcare services, deciding on the
positivity or negativity of the comments. Another domain includes doctors’ notes on
patients’ symptoms aiming to predict and thus recommend some of the most often used
medical tests according to general doctors’ procedures.
The features used included a part of or all distinct words roots together with their
binary or frequency information. Linear or vector analysis of the feature sets was done
employing Machine Learning algorithms provided by the Weka tool. Hybrid features
set was proposed and found more efficient combining binary vectors and frequency
meta-features from nodes and leaves of J48 tree classifier for all or a set of correlation based selected features, improving both prediction accuracy and classification
performance
Personalizing online reviews for better customer decision making
Online consumer reviews have become an important source of information for understanding
markets and customer preferences. When making purchase decisions, customers
increasingly rely on user-generated online reviews; some even consider the information
in online reviews more credible and trustworthy than information provided
by vendors. Many studies have revealed that online reviews influence demand and
sales. Others have shown the possibility of identifying customer interest in product
attributes. However, little work has been done to address customer and review diversity
in the process of examining reviews. This research intends to answer the research
question: how can we solve the problem of customer and review diversity in the context
of online reviews to recommend useful reviews based on customer preferences and
improve product recommendation? Our approach to the question is through personalization.
Similar to other personalization research, we use an attribute-based model
to represent products and customer preferences. Unlike existing personalization research
that uses a set of pre-defined product attributes, we explore the possibility of a
data-driven approach for identifying more comprehensive product attributes from online
reviews to model products and customer preferences. Specifically, we introduce
a new topic model for product attribute identification and sentiment analysis. By
differentiating word co-occurrences at the sentence level from at the document level,
the model better identifies interpretable topics. The use of an inference network with
shared structure enables the model to predict product attribute ratings accurately.
Based on this topic model, we develop attribute-based representations of products,
reviews and customer preferences and use them to construct the personalization of online reviews. We examine personalization from the lens of consumer search theory
and human information processing theory and test the hypotheses with an experiment.
The personalization of online reviews can 1) recommend products matching
customer's preferences; 2) improve custom's intention towards recommended products;
3) best distinguish recommended products from products that do not match
customer's preferences; and 4) reduce decision effort
Integrating Terminology Extraction and Word Embedding for Unsupervised Aspect Based Sentiment Analysis
In this paper we explore the advantages that unsupervised terminology extraction can bring to unsupervised Aspect Based Sentiment Analysis methods based on word embedding expansion techniques. We prove that the gain in terms of F-measure is in the order of 3%.Nel presente articolo analizziamo l’interazione tra syistemi di estrazione “classica” terminologica e systemi basati su techniche di “word embedding” nel contesto dell’analisi delle opinioni. Domostreremo che l’integrazione di terminogie porta un guadagno in F-measure pari al 3% sul dataset francese di Semeval 2016
Aspect-based sentiment analysis: a scalable system, a condition miner, and an evaluation dataset.
Aspect-based sentiment analysis systems are a kind of text-mining sys-
tems that specialise in summarising the sentiment that a collection of reviews
convey regarding some aspects of an item. There are many cases in which
users write their reviews using conditional sentences; in such cases, min-
ing the conditions so that they can be analysed is very important to improve
the interpretation of the corresponding sentiment summaries. Unfortunately,
current commercial systems or research systems neglect conditions; cur-
rent frameworks and toolkits do not provide any components to mine them;
furthermore, the proposals in the literature are insufficient because they
are based on hand-crafted patterns that fall short regarding recall or ma-
chine learning models that are tightly bound with a specific language and
require too much configuration.
In this dissertation, we introduce Torii, which is an aspect-based sentiment
analysis system whose most salient feature is that it can mine conditions; we
also introduce Kami, which provides two deep learning proposals to mine
conditions; and we also present Norito, which is the first publicly available
dataset of conditions. Our experimental results prove our proposals to mine
conditions are similar to the state of the art in terms of precision, but improve
recall enough to beat them in terms of F1 score. Finally, it is worth mentioning
that this dissertation would not have been possible without the collaboration
of Opileak, which backs up the industrial applicability of our work.Los sistemas de análisis de sentimiento basados en aspectos son un ti-
po de sistemas de minería de texto que están especializados en resumir el
sentimiento que una colección de revisiones transmite respecto a diversos as-
pectos de un item. En muchas ocasiones, los usuarios escriben sus revisiones
utilizando condiciones; minarlas de manera que puedan ser analizadas es
muy importante para mejorar la interpretación de los correspondientes suma-
rios de sentimiento. Por desgracia, los sistemas comerciales y los sistemas
académicos existentes ignoran las condiciones; los frameworks y bibliote-
cas existentes no proporcionan ningún componente para minarlas; además,
las propuestas de la bibliografía son insuficientes ya que están basadas en
patrones diseñados manualmente que no ofrecen suficiente cobertura o mo-
delos de aprendizaje automático que están muy ligados a un idioma concreto
y requieren de demasiada configuración específica.
En esta tesis presentamos Torii, que es un sistema de análisis de sentimien-
to basado en aspectos cuya característica más destacada es que puede minar
condiciones; también presentamos Kami, que proporciona dos propuestas
de aprendizaje profundo para minar condiciones; finalmente presentamos
Norito, que es el primer dataset de condiciones disponible públicamen-
te. Nuestros resultados experimentales prueban que nuestras propuestas de
minería de condiciones son similares al estado del arte en términos de pre-
cisión, pero mejoran la cobertura suficientemente como para batirlos en
términos de F1. Finalmente, es digno de mención que esta tesis no habría si-
do posible sin la colaboración de Opileak, que garantiza la aplicabilidad
industrial de nuestro trabajo
Neural Approaches to Relational Aspect-Based Sentiment Analysis. Exploring generalizations across words and languages
Jebbara S. Neural Approaches to Relational Aspect-Based Sentiment Analysis. Exploring generalizations across words and languages. Bielefeld: Universität Bielefeld; 2020.Everyday, vast amounts of unstructured, textual data are shared online in digital form.
Websites such as forums, social media sites, review sites, blogs, and comment sections offer platforms to express and discuss opinions and experiences. Understanding the opinions in these resources is valuable for e.g. businesses to support market research and customer service but also individuals, who can benefit from the experiences and expertise of others.
In this thesis, we approach the topic of opinion extraction and classification with neural network models. We regard this area of sentiment analysis as a relation extraction problem in which the sentiment of some opinion holder towards a certain aspect of a product, theme, or event needs to be extracted. In accordance with this framework, our main contributions are the following:
1. We propose a full system addressing all subtasks of relational sentiment analysis.
2. We investigate how semantic web resources can be leveraged in a neural-network-based model for the extraction of opinion targets and the classification of sentiment labels. Specifically, we experiment with enhancing pretrained word embeddings using the lexical resource WordNet. Furthermore, we enrich a purely text-based model with SenticNet concepts and observe an improvement for sentiment classification.
3. We examine how opinion targets can be automatically identified in noisy texts. Customer reviews, for instance, are prone to contain misspelled words and are difficult to process due to their domain-specific language. We integrate information about the character structure of a word into a sequence labeling system using character-level word embeddings and show their positive impact on the system's performance. We reveal encoded character patterns of the learned embeddings and give a nuanced view of the obtained performance differences.
4. Opinion target extraction usually relies on supervised learning approaches. We address the lack of available annotated data for specific languages by proposing a zero-shot cross-lingual approach for the extraction of opinion target expressions. We leverage multilingual word embeddings that share a common vector space across various languages and incorporate these into a convolutional neural network architecture. Our experiments with 5 languages give promising results: We can successfully train a model on annotated data of a source language and perform accurate prediction on a target language without ever using any annotated samples in that target language
Information and Communication Technologies in Tourism 2022
This open access book presents the proceedings of the International Federation for IT and Travel & Tourism (IFITT)’s 29th Annual International eTourism Conference, which assembles the latest research presented at the ENTER2022 conference, which will be held on January 11–14, 2022. The book provides an extensive overview of how information and communication technologies can be used to develop tourism and hospitality. It covers the latest research on various topics within the field, including augmented and virtual reality, website development, social media use, e-learning, big data, analytics, and recommendation systems. The readers will gain insights and ideas on how information and communication technologies can be used in tourism and hospitality. Academics working in the eTourism field, as well as students and practitioners, will find up-to-date information on the status of research
- …