484 research outputs found

    Predicting customer satisfaction with product reviews: A comparitive study of some machine learning approaches.

    Get PDF
    In past two decades e-commerce platform developed exponentially, and with this advent, there came several challenges due to a vast amount of information. Customers not only buy products online but also get valuable information about a product they intend to buy through an online platform. Customers share their experiences by providing feedback which creates a pool of textual information and this process continuously generates data every day. The information provided by customers contains both subjective and objective text that contains a rich information regarding behaviour, liking and disliking towards a product and sentiments of customers. Moreover, this information can be helpful for the customers who are yet to buy or who are yet in decision making process. This thesis studies comparison of four supervised machine learning approaches to predict customer satisfaction. These approaches are: Naïve Bayes, Support Vector Machines (SVM), Logistic Regression (LR), and Decision Tree (DT). The models use term frequency inverse document frequency (TF-IDF) vectorization for training and testing sets of data. The models are applied after basic pre-processing of text data that includes the lower casing, lemmatization, the stop words removal, smileys removal, and digits removal. We compare the performance of models using accuracy, precision, recall, and F1-scores. Support Vector Machines (SVM) outperforms the rest of the models with the accuracy rate 83% while Naïve Bayes, Logistic Regression (LR) and Decision Tree (DT) have accuracy rate 82%, 78%, and 76%, respectively. Moreover, we evaluate the performance of classifiers using confusion matrix

    データ/テキストマイニングをベースとするサービス産業評価指標の開発

    Get PDF
    国立大学法人長岡技術科学大

    Application of pre-training and fine-tuning AI models to machine translation: a case study of multilingual text classification in Baidu

    Get PDF
    With the development of international information technology, we are producing a huge amount of information all the time. The processing ability of information in various languages is gradually replacing information and becoming a rarer resource. How to obtain the most effective information in such a large and complex amount of multilingual textual information is a major goal of multilingual information processing. Multilingual text classification helps users to break the language barrier and accurately locate the required information and triage information. At the same time, the rapid development of the Internet has accelerated the communication among users of various languages, giving rise to a large number of multilingual texts, such as book and movie reviews, online chats, product introductions and other forms, which contain a large amount of valuable implicit information and urgently need automated tools to categorize and process those multilingual texts. This work describes the Natural Language Process (NLP) sub-task known as Multilingual Text Classification (MTC) performed within the context of Baidu, a Chinese leading AI company with a strong Internet base, whose NLP division led the industry in deep learning technology to go online in Machine Translation (MT) and search. Multilingual text classification is an important module in NLP machine translation and a basic module in NLP tasks. It can be applied to many fields, such as Fake Reviews Detection, News Headlines Categories Classification, Analysis of positive and negative reviews and so on. In the following work, we will first define the AI model paradigm of 'pre-training and fine-tuning' in deep learning in the Baidu NLP department. Then investigated the application scenarios of multilingual text classification. Most of the text classification systems currently available in the Chinese market are designed for a single language, such as Alibaba's text classification system. If users need to classify texts of the same category in multiple languages, they need to train multiple single text classification systems and then classify them one by one. However, many internationalized products do not have a single text language, such as AliExpress cross-border e-commerce business, Airbnb B&B business, etc. Industry needs to understand and classify users’ reviews in various languages, and have conducted in-depth statistics and marketing strategy development, and multilingual text classification is particularly important in this scenario. Therefore, we focus on interpreting the methodology of multilingual text classification model of machine translation in Baidu NLP department, and capture sets of multilingual data of reviews, news headlines and other data for manual classification and labeling, use the labeling results for fine-tuning of multilingual text classification model, and output the quality evaluation data of Baidu multilingual text classification model after fine-tuning. We will discuss if the pre-training and fine-tuning of the large model can substantially improve the quality and performance of multilingual text classification. Finally, based on the machine translation-multilingual text classification model, we derive the application method of pre-training and fine-tuning paradigm in the current cutting-edge deep learning AI model under the NLP system and verify the generality and cutting-edge of the pre-training and fine-tuning paradigm in the deep learning-intelligent search field.Com o desenvolvimento da tecnologia de informação internacional, estamos sempre a produzir uma enorme quantidade de informação e o recurso mais escasso já não é a informação, mas a capacidade de processar informação em cada língua. A maior parte da informação multilingue é expressa sob a forma de texto. Como obter a informação mais eficaz numa quantidade tão considerável e complexa de informação textual multilingue é um dos principais objetivos do processamento de informação multilingue. A classificação de texto multilingue ajuda os utilizadores a quebrar a barreira linguística e a localizar com precisão a informação necessária e a classificá-la. Ao mesmo tempo, o rápido desenvolvimento da Internet acelerou a comunicação entre utilizadores de várias línguas, dando origem a um grande número de textos multilingues, tais como críticas de livros e filmes, chats, introduções de produtos e outros distintos textos, que contêm uma grande quantidade de informação implícita valiosa e necessitam urgentemente de ferramentas automatizadas para categorizar e processar esses textos multilingues. Este trabalho descreve a subtarefa do Processamento de Linguagem Natural (PNL) conhecida como Classificação de Texto Multilingue (MTC), realizada no contexto da Baidu, uma empresa chinesa líder em IA, cuja equipa de PNL levou a indústria em tecnologia baseada em aprendizagem neuronal a destacar-se em Tradução Automática (MT) e pesquisa científica. A classificação multilingue de textos é um módulo importante na tradução automática de PNL e um módulo básico em tarefas de PNL. A MTC pode ser aplicada a muitos campos, tais como análise de sentimentos multilingues, categorização de notícias, filtragem de conteúdos indesejados (do inglês spam), entre outros. Neste trabalho, iremos primeiro definir o paradigma do modelo AI de 'pré-treino e afinação' em aprendizagem profunda no departamento de PNL da Baidu. Em seguida, realizaremos a pesquisa sobre outros produtos no mercado com capacidade de classificação de texto — a classificação de texto levada a cabo pela Alibaba. Após a pesquisa, verificamos que a maioria dos sistemas de classificação de texto atualmente disponíveis no mercado chinês são concebidos para uma única língua, tal como o sistema de classificação de texto Alibaba. Se os utilizadores precisarem de classificar textos da mesma categoria em várias línguas, precisam de aplicar vários sistemas de classificação de texto para cada língua e depois classificá-los um a um. No entanto, muitos produtos internacionalizados não têm uma única língua de texto, tais como AliExpress comércio eletrónico transfronteiriço, Airbnb B&B business, etc. A indústria precisa compreender e classificar as revisões dos utilizadores em várias línguas. Esta necessidade conduziu a um desenvolvimento aprofundado de estatísticas e estratégias de marketing, e a classificação de textos multilingues é particularmente importante neste cenário. Desta forma, concentrar-nos-emos na interpretação da metodologia do modelo de classificação de texto multilingue da tradução automática no departamento de PNL Baidu. Colhemos para o efeito conjuntos de dados multilingues de comentários e críticas, manchetes de notícias e outros dados para classificação manual, utilizamos os resultados dessa classificação para o aperfeiçoamento do modelo de classificação de texto multilingue e produzimos os dados de avaliação da qualidade do modelo de classificação de texto multilingue da Baidu. Discutiremos se o pré-treino e o aperfeiçoamento do modelo podem melhorar substancialmente a qualidade e o desempenho da classificação de texto multilingue. Finalmente, com base no modelo de classificação de texto multilingue de tradução automática, derivamos o método de aplicação do paradigma de pré-formação e afinação no atual modelo de IA de aprendizagem profunda de ponta sob o sistema de PNL, e verificamos a robustez e os resultados positivos do paradigma de pré-treino e afinação no campo de pesquisa de aprendizagem profunda

    ANALYZING CUSTOMER REVIEWS IN TURKISH USING MACHINE LEARNING AND DATA SCIENCE METHODOLOGIES

    Get PDF
    Digital life, especially after the introduction of Web 2.0, has significantly altered human relations, providing all people the “right of public speech”. Ideas, emotions, and opinions on many topics are generously shared in virtual environments. A new age global and digital Mouth of World is shaping the society where knowledge is the most influential power. Being fed by social media data highly dynamic in either amount or shape, automatic handling is indispensable. Natural Language Processing, in cooperation with Machine Language techniques, has an important say in analyzing written textual data. Traditional techniques exploited in the literature are empowered when hybrid ones are applied, in accordance also with the characteristic properties of the language used and the domain-specific data. Although all the subsequent steps of the text classification chain are important, adequate feature selecting has a notable huge impact on accurate classification prediction. In this study, a simple classification of the sentiment polarity of comments in document level of subjective texts in Turkish is done. Different domains include reviews of customers towards company products, movies, and healthcare services, deciding on the positivity or negativity of the comments. Another domain includes doctors’ notes on patients’ symptoms aiming to predict and thus recommend some of the most often used medical tests according to general doctors’ procedures. The features used included a part of or all distinct words roots together with their binary or frequency information. Linear or vector analysis of the feature sets was done employing Machine Learning algorithms provided by the Weka tool. Hybrid features set was proposed and found more efficient combining binary vectors and frequency meta-features from nodes and leaves of J48 tree classifier for all or a set of correlation based selected features, improving both prediction accuracy and classification performance

    Personalizing online reviews for better customer decision making

    Get PDF
    Online consumer reviews have become an important source of information for understanding markets and customer preferences. When making purchase decisions, customers increasingly rely on user-generated online reviews; some even consider the information in online reviews more credible and trustworthy than information provided by vendors. Many studies have revealed that online reviews influence demand and sales. Others have shown the possibility of identifying customer interest in product attributes. However, little work has been done to address customer and review diversity in the process of examining reviews. This research intends to answer the research question: how can we solve the problem of customer and review diversity in the context of online reviews to recommend useful reviews based on customer preferences and improve product recommendation? Our approach to the question is through personalization. Similar to other personalization research, we use an attribute-based model to represent products and customer preferences. Unlike existing personalization research that uses a set of pre-defined product attributes, we explore the possibility of a data-driven approach for identifying more comprehensive product attributes from online reviews to model products and customer preferences. Specifically, we introduce a new topic model for product attribute identification and sentiment analysis. By differentiating word co-occurrences at the sentence level from at the document level, the model better identifies interpretable topics. The use of an inference network with shared structure enables the model to predict product attribute ratings accurately. Based on this topic model, we develop attribute-based representations of products, reviews and customer preferences and use them to construct the personalization of online reviews. We examine personalization from the lens of consumer search theory and human information processing theory and test the hypotheses with an experiment. The personalization of online reviews can 1) recommend products matching customer's preferences; 2) improve custom's intention towards recommended products; 3) best distinguish recommended products from products that do not match customer's preferences; and 4) reduce decision effort

    Integrating Terminology Extraction and Word Embedding for Unsupervised Aspect Based Sentiment Analysis

    Get PDF
    In this paper we explore the advantages that unsupervised terminology extraction can bring to unsupervised Aspect Based Sentiment Analysis methods based on word embedding expansion techniques. We prove that the gain in terms of F-measure is in the order of 3%.Nel presente articolo analizziamo l’interazione tra syistemi di estrazione “classica” terminologica e systemi basati su techniche di “word embedding” nel contesto dell’analisi delle opinioni. Domostreremo che l’integrazione di terminogie porta un guadagno in F-measure pari al 3% sul dataset francese di Semeval 2016

    Aspect-based sentiment analysis: a scalable system, a condition miner, and an evaluation dataset.

    Get PDF
    Aspect-based sentiment analysis systems are a kind of text-mining sys- tems that specialise in summarising the sentiment that a collection of reviews convey regarding some aspects of an item. There are many cases in which users write their reviews using conditional sentences; in such cases, min- ing the conditions so that they can be analysed is very important to improve the interpretation of the corresponding sentiment summaries. Unfortunately, current commercial systems or research systems neglect conditions; cur- rent frameworks and toolkits do not provide any components to mine them; furthermore, the proposals in the literature are insufficient because they are based on hand-crafted patterns that fall short regarding recall or ma- chine learning models that are tightly bound with a specific language and require too much configuration. In this dissertation, we introduce Torii, which is an aspect-based sentiment analysis system whose most salient feature is that it can mine conditions; we also introduce Kami, which provides two deep learning proposals to mine conditions; and we also present Norito, which is the first publicly available dataset of conditions. Our experimental results prove our proposals to mine conditions are similar to the state of the art in terms of precision, but improve recall enough to beat them in terms of F1 score. Finally, it is worth mentioning that this dissertation would not have been possible without the collaboration of Opileak, which backs up the industrial applicability of our work.Los sistemas de análisis de sentimiento basados en aspectos son un ti- po de sistemas de minería de texto que están especializados en resumir el sentimiento que una colección de revisiones transmite respecto a diversos as- pectos de un item. En muchas ocasiones, los usuarios escriben sus revisiones utilizando condiciones; minarlas de manera que puedan ser analizadas es muy importante para mejorar la interpretación de los correspondientes suma- rios de sentimiento. Por desgracia, los sistemas comerciales y los sistemas académicos existentes ignoran las condiciones; los frameworks y bibliote- cas existentes no proporcionan ningún componente para minarlas; además, las propuestas de la bibliografía son insuficientes ya que están basadas en patrones diseñados manualmente que no ofrecen suficiente cobertura o mo- delos de aprendizaje automático que están muy ligados a un idioma concreto y requieren de demasiada configuración específica. En esta tesis presentamos Torii, que es un sistema de análisis de sentimien- to basado en aspectos cuya característica más destacada es que puede minar condiciones; también presentamos Kami, que proporciona dos propuestas de aprendizaje profundo para minar condiciones; finalmente presentamos Norito, que es el primer dataset de condiciones disponible públicamen- te. Nuestros resultados experimentales prueban que nuestras propuestas de minería de condiciones son similares al estado del arte en términos de pre- cisión, pero mejoran la cobertura suficientemente como para batirlos en términos de F1. Finalmente, es digno de mención que esta tesis no habría si- do posible sin la colaboración de Opileak, que garantiza la aplicabilidad industrial de nuestro trabajo

    Neural Approaches to Relational Aspect-Based Sentiment Analysis. Exploring generalizations across words and languages

    Get PDF
    Jebbara S. Neural Approaches to Relational Aspect-Based Sentiment Analysis. Exploring generalizations across words and languages. Bielefeld: Universität Bielefeld; 2020.Everyday, vast amounts of unstructured, textual data are shared online in digital form. Websites such as forums, social media sites, review sites, blogs, and comment sections offer platforms to express and discuss opinions and experiences. Understanding the opinions in these resources is valuable for e.g. businesses to support market research and customer service but also individuals, who can benefit from the experiences and expertise of others. In this thesis, we approach the topic of opinion extraction and classification with neural network models. We regard this area of sentiment analysis as a relation extraction problem in which the sentiment of some opinion holder towards a certain aspect of a product, theme, or event needs to be extracted. In accordance with this framework, our main contributions are the following: 1. We propose a full system addressing all subtasks of relational sentiment analysis. 2. We investigate how semantic web resources can be leveraged in a neural-network-based model for the extraction of opinion targets and the classification of sentiment labels. Specifically, we experiment with enhancing pretrained word embeddings using the lexical resource WordNet. Furthermore, we enrich a purely text-based model with SenticNet concepts and observe an improvement for sentiment classification. 3. We examine how opinion targets can be automatically identified in noisy texts. Customer reviews, for instance, are prone to contain misspelled words and are difficult to process due to their domain-specific language. We integrate information about the character structure of a word into a sequence labeling system using character-level word embeddings and show their positive impact on the system's performance. We reveal encoded character patterns of the learned embeddings and give a nuanced view of the obtained performance differences. 4. Opinion target extraction usually relies on supervised learning approaches. We address the lack of available annotated data for specific languages by proposing a zero-shot cross-lingual approach for the extraction of opinion target expressions. We leverage multilingual word embeddings that share a common vector space across various languages and incorporate these into a convolutional neural network architecture. Our experiments with 5 languages give promising results: We can successfully train a model on annotated data of a source language and perform accurate prediction on a target language without ever using any annotated samples in that target language

    Information and Communication Technologies in Tourism 2022

    Get PDF
    This open access book presents the proceedings of the International Federation for IT and Travel & Tourism (IFITT)’s 29th Annual International eTourism Conference, which assembles the latest research presented at the ENTER2022 conference, which will be held on January 11–14, 2022. The book provides an extensive overview of how information and communication technologies can be used to develop tourism and hospitality. It covers the latest research on various topics within the field, including augmented and virtual reality, website development, social media use, e-learning, big data, analytics, and recommendation systems. The readers will gain insights and ideas on how information and communication technologies can be used in tourism and hospitality. Academics working in the eTourism field, as well as students and practitioners, will find up-to-date information on the status of research
    corecore