230 research outputs found

    Characterizing and Predicting Early Reviewers for Effective Product Marketing on E-Commerce Websites

    Get PDF
    Online reviews have become an important source of information for users before making an informed purchase decision. Early reviews of a product tend to have a high impact on the subsequent product sales. In this paper, we take the initiative to study the behavior characteristics of early reviewers through their posted reviews on two real-world large e-commerce platforms, i.e., Amazon and Yelp. In specific, we divide product lifetime into three consecutive stages, namely early, majority and laggards. A user who has posted a review in the early stage is considered as an early reviewer. We quantitatively characterize early reviewers based on their rating behaviors, the helpfulness scores received from others and the correlation of their reviews with product popularity. We have found that (1) an early reviewer tends to assign a higher average rating score; and (2) an early reviewer tends to post more helpful reviews. Our analysis of product reviews also indicates that early reviewers' ratings and their received helpfulness scores are likely to influence product popularity. By viewing review posting process as a multiplayer competition game, we propose a novel margin-based embedding model for early reviewer prediction. Extensive experiments on two different e-commerce datasets have shown that our proposed approach outperforms a number of competitive baselines

    Three Essays on the Role of Unstructured Data in Marketing Research

    Get PDF
    This thesis studies the use of firm and user-generated unstructured data (e.g., text and videos) for improving market research combining advances in text, audio and video processing with traditional economic modeling. The first chapter is joint work with K. Sudhir and Minkyung Kim. It addresses two significant challenges in using online text reviews to obtain fine-grained attribute level sentiment ratings. First, we develop a deep learning convolutional-LSTM hybrid model to account for language structure, in contrast to methods that rely on word frequency. The convolutional layer accounts for the spatial structure (adjacent word groups or phrases) and LSTM accounts for the sequential structure of language (sentiment distributed and modified across non-adjacent phrases). Second, we address the problem of missing attributes in text in constructing attribute sentiment scores---as reviewers write only about a subset of attributes and remain silent on others. We develop a model-based imputation strategy using a structural model of heterogeneous rating behavior. Using Yelp restaurant review data, we show superior accuracy in converting text to numerical attribute sentiment scores with our model. The structural model finds three reviewer segments with different motivations: status seeking, altruism/want voice, and need to vent/praise. Interestingly, our results show that reviewers write to inform and vent/praise, but not based on attribute importance. Our heterogeneous model-based imputation performs better than other common imputations; and importantly leads to managerially significant corrections in restaurant attribute ratings. The second essay, which is joint work with Aniko Oery and Joyee Deb is an information-theoretic model to study what causes selection in valence in user-generated reviews. The propensity of consumers to engage in word-of-mouth (WOM) differs after good versus bad experiences, which can result in positive or negative selection of user-generated reviews. We show how the strength of brand image (dispersion of consumer beliefs about quality) and the informativeness of good and bad experiences impacts selection of WOM in equilibrium. WOM is costly: Early adopters talk only if they can affect the receiver’s purchase. If the brand image is strong (consumer beliefs are homogeneous), only negative WOM can arise. With a weak brand image or heterogeneous beliefs, positive WOM can occur if positive experiences are sufficiently informative. Using data from Yelp.com, we show how strong brands (chain restaurants) systematically receive lower evaluations controlling for several restaurant and reviewer characteristics. The third essay which is joint work with K.Sudhir and Khai Chiong studies success factors of persuasive sales pitches from a multi-modal video dataset of buyer-seller interactions. A successful sales pitch is an outcome of both the content of the message as well as style of delivery. Moreover, unlike one-way interactions like speeches, sales pitches are a two-way process and hence interactivity as well as matching the wavelength of the buyer are also critical to the success of the pitch. We extract four groups of features: content-related, style-related, interactivity and similarity in order to build a predictive model of sales pitch effectiveness

    Leveraging distant supervision for improved named entity recognition

    Full text link
    Les techniques d'apprentissage profond ont fait un bond au cours des dernières années, et ont considérablement changé la manière dont les tâches de traitement automatique du langage naturel (TALN) sont traitées. En quelques années, les réseaux de neurones et les plongements de mots sont rapidement devenus des composants centraux à adopter dans le domaine. La supervision distante (SD) est une technique connue en TALN qui consiste à générer automatiquement des données étiquetées à partir d'exemples partiellement annotés. Traditionnellement, ces données sont utilisées pour l'entraînement en l'absence d'annotations manuelles, ou comme données supplémentaires pour améliorer les performances de généralisation. Dans cette thèse, nous étudions comment la supervision distante peut être utilisée dans un cadre d'un TALN moderne basé sur l'apprentissage profond. Puisque les algorithmes d'apprentissage profond s'améliorent lorsqu'une quantité massive de données est fournie (en particulier pour l'apprentissage des représentations), nous revisitons la génération automatique des données avec la supervision distante à partir de Wikipédia. On applique des post-traitements sur Wikipédia pour augmenter la quantité d'exemples annotés, tout en introduisant une quantité raisonnable de bruit. Ensuite, nous explorons différentes méthodes d'utilisation de données obtenues par supervision distante pour l'apprentissage des représentations, principalement pour apprendre des représentations de mots classiques (statistiques) et contextuelles. À cause de sa position centrale pour de nombreuses applications du TALN, nous choisissons la reconnaissance d'entité nommée (NER) comme tâche principale. Nous expérimentons avec des bancs d’essai NER standards et nous observons des performances état de l’art. Ce faisant, nous étudions un cadre plus intéressant, à savoir l'amélioration des performances inter-domaines (généralisation).Recent years have seen a leap in deep learning techniques that greatly changed the way Natural Language Processing (NLP) tasks are tackled. In a couple of years, neural networks and word embeddings quickly became central components to be adopted in the domain. Distant supervision (DS) is a well-used technique in NLP to produce labeled data from partially annotated examples. Traditionally, it was mainly used as training data in the absence of manual annotations, or as additional training data to improve generalization performances. In this thesis, we study how distant supervision can be employed within a modern deep learning based NLP framework. As deep learning algorithms gets better when massive amount of data is provided (especially for representation learning), we revisit the task of generating distant supervision data from Wikipedia. We apply post-processing treatments on the original dump to further increase the quantity of labeled examples, while introducing a reasonable amount of noise. Then, we explore different methods for using distant supervision data for representation learning, mainly to learn classic and contextualized word representations. Due to its importance as a basic component in many NLP applications, we choose Named-Entity Recognition (NER) as our main task. We experiment on standard NER benchmarks showing state-of-the-art performances. By doing so, we investigate a more interesting setting, that is, improving the cross-domain (generalization) performances

    Unsupervised Pretraining of Neural Networks with Multiple Targets using Siamese Architectures

    Get PDF
    A model's response for a given input pattern depends on the seen patterns in the training data. The larger the amount of training data, the more likely edge cases are covered during training. However, the more complex input patterns are, the larger the model has to be. For very simple use cases, a relatively small model can achieve very high test accuracy in a matter of minutes. On the other hand, a large model has to be trained for multiple days. The actual time to develop a model of that size can be considered to be even greater since often many different architecture types and hyper-parameter configurations have to be tried. An extreme case for a large model is the recently released GPT-3 model. This model consists of 175 billion parameters and was trained using 45 terabytes of text data. The model was trained to generate text and is able to write news articles and source code based only on a rough description. However, a model like this is only creatable for researchers with access to special hardware or immense amounts of data. Thus, it is desirable to find less resource-intensive training approaches to enable other researchers to create well performing models. This thesis investigates the use of pre-trained models. If a model has been trained on one dataset and is then trained on another similar data, it faster learns to adjust to similar patterns than a model that has not yet seen any of the task's pattern. Thus, the learned lessons from one training are transferred to another task. During pre-training, the model is trained to solve a specific task like predicting the next word in a sequence or first encoding an input image before decoding it. Such models contain an encoder and a decoder part. When transferring that model to another task, parts of the model's layers will be removed. As a result, having to discard fewer weights results in faster training since less time has to be spent on training parts of a model that are only needed to solve an auxiliary task. Throughout this thesis, the concept of siamese architectures will be discussed since when using that architecture, no parameters have to be discarded when transferring a model trained with that approach onto another task. Thus, the siamese pre-training approach positively impacts the need for resources like time and energy use and drives the development of new models in the direction of Green AI. The models trained with this approach will be evaluated by comparing them to models trained with other pre-training approaches as well as large existing models. It will be shown that the models trained for the tasks in this thesis perform as good as externally pre-trained models, given the right choice of data and training targets: It will be shown that the number and type of training targets during pre-training impacts a model's performance on transfer learning tasks. The use cases presented in this thesis cover different data from different domains to show that the siamese training approach is widely applicable. Consequently, researchers are motivated to create their own pre-trained models for data domains, for which there are no existing pre-trained models.Die Vorhersage eines Models hängt davon ab, welche Muster in den während des Trainings benutzen Daten vorhanden sind. Je größer die Menge an Trainingsdaten ist, desto wahrscheinlicher ist es, dass Grenzfälle in den Daten vorkommen. Je größer jedoch die Anzahl der zu lernenden Mustern ist, desto größer muss jedoch das Modell sein. Für einfache Anwendungsfälle ist es möglich ein kleines Modell in wenigen Minuten zu trainieren um bereits gute Ergebnisse auf Testdaten zu erhalten. Für komplexe Anwendungsfälle kann ein dementsprechend großes Modell jedoch bis zu mehrere Tage benötigen um ausreichend gut zu sein. Ein Extremfall für ein großes Modell ist das kürzlich veröffentlichte Modell mit dem Namen GPT-3, welches aus 175 Milliarden Parametern besteht und mit Trainingsdaten in der Größenordnung von 45 Terabyte trainiert wurde. Das Modell wurde trainiert Text zu generieren und ist in der Lage Nachrichtenartikel zu generieren, basierend auf einer groben Ausgangsbeschreibung. Solch ein Modell können nur solche Forscher entwickeln, die Zugang zu entsprechender Hardware und Datenmengen haben. Es demnach von Interesse Trainingsvorgehen dahingehend zu verbessern, dass auch mit wenig vorhandenen Ressourcen Modelle für komplexe Anwendungsfälle trainiert werden können. Diese Arbeit beschäfigt sich mit dem Vortrainieren von neuronalen Netzen. Wenn ein neuronales Netz auf einem Datensatz trainiert wurde und dann auf einem zweiten Datensatz weiter trainiert wird, lernt es die Merkmale des zweiten Datensatzes schneller, da es nicht von Grund auf Muster lernen muss sondern auf bereits gelerntes zurückgreifen kann. Man spricht dann davon, dass das Wissen transferiert wird. Während des Vortrainierens bekommt ein Modell häufig eine Aufgabe wie zum Beispiel, im Fall von Bilddaten, die Trainingsdaten erst zu komprimieren und dann wieder herzustellen. Bei Textdaten könnte ein Modell vortrainiert werden, indem es einen Satz als Eingabe erhält und dann den nächsten Satz aus dem Quelldokument vorhersagen muss. Solche Modelle bestehen dementsprechend aus einem Encoder und einem Decoder. Der Nachteil bei diesem Vorgehen ist, dass der Decoder lediglich für das Vortrainieren benötigt wird und für den späteren Anwendungsfall nur der Encoder benötigt wird. Zentraler Bestandteil in dieser Arbeit ist deswegen das Untersuchen der Vorteile und Nachteile der siamesische Modellarchitektur. Diese Architektur besteht lediglich aus einem Encoder, was dazu führt, dass das Vortrainieren kostengünstiger ist, da weniger Gewichte trainiert werden müssen. Der wesentliche wissenschaftliche Beitrag liegt darin, dass die siamische Architektur ausführlich verglichen wird mit vergleichbaren Ansätzen. Dabei werden bestimmte Nachteile gefunden, wie zum Beispiel dass die Auswahl einer Ähnlichkeitsfunktion oder das Zusammenstellen der Trainingsdaten große Auswirkung auf das Modelltraining haben. Es wird erarbeitet, welche Ähnlichkeitsfunktion in welchen Kontexten empfohlen wird sowie wie andere Nachteile der siamischen Architektur durch die Anpassung der Trainingsziele ausgeglichen werden können. Die entsprechenden Experimente werden dabei auf Daten aus unterschiedlichen Domänen ausgeführt um zu zeigen, dass der entsprechende Ansatz universell anwendbar ist. Die Ergebnisse aus konkreten Anwendungsfällen zeigen außerdem, dass die innerhalb dieser Arbeit entwickelten Modelle ähnlich gut abschneiden wie extern verfügbare Modelle, welche mit großem Ressourcenaufwand trainiert worden sind. Dies zeigt, dass mit Bedacht erarbeitete Architekturen die benötigten Ressourcen verringern können

    A Systematic Comparison of English Noun Compound Representations

    Full text link
    Building meaningful representations of noun compounds is not trivial since many of them scarcely appear in the corpus. To that end, composition functions approximate the distributional representation of a noun compound by combining its constituent distributional vectors. In the more general case, phrase embeddings have been trained by minimizing the distance between the vectors representing paraphrases. We compare various types of noun compound representations, including distributional, compositional, and paraphrase-based representations, through a series of tasks and analyses, and with an extensive number of underlying word embeddings. We find that indeed, in most cases, composition functions produce higher quality representations than distributional ones, and they improve with computational power. No single function performs best in all scenarios, suggesting that a joint training objective may produce improved representations.Comment: MWE workshop @ ACL 201

    A study of deep learning and its applications to face recognition techniques

    Get PDF
    El siguiente trabajo es el resultado de la tesis de maestría de Fernando Suzacq. La tesis se centró alrededor de la investigación sobre el reconocimiento facial en 3D, sin la reconstrucción de la profundidad ni la utilización de modelos 3D genéricos. Esta investigación resultó en la escritura de un paper y su posterior publicación en IEEE Transactions on Pattern Analysis and Machine Intelligence. Mediante el uso de iluminación activa, se mejora el reconocimiento facial en 2D y se lo hace más robusto a condiciones de baja iluminación o ataques de falsificación de identidad. La idea central del trabajo es la proyección de un patrón de luz de alta frecuencia sobre la cara de prueba. De la captura de esta imagen, nos es posible recuperar información real 3D, que se desprende de las deformaciones de este patrón, junto con una imagen 2D de la cara de prueba. Este proceso evita tener que lidiar con la difícil tarea de reconstrucción 3D. En el trabajo se presenta la teoría que fundamenta este proceso, se explica su construcción y se proveen los resultados de distintos experimentos realizados que sostienen su validez y utilidad. Para el desarrollo de esta investigación, fue necesario el estudio de la teoría existente y una revisión del estado del arte en este problema particular. Parte del resultado de este trabajo se presenta también en este documento, como marco teórico sobre la publicación

    Integrating artificial intelligence into an ophthalmologist’s workflow: obstacles and opportunities

    Get PDF
    Introduction: Demand in clinical services within the field of ophthalmology is predicted to rise over the future years. Artificial intelligence, in particular, machine learning-based systems, have demonstrated significant potential in optimizing medical diagnostics, predictive analysis, and management of clinical conditions. Ophthalmology has been at the forefront of this digital revolution, setting precedents for integration of these systems into clinical workflows. Areas covered: This review discusses integration of machine learning tools within ophthalmology clinical practices. We discuss key issues around ethical consideration, regulation, and clinical governance. We also highlight challenges associated with clinical adoption, sustainability, and discuss the importance of interoperability. Expert opinion: Clinical integration is considered one of the most challenging stages within the implementation process. Successful integration necessitates a collaborative approach from multiple stakeholders around a structured governance framework, with emphasis on standardization across healthcare providers and equipment and software developers

    Decoding the Real World: Tackling Virtual Ethnographic Challenges through Data-Driven Methods

    Get PDF

    Review on recent advances in information mining from big consumer opinion data for product design

    Get PDF
    In this paper, based on more than ten years' studies on this dedicated research thrust, a comprehensive review concerning information mining from big consumer opinion data in order to assist product design is presented. First, the research background and the essential terminologies regarding online consumer opinion data are introduced. Next, studies concerning information extraction and information utilization of big consumer opinion data for product design are reviewed. Studies on information extraction of big consumer opinion data are explained from various perspectives, including data acquisition, opinion target recognition, feature identification and sentiment analysis, opinion summarization and sampling, etc. Reviews on information utilization of big consumer opinion data for product design are explored in terms of how to extract critical customer needs from big consumer opinion data, how to connect the voice of the customers with product design, how to make effective comparisons and reasonable ranking on similar products, how to identify ever-evolving customer concerns efficiently, and so on. Furthermore, significant and practical aspects of research trends are highlighted for future studies. This survey will facilitate researchers and practitioners to understand the latest development of relevant studies and applications centered on how big consumer opinion data can be processed, analyzed, and exploited in aiding product design
    • …
    corecore