5 research outputs found
Exploring Latent Semantic Factors to Find Useful Product Reviews
Online reviews provided by consumers are a valuable asset for e-Commerce
platforms, influencing potential consumers in making purchasing decisions.
However, these reviews are of varying quality, with the useful ones buried deep
within a heap of non-informative reviews. In this work, we attempt to
automatically identify review quality in terms of its helpfulness to the end
consumers. In contrast to previous works in this domain exploiting a variety of
syntactic and community-level features, we delve deep into the semantics of
reviews as to what makes them useful, providing interpretable explanation for
the same. We identify a set of consistency and semantic factors, all from the
text, ratings, and timestamps of user-generated reviews, making our approach
generalizable across all communities and domains. We explore review semantics
in terms of several latent factors like the expertise of its author, his
judgment about the fine-grained facets of the underlying product, and his
writing style. These are cast into a Hidden Markov Model -- Latent Dirichlet
Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii)
item facets, and (iii) review helpfulness. Large-scale experiments on five
real-world datasets from Amazon show significant improvement over
state-of-the-art baselines in predicting and ranking useful reviews
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities
One of the major hurdles preventing the full exploitation of information from
online communities is the widespread concern regarding the quality and
credibility of user-contributed content. Prior works in this domain operate on
a static snapshot of the community, making strong assumptions about the
structure of the data (e.g., relational tables), or consider only shallow
features for text classification.
To address the above limitations, we propose probabilistic graphical models
that can leverage the joint interplay between multiple factors in online
communities --- like user interactions, community dynamics, and textual content
--- to automatically assess the credibility of user-contributed online content,
and the expertise of users and their evolution with user-interpretable
explanation. To this end, we devise new models based on Conditional Random
Fields for different settings like incorporating partial expert knowledge for
semi-supervised learning, and handling discrete labels as well as numeric
ratings for fine-grained analysis. This enables applications such as extracting
reliable side-effects of drugs from user-contributed posts in healthforums, and
identifying credible content in news communities.
Online communities are dynamic, as users join and leave, adapt to evolving
trends, and mature over time. To capture this dynamics, we propose generative
models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian
Motion to trace the continuous evolution of user expertise and their language
model over time. This allows us to identify expert users and credible content
jointly over time, improving state-of-the-art recommender systems by explicitly
considering the maturity of users. This also enables applications such as
identifying helpful product reviews, and detecting fake and anomalous reviews
with limited information.Comment: PhD thesis, Mar 201
Context-aware Helpfulness Prediction for Online Product Reviews
Modeling and prediction of review helpfulness has become more predominant due
to proliferation of e-commerce websites and online shops. Since the
functionality of a product cannot be tested before buying, people often rely on
different kinds of user reviews to decide whether or not to buy a product.
However, quality reviews might be buried deep in the heap of a large amount of
reviews. Therefore, recommending reviews to customers based on the review
quality is of the essence. Since there is no direct indication of review
quality, most reviews use the information that ''X out of Y'' users found the
review helpful for obtaining the review quality. However, this approach
undermines helpfulness prediction because not all reviews have statistically
abundant votes. In this paper, we propose a neural deep learning model that
predicts the helpfulness score of a review. This model is based on
convolutional neural network (CNN) and a context-aware encoding mechanism which
can directly capture relationships between words irrespective of their distance
in a long sequence. We validated our model on human annotated dataset and the
result shows that our model significantly outperforms existing models for
helpfulness prediction.Comment: Published as a proceeding paper in AIRS 201
Credibility analysis of textual claims with explainable evidence
Despite being a vast resource of valuable information, the Web has been polluted by the spread of false claims. Increasing hoaxes, fake news, and misleading information on the Web have given rise to many fact-checking websites that manually assess these doubtful claims. However, the rapid speed and large scale of misinformation spread have become the bottleneck for manual verification. This calls for credibility assessment tools that can automate this verification process. Prior works in this domain make strong assumptions about the structure of the claims and the communities where they are made. Most importantly, black-box techniques proposed in prior works lack the ability to explain why a certain statement is deemed credible or not. To address these limitations, this dissertation proposes a general framework for automated credibility assessment that does not make any assumption about the structure or origin of the claims. Specifically, we propose a feature-based model, which automatically retrieves relevant articles about the given claim and assesses its credibility by capturing the mutual interaction between the language style of the relevant articles, their stance towards the claim, and the trustworthiness of the underlying web sources. We further enhance our credibility assessment approach and propose a neural-network-based model. Unlike the feature-based model, this model does not rely on feature engineering and external lexicons. Both our models make their assessments interpretable by extracting explainable evidence from judiciously selected web sources.
We utilize our models and develop a Web interface, CredEye, which enables users to automatically assess the credibility of a textual claim and dissect into the assessment by browsing through judiciously and automatically selected evidence snippets. In addition, we study the problem of stance classification and propose a neural-network-based model for predicting the stance of diverse user perspectives regarding the controversial claims. Given a controversial claim and a user comment, our stance classification model predicts whether the user comment is supporting or opposing the claim.Das Web ist eine riesige Quelle wertvoller Informationen, allerdings wurde es durch die Verbreitung von Falschmeldungen verschmutzt. Eine zunehmende Anzahl an Hoaxes, Falschmeldungen und irreführenden Informationen im Internet haben viele Websites hervorgebracht, auf denen die Fakten überprüft und zweifelhafte Behauptungen manuell bewertet werden. Die rasante Verbreitung großer Mengen von Fehlinformationen sind jedoch zum Engpass für die manuelle Überprüfung geworden. Dies erfordert Tools zur Bewertung der Glaubwürdigkeit, mit denen dieser Überprüfungsprozess automatisiert werden kann. In früheren Arbeiten in diesem Bereich werden starke Annahmen gemacht über die Struktur der Behauptungen und die Portale, in denen sie gepostet werden. Vor allem aber können die Black-Box-Techniken, die in früheren Arbeiten vorgeschlagen wurden, nicht erklären, warum eine bestimmte Aussage als glaubwürdig erachtet wird oder nicht. Um diesen Einschränkungen zu begegnen, wird in dieser Dissertation ein allgemeines Framework für die automatisierte Bewertung der Glaubwürdigkeit vorgeschlagen, bei dem keine Annahmen über die Struktur oder den Ursprung der Behauptungen gemacht werden. Insbesondere schlagen wir ein featurebasiertes Modell vor, das automatisch relevante Artikel zu einer bestimmten Behauptung abruft und deren Glaubwürdigkeit bewertet, indem die gegenseitige Interaktion zwischen dem Sprachstil der relevanten Artikel, ihre Haltung zur Behauptung und der Vertrauenswürdigkeit der zugrunde liegenden Quellen erfasst wird. Wir verbessern unseren Ansatz zur Bewertung der Glaubwürdigkeit weiter und schlagen ein auf neuronalen Netzen basierendes Modell vor. Im Gegensatz zum featurebasierten Modell ist dieses Modell nicht auf Feature-Engineering und externe Lexika angewiesen. Unsere beiden Modelle machen ihre Einschätzungen interpretierbar, indem sie erklärbare Beweise aus sorgfältig ausgewählten Webquellen extrahieren. Wir verwenden unsere Modelle zur Entwicklung eines Webinterfaces, CredEye, mit dem Benutzer die Glaubwürdigkeit einer Behauptung in Textform automatisch bewerten und verstehen können, indem sie automatisch ausgewählte Beweisstücke einsehen. Darüber hinaus untersuchen wir das Problem der Positionsklassifizierung und schlagen ein auf neuronalen Netzen basierendes Modell vor, um die Position verschiedener Benutzerperspektiven in Bezug auf die umstrittenen Behauptungen vorherzusagen. Bei einer kontroversen Behauptung und einem Benutzerkommentar sagt unser Einstufungsmodell voraus, ob der Benutzerkommentar die Behauptung unterstützt oder ablehnt
Deep Learning based Models for Classification from Natural Language Processing to Computer Vision
With the availability of large scale data sets, researchers in many different areas such as natural language processing, computer vision, recommender systems have started making use of deep learning models and have achieved great progress in recent years. In this dissertation, we study three important classification problems based on deep learning models.
First, with the fast growth of e-commerce, more people choose to purchase products online and browse reviews before making decisions. It is essential to build a model to identify helpful reviews automatically. Our work is inspired by the observation that a customer\u27s expectation of a review can be greatly affected by review sentiment and the degree to which the customer is aware of pertinent product information. To model such customer expectation and capture important information from a review text, we propose a novel neural network which encodes the sentiment of a review through an attention module, and introduces a product attention layer that fuses information from both the target product and related products. Our experimental results for the task of identifying whether a review is helpful or not show an AUC improvement of 5.4\% and 1.5\% over the previous state of the art model on Amazon and Yelp data sets, respectively. We further validate the effectiveness of each attention layer of our model in two application scenarios. The results demonstrate that both attention layers contribute to the model performance, and the combination of them has a synergistic effect. We also evaluate our model performance as a recommender system using three commonly used metrics: NDCG@10, Precision@10 and Recall@10. Our model outperforms PRH-Net, a state-of-the-art model, on all three of these metrics.
Second, real-time bidding (RTB) that features per-impression-level real-time ad auctions has become a popular practice in today\u27s digital advertising industry. In RTB, click-through rate (CTR) prediction is a fundamental problem to ensure the success of an ad campaign and boost revenue. We present a dynamic CTR prediction model designed for the Samsung demand-side platform (DSP). We identify two key technical challenges that have not been fully addressed by the existing solutions: the dynamic nature of RTB and user information scarcity. To address both challenges, we develop a \ourmodel model. Our model effectively captures the dynamic evolutions of both users and ads and integrates auxiliary data sources (e.g., installed apps) to better model users\u27 preferences. We put forward a novel interaction layer that fuses both explicit user responses (e.g., clicks on ads) and auxiliary data sources to generate consolidated user preference representations. We evaluate our model using a large amount of data collected from the Samsung advertising platform and compare our method against several state-of-the-art methods that are likely suitable for real-world deployment. The evaluation results demonstrate the effectiveness of our method and the potential for production.
Third, for Highway Performance Monitoring System (HPMS) purposes, the South Carolina Department of Transportation (SCDOT) must provide to the Federal Highway Administration (FHA) a classification of vehicles. However, due to limited lighting conditions at nighttime, classifying vehicles at nighttime is quite challenging. To solve this problem, we designed three CNN models to operate on thermal images. These three models have different architectures. Of these, model 2 achieves the best performance. Based on model 2, to avoid over-fitting and improve the performance further, we propose two training-test methods based on data augmentation technique. The experimental results demonstrate that the second training-test method improves the performance of model 2 further with regard to both accuracy and f1-score