236 research outputs found
Recommended from our members
Enhancing Usability and Explainability of Data Systems
The recent growth of data science expanded its reach to an ever-growing user base of nonexperts, increasing the need for usability, understandability, and explainability in these systems. Enhancing usability makes data systems accessible to people with different skills and backgrounds alike, leading to democratization of data systems. Furthermore, proper understanding of data and data-driven systems is necessary for the users to trust the function of the systems that learn from data. Finally, data systems should be transparent: when a data system behaves unexpectedly or malfunctions, the users deserve proper explanation of what caused the observed incident. Unfortunately, most existing data systems offer limited usability and support for explanations: these systems are usable only by experts with sound technical skills, and even expert users are hindered by the lack of transparency into the systems\u27 inner workings and functions. The aim of my thesis is to bridge the usability gap between nonexpert users and complex data systems, aid all sort of users, including the expert ones, in data and system understanding, and provide explanations that help reason about unexpected outcomes involving data systems. Specifically, my thesis has the following three goals: (1) enhancing usability of data systems for nonexperts, (2) enable data understanding that can assist users in a variety of tasks such as achieving trust in data-driven machine learning, gaining data understanding, and data cleaning, and (3) explaining causes of unexpected outcomes involving data and data systems.
For enhancing usability, we focus on example-driven user intent discovery. We develop systems based on example-driven interactions in two different settings: querying relational databases and personalized document summarization. Towards data understanding, we develop a new data-profiling primitive that can characterize tuples for which a machine-learned model is likely to produce untrustworthy predictions. We also develop an explanation framework to explain causes of such untrustworthy predictions. Additionally, this new data-profiling primitive enables interactive data cleaning. Finally, we develop two explanation frameworks, tailored to provide explanations in debugging data system components, including the data itself. The explanation frameworks focus on explaining the root cause of a concurrent application\u27s intermittent failure and exposing issues in the data that cause a data-driven system to malfunction
Analisis Sentimen Ulasan Film Oppenheimer Pada Situs Imdb Menggunakan Metode Naive Bayes
This research aims to analyze the accuracy of Oppenheimer film sentiment based on audience reviews written via the Internet Movie Database (IMDb) website using the Naive Bayes method. Audience reviews on the IMDb site are a valuable source of information in understanding audience opinions and responses to a film. In this research, researchers implemented the Naive Bayes algorithm classification method to classify reviews as positive or negative sentiment. Movie review data from IMDb is collected and entered into the pre-processing stage, then relevant features are extracted to train the Naive Bayes model. The evaluation results show that the Naive Bayes method can recognize sentiment in Oppenheimer film reviews with a significant level of accuracy. The findings of this research provide valuable insight for the film industry in understanding audience responses to these films, and the sentiment information obtained can be used as a basis for better decision making in film development and marketing. However, researchers acknowledge that there are limitations, especially in classification accuracy in reviews that use ambiguous or unclear language. Therefore, future research could involve other methods or combine several methods to improve the accuracy and reliability of sentiment analysis of film reviews.
Key Words: IMDb, Movie Reviews, Naive Bayes, Sentiment Analysis
Penelitian ini bertujuan untuk menganalisis keakuratan sentimen film Oppenheimer berdasarkan ulasan penonton yang ditulis melalui situs web Internet Movie Database (IMDb) menggunakan metode Naive Bayes. Ulasan penonton di situs IMDb merupakan sumber informasi yang berharga dalam memahami pendapat dan tanggapan penonton terhadap suatu film. Dalam penelitian ini, peneliti mengimplementasikan metode klasifikasi algoritma Naive Bayes untuk mengklasifikasikan ulasan sebagai sentimen positif atau negatif. Data ulasan film dari IMDb dikumpulkan dan masuk ke tahap pre-processing, kemudian fitur-fitur yang relevan diekstraksi untuk melatih model Naive Bayes. Hasil evaluasi menunjukkan bahwa metode Naive Bayes dapat mengenali sentimen dalam ulasan film Oppenheimer dengan tingkat akurasi yang signifikan. Temuan penelitian ini memberikan wawasan berharga bagi industri film dalam memahami respons penonton terhadap film ini, dan informasi sentimen yang diperoleh dapat digunakan sebagai dasar untuk pengambilan keputusan yang lebih baik dalam pengembangan film dan pemasaran. Meskipun demikian, peneliti mengakui adanya keterbatasan, terutama dalam akurasi klasifikasi pada ulasan yang menggunakan bahasa yang ambigu atau tidak jelas. Oleh karena itu, untuk penelitian ke depannya dapat melibatkan metode lain atau menggabungkan beberapa metode untuk meningkatkan akurasi dan keandalan analisis sentimen ulasan film ini.
Kata Kunci: Analisis Sentimen, IMDb, Naive Bayes, Ulasan Film
 
Peeking into the other half of the glass : handling polarization in recommender systems.
This dissertation is about filtering and discovering information online while using recommender systems. In the first part of our research, we study the phenomenon of polarization and its impact on filtering and discovering information. Polarization is a social phenomenon, with serious consequences, in real-life, particularly on social media. Thus it is important to understand how machine learning algorithms, especially recommender systems, behave in polarized environments. We study polarization within the context of the users\u27 interactions with a space of items and how this affects recommender systems. We first formalize the concept of polarization based on item ratings and then relate it to the item reviews, when available. We then propose a domain independent data science pipeline to automatically detect polarization using the ratings rather than the properties, typically used to detect polarization, such as item\u27s content or social network topology. We perform an extensive comparison of polarization measures on several benchmark data sets and show that our polarization detection framework can detect different degrees of polarization and outperforms existing measures in capturing an intuitive notion of polarization. We also investigate and uncover certain peculiar patterns that are characteristic of environments where polarization emerges: A machine learning algorithm finds it easier to learn discriminating models in polarized environments: The models will quickly learn to keep each user in the safety of their preferred viewpoint, essentially, giving rise to filter bubbles and making them easier to learn. After quantifying the extent of polarization in current recommender system benchmark data, we propose new counter-polarization approaches for existing collaborative filtering recommender systems, focusing particularly on the state of the art models based on Matrix Factorization. Our work represents an essential step toward the new research area concerned with quantifying, detecting and counteracting polarization in human-generated data and machine learning algorithms.We also make a theoretical analysis of how polarization affects learning latent factor models, and how counter-polarization affects these models. In the second part of our dissertation, we investigate the problem of discovering related information by recommendation of tags on social media micro-blogging platforms. Real-time micro-blogging services such as Twitter have recently witnessed exponential growth, with millions of active web users who generate billions of micro-posts to share information, opinions and personal viewpoints, daily. However, these posts are inherently noisy and unstructured because they could be in any format, hence making them difficult to organize for the purpose of retrieval of relevant information. One way to solve this problem is using hashtags, which are quickly becoming the standard approach for annotation of various information on social media, such that varied posts about the same or related topic are annotated with the same hashtag. However hashtags are not used in a consistent manner and most importantly, are completely optional to use. This makes them unreliable as the sole mechanism for searching for relevant information. We investigate mechanisms for consolidating the hashtag space using recommender systems. Our methods are general enough that they can be used for hashtag annotation in various social media services such as twitter, as well as for general item recommendations on systems that rely on implicit user interest data such as e-learning and news sites, or explicit user ratings, such as e-commerce and online entertainment sites. To conclude, we propose a methodology to extract stories based on two types of hashtag co-occurrence graphs. Our research in hashtag recommendation was able to exploit the textual content that is available as part of user messages or posts, and thus resulted in hybrid recommendation strategies. Using content within this context can bridge polarization boundaries. However, when content is not available, is missing, or is unreliable, as in the case of platforms that are rich in multimedia and multilingual posts, the content option becomes less powerful and pure collaborative filtering regains its important role, along with the challenges of polarization
Text-based Sentiment Analysis and Music Emotion Recognition
Nowadays, with the expansion of social media, large amounts of user-generated
texts like tweets, blog posts or product reviews are shared online. Sentiment polarity
analysis of such texts has become highly attractive and is utilized in recommender
systems, market predictions, business intelligence and more. We also witness deep
learning techniques becoming top performers on those types of tasks. There are
however several problems that need to be solved for efficient use of deep neural
networks on text mining and text polarity analysis.
First of all, deep neural networks are data hungry. They need to be fed with
datasets that are big in size, cleaned and preprocessed as well as properly labeled.
Second, the modern natural language processing concept of word embeddings as a
dense and distributed text feature representation solves sparsity and dimensionality
problems of the traditional bag-of-words model. Still, there are various uncertainties
regarding the use of word vectors: should they be generated from the same dataset
that is used to train the model or it is better to source them from big and popular
collections that work as generic text feature representations? Third, it is not easy for
practitioners to find a simple and highly effective deep learning setup for various
document lengths and types. Recurrent neural networks are weak with longer texts
and optimal convolution-pooling combinations are not easily conceived. It is thus
convenient to have generic neural network architectures that are effective and can
adapt to various texts, encapsulating much of design complexity.
This thesis addresses the above problems to provide methodological and practical
insights for utilizing neural networks on sentiment analysis of texts and achieving
state of the art results. Regarding the first problem, the effectiveness of various
crowdsourcing alternatives is explored and two medium-sized and emotion-labeled
song datasets are created utilizing social tags. One of the research interests of Telecom
Italia was the exploration of relations between music emotional stimulation and
driving style. Consequently, a context-aware music recommender system that aims
to enhance driving comfort and safety was also designed. To address the second
problem, a series of experiments with large text collections of various contents and
domains were conducted. Word embeddings of different parameters were exercised
and results revealed that their quality is influenced (mostly but not only) by the
size of texts they were created from. When working with small text datasets, it is
thus important to source word features from popular and generic word embedding
collections. Regarding the third problem, a series of experiments involving convolutional
and max-pooling neural layers were conducted. Various patterns relating
text properties and network parameters with optimal classification accuracy were
observed. Combining convolutions of words, bigrams, and trigrams with regional
max-pooling layers in a couple of stacks produced the best results. The derived
architecture achieves competitive performance on sentiment polarity analysis of
movie, business and product reviews.
Given that labeled data are becoming the bottleneck of the current deep learning
systems, a future research direction could be the exploration of various data programming
possibilities for constructing even bigger labeled datasets. Investigation
of feature-level or decision-level ensemble techniques in the context of deep neural
networks could also be fruitful. Different feature types do usually represent complementary
characteristics of data. Combining word embedding and traditional text
features or utilizing recurrent networks on document splits and then aggregating the
predictions could further increase prediction accuracy of such models
CHESTNUT: Improve serendipity in movie recommendation by an Information Theory-based collaborative filtering approach
The term serendipity has been understood narrowly in the Recommender System. Applying a user-centered approach, user-friendly serendipitous recommender systems are expected to be developed based on a good understanding of serendipity. In this paper, we introduce CHESTNUT , a memory-based movie collaborative filtering system to improve serendipity performance. Relying on a proposed Information Theory-based algorithm and previous study, we demonstrate a method of successfully injecting insight, unexpectedness and usefulness, which are key metrics for a more comprehensive understanding of serendipity, into a practical serendipitous runtime system. With lightweight experiments, we have revealed a few runtime issues and further optimized the same. We have evaluated CHESTNUT in both practicability and effectiveness , and the results show that it is fast, scalable and improves serendip-ity performance significantly, compared with mainstream memory-based collaborative filtering. The source codes of CHESTNUT are online at https://github.com/unnc-idl-ucc/CHESTNUT/
Content Discovery in Online Services: A Case Study on a Video on Demand System
Video-on-demand services have gained popularity in recent years for the large catalogue of content they offer and the ability to watch them at any desired time. Having many options to choose from may be overwhelming for the users and affect negatively the overall experience. The use of recommender systems has been proven to help users discover relevant content faster. However, content discovery is affected not only by the number of choices, but also by the way the content is displayed to the user. Moreover, the development of recommender systems has been commonly focused on increasing their prediction accuracy, rather than the usefulness and user experience.
This work takes on a user-centric approach to designing an efficient content discovery experience for its users. The main contribution of this research is a set of guidelines for designing the user interface and recommender system for the aforementioned purpose, formulated based on a user study and existing research. The guidelines were additionally translated into interface designs, which were then evaluated with users. The results showed that users were satisfied with the proposed design and the goal of providing a better content discovery experience was achieved. Moreover, the guidelines were found feasible by the company in which the research was conducted and thus have a high potential to work in a real product.
With this research, I aim to highlight the importance of improving the content discovery process both from the perspective of the user interface and a recommender system, and encourage researchers to consider the user experience in those aspects
USING META-DATA FROM FREE-TEXT USER-GENERATED CONTENT TO IMPROVE PERSONALIZED RECOMMENDATION BY REDUCING SPARSITY
Ph.DDOCTOR OF PHILOSOPH
Recommendations based on social links
The goal of this chapter is to give an overview of recent works on the development of social link-based recommender systems and to offer insights on related issues, as well as future directions for research. Among several kinds of social recommendations, this chapter focuses on recommendations, which are based on users’ self-defined (i.e., explicit) social links and suggest items, rather than people of interest. The chapter starts by reviewing the needs for social link-based recommendations and studies that explain the viability of social networks as useful information sources. Following that, the core part of the chapter dissects and examines modern research on social link-based recommendations along several dimensions. It concludes with a discussion of several important issues and future directions for social link-based recommendation research
- …