7,970 research outputs found
Diverse personalized recommendations with uncertainty from implicit preference data with the Bayesian Mallows Model
Clicking data, which exists in abundance and contains objective user
preference information, is widely used to produce personalized recommendations
in web-based applications. Current popular recommendation algorithms, typically
based on matrix factorizations, often have high accuracy and achieve good
clickthrough rates. However, diversity of the recommended items, which can
greatly enhance user experiences, is often overlooked. Moreover, most
algorithms do not produce interpretable uncertainty quantifications of the
recommendations. In this work, we propose the Bayesian Mallows for Clicking
Data (BMCD) method, which augments clicking data into compatible full ranking
vectors by enforcing all the clicked items to be top-ranked. User preferences
are learned using a Mallows ranking model. Bayesian inference leads to
interpretable uncertainties of each individual recommendation, and we also
propose a method to make personalized recommendations based on such
uncertainties. With a simulation study and a real life data example, we
demonstrate that compared to state-of-the-art matrix factorization, BMCD makes
personalized recommendations with similar accuracy, while achieving much higher
level of diversity, and producing interpretable and actionable uncertainty
estimation.Comment: 27 page
ELVIS: Entertainment-led video summaries
© ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Multimedia Computing, Communications, and Applications, 6(3): Article no. 17 (2010) http://doi.acm.org/10.1145/1823746.1823751Video summaries present the user with a condensed and succinct representation of the content of a video stream. Usually this is achieved by attaching degrees of importance to low-level image, audio and text features. However, video content elicits strong and measurable physiological responses in the user, which are potentially rich indicators of what video content is memorable to or emotionally engaging for an individual user. This article proposes a technique that exploits such physiological responses to a given video stream by a given user to produce Entertainment-Led VIdeo Summaries (ELVIS). ELVIS is made up of five analysis phases which correspond to the analyses of five physiological response measures: electro-dermal response (EDR), heart rate (HR), blood volume pulse (BVP), respiration rate (RR), and respiration amplitude (RA). Through these analyses, the temporal locations of the most entertaining video subsegments, as they occur within the video stream as a whole, are automatically identified. The effectiveness of the ELVIS technique is verified through a statistical analysis of data collected during a set of user trials. Our results show that ELVIS is more consistent than RANDOM, EDR, HR, BVP, RR and RA selections in identifying the most entertaining video subsegments for content in the comedy, horror/comedy, and horror genres. Subjective user reports also reveal that ELVIS video summaries are comparatively easy to understand, enjoyable, and informative
Recommended from our members
Context-awareness for mobile sensing: a survey and future directions
The evolution of smartphones together with increasing computational power have empowered developers to create innovative context-aware applications for recognizing user related social and cognitive activities in any situation and at any location. The existence and awareness of the context provides the capability of being conscious of physical environments or situations around mobile device users. This allows network services to respond proactively and intelligently based on such awareness. The key idea behind context-aware applications is to encourage users to collect, analyze and share local sensory knowledge in the purpose for a large scale community use by creating a smart network. The desired network is capable of making autonomous logical decisions to actuate environmental objects, and also assist individuals. However, many open challenges remain, which are mostly arisen due to the middleware services provided in mobile devices have limited resources in terms of power, memory and bandwidth. Thus, it becomes critically important to study how the drawbacks can be elaborated and resolved, and at the same time better understand the opportunities for the research community to contribute to the context-awareness. To this end, this paper surveys the literature over the period of 1991-2014 from the emerging concepts to applications of context-awareness in mobile platforms by providing up-to-date research and future research directions. Moreover, it points out the challenges faced in this regard and enlighten them by proposing possible solutions
Scalable data analytics using spark
Tezin basılısı İstanbul Şehir Üniversitesi Kütüphanesi'ndedir.This thesis presents our experience in designing a scalable data analytics platform on
top of Apache Spark (major) and Apache Hadoop (minor). We worked on three repre-
sentative applications: (1) Sentiment Analysis, (2) Collaborative Filtering and (3) Topic
Modeling. We demonstrated how to scale these applications on a cluster of 8 workers.
Each worker contributes 4 cores, 8 GB RAM, and 100 GB of disk space to the com-
pute pool. Our conclusion is that Apache Spark has enough maturity to be deployed in
production comfortably.Abstract ii
Öz iii
Acknowledgments v
List of Figures viii
List of Tables ix
1 Introduction 1
2 Sentiment Analytics on Spark
2 2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1 Preprocessing on the data . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.2 Naive Bayes Classifier . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.1 Resilient Distributed Datasets(RDD) . . . . . . . . . . . . . . . . . 5
2.3.2 Broadcast Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.3 The Movie Reviews Dataset . . . . . . . . . . . . . . . . . . . . . . 6
2.3.4 Cluster Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.5 Model Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.1 Apache Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 Apache Mahout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.3 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.3.1 Broadcasting vs. Not-broadcasting . . . . . . . . . . . . . 10
2.4.3.2 Time required for training . . . . . . . . . . . . . . . . . . 10
2.4.3.3 Time required for testing . . . . . . . . . . . . . . . . . . 11
3 Collaborative Filtering on Spark 13
3.1 MLBase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Online Recommendation System . . . . . . . . . . . . . . . . . . . . . . . 14
4 Topic Modeling on Hadoop 17
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
4.3 LDA in MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4.1 The Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4.2 Cluster Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4.3 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Conclusions 22
Bibliography 2
- …