7,970 research outputs found

    Diverse personalized recommendations with uncertainty from implicit preference data with the Bayesian Mallows Model

    Full text link
    Clicking data, which exists in abundance and contains objective user preference information, is widely used to produce personalized recommendations in web-based applications. Current popular recommendation algorithms, typically based on matrix factorizations, often have high accuracy and achieve good clickthrough rates. However, diversity of the recommended items, which can greatly enhance user experiences, is often overlooked. Moreover, most algorithms do not produce interpretable uncertainty quantifications of the recommendations. In this work, we propose the Bayesian Mallows for Clicking Data (BMCD) method, which augments clicking data into compatible full ranking vectors by enforcing all the clicked items to be top-ranked. User preferences are learned using a Mallows ranking model. Bayesian inference leads to interpretable uncertainties of each individual recommendation, and we also propose a method to make personalized recommendations based on such uncertainties. With a simulation study and a real life data example, we demonstrate that compared to state-of-the-art matrix factorization, BMCD makes personalized recommendations with similar accuracy, while achieving much higher level of diversity, and producing interpretable and actionable uncertainty estimation.Comment: 27 page

    ELVIS: Entertainment-led video summaries

    Get PDF
    © ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Multimedia Computing, Communications, and Applications, 6(3): Article no. 17 (2010) http://doi.acm.org/10.1145/1823746.1823751Video summaries present the user with a condensed and succinct representation of the content of a video stream. Usually this is achieved by attaching degrees of importance to low-level image, audio and text features. However, video content elicits strong and measurable physiological responses in the user, which are potentially rich indicators of what video content is memorable to or emotionally engaging for an individual user. This article proposes a technique that exploits such physiological responses to a given video stream by a given user to produce Entertainment-Led VIdeo Summaries (ELVIS). ELVIS is made up of five analysis phases which correspond to the analyses of five physiological response measures: electro-dermal response (EDR), heart rate (HR), blood volume pulse (BVP), respiration rate (RR), and respiration amplitude (RA). Through these analyses, the temporal locations of the most entertaining video subsegments, as they occur within the video stream as a whole, are automatically identified. The effectiveness of the ELVIS technique is verified through a statistical analysis of data collected during a set of user trials. Our results show that ELVIS is more consistent than RANDOM, EDR, HR, BVP, RR and RA selections in identifying the most entertaining video subsegments for content in the comedy, horror/comedy, and horror genres. Subjective user reports also reveal that ELVIS video summaries are comparatively easy to understand, enjoyable, and informative

    Scalable data analytics using spark

    Get PDF
    Tezin basılısı İstanbul Şehir Üniversitesi Kütüphanesi'ndedir.This thesis presents our experience in designing a scalable data analytics platform on top of Apache Spark (major) and Apache Hadoop (minor). We worked on three repre- sentative applications: (1) Sentiment Analysis, (2) Collaborative Filtering and (3) Topic Modeling. We demonstrated how to scale these applications on a cluster of 8 workers. Each worker contributes 4 cores, 8 GB RAM, and 100 GB of disk space to the com- pute pool. Our conclusion is that Apache Spark has enough maturity to be deployed in production comfortably.Abstract ii Öz iii Acknowledgments v List of Figures viii List of Tables ix 1 Introduction 1 2 Sentiment Analytics on Spark 2 2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2.1 Preprocessing on the data . . . . . . . . . . . . . . . . . . . . . . . 3 2.2.2 Naive Bayes Classifier . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3.1 Resilient Distributed Datasets(RDD) . . . . . . . . . . . . . . . . . 5 2.3.2 Broadcast Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3.3 The Movie Reviews Dataset . . . . . . . . . . . . . . . . . . . . . . 6 2.3.4 Cluster Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.5 Model Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4.1 Apache Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4.2 Apache Mahout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4.3 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4.3.1 Broadcasting vs. Not-broadcasting . . . . . . . . . . . . . 10 2.4.3.2 Time required for training . . . . . . . . . . . . . . . . . . 10 2.4.3.3 Time required for testing . . . . . . . . . . . . . . . . . . 11 3 Collaborative Filtering on Spark 13 3.1 MLBase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Online Recommendation System . . . . . . . . . . . . . . . . . . . . . . . 14 4 Topic Modeling on Hadoop 17 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 4.3 LDA in MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.4.1 The Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.4.2 Cluster Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.4.3 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5 Conclusions 22 Bibliography 2
    corecore