3,284 research outputs found

    Deep Learning based Recommender System: A Survey and New Perspectives

    Full text link
    With the ever-growing volume of online information, recommender systems have been an effective strategy to overcome such information overload. The utility of recommender systems cannot be overstated, given its widespread adoption in many web applications, along with its potential impact to ameliorate many problems related to over-choice. In recent years, deep learning has garnered considerable interest in many research fields such as computer vision and natural language processing, owing not only to stellar performance but also the attractive property of learning feature representations from scratch. The influence of deep learning is also pervasive, recently demonstrating its effectiveness when applied to information retrieval and recommender systems research. Evidently, the field of deep learning in recommender system is flourishing. This article aims to provide a comprehensive review of recent research efforts on deep learning based recommender systems. More concretely, we provide and devise a taxonomy of deep learning based recommendation models, along with providing a comprehensive summary of the state-of-the-art. Finally, we expand on current trends and provide new perspectives pertaining to this new exciting development of the field.Comment: The paper has been accepted by ACM Computing Surveys. https://doi.acm.org/10.1145/328502

    Movies and meaning: from low-level features to mind reading

    Get PDF
    When dealing with movies, closing the tremendous discontinuity between low-level features and the richness of semantics in the viewers' cognitive processes, requires a variety of approaches and different perspectives. For instance when attempting to relate movie content to users' affective responses, previous work suggests that a direct mapping of audio-visual properties into elicited emotions is difficult, due to the high variability of individual reactions. To reduce the gap between the objective level of features and the subjective sphere of emotions, we exploit the intermediate representation of the connotative properties of movies: the set of shooting and editing conventions that help in transmitting meaning to the audience. One of these stylistic feature, the shot scale, i.e. the distance of the camera from the subject, effectively regulates theory of mind, indicating that increasing spatial proximity to the character triggers higher occurrence of mental state references in viewers' story descriptions. Movies are also becoming an important stimuli employed in neural decoding, an ambitious line of research within contemporary neuroscience aiming at "mindreading". In this field we address the challenge of producing decoding models for the reconstruction of perceptual contents by combining fMRI data and deep features in a hybrid model able to predict specific video object classes

    Understanding user experience of mobile video: Framework, measurement, and optimization

    Get PDF
    Since users have become the focus of product/service design in last decade, the term User eXperience (UX) has been frequently used in the field of Human-Computer-Interaction (HCI). Research on UX facilitates a better understanding of the various aspects of the user’s interaction with the product or service. Mobile video, as a new and promising service and research field, has attracted great attention. Due to the significance of UX in the success of mobile video (Jordan, 2002), many researchers have centered on this area, examining users’ expectations, motivations, requirements, and usage context. As a result, many influencing factors have been explored (Buchinger, Kriglstein, Brandt & Hlavacs, 2011; Buchinger, Kriglstein & Hlavacs, 2009). However, a general framework for specific mobile video service is lacking for structuring such a great number of factors. To measure user experience of multimedia services such as mobile video, quality of experience (QoE) has recently become a prominent concept. In contrast to the traditionally used concept quality of service (QoS), QoE not only involves objectively measuring the delivered service but also takes into account user’s needs and desires when using the service, emphasizing the user’s overall acceptability on the service. Many QoE metrics are able to estimate the user perceived quality or acceptability of mobile video, but may be not enough accurate for the overall UX prediction due to the complexity of UX. Only a few frameworks of QoE have addressed more aspects of UX for mobile multimedia applications but need be transformed into practical measures. The challenge of optimizing UX remains adaptations to the resource constrains (e.g., network conditions, mobile device capabilities, and heterogeneous usage contexts) as well as meeting complicated user requirements (e.g., usage purposes and personal preferences). In this chapter, we investigate the existing important UX frameworks, compare their similarities and discuss some important features that fit in the mobile video service. Based on the previous research, we propose a simple UX framework for mobile video application by mapping a variety of influencing factors of UX upon a typical mobile video delivery system. Each component and its factors are explored with comprehensive literature reviews. The proposed framework may benefit in user-centred design of mobile video through taking a complete consideration of UX influences and in improvement of mobile videoservice quality by adjusting the values of certain factors to produce a positive user experience. It may also facilitate relative research in the way of locating important issues to study, clarifying research scopes, and setting up proper study procedures. We then review a great deal of research on UX measurement, including QoE metrics and QoE frameworks of mobile multimedia. Finally, we discuss how to achieve an optimal quality of user experience by focusing on the issues of various aspects of UX of mobile video. In the conclusion, we suggest some open issues for future study

    Enhancing Video Recommendation Using Multimedia Content

    Get PDF
    Video recordings are complex media types. When we watch a movie, we can effortlessly register a lot of details conveyed to us (by the author) through different multimedia channels, in particular, the audio and visual modalities. To date, majority of movie recommender systems use collaborative filtering (CF) models or content-based filtering (CBF) relying on metadata (e.g., editorial such as genre or wisdom of the crowd such as user-generated tags) at their core since they are human-generated and are assumed to cover the 'content semantics' of movies by a great degree. The information obtained from multimedia content and learning from muli-modal sources (e.g., audio, visual and metadata) on the other hand, offers the possibility of uncovering relationships between modalities and obtaining an in-depth understanding of natural phenomena occurring in a video. These discerning characteristics of heterogeneous feature sets meet users' differing information needs. In the context of this Ph.D. thesis [9], which is briefly summarized in the current extended abstract, approaches to automated extraction of multimedia information from videos and their integration with video recommender systems have been elaborated, implemented, and analyzed. Variety of tasks related to movie recommendation using multimedia content have been studied. The results of this thesis can motivate the fact that recommender system research can benefit from knowledge in multimedia signal processing and machine learning established over the last decades for solving various recommendation tasks

    Adversarial Training Towards Robust Multimedia Recommender System

    Full text link
    With the prevalence of multimedia content on the Web, developing recommender solutions that can effectively leverage the rich signal in multimedia data is in urgent need. Owing to the success of deep neural networks in representation learning, recent advance on multimedia recommendation has largely focused on exploring deep learning methods to improve the recommendation accuracy. To date, however, there has been little effort to investigate the robustness of multimedia representation and its impact on the performance of multimedia recommendation. In this paper, we shed light on the robustness of multimedia recommender system. Using the state-of-the-art recommendation framework and deep image features, we demonstrate that the overall system is not robust, such that a small (but purposeful) perturbation on the input image will severely decrease the recommendation accuracy. This implies the possible weakness of multimedia recommender system in predicting user preference, and more importantly, the potential of improvement by enhancing its robustness. To this end, we propose a novel solution named Adversarial Multimedia Recommendation (AMR), which can lead to a more robust multimedia recommender model by using adversarial learning. The idea is to train the model to defend an adversary, which adds perturbations to the target image with the purpose of decreasing the model's accuracy. We conduct experiments on two representative multimedia recommendation tasks, namely, image recommendation and visually-aware product recommendation. Extensive results verify the positive effect of adversarial learning and demonstrate the effectiveness of our AMR method. Source codes are available in https://github.com/duxy-me/AMR.Comment: TKD

    A multi-objective optimization for video orchestration

    Full text link
    In this work, the problem of video orchestration performed by combining information extracted by multiple video sequences is considered. The novelty of the proposed approach relies on the use of aesthetic features and of cinematographic composition rules for automatically aggregating the inputs from different cameras in a unique video. While prior methodologies have separately addressed the issues of aesthetic feature extraction from videos and video orchestration, in this work we exploit a set of features of a scene for automatically selecting the shots being characterized by the best aesthetic score. In order to evaluate the effectiveness of the proposed method, a preliminary subjective experiment has been carried out with experts from the audiovisual field. The achieved results are encouraging and show that there is space for improving the performances

    Novel Methods Using Human Emotion and Visual Features for Recommending Movies

    Get PDF
    Postponed access: the file will be accessible after 2022-06-01This master thesis investigates novel methods using human emotion as contextual information to estimate and elicit ratings when watching movie trailers. The aim is to acquire user preferences without the intrusive and time-consuming behavior of Explicit Feedback strategies, and generate quality recommendations. The proposed preference-elicitation technique is implemented as an Emotion-based Filtering technique (EF) to generate recommendations, and is evaluated against two other recommendation techniques. One Visual-based Filtering technique, using low-level visual features of movies, and one Collaborative Filtering (CF) using explicit ratings. In terms of \textit{Accuracy}, we found the Emotion-based Filtering technique (EF) to perform better than the two other filtering techniques. In terms of \textit{Diversity}, the Visual-based Filtering (VF) performed best. We further analyse the obtained data to see if movie genres tend to induce specific emotions, and the potential correlation between emotional responses of users and visual features of movie trailers. When investigating emotional responses, we found that \textit{joy} and \textit{disgust} tend to be more prominent in movie genres than other emotions. Our findings also suggest potential correlations on a per movie level. The proposed Visual-based Filtering technique can be adopted as an Implicit Feedback strategy to obtain user preferences. For future work, we will extend the experiment with more participants and build stronger affective profiles to be studied when recommending movies.Masteroppgave i informasjonsvitenskapINFO390MASV-INF

    Exploiting visual saliency for assessing the impact of car commercials upon viewers

    Get PDF
    Content based video indexing and retrieval (CBVIR) is a lively area of research which focuses on automating the indexing, retrieval and management of videos. This area has a wide spectrum of promising applications where assessing the impact of audiovisual productions emerges as a particularly interesting and motivating one. In this paper we present a computational model capable to predict the impact (i.e. positive or negative) upon viewers of car advertisements videos by using a set of visual saliency descriptors. Visual saliency provides information about parts of the image perceived as most important, which are instinctively targeted by humans when looking at a picture or watching a video. For this reason we propose to exploit visual information, introducing it as a new feature which reflects high-level semantics objectively, to improve the video impact categorization results. The suggested salience descriptors are inspired by the mechanisms that underlie the attentional abilities of the human visual system and organized into seven distinct families according to different measurements over the identified salient areas in the video frames, namely population, size, location, geometry, orientation, movement and photographic composition. Proposed approach starts by computing saliency maps for all the video frames, where two different visual saliency detection frameworks have been considered and evaluated: the popular graph based visual saliency (GBVS) algorithm, and a state-of-the-art DNN-based approach.This work has been partially supported by the National Grants RTC-2016-5305-7 and TEC2014-53390-P of the Spanish Ministry of Economy and Competitiveness.Publicad