813 research outputs found

    Multimodal Content Analysis for Effective Advertisements on YouTube

    Full text link
    The rapid advances in e-commerce and Web 2.0 technologies have greatly increased the impact of commercial advertisements on the general public. As a key enabling technology, a multitude of recommender systems exists which analyzes user features and browsing patterns to recommend appealing advertisements to users. In this work, we seek to study the characteristics or attributes that characterize an effective advertisement and recommend a useful set of features to aid the designing and production processes of commercial advertisements. We analyze the temporal patterns from multimedia content of advertisement videos including auditory, visual and textual components, and study their individual roles and synergies in the success of an advertisement. The objective of this work is then to measure the effectiveness of an advertisement, and to recommend a useful set of features to advertisement designers to make it more successful and approachable to users. Our proposed framework employs the signal processing technique of cross modality feature learning where data streams from different components are employed to train separate neural network models and are then fused together to learn a shared representation. Subsequently, a neural network model trained on this joint feature embedding representation is utilized as a classifier to predict advertisement effectiveness. We validate our approach using subjective ratings from a dedicated user study, the sentiment strength of online viewer comments, and a viewer opinion metric of the ratio of the Likes and Views received by each advertisement from an online platform.Comment: 11 pages, 5 figures, ICDM 201

    Multimodal Content Delivery for Geo-services

    Get PDF
    This thesis describes a body of work carried out over several research projects in the area of multimodal interaction for location-based services. Research in this area has progressed from using simulated mobile environments to demonstrate the visual modality, to the ubiquitous delivery of rich media using multimodal interfaces (geo- services). To effectively deliver these services, research focused on innovative solutions to real-world problems in a number of disciplines including geo-location, mobile spatial interaction, location-based services, rich media interfaces and auditory user interfaces. My original contributions to knowledge are made in the areas of multimodal interaction underpinned by advances in geo-location technology and supported by the proliferation of mobile device technology into modern life. Accurate positioning is a known problem for location-based services, contributions in the area of mobile positioning demonstrate a hybrid positioning technology for mobile devices that uses terrestrial beacons to trilaterate position. Information overload is an active concern for location-based applications that struggle to manage large amounts of data, contributions in the area of egocentric visibility that filter data based on field-of-view demonstrate novel forms of multimodal input. One of the more pertinent characteristics of these applications is the delivery or output modality employed (auditory, visual or tactile). Further contributions in the area of multimodal content delivery are made, where multiple modalities are used to deliver information using graphical user interfaces, tactile interfaces and more notably auditory user interfaces. It is demonstrated how a combination of these interfaces can be used to synergistically deliver context sensitive rich media to users - in a responsive way - based on usage scenarios that consider the affordance of the device, the geographical position and bearing of the device and also the location of the device

    Multimodal information presentation for high-load human computer interaction

    Get PDF
    This dissertation addresses the question: given an application and an interaction context, how can interfaces present information to users in a way that improves the quality of interaction (e.g. a better user performance, a lower cognitive demand and a greater user satisfaction)? Information presentation is critical to the quality of interaction because it guides, constrains and even determines cognitive behavior. A good presentation is particularly desired in high-load human computer interactions, such as when users are under time pressure, stress, or are multi-tasking. Under a high mental workload, users may not have the spared cognitive capacity to cope with the unnecessary workload induced by a bad presentation. In this dissertation work, the major presentation factor of interest is modality. We have conducted theoretical studies in the cognitive psychology domain, in order to understand the role of presentation modality in different stages of human information processing. Based on the theoretical guidance, we have conducted a series of user studies investigating the effect of information presentation (modality and other factors) in several high-load task settings. The two task domains are crisis management and driving. Using crisis scenario, we investigated how to presentation information to facilitate time-limited visual search and time-limited decision making. In the driving domain, we investigated how to present highly-urgent danger warnings and how to present informative cues that help drivers manage their attention between multiple tasks. The outcomes of this dissertation work have useful implications to the design of cognitively-compatible user interfaces, and are not limited to high-load applications

    Experiences of aiding autobiographical memory Using the SenseCam

    Get PDF
    Human memory is a dynamic system that makes accessible certain memories of events based on a hierarchy of information, arguably driven by personal significance. Not all events are remembered, but those that are tend to be more psychologically relevant. In contrast, lifelogging is the process of automatically recording aspects of one's life in digital form without loss of information. In this article we share our experiences in designing computer-based solutions to assist people review their visual lifelogs and address this contrast. The technical basis for our work is automatically segmenting visual lifelogs into events, allowing event similarity and event importance to be computed, ideas that are motivated by cognitive science considerations of how human memory works and can be assisted. Our work has been based on visual lifelogs gathered by dozens of people, some of them with collections spanning multiple years. In this review article we summarize a series of studies that have led to the development of a browser that is based on human memory systems and discuss the inherent tension in storing large amounts of data but making the most relevant material the most accessible

    Crossmodal displays : coordinated crossmodal cues for information provision in public spaces

    Get PDF
    PhD ThesisThis thesis explores the design of Crossmodal Display, a new kind of display-based interface that aims to help prevent information overload and support information presentation for multiple simultaneous people who share a physical space or situated interface but have different information needs and privacy concerns. By exploiting the human multimodal perception and utilizing the synergy of both existing public displays and personal displays, crossmodal displays avoid numerous drawbacks associated with previous approaches, including a reliance on tracking technologies, weak protection for user‟s privacy, small user capacity and high cognitive load demands. The review of the human multimodal perception in this thesis, especially multimodal integration and crossmodal interaction, has many design implications for the design of crossmodal displays and constitutes the foundation for our proposed conceptual model. Two types of crossmodal display prototype applications are developed: CROSSFLOW for indoor navigation and CROSSBOARD for information retrieval on high-density information display; both of these utilize coordinated crossmodal cues to guide multiple simultaneous users‟ attention to publicly visible information relevant to each user timely. Most of the results of single-user and multi-user lab studies on the prototype systems we developed in this research demonstrate the effectiveness and efficiency of crossmodal displays and validate several significant advantages over the previous solutions. However, the results also reveal that more detailed usability and user experience of crossmodal displays as well as the human perception of crossmodal cues should be investigated and improved. This thesis is the first exploration into the design of crossmodal displays. A set of design suggestions and a lifecycle model of crossmodal display development have been produced, and can be used by designers or other researchers who wish to develop crossmodal displays for their applications or integrate crossmodal cues in their interfaces

    Experiences of aiding autobiographical memory using the sensecam

    Get PDF
    Human memory is a dynamic system that makes accessible certain memories of events based on a hierarchy of information, arguably driven by personal significance. Not all events are remembered, but those that are tend to be more psychologically relevant. In contrast, lifelogging is the process of automatically recording aspects of one's life in digital form without loss of information. In this article we share our experiences in designing computer-based solutions to assist people review their visual lifelogs and address this contrast. The technical basis for our work is automatically segmenting visual lifelogs into events, allowing event similarity and event importance to be computed, ideas that are motivated by cognitive science considerations of how human memory works and can be assisted. Our work has been based on visual lifelogs gathered by dozens of people, some of them with collections spanning multiple years. In this review article we summarize a series of studies that have led to the development of a browser that is based on human memory systems and discuss the inherent tension in storing large amounts of data but making the most relevant material the most accessible
    corecore