40,533 research outputs found

    Multimodal Content Analysis for Effective Advertisements on YouTube

    Full text link
    The rapid advances in e-commerce and Web 2.0 technologies have greatly increased the impact of commercial advertisements on the general public. As a key enabling technology, a multitude of recommender systems exists which analyzes user features and browsing patterns to recommend appealing advertisements to users. In this work, we seek to study the characteristics or attributes that characterize an effective advertisement and recommend a useful set of features to aid the designing and production processes of commercial advertisements. We analyze the temporal patterns from multimedia content of advertisement videos including auditory, visual and textual components, and study their individual roles and synergies in the success of an advertisement. The objective of this work is then to measure the effectiveness of an advertisement, and to recommend a useful set of features to advertisement designers to make it more successful and approachable to users. Our proposed framework employs the signal processing technique of cross modality feature learning where data streams from different components are employed to train separate neural network models and are then fused together to learn a shared representation. Subsequently, a neural network model trained on this joint feature embedding representation is utilized as a classifier to predict advertisement effectiveness. We validate our approach using subjective ratings from a dedicated user study, the sentiment strength of online viewer comments, and a viewer opinion metric of the ratio of the Likes and Views received by each advertisement from an online platform.Comment: 11 pages, 5 figures, ICDM 201

    Transient Analysis for Music and Moving Images: Consideration for Television Advertising

    Get PDF
    In audiovisual composition, coupling montage moving images with music is common practice. Interpretation of the effect on an audioviewer's consequent interpretation of the composition is discursive and unquantified. Meth-odology for evaluating the audiovisual multimodal inter-activity is proposed, developing an analysis procedure via the study of modality interdependent transient structures, explained as forming the foundation of perception via the concept of Basic Exposure response to the stimulus. The research has implications for analysis of all audiovisual media, with practical implications in television advertis-ing as a discrete typology of target driven audiovisual presentation. Examples from contemporary advertising are used to explore typical transient interaction patterns and the consequences of which are discussed from the practical viewpoint of the audiovisual composer

    Lip2AudSpec: Speech reconstruction from silent lip movements video

    Full text link
    In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram which is then used as target to our main lip reading network comprising of CNN, LSTM and fully connected layers. Our experiments show that the autoencoder is able to reconstruct the original auditory spectrogram with a 98% correlation and also improves the quality of reconstructed speech from the main lip reading network. Our model, trained jointly on different speakers is able to extract individual speaker characteristics and gives promising results of reconstructing intelligible speech with superior word recognition accuracy

    Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

    Full text link
    This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System

    Post-training load-related changes of auditory working memory: An EEG study

    Get PDF
    Working memory (WM) refers to the temporary retention and manipulation of information, and its capacity is highly susceptible to training. Yet, the neural mechanisms that allow for increased performance under demanding conditions are not fully understood. We expected that post-training efficiency in WM performance modulates neural processing during high load tasks. We tested this hypothesis, using electroencephalography (EEG) (N = 39), by comparing source space spectral power of healthy adults performing low and high load auditory WM tasks. Prior to the assessment, participants either underwent a modality-specific auditory WM training, or a modality-irrelevant tactile WM training, or were not trained (active control). After a modality-specific training participants showed higher behavioral performance, compared to the control. EEG data analysis revealed general effects of WM load, across all training groups, in the theta-, alpha-, and beta-frequency bands. With increased load theta-band power increased over frontal, and decreased over parietal areas. Centro-parietal alpha-band power and central beta-band power decreased with load. Interestingly, in the high load condition a tendency toward reduced beta-band power in the right medial temporal lobe was observed in the modality-specific WM training group compared to the modality-irrelevant and active control groups. Our finding that WM processing during the high load condition changed after modality-specific WM training, showing reduced beta-band activity in voice-selective regions, possibly indicates a more efficient maintenance of task-relevant stimuli. The general load effects suggest that WM performance at high load demands involves complementary mechanisms, combining a strengthening of task-relevant and a suppression of task-irrelevant processing
    • …
    corecore