Search CORE

40,533 research outputs found

Multimodal Content Analysis for Effective Advertisements on YouTube

Author: Gupta Harsh
Johnson Joseph
Lee Hyunhwan
Ogihara Mitsunori
Parthasarathy Srinivasan
Ren Gang
Sun Wei
Vedula Nikhita
Publication venue
Publication date: 12/09/2017
Field of study

The rapid advances in e-commerce and Web 2.0 technologies have greatly increased the impact of commercial advertisements on the general public. As a key enabling technology, a multitude of recommender systems exists which analyzes user features and browsing patterns to recommend appealing advertisements to users. In this work, we seek to study the characteristics or attributes that characterize an effective advertisement and recommend a useful set of features to aid the designing and production processes of commercial advertisements. We analyze the temporal patterns from multimedia content of advertisement videos including auditory, visual and textual components, and study their individual roles and synergies in the success of an advertisement. The objective of this work is then to measure the effectiveness of an advertisement, and to recommend a useful set of features to advertisement designers to make it more successful and approachable to users. Our proposed framework employs the signal processing technique of cross modality feature learning where data streams from different components are employed to train separate neural network models and are then fused together to learn a shared representation. Subsequently, a neural network model trained on this joint feature embedding representation is utilized as a classifier to predict advertisement effectiveness. We validate our approach using subjective ratings from a dedicated user study, the sentiment strength of online viewer comments, and a viewer opinion metric of the ratio of the Likes and Views received by each advertisement from an online platform.Comment: 11 pages, 5 figures, ICDM 201

arXiv.org e-Print Archive

Crossref

Transient Analysis for Music and Moving Images: Consideration for Television Advertising

Author: Gibson Ian
Rogers Andrew
Publication venue: ICMC/SMC
Publication date: 01/01/2014
Field of study

In audiovisual composition, coupling montage moving images with music is common practice. Interpretation of the effect on an audioviewer's consequent interpretation of the composition is discursive and unquantified. Meth-odology for evaluating the audiovisual multimodal inter-activity is proposed, developing an analysis procedure via the study of modality interdependent transient structures, explained as forming the foundation of perception via the concept of Basic Exposure response to the stimulus. The research has implications for analysis of all audiovisual media, with practical implications in television advertis-ing as a discrete typology of target driven audiovisual presentation. Examples from contemporary advertising are used to explore typical transient interaction patterns and the consequences of which are discussed from the practical viewpoint of the audiovisual composer

University of Michigan Library Repository

University of Huddersfield Repository

Huddersfield Research Portal

BSA Practice guidance: an overview of current management of auditory processing disorder (APD)

Author: Alles R.
Bamiou D.
Batchelor L.
Campbell N.G. (Lead Author)
Canning D.
Grant P.
Luxon L.
Moore D.
Murray P.
Nairn S.
Rosen S.
Sirimanna T.
Treharne D.
Wakeham K.
Publication venue: British Society of Audiology
Publication date: 17/10/2011
Field of study

Southampton (e-Prints Soton)

Mental capital and wellbeing : making the most of ourselves in the 21st century : learning difficulties : future challenges

Author: Goswami Usha
Publication venue: Government Office for Science
Publication date: 01/01/2008
Field of study

Digital Education Resource Archive

Lip2AudSpec: Speech reconstruction from silent lip movements video

Author: Akbari Hassan
Arora Himani
Cao Liangliang
Mesgarani Nima
Publication venue
Publication date: 26/10/2017
Field of study

In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram which is then used as target to our main lip reading network comprising of CNN, LSTM and fully connected layers. Our experiments show that the autoencoder is able to reconstruct the original auditory spectrogram with a 98% correlation and also improves the quality of reconstructed speech from the main lip reading network. Our model, trained jointly on different speakers is able to extract individual speaker characteristics and gives promising results of reconstructing intelligible speech with superior word recognition accuracy

arXiv.org e-Print Archive

Crossref

Effects of familiarity and presentation mode on auditory-visual speech recognition in adults with aphasia

Author: Arkenberg Rachel Hahn
Gospel Mary
Publication venue: Digital Commons @ Butler University
Publication date: 01/01/2016
Field of study

Digital Commons @ Butler University

Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

Author: Beskow Jonas
Salvi Giampiero
Stefanov Kalin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

NORA - Norwegian Open Research Archives

Post-training load-related changes of auditory working memory: An EEG study

Author: Bruns P.
Donner T.
Engel A.
Gudi-Mindermann H.
Kloosterman N.
Rimmele J.
Röder B.
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2020
Field of study

Working memory (WM) refers to the temporary retention and manipulation of information, and its capacity is highly susceptible to training. Yet, the neural mechanisms that allow for increased performance under demanding conditions are not fully understood. We expected that post-training efficiency in WM performance modulates neural processing during high load tasks. We tested this hypothesis, using electroencephalography (EEG) (N = 39), by comparing source space spectral power of healthy adults performing low and high load auditory WM tasks. Prior to the assessment, participants either underwent a modality-specific auditory WM training, or a modality-irrelevant tactile WM training, or were not trained (active control). After a modality-specific training participants showed higher behavioral performance, compared to the control. EEG data analysis revealed general effects of WM load, across all training groups, in the theta-, alpha-, and beta-frequency bands. With increased load theta-band power increased over frontal, and decreased over parietal areas. Centro-parietal alpha-band power and central beta-band power decreased with load. Interestingly, in the high load condition a tendency toward reduced beta-band power in the right medial temporal lobe was observed in the modality-specific WM training group compared to the modality-irrelevant and active control groups. Our finding that WM processing during the high load condition changed after modality-specific WM training, showing reduced beta-band activity in voice-selective regions, possibly indicates a more efficient maintenance of task-relevant stimuli. The general load effects suggest that WM performance at high load demands involves complementary mechanisms, combining a strengthening of task-relevant and a suppression of task-irrelevant processing

MPG.PuRe