94,801 research outputs found

    "Sitting too close to the screen can be bad for your ears": A study of audio-visual location discrepancy detection under different visual projections

    Get PDF
    In this work, we look at the perception of event locality under conditions of disparate audio and visual cues. We address an aspect of the so called “ventriloquism effect” relevant for multi-media designers; namely, how auditory perception of event locality is influenced by the size and scale of the accompanying visual projection of those events. We observed that recalibration of the visual axes of an audio-visual animation (by resizing and zooming) exerts a recalibrating influence on the auditory space perception. In particular, sensitivity to audio-visual discrepancies (between a centrally located visual stimuli and laterally displaced audio cue) increases near the edge of the screen on which the visual cue is displayed. In other words,discrepancy detection thresholds are not fixed for a particular pair of stimuli, but are influenced by the size of the display space. Moreover, the discrepancy thresholds are influenced by scale as well as size. That is, the boundary of auditory space perception is not rigidly fixed on the boundaries of the screen; it also depends on the spatial relationship depicted. For example,the ventriloquism effect will break down within the boundaries of a large screen if zooming is used to exaggerate the proximity of the audience to the events. The latter effect appears to be much weaker than the former

    SMART-I²: A Spatial Multi-users Audio-visual Real Time Interactive Interface

    No full text
    International audienceThe SMART-I2 aims at creating a precise and coherent virtual environment by providing users with both audio and visual accurate localization cues. It is known that for audio rendering, Wave Field Synthesis, and for visual rendering, Tracked Stereoscopy, individually permit high quality spatial immersion within an extended space. The proposed system combines these two rendering approaches through the use of a large Multi-Actuator Panel used as both a loudspeaker array and as a projection screen, considerably reducing audio-visual incoherencies. The system performance has been confirmed by an objective validation of the audio interface and a perceptual evaluation of the audio-visual rendering

    Musical instrument classification using non-negative matrix factorization algorithms

    No full text
    In this paper, a class of algorithms for automatic classification of individual musical instrument sounds is presented. Several perceptual features used in general sound classification applications were measured for 300 sound recordings consisting of 6 different musical instrument classes (piano, violin, cello, flute, bassoon and soprano saxophone). In addition, MPEG-7 basic spectral and spectral basis descriptors were considered, providing an effective combination for accurately describing the spectral and timbrai audio characteristics. The audio flies were split using 70% of the available data for training and the remaining 30% for testing. A classifier was developed based on non-negative matrix factorization (NMF) techniques, thus introducing a novel application of NMF. The standard NMF method was examined, as well as its modifications: the local, the sparse, and the discriminant NMF. Experimental results are presented to compare MPEG-7 spectral basis representations with MPEG-7 basic spectral features alongside the various NMF algorithms. The results indicate that the use of the spectrum projection coefficients for feature extraction and the standard NMF classifier yields an accuracy exceeding 95%. ©2006 IEEE

    A Deep Representation for Invariance And Music Classification

    Get PDF
    Representations in the auditory cortex might be based on mechanisms similar to the visual ventral stream; modules for building invariance to transformations and multiple layers for compositionality and selectivity. In this paper we propose the use of such computational modules for extracting invariant and discriminative audio representations. Building on a theory of invariance in hierarchical architectures, we propose a novel, mid-level representation for acoustical signals, using the empirical distributions of projections on a set of templates and their transformations. Under the assumption that, by construction, this dictionary of templates is composed from similar classes, and samples the orbit of variance-inducing signal transformations (such as shift and scale), the resulting signature is theoretically guaranteed to be unique, invariant to transformations and stable to deformations. Modules of projection and pooling can then constitute layers of deep networks, for learning composite representations. We present the main theoretical and computational aspects of a framework for unsupervised learning of invariant audio representations, empirically evaluated on music genre classification.Comment: 5 pages, CBMM Memo No. 002, (to appear) IEEE 2014 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014

    Projection Mapping Motif Melayu

    Get PDF
    Teknik Projection mapping sebagai media baru menjadi peluang pemanfaatan teknologi multimedia yang mulai ramai diaplikasikan. Penelitian ini memanfaatkan teknologi yang biasa disebut dengan Augmented Reality Spatial sebagai pengembangan dan penyampaian informasi dengan mengangkat konten video motion graphic yang bertema motif melayu ditampilkan melalui proyektor dan ditembakkan pada gedung di Politeknik Negeri Batam. Metode penelitian dibagi pada beberapa tahapan dimulai dari observasi objek dan bidang, menentukan jumlah dan mempersiapkan proyektor, membuat replika virtual bidang projection, perancangan konten multimedia, mapping test, hingga pertunjukan projection mapping. Dari pengumpulan data responden yang melihat pertunjukan projection mapping didapatkan penilaian pada beberapa aspek kelayakan yaitu, kelayakan teknik projection mapping, kelayakan teknik audio dan kelayakan tema didominasi dengan hasil penilaian layak dan sangat layak. Pada aspek kelayakan visual dan kelayakan durasi harus dievaluasi kembali karena mendapatkan beberapa penilaian kurang layak

    Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections

    Get PDF
    In this paper, we study zero-shot learning in audio classification through factored linear and nonlinear acoustic-semantic projections between audio instances and sound classes. Zero-shot learning in audio classification refers to classification problems that aim at recognizing audio instances of sound classes, which have no available training data but only semantic side information. In this paper, we address zero-shot learning by employing factored linear and nonlinear acoustic-semantic projections. We develop factored linear projections by applying rank decomposition to a bilinear model, and use nonlinear activation functions, such as tanh, to model the non-linearity between acoustic embeddings and semantic embeddings. Compared with the prior bilinear model, experimental results show that the proposed projection methods are effective for improving classification performance of zero-shot learning in audio classification.Comment: Accepted by ICASSP 202

    The SMART-I²: A new approach for the design of immersive audio-visual environments

    No full text
    International audienceThe SMART-I² aims at creating a precise and coherent virtual environment by providing users with both audio and visual accurate localization cues. Wave field synthesis, for audio rendering, and tracked passive stereoscopy, for visual rendering, individually permit high quality spatial immersion within an extended space. The proposed system combines these two rendering approaches through the use of large multi-actuator panels used as both loudspeaker arrays and as projection screens, providing a more perceptually consistent rendering

    End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss

    Full text link
    Cross-modality retrieval encompasses retrieval tasks where the fetched items are of a different type than the search query, e.g., retrieving pictures relevant to a given text query. The state-of-the-art approach to cross-modality retrieval relies on learning a joint embedding space of the two modalities, where items from either modality are retrieved using nearest-neighbor search. In this work, we introduce a neural network layer based on Canonical Correlation Analysis (CCA) that learns better embedding spaces by analytically computing projections that maximize correlation. In contrast to previous approaches, the CCA Layer (CCAL) allows us to combine existing objectives for embedding space learning, such as pairwise ranking losses, with the optimal projections of CCA. We show the effectiveness of our approach for cross-modality retrieval on three different scenarios (text-to-image, audio-sheet-music and zero-shot retrieval), surpassing both Deep CCA and a multi-view network using freely learned projections optimized by a pairwise ranking loss, especially when little training data is available (the code for all three methods is released at: https://github.com/CPJKU/cca_layer).Comment: Preliminary version of a paper published in the International Journal of Multimedia Information Retrieva
    corecore