165 research outputs found

    Backward Compatible Spatialized Teleconferencing based on Squeezed Recordings

    Get PDF
    Commercial teleconferencing systems currently available, although offering sophisticated video stimulus of the remote participants, commonly employ only mono or stereo audio playback for the user. However, in teleconferencing applications where there are multiple participants at multiple sites, spatializing the audio reproduced at each site (using headphones or loudspeakers) to assist listeners to distinguish between participating speakers can significantly improve the meeting experience (Baldis, 2001; Evans et al., 2000; Ward & Elko 1999; Kilgore et al., 2003; Wrigley et al., 2009; James & Hawksford, 2008). An example is Vocal Village (Kilgore et al., 2003), which uses online avatars to co-locate remote participants over the Internet in virtual space with audio spatialized over headphones (Kilgore, et al., 2003). This system adds speaker location cues to monaural speech to create a user manipulable soundfield that matches the avatar’s position in the virtual space. Giving participants the freedom to manipulate the acoustic location of other participants in the rendered sound scene that they experience has been shown to provide for improved multitasking performance (Wrigley et al., 2009). A system for multiparty teleconferencing requires firstly a stage for recording speech from multiple participants at each site. These signals then need to be compressed to allow for efficient transmission of the spatial speech. One approach is to utilise close-talking microphones to record each participant (e.g. lapel microphones), and then encode each speech signal separately prior to transmission (James & Hawksford, 2008). Alternatively, for increased flexibility, a microphone array located at a central point on, say, a meeting table can be used to generate a multichannel recording of the meeting speech A microphone array approach is adopted in this work and allows for processing of the recordings to identify relative spatial locations of the sources as well as multichannel speech enhancement techniques to improve the quality of recordings in noisy environments. For efficient transmission of the recorded signals, the approach also requires a multichannel compression technique suitable to spatially recorded speech signals

    Reviews on Technology and Standard of Spatial Audio Coding

    Get PDF
    Market  demands  on a more impressive entertainment media have motivated for delivery of three dimensional  (3D) audio content to  home consumers  through Ultra  High  Definition  TV  (UHDTV), the next generation of TV broadcasting, where spatial  audio  coding plays  fundamental role. This paper reviews fundamental concept on spatial audio coding which includes technology, standard, and application. Basic principle of object-based audio reproduction system  will also be elaborated, compared  to  the  traditional channel-based system, to provide good understanding on this popular interactive audio reproduction system which gives end users flexibility to render  their  own preferred  audio composition.Keywords : spatial audio, audio coding, multi-channel audio signals, MPEG standard, object-based audi

    Ambisonics

    Get PDF
    This open access book provides a concise explanation of the fundamentals and background of the surround sound recording and playback technology Ambisonics. It equips readers with the psychoacoustical, signal processing, acoustical, and mathematical knowledge needed to understand the inner workings of modern processing utilities, special equipment for recording, manipulation, and reproduction in the higher-order Ambisonic format. The book comes with various practical examples based on free software tools and open scientific data for reproducible research. The book’s introductory section offers a perspective on Ambisonics spanning from the origins of coincident recordings in the 1930s to the Ambisonic concepts of the 1970s, as well as classical ways of applying Ambisonics in first-order coincident sound scene recording and reproduction that have been practiced since the 1980s. As, from time to time, the underlying mathematics become quite involved, but should be comprehensive without sacrificing readability, the book includes an extensive mathematical appendix. The book offers readers a deeper understanding of Ambisonic technologies, and will especially benefit scientists, audio-system and audio-recording engineers. In the advanced sections of the book, fundamentals and modern techniques as higher-order Ambisonic decoding, 3D audio effects, and higher-order recording are explained. Those techniques are shown to be suitable to supply audience areas ranging from studio-sized to hundreds of listeners, or headphone-based playback, regardless whether it is live, interactive, or studio-produced 3D audio material

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    Backwards is the way forward: feedback in the cortical hierarchy predicts the expected future

    Get PDF
    Clark offers a powerful description of the brain as a prediction machine, which offers progress on two distinct levels. First, on an abstract conceptual level, it provides a unifying framework for perception, action, and cognition (including subdivisions such as attention, expectation, and imagination). Second, hierarchical prediction offers progress on a concrete descriptive level for testing and constraining conceptual elements and mechanisms of predictive coding models (estimation of predictions, prediction errors, and internal models)

    Attention is more than prediction precision [Commentary on target article]

    Get PDF
    A cornerstone of the target article is that, in a predictive coding framework, attention can be modelled by weighting prediction error with a measure of precision. We argue that this is not a complete explanation, especially in the light of ERP (event-related potentials) data showing large evoked responses for frequently presented target stimuli, which thus are predicted

    Early Somatosensory Processing and Crossmodal Influences

    Get PDF
    Sensory stimuli from distinct modalities are continuously linked together by the brain to create a cohesive percept of the surrounding environment—a process known as multisensory integration. Furthermore, sensory information from one modality has been shown to alter the processing of another modality. This phenomenon, now referred to as crossmodal sensory integration, has led to an abundance of research, with many studies reporting enhanced cortical responses when stimuli from different modalities (i.e., visual) occur in close temporal proximity to the onset of a tactile stimulus. Due to current COVID-19 pandemic, a time-frequency analysis (event-related spectral perturbation) of two related datasets (Faerman & Staines, 2019; Popovich & Staines, 2014) was performed in the current work. In both studies, participants were asked to attend only to crossmodal stimuli and to determine the amplitude of both the visually presented horizontal bars and vibrotactile stimuli, while electroencephalography (EEG) was recorded. Conditions involved several blocks of randomized trials with different temporal latencies between the onset of visual and tactile stimuli (i.e., 0-100ms, 100-200ms, 200-300ms). In addition, participants applied a force graded motor response using a pressure sensitive bulb, meant to represent the summation of both stimulus amplitudes. Researchers found that P50 amplitude was greatest in conditions where visual stimuli preceded tactile stimuli with later latencies of onset (0-100ms for Popovich & Staines (2014); and 200-300ms for Faerman & Staines (2019)). Given the P50 modulation reported in the studies above, the objective of the current work was to examine excitability changes of parietal cortex using de(synchronizations) in mainly the beta, alpha, and theta frequency bands, believed to occur in response to a task where both the timing and relevance of crossmodal (visual-tactile) events were manipulated. The rationale for this approach is supported by past studies that have demonstrated links between beta, alpha, and theta de(synchronizations) and a role in both sensorimotor integration and certain attentional processes (Barutchu et al., 2013; Lalo, Gilbertson, & Doyle, 2007; Siegel, Warden, & Miller). De(synchronizations) of neuronal activity are connected to the coupling and uncoupling of functional networks in the brain. Therefore, it is believed that repetitive and synchronous neuronal firing promotes the activation of functional networks because it increases the chances that neurons entrain each other in synchronous firing, and vice versa (Bastiaansen, Mazaheri, Jensen, 2012). With this background information in mind, the general hypotheses were that beta band (13-30Hz) synchronization would be greatest when a visual stimulus preceded a tactile stimulus by 100ms compared to when a tactile stimulus preceded a visual stimulus by 100ms, and that both theta and alpha synchronization would be influenced by the interaction of attention and top- down/bottom-up influences, represented by the attentional demand and the temporal relationships of the sensory processing stimuli. A one-way repeated measures analysis of variance (RM-ANOVA) confirmed a strong effect of stimulus for the theta frequency at frontal site(s), with Tukey’s post-hoc tests revealing a significant difference between the experimental condition where visual and tactile stimuli were presented simultaneously and the condition where tactile stimuli preceded visual stimuli by 100ms. A main effect of stimulus was also found for the alpha frequency range at central-parietal sites, with Tukey’s post-hoc test revealing a significant difference when visual information preceded tactile stimuli by 100-200ms and 200- 300ms. It is quite possible that the crossmodal nature of the task used in both experiments is driving, at least in part, the alpha-theta synchronizations discussed, perhaps in a similar manner to the modulations of specific ERP components (i.e., P50, P100) reported in previous studies; however further research must be conducted to provide clarity

    Engineering Data Compendium. Human Perception and Performance, Volume 1

    Get PDF
    The concept underlying the Engineering Data Compendium was the product an R and D program (Integrated Perceptual Information for Designers project) aimed at facilitating the application of basic research findings in human performance to the design of military crew systems. The principal objective was to develop a workable strategy for: (1) identifying and distilling information of potential value to system design from existing research literature, and (2) presenting this technical information in a way that would aid its accessibility, interpretability, and applicability by system designers. The present four volumes of the Engineering Data Compendium represent the first implementation of this strategy. This is Volume 1, which contains sections on Visual Acquisition of Information, Auditory Acquisition of Information, and Acquisition of Information by Other Senses

    Activity in area V3A predicts positions of moving objects

    Get PDF
    No description supplie
    corecore