115,591 research outputs found

    Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks

    Full text link
    The propagation of sound in a shallow water environment is characterized by boundary reflections from the sea surface and sea floor. These reflections result in multiple (indirect) sound propagation paths, which can degrade the performance of passive sound source localization methods. This paper proposes the use of convolutional neural networks (CNNs) for the localization of sources of broadband acoustic radiated noise (such as motor vessels) in shallow water multipath environments. It is shown that CNNs operating on cepstrogram and generalized cross-correlogram inputs are able to more reliably estimate the instantaneous range and bearing of transiting motor vessels when the source localization performance of conventional passive ranging methods is degraded. The ensuing improvement in source localization performance is demonstrated using real data collected during an at-sea experiment.Comment: 5 pages, 5 figures, Final draft of paper submitted to 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 15-20 April 2018 in Calgary, Alberta, Canada. arXiv admin note: text overlap with arXiv:1612.0350

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    RRL: A Rich Representation Language for the Description of Agent Behaviour in NECA

    Get PDF
    In this paper, we describe the Rich Representation Language (RRL) which is used in the NECA system. The NECA system generates interactions between two or more animated characters. The RRL is a formal framework for representing the information that is exchanged at the interfaces between the various NECA system modules

    Acoustic simultaneous localization and mapping (A-SLAM) of a moving microphone array and its surrounding speakers

    Get PDF
    Acoustic scene mapping creates a representation of positions of audio sources such as talkers within the surrounding environment of a microphone array. By allowing the array to move, the acoustic scene can be explored in order to improve the map. Furthermore, the spatial diversity of the kinematic array allows for estimation of the source-sensor distance in scenarios where source directions of arrival are measured. As sound source localization is performed relative to the array position, mapping of acoustic sources requires knowledge of the absolute position of the microphone array in the room. If the array is moving, its absolute position is unknown in practice. Hence, Simultaneous Localization and Mapping (SLAM) is required in order to localize the microphone array position and map the surrounding sound sources. In realistic environments, microphone arrays receive a convolutive mixture of direct-path speech signals, noise and reflections due to reverberation. A key challenge of Acoustic SLAM (a-SLAM) is robustness against reverberant clutter measurements and missing source detections. This paper proposes a novel bearing-only a-SLAM approach using a Single-Cluster Probability Hypothesis Density filter. Results demonstrate convergence to accurate estimates of the array trajectory and source positions

    Source bearing and steering-vector estimation using partially calibrated arrays

    Get PDF
    The problem of source direction-of-arrival (DOA) estimation using a sensor array is addressed, where some of the sensors are perfectly calibrated, while others are uncalibrated. An algorithm is proposed for estimating the source directions in addition to the estimation of unknown array parameters such as sensor gains and phases, as a way of performing array self-calibration. The cost function is an extension of the maximum likelihood (ML) criteria that were originally developed for DOA estimation with a perfectly calibrated array. A particle swarm optimization (PSO) algorithm is used to explore the high-dimensional problem space and find the global minimum of the cost function. The design of the PSO is a combination of the problem-independent kernel and some newly introduced problem-specific features such as search space mapping, particle velocity control, and particle position clipping. This architecture plus properly selected parameters make the PSO highly flexible and reusable, while being sufficiently specific and effective in the current application. Simulation results demonstrate that the proposed technique may produce more accurate estimates of the source bearings and unknown array parameters in a cheaper way as compared with other popular methods, with the root-mean-squared error (RMSE) approaching and asymptotically attaining the Cramer Rao bound (CRB) even in unfavorable conditions

    Activity theory: A framework for analysing intercultural academic activity

    Get PDF
    This article suggests that Activity Theory (AT) can be applied as a holistic framework to analyse the complex sociocultural issues that arise when academics wish to engage in collaborative activity across institutional and cultural boundaries. Attention will initially focus on how Activity Theory, first formulated in the 1930s by Leont’ev (1978), and subsequently developed into a second generation by Engeström (1987), can help to analyse and illuminate the inherent complexity within any one community of practice. A more elaborate model of AT (Engeström, 2001) is currently being developed and applied to analyse and illuminate collaborative activity across institutional boundaries, so as to transform discourse communities into speech communities of practice through expansive learning. It is suggested that this ‘third generation’ model can be further refined to analyse specific contact zones, within and between activity systems, as a precursor to undertaking collaborative activity. It is suggested that, when discourse communities deriving from different culturally diverse traditions seek to work together, such an a priori analysis would enable potential areas for miscommunication and misconstrual to be identified and possibly resolved before collaborative activity actually commences

    A user perspective of quality of service in m-commerce

    Get PDF
    This is the post-print version of the Article. The official published version can be accessed from the link below - Copyright @ 2004 Springer VerlagIn an m-commerce setting, the underlying communication system will have to provide a Quality of Service (QoS) in the presence of two competing factors—network bandwidth and, as the pressure to add value to the business-to-consumer (B2C) shopping experience by integrating multimedia applications grows, increasing data sizes. In this paper, developments in the area of QoS-dependent multimedia perceptual quality are reviewed and are integrated with recent work focusing on QoS for e-commerce. Based on previously identified user perceptual tolerance to varying multimedia QoS, we show that enhancing the m-commerce B2C user experience with multimedia, far from being an idealised scenario, is in fact feasible if perceptual considerations are employed

    Convexity in source separation: Models, geometry, and algorithms

    Get PDF
    Source separation or demixing is the process of extracting multiple components entangled within a signal. Contemporary signal processing presents a host of difficult source separation problems, from interference cancellation to background subtraction, blind deconvolution, and even dictionary learning. Despite the recent progress in each of these applications, advances in high-throughput sensor technology place demixing algorithms under pressure to accommodate extremely high-dimensional signals, separate an ever larger number of sources, and cope with more sophisticated signal and mixing models. These difficulties are exacerbated by the need for real-time action in automated decision-making systems. Recent advances in convex optimization provide a simple framework for efficiently solving numerous difficult demixing problems. This article provides an overview of the emerging field, explains the theory that governs the underlying procedures, and surveys algorithms that solve them efficiently. We aim to equip practitioners with a toolkit for constructing their own demixing algorithms that work, as well as concrete intuition for why they work
    • 

    corecore