106 research outputs found

    Self-localization in Ad Hoc Indoor Acoustic Networks

    Get PDF
    The increasing use of mobile technology in everyday life has aroused interest into developing new ways of utilizing the data collected by devices such as mobile phones and wearable devices. Acoustic sensors can be used to localize sound sources if the positions of spatially separate sensors are known or can be determined. However, the process of determining the 3D coordinates by manual measurements is tedious especially with increasing number of sensors. Therefore, the localization process has to be automated. Satellite based positioning is imprecise for many applications and requires line-of-sight to the sky. This thesis studies localization methods for wireless acoustic sensor networks and the process is called self-localization.This thesis focuses on self-localization from sound, and therefore the term acoustic is used. Furthermore, the development of the methods aims at utilizing ad hoc sensor networks, which means that the sensors are not necessarily installed in the premises like meeting rooms and other purpose-built spaces, which often have dedicated audio hardware for spatial audio applications. Instead of relying on such spaces and equipment, mobile devices are used, which are combined to form sensor networks.For instance, a few mobile phones laid on a table can be used to create a sensor network built for an event and it is inherently dismantled once the event is over, which explains the use of the term ad hoc. Once positions of the devices are estimated, the network can be used for spatial applications such as sound source localization and audio enhancement via spatial filtering. The main purpose of this thesis is to present the methods for self-localization of such an ad hoc acoustic sensor network. Using off-the-shelf ad hoc devices to establish sensor networks enables implementation of many spatial algorithms basically in any environment.Several acoustic self-localization methods have been introduced over the years. However, they often rely on specialized hardware and calibration signals. This thesis presents methods that are passive and utilize environmental sounds such as speech from which, by using time delay estimation, the spatial information of the sensor network can be determined. Many previous self-localization methods assume that audio captured by the sensors is synchronized. This assumption cannot be made in an ad hoc sensor network, since the different sensors are unaware of each other without specific signaling that is not available without special arrangement.The methods developed in this thesis are evaluated with simulations and real data recordings. Scenarios in which the targets of positioning are stationary and in motion are studied. The real world recordings are made in closed spaces such as meeting rooms. The targets are approximately 1 – 5 meters apart. The positioning accuracy is approximately five centimeters in a stationary scenario, and ten centimeters in a moving-target scenario on average. The most important result of this thesis is presenting the first self-localization method that uses environmental sounds and off-the-shelf unsynchronized devices, and allows the targets of self-localization to move

    잔향 환경에서의 인공 음향 신호를 이용한 음향 센서 위치 추정 기술

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 김남수.Widespread use of smart devices has brought a growth of user-customized services. In particular, localization techniques have been gaining attention due to increase of location-based services (LBS). Most of LBS services such as navigation systems, traffic alerts or augmented reality (AR) services depend on the GPS for its accuracy and speed, however, its operation is limited to the outdoor environments. The demand of indoor LBS is rapidly growing due to the growth of automated home and IoT technology. There have been studies via WiFi, Bluetooth or RFID, but their performance has been unsatisfactory for their limitation such as the requirement of additional equipment or guarantee of the line of sight. Among various sensors used for indoor localization, we focus on the acoustic sensors, i.e. microphones. There are several advantages in using the acoustic signals for indoor localization. There is no need for additional apparatus since loudspeakers are pre-installed in most of the buildings for the purpose of announcement or playing background music and mobile devices such as cellphones or tablets are equipped with microphones and loudspeakers. Even the prevailing popularity of IoT services helps accessibility of acoustical sensors and loudspeakers. In addition, acoustic signals have advantages of being able to detect signals through obstacles unlike cameras of RFID. In this thesis, we propose a position estimation system using acoustic signals to maximize these advantages. We aim to estimate the position of the target user with an acoustic sensor based on the recording of signals from the fixed loudspeakers installed around the room. We target to estimate the position of the acoustic sensor with high accuracy and low-complexity in a large space with high reverberation. Particularly, we try not to affect human hearing by using inaudible frequency bands. In order to estimate the position, it is important to estimate the direct path signal rather than the signal due to reverberation or reflection. To do this, we present various localization techniques as following. First, we propose the source data structure to operate in the large reverberant environments. In the large space, the consideration of the near-far effect is required which refers to a situation when the desired signal is far away, it is difficult to receive the desired signal due to the interference of closer unwanted signals. In wireless communications, it can be dealt with by interaction of transmitter and receiver by feedback of channel information. However, it is difficult in the acoustic system since there is no feedback between the transmitter and receiver. We borrowed the structure called OFDMA-CDM and modified it to deal with the near-far effect. In the reverberant environment, the amplitude of reverberation is often larger than the direct path signal. We proposed the technique to estimate the direct path signal. Second, we propose a method for accurate location estimation in the highly reverberant environments. In the high reverberation condition, more spurious reflections occur, which makes it difficult to estimate the time delay of the direct path signal. If the time delay estimation is wrong, it is likely that the position estimate does not converge by an estimation method. In the proposed method, position candidates are obtained from most of the received signals including signals even from spurious reflections. The unreliable candidates are filtered out by the agreement test and rank the rest candidates by their reliability to find accurate target position. We can estimate the receiver's position even in the condition of attenuated direct path signal or high reverberation by using the proposed method. Third, we proposed a low-complexity localization method to work in the highly reverberant environment. This method is based on the particle filter that estimates the position by weighted particles whose weights are computed by the likelihood. We designed likelihood function that efficiently calculates likelihood in the region with the direct path signal so that more reliable position can be obtained. The proposed method enables location estimation with high precision with a relatively small amount of computation in severe reverberation. The proposed methods are evaluated in simulated environments with different reverberation time. The performances are verified in different parameters and compared with other localization methods. In addition, the performance is evaluated in the real reverberant environment with a large space. A series of experiments has shown the superiority of the proposed methods and it is appropriate to apply in the actual environment.1 Introduction 1 2 Acoustic Receiver Localization System 7 2.1 Source data structure 8 2.2 Localization from the received signal 12 2.3 TDE in reverberant environments 16 2.4 Near-far effect 18 3 Indoor Localization using Inaudible Acoustic Signals 21 3.1 Introduction 21 3.2 Acoustic source design and synchronization 22 3.2.1 Reverberation in multipath environments 23 3.2.2 Source data structure for ARL 23 3.2.3 Signal presence detection 30 3.2.4 Direct path detection 30 3.3 Performance evaluation 32 3.3.1 Experimental setup and system configuration 33 3.3.2 Evaluation of acoustic data structure 34 3.3.3 Performance of the direct path detection algorithm 36 3.3.4 Performance in a real room 36 3.4 Summary 38 4 Robust Time Delay Estimation for Acoustic Indoor Localization in Reverberant Environments 39 4.1 Introduction 39 4.2 Robust TDE 40 4.3 Performance evaluation 45 4.3.1 Performance evaluation in a real room 46 4.3.2 Performance evaluation in simulated reverberant conditions 47 4.4 Summary 50 5 Indoor Localization Based on Particle Filtering 53 5.1 Introduction 53 5.2 A framework of positioning method using particle filter 54 5.2.1 State and dynamic models 55 5.2.2 Bayesian framework using particle filter 56 5.2.3 Likelihood function 57 5.3 ARL in reverberant environment 59 5.3.1 Peak quality 59 5.3.2 Efficient calculation of the likelihood function 60 5.3.3 Finding the direct path region 61 5.4 Performance evaluation 64 5.4.1 Performance in a simulated environment 65 5.4.2 Performance in the actual environment 87 5.5 Summary 89 6 Conclusions 91 Bibliography 95 요약 105Docto

    Multi-Modal Target Tracking Using Heterogeneous Sensor Networks

    Full text link
    Abstract—The paper describes a target tracking system run-ning on a Heterogeneous Sensor Network (HSN) and presents results gathered from a realistic deployment. The system fuses audio direction of arrival data from mote class devices and object detection measurements from embedded PCs equipped with cameras. The acoustic sensor nodes perform beamforming and measure the energy as a function of the angle. The camera nodes detect moving objects and estimate their angle. The sensor detections are sent to a centralized sensor fusion node via a combination of two wireless networks. The novelty of our system is the unique combination of target tracking methods customized for the application at hand and their implementation on an actual HSN platform. I

    Audio-based localization for ubiquitous sensor networks

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2005.Includes bibliographical references (p. 97-101).This research presents novel techniques for acoustic-source location for both actively triggered, and passively detected signals using pervasive, distributed networks of devices, and investigates the combination of existing resources available in personal electronics to build a digital sensing 'commons'. By connecting personal resources with those of the people nearby, tasks can be achieved, through distributed placement and statistical improvement, that a single device could not do alone. The utility and benefits of spatio-temporal acoustic sensing are presented, in the context of ubiquitous computing and machine listening history. An active audio self-localisation algorithm is described which is effective in distributed sensor networks even if only coarse temporal synchronisation can be established. Pseudo-noise 'chirps' are emitted and recorded at each of the nodes. Pair-wise distances are calculated by comparing the difference in the audio delays between the peaks measured in each recording. By removing dependence on fine grained temporal synchronisation it is hoped that this technique can be used concurrently across a wide range of devices to better leverage the existing audio sensing resources that surround us.(cont.) A passive acoustic source location estimation method is then derived which is suited to the microphone resources of network-connected heterogeneous devices containing asynchronous processors and uncalibrated sensors. Under these constraints position coordinates must be simultaneously determined for pairs of sounds and recorded at each microphone to form a chain of acoustic events. It is shown that an iterative, numerical least-squares estimator can be used. Initial position estimates of the source pair can be first found from the previous estimate in the chain and a closed-form least squares approach, improving the convergence rate of the second step. Implementations of these methods using the Smart Architectural Surfaces development platform are described and assessed. The viability of the active ranging technique is further demonstrated in a mixed-device ad-hoc sensor network case using existing off-the-shelf technology. Finally, drawing on human-centric onset detection as a means of discovering suitable sound features, to be passed between nodes for comparison, the extension of the source location algorithm beyond the use of pseudo-noise test sounds to enable the location of extraneous noises and acoustic streams is discussed for further study.Benjamin Christopher Dalton.S.M

    The influence of channel and source degradations on intelligibility and physiological measurements of effort

    Get PDF
    Despite the fact that everyday listening is compromised by acoustic degradations, individuals show a remarkable ability to understand degraded speech. However, recent trends in speech perception research emphasise the cognitive load imposed by degraded speech on both normal-hearing and hearing-impaired listeners. The perception of degraded speech is often studied through channel degradations such as background noise. However, source degradations determined by talkers’ acoustic-phonetic characteristics have been studied to a lesser extent, especially in the context of listening effort models. Similarly, little attention has been given to speaking effort, i.e., effort experienced by talkers when producing speech under channel degradations. This thesis aims to provide a holistic understanding of communication effort, i.e., taking into account both listener and talker factors. Three pupillometry studies are presented. In the first study, speech was recorded for 16 Southern British English speakers and presented to normal-hearing listeners in quiet and in combination with three degradations: noise-vocoding, masking and time-compression. Results showed that acoustic-phonetic talker characteristics predicted intelligibility of degraded speech, but not listening effort, as likely indexed by pupil dilation. In the second study, older hearing-impaired listeners were presented fast time-compressed speech under simulated room acoustics. Intelligibility was kept at high levels. Results showed that both fast speech and reverberant speech were associated with higher listening effort, as suggested by pupillometry. Discrepancies between pupillometry and perceived effort ratings suggest that both methods should be employed in speech perception research to pinpoint processing effort. While findings from the first two studies support models of degraded speech perception, emphasising the relevance of source degradations, they also have methodological implications for pupillometry paradigms. In the third study, pupillometry was combined with a speech production task, aiming to establish an equivalent to listening effort for talkers: speaking effort. Normal-hearing participants were asked to read and produce speech in quiet or in the presence of different types of masking: stationary and modulated speech-shaped noise, and competing-talker masking. Results indicated that while talkers acoustically enhance their speech more under stationary masking, larger pupil dilation associated with competing-speaker masking reflected higher speaking effort. Results from all three studies are discussed in conjunction with models of degraded speech perception and production. Listening effort models are revisited to incorporate pupillometry results from speech production paradigms. Given the new approach of investigating source factors using pupillometry, methodological issues are discussed as well. The main insight provided by this thesis, i.e., the feasibility of applying pupillometry to situations involving listener and talker factors, is suggested to guide future research employing naturalistic conversations

    Enabling technologies for audio augmented reality systems

    Get PDF
    Audio augmented reality (AAR) refers to technology that embeds computer-generated auditory content into a user's real acoustic environment. An AAR system has specific requirements that set it apart from regular human--computer interfaces: an audio playback system to allow the simultaneous perception of real and virtual sounds; motion tracking to enable interactivity and location-awareness; the design and implementation of auditory display to deliver AAR content; and spatial rendering to display spatialised AAR content. This thesis presents a series of studies on enabling technologies to meet these requirements. A binaural headset with integrated microphones is assumed as the audio playback system, as it allows mobility and precise control over the ear input signals. Here, user position and orientation tracking methods are proposed that rely on speech signals recorded at the binaural headset microphones. To evaluate the proposed methods, the head orientations and positions of three conferees engaged in a discussion were tracked. The binaural microphones improved tracking performance substantially. The proposed methods are applicable to acoustic tracking with other forms of user-worn microphones. Results from a listening test investigating the effect of auditory display parameters on user performance are reported. The parameters studied were derived from the design choices to be made when implementing auditory display. The results indicate that users are able to detect a sound sample among distractors and estimate sample numerosity accurately with both speech and non-speech audio, if the samples are presented with adequate temporal separation. Whether or not samples were separated spatially had no effect on user performance. However, with spatially separated samples, users were able to detect a sample among distractors and simultaneously localise it. The results of this study are applicable to a variety of AAR applications that require conveying sample presence or numerosity. Spatial rendering is commonly implemented by convolving virtual sounds with head-related transfer functions (HRTFs). Here, a framework is proposed that interpolates HRTFs measured at arbitrary directions and distances. The framework employs Delaunay triangulation to group HRTFs into subsets suitable for interpolation and barycentric coordinates as interpolation weights. The proposed interpolation framework allows the realtime rendering of virtual sources in the near-field via HRTFs measured at various distances
    corecore