1,631 research outputs found

    Energy-aware Scheduling of Surveillance in Wireless Multimedia Sensor Networks

    Get PDF
    Wireless sensor networks involve a large number of sensor nodes with limited energy supply, which impacts the behavior of their application. In wireless multimedia sensor networks, sensor nodes are equipped with audio and visual information collection modules. Multimedia contents are ubiquitously retrieved in surveillance applications. To solve the energy problems during target surveillance with wireless multimedia sensor networks, an energy-aware sensor scheduling method is proposed in this paper. Sensor nodes which acquire acoustic signals are deployed randomly in the sensing fields. Target localization is based on the signal energy feature provided by multiple sensor nodes, employing particle swarm optimization (PSO). During the target surveillance procedure, sensor nodes are adaptively grouped in a totally distributed manner. Specially, the target motion information is extracted by a forecasting algorithm, which is based on the hidden Markov model (HMM). The forecasting results are utilized to awaken sensor node in the vicinity of future target position. According to the two properties, signal energy feature and residual energy, the sensor nodes decide whether to participate in target detection separately with a fuzzy control approach. Meanwhile, the local routing scheme of data transmission towards the observer is discussed. Experimental results demonstrate the efficiency of energy-aware scheduling of surveillance in wireless multimedia sensor network, where significant energy saving is achieved by the sensor awakening approach and data transmission paths are calculated with low computational complexity

    3D AUDIO-VISUAL SPEAKER TRACKING WITH AN ADAPTIVE PARTICLE FILTER

    Get PDF
    reserved4siWe propose an audio-visual fusion algorithm for 3D speaker tracking from a localised multi-modal sensor platform composed of a camera and a small microphone array. After extracting audio-visual cues from individual modalities we fuse them adaptively using their reliability in a particle filter framework. The reliability of the audio signal is measured based on the maximum Global Coherence Field (GCF) peak value at each frame. The visual reliability is based on colour-histogram matching with detection results compared with a reference image in the RGB space. Experiments on the AV16.3 dataset show that the proposed adaptive audio-visual tracker outperforms both the individual modalities and a classical approach with fixed parameters in terms of tracking accuracy.Qian, Xinyuan; Brutti, Alessio; Omologo, Maurizio; Cavallaro, AndreaQian, Xinyuan; Brutti, Alessio; Omologo, Maurizio; Cavallaro, Andre

    Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

    Full text link
    Audio-visual speaker tracking has drawn increasing attention over the past few years due to its academic values and wide application. Audio and visual modalities can provide complementary information for localization and tracking. With audio and visual information, the Bayesian-based filter can solve the problem of data association, audio-visual fusion and track management. In this paper, we conduct a comprehensive overview of audio-visual speaker tracking. To our knowledge, this is the first extensive survey over the past five years. We introduce the family of Bayesian filters and summarize the methods for obtaining audio-visual measurements. In addition, the existing trackers and their performance on AV16.3 dataset are summarized. In the past few years, deep learning techniques have thrived, which also boosts the development of audio visual speaker tracking. The influence of deep learning techniques in terms of measurement extraction and state estimation is also discussed. At last, we discuss the connections between audio-visual speaker tracking and other areas such as speech separation and distributed speaker tracking

    Audiovisual head orientation estimation with particle filtering in multisensor scenarios

    Get PDF
    This article presents a multimodal approach to head pose estimation of individuals in environments equipped with multiple cameras and microphones, such as SmartRooms or automatic video conferencing. Determining the individuals head orientation is the basis for many forms of more sophisticated interactions between humans and technical devices and can also be used for automatic sensor selection (camera, microphone) in communications or video surveillance systems. The use of particle filters as a unified framework for the estimation of the head orientation for both monomodal and multimodal cases is proposed. In video, we estimate head orientation from color information by exploiting spatial redundancy among cameras. Audio information is processed to estimate the direction of the voice produced by a speaker making use of the directivity characteristics of the head radiation pattern. Furthermore, two different particle filter multimodal information fusion schemes for combining the audio and video streams are analyzed in terms of accuracy and robustness. In the first one, fusion is performed at a decision level by combining each monomodal head pose estimation, while the second one uses a joint estimation system combining information at data level. Experimental results conducted over the CLEAR 2006 evaluation database are reported and the comparison of the proposed multimodal head pose estimation algorithms with the reference monomodal approaches proves the effectiveness of the proposed approach.Postprint (published version

    Acoustic Sensor Networks and Mobile Robotics for Sound Source Localization

    Full text link
    © 2019 IEEE. Localizing a sound source is a fundamental but still challenging issue in many applications, where sound information is gathered by static and local microphone sensors. Therefore, this work proposes a new system by exploiting advances in sensor networks and robotics to more accurately address the problem of sound source localization. By the use of the network infrastructure, acoustic sensors are more efficient to spatially monitor acoustical phenomena. Furthermore, a mobile robot is proposed to carry an extra microphone array in order to collect more acoustic signals when it travels around the environment. Driving the robot is guided by the need to increase the quality of the data gathered by the static acoustic sensors, which leads to better probabilistic fusion of all the information gained, so that an increasingly accurate map of the sound source can be built. The proposed system has been validated in a real-life environment, where the obtained results are highly promising

    Online Audio-Visual Multi-Source Tracking and Separation: A Labeled Random Finite Set Approach

    Get PDF
    The dissertation proposes an online solution for separating an unknown and time-varying number of moving sources using audio and visual data. The random finite set framework is used for the modeling and fusion of audio and visual data. This enables an online tracking algorithm to estimate the source positions and identities for each time point. With this information, a set of beamformers can be designed to separate each desired source and suppress the interfering sources
    • …
    corecore