4,390 research outputs found

    Robust sound event detection in bioacoustic sensor networks

    Full text link
    Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record sounds of wildlife over long periods of time in scalable and minimally invasive ways. Deriving per-species abundance estimates from these sensors requires detection, classification, and quantification of animal vocalizations as individual acoustic events. Yet, variability in ambient noise, both over time and across sensors, hinders the reliability of current automated systems for sound event detection (SED), such as convolutional neural networks (CNN) in the time-frequency domain. In this article, we develop, benchmark, and combine several machine listening techniques to improve the generalizability of SED models across heterogeneous acoustic environments. As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise. Starting from a CNN yielding state-of-the-art accuracy on this task, we introduce two noise adaptation techniques, respectively integrating short-term (60 milliseconds) and long-term (30 minutes) context. First, we apply per-channel energy normalization (PCEN) in the time-frequency domain, which applies short-term automatic gain control to every subband in the mel-frequency spectrogram. Secondly, we replace the last dense layer in the network by a context-adaptive neural network (CA-NN) layer. Combining them yields state-of-the-art results that are unmatched by artificial data augmentation alone. We release a pre-trained version of our best performing system under the name of BirdVoxDetect, a ready-to-use detector of avian flight calls in field recordings.Comment: 32 pages, in English. Submitted to PLOS ONE journal in February 2019; revised August 2019; published October 201

    Fiber Optic Acoustic Sensing to Understand and Affect the Rhythm of the Cities: Proof-of-Concept to Create Data-Driven Urban Mobility Models

    Get PDF
    In the framework of massive sensing and smart sustainable cities, this work presents an urban distributed acoustic sensing testbed in the vicinity of the School of Technology and Telecommunication Engineering of the University of Granada, Spain. After positioning the sensing technology and the state of the art of similar existing approaches, the results of the monitoring experiment are described. Details of the sensing scenario, basic types of events automatically distinguishable, initial noise removal actions and frequency and signal complexity analysis are provided. The experiment, used as a proof-of-concept, shows the enormous potential of the sensing technology to generate data-driven urban mobility models. In order to support this fact, examples of preliminary density of traffic analysis and average speed calculation for buses, cars and pedestrians in the testbed’s neighborhood are exposed, together with the accidental presence of a local earthquake. Challenges, benefits and future research directions of this sensing technology are pointed out.B-TIC-542-UGR20 funded by “Consejería de Universidad, Investigación e Innovacción de la Junta de AndalucíaERDF A way of making Europ

    Civilian Target Recognition using Hierarchical Fusion

    Get PDF
    The growth of computer vision technology has been marked by attempts to imitate human behavior to impart robustness and confidence to the decision making process of automated systems. Examples of disciplines in computer vision that have been targets of such efforts are Automatic Target Recognition (ATR) and fusion. ATR is the process of aided or unaided target detection and recognition using data from different sensors. Usually, it is synonymous with its military application of recognizing battlefield targets using imaging sensors. Fusion is the process of integrating information from different sources at the data or decision levels so as to provide a single robust decision as opposed to multiple individual results. This thesis combines these two research areas to provide improved classification accuracy in recognizing civilian targets. The results obtained reaffirm that fusion techniques tend to improve the recognition rates of ATR systems. Previous work in ATR has mainly dealt with military targets and single level of data fusion. Expensive sensors and time-consuming algorithms are generally used to improve system performance. In this thesis, civilian target recognition, which is considered to be harder than military target recognition, is performed. Inexpensive sensors are used to keep the system cost low. In order to compensate for the reduced system ability, fusion is performed at two different levels of the ATR system { event level and sensor level. Only preliminary image processing and pattern recognition techniques have been used so as to maintain low operation times. High classification rates are obtained using data fusion techniques alone. Another contribution of this thesis is the provision of a single framework to perform all operations from target data acquisition to the final decision making. The Sensor Fusion Testbed (SFTB) designed by Northrop Grumman Systems has been used by the Night Vision & Electronic Sensors Directorate to obtain images of seven different types of civilian targets. Image segmentation is performed using background subtraction. The seven invariant moments are extracted from the segmented image and basic classification is performed using k Nearest Neighbor method. Cross-validation is used to provide a better idea of the classification ability of the system. Temporal fusion at the event level is performed using majority voting and sensor level fusion is done using Behavior-Knowledge Space method. Two separate databases were used. The first database uses seven targets (2 cars, 2 SUVs, 2 trucks and 1 stake body light truck). Individual frame, temporal fusion and BKS fusion results are around 65%, 70% and 77% respectively. The second database has three targets (cars, SUVs and trucks) formed by combining classes from the first database. Higher classification accuracies are observed here. 75%, 90% and 95% recognition rates are obtained at frame, event and sensor levels. It can be seen that, on an average, recognition accuracy improves with increasing levels of fusion. Also, distance-based classification was performed to study the variation of system performance with the distance of the target from the cameras. The results are along expected lines and indicate the efficacy of fusion techniques for the ATR problem. Future work using more complex image processing and pattern recognition routines can further improve the classification performance of the system. The SFTB can be equipped with these algorithms and field-tested to check real-time performance

    Fully automated urban traffic system

    Get PDF
    The replacement of the driver with an automatic system which could perform the functions of guiding and routing a vehicle with a human's capability of responding to changing traffic demands was discussed. The problem was divided into four technological areas; guidance, routing, computing, and communications. It was determined that the latter three areas being developed independent of any need for fully automated urban traffic. A guidance system that would meet system requirements was not being developed but was technically feasible

    Strategies for Searching Video Content with Text Queries or Video Examples

    Full text link
    The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search. However, metadata is often lacking for user-generated videos, thus these videos are unsearchable by current search engines. Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity problem by directly analyzing the visual and audio streams of each video. CBVR encompasses multiple research topics, including low-level feature design, feature fusion, semantic detector training and video search/reranking. We present novel strategies in these topics to enhance CBVR in both accuracy and speed under different query inputs, including pure textual queries and query by video examples. Our proposed strategies have been incorporated into our submission for the TRECVID 2014 Multimedia Event Detection evaluation, where our system outperformed other submissions in both text queries and video example queries, thus demonstrating the effectiveness of our proposed approaches

    Fall Detection Using Neural Networks

    Get PDF
    Falls inside of the home is a major concern facing the aging population. Monitoring the home environment to detect a fall can prevent profound consequences due to delayed emergency response. One option to monitor a home environment is to use a camera-based fall detection system. Conceptual designs vary from 3D positional monitoring (multi-camera monitoring) to body position and limb speed classification. Research shows varying degree of success with such concepts when designed with multi-camera setup. However, camera-based systems are inherently intrusive and costly to implement. In this research, we use a sound-based system to detect fall events. Acoustic sensors are used to monitor various sound events and feed a trained machine learning model that makes predictions of a fall events. Audio samples from the sensors are converted to frequency domain images using Mel-Frequency Cepstral Coefficients method. These images are used by a trained convolution neural network to predict a fall. A publicly available dataset of household sounds is used to train the model. Varying the model\u27s complexity, we found an optimal architecture that achieves high performance while being computationally less extensive compared to the other models with similar performance. We deployed this model in a NVIDIA Jetson Nano Developer Kit
    • …
    corecore