4,390 research outputs found
Robust sound event detection in bioacoustic sensor networks
Bioacoustic sensors, sometimes known as autonomous recording units (ARUs),
can record sounds of wildlife over long periods of time in scalable and
minimally invasive ways. Deriving per-species abundance estimates from these
sensors requires detection, classification, and quantification of animal
vocalizations as individual acoustic events. Yet, variability in ambient noise,
both over time and across sensors, hinders the reliability of current automated
systems for sound event detection (SED), such as convolutional neural networks
(CNN) in the time-frequency domain. In this article, we develop, benchmark, and
combine several machine listening techniques to improve the generalizability of
SED models across heterogeneous acoustic environments. As a case study, we
consider the problem of detecting avian flight calls from a ten-hour recording
of nocturnal bird migration, recorded by a network of six ARUs in the presence
of heterogeneous background noise. Starting from a CNN yielding
state-of-the-art accuracy on this task, we introduce two noise adaptation
techniques, respectively integrating short-term (60 milliseconds) and long-term
(30 minutes) context. First, we apply per-channel energy normalization (PCEN)
in the time-frequency domain, which applies short-term automatic gain control
to every subband in the mel-frequency spectrogram. Secondly, we replace the
last dense layer in the network by a context-adaptive neural network (CA-NN)
layer. Combining them yields state-of-the-art results that are unmatched by
artificial data augmentation alone. We release a pre-trained version of our
best performing system under the name of BirdVoxDetect, a ready-to-use detector
of avian flight calls in field recordings.Comment: 32 pages, in English. Submitted to PLOS ONE journal in February 2019;
revised August 2019; published October 201
Fiber Optic Acoustic Sensing to Understand and Affect the Rhythm of the Cities: Proof-of-Concept to Create Data-Driven Urban Mobility Models
In the framework of massive sensing and smart sustainable cities, this work presents an
urban distributed acoustic sensing testbed in the vicinity of the School of Technology and Telecommunication
Engineering of the University of Granada, Spain. After positioning the sensing technology
and the state of the art of similar existing approaches, the results of the monitoring experiment are
described. Details of the sensing scenario, basic types of events automatically distinguishable, initial
noise removal actions and frequency and signal complexity analysis are provided. The experiment,
used as a proof-of-concept, shows the enormous potential of the sensing technology to generate
data-driven urban mobility models. In order to support this fact, examples of preliminary density
of traffic analysis and average speed calculation for buses, cars and pedestrians in the testbed’s
neighborhood are exposed, together with the accidental presence of a local earthquake. Challenges,
benefits and future research directions of this sensing technology are pointed out.B-TIC-542-UGR20 funded by “ConsejerÃa de Universidad,
Investigación e Innovacción de la Junta de AndalucÃaERDF A way of making
Europ
Civilian Target Recognition using Hierarchical Fusion
The growth of computer vision technology has been marked by attempts to imitate human behavior to impart robustness and confidence to the decision making process of automated systems. Examples of disciplines in computer vision that have been targets of such efforts are Automatic Target Recognition (ATR) and fusion. ATR is the process of aided or unaided target detection and recognition using data from different sensors. Usually, it is synonymous with its military application of recognizing battlefield targets using imaging sensors. Fusion is the process of integrating information from different sources at the data or decision levels so as to provide a single robust decision as opposed to multiple individual results. This thesis combines these two research areas to provide improved classification accuracy in recognizing civilian targets. The results obtained reaffirm that fusion techniques tend to improve the recognition rates of ATR systems.
Previous work in ATR has mainly dealt with military targets and single level of data fusion. Expensive sensors and time-consuming algorithms are generally used to improve system performance. In this thesis, civilian target recognition, which is considered to be harder than military target recognition, is performed. Inexpensive sensors are used to keep the system cost low. In order to compensate for the reduced system ability, fusion is performed at two different levels of the ATR system { event level and sensor level. Only preliminary image processing and pattern recognition techniques have been used so as to maintain low operation times. High classification rates are obtained using data fusion techniques alone. Another contribution of this thesis is the provision of a single framework to perform all operations from target data acquisition to the final decision making.
The Sensor Fusion Testbed (SFTB) designed by Northrop Grumman Systems has been used by the Night Vision & Electronic Sensors Directorate to obtain images of seven different types of civilian targets. Image segmentation is performed using background subtraction. The seven invariant moments are extracted from the segmented image and basic classification is performed using k Nearest Neighbor method. Cross-validation is used to provide a better idea of the classification ability of the system. Temporal fusion at the event level is performed using majority voting and sensor level fusion is done using Behavior-Knowledge Space method.
Two separate databases were used. The first database uses seven targets (2 cars, 2 SUVs, 2 trucks and 1 stake body light truck). Individual frame, temporal fusion and BKS fusion results are around 65%, 70% and 77% respectively. The second database has three targets (cars, SUVs and trucks) formed by combining classes from the first database. Higher classification accuracies are observed here. 75%, 90% and 95% recognition rates are obtained at frame, event and sensor levels. It can be seen that, on an average, recognition accuracy improves with increasing levels of fusion. Also, distance-based classification was performed to
study the variation of system performance with the distance of the target from the cameras. The results are along expected lines and indicate the efficacy of fusion techniques for the ATR problem. Future work using more complex image processing and pattern recognition routines can further improve the classification performance of the system. The SFTB can be equipped with these algorithms and field-tested to check real-time performance
Fully automated urban traffic system
The replacement of the driver with an automatic system which could perform the functions of guiding and routing a vehicle with a human's capability of responding to changing traffic demands was discussed. The problem was divided into four technological areas; guidance, routing, computing, and communications. It was determined that the latter three areas being developed independent of any need for fully automated urban traffic. A guidance system that would meet system requirements was not being developed but was technically feasible
Strategies for Searching Video Content with Text Queries or Video Examples
The large number of user-generated videos uploaded on to the Internet
everyday has led to many commercial video search engines, which mainly rely on
text metadata for search. However, metadata is often lacking for user-generated
videos, thus these videos are unsearchable by current search engines.
Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity
problem by directly analyzing the visual and audio streams of each video. CBVR
encompasses multiple research topics, including low-level feature design,
feature fusion, semantic detector training and video search/reranking. We
present novel strategies in these topics to enhance CBVR in both accuracy and
speed under different query inputs, including pure textual queries and query by
video examples. Our proposed strategies have been incorporated into our
submission for the TRECVID 2014 Multimedia Event Detection evaluation, where
our system outperformed other submissions in both text queries and video
example queries, thus demonstrating the effectiveness of our proposed
approaches
Fall Detection Using Neural Networks
Falls inside of the home is a major concern facing the aging population. Monitoring the home environment to detect a fall can prevent profound consequences due to delayed emergency response. One option to monitor a home environment is to use a camera-based fall detection system. Conceptual designs vary from 3D positional monitoring (multi-camera monitoring) to body position and limb speed classification. Research shows varying degree of success with such concepts when designed with multi-camera setup. However, camera-based systems are inherently intrusive and costly to implement. In this research, we use a sound-based system to detect fall events. Acoustic sensors are used to monitor various sound events and feed a trained machine learning model that makes predictions of a fall events. Audio samples from the sensors are converted to frequency domain images using Mel-Frequency Cepstral Coefficients method. These images are used by a trained convolution neural network to predict a fall. A publicly available dataset of household sounds is used to train the model. Varying the model\u27s complexity, we found an optimal architecture that achieves high performance while being computationally less extensive compared to the other models with similar performance. We deployed this model in a NVIDIA Jetson Nano Developer Kit
- …