10,926 research outputs found
Improving acoustic vehicle classification by information fusion
We present an information fusion approach for ground vehicle classification based on the emitted acoustic signal. Many acoustic factors can contribute to the classification accuracy of working ground vehicles. Classification relying on a single feature set may lose some useful information if its underlying sound production model is not comprehensive. To improve classification accuracy, we consider an information fusion diagram, in which various aspects of an acoustic signature are taken into account and emphasized separately by two different feature extraction methods. The first set of features aims to represent internal sound production, and a number of harmonic components are extracted to characterize the factors related to the vehicle’s resonance. The second set of features is extracted based on a computationally effective discriminatory analysis, and a group of key frequency components are selected by mutual information, accounting for the sound production from the vehicle’s exterior parts. In correspondence with this structure, we further put forward a modifiedBayesian fusion algorithm, which takes advantage of matching each specific feature set with its favored classifier. To assess the proposed approach, experiments are carried out based on a data set containing acoustic signals from different types of vehicles. Results indicate that the fusion approach can effectively increase classification accuracy compared to that achieved using each individual features set alone. The Bayesian-based decision level fusion is found fusion is found to be improved than a feature level fusion approac
Multimodal person recognition for human-vehicle interaction
Next-generation vehicles will undoubtedly feature biometric person recognition as part of an effort to improve the driving experience. Today's technology prevents such systems from operating satisfactorily under adverse conditions. A proposed framework for achieving person recognition successfully combines different biometric modalities, borne out in two case studies
Smart Traction Control Systems for Electric Vehicles Using Acoustic Road-type Estimation
The application of traction control systems (TCS) for electric vehicles (EV)
has great potential due to easy implementation of torque control with
direct-drive motors. However, the control system usually requires road-tire
friction and slip-ratio values, which must be estimated. While it is not
possible to obtain the first one directly, the estimation of latter value
requires accurate measurements of chassis and wheel velocity. In addition,
existing TCS structures are often designed without considering the robustness
and energy efficiency of torque control. In this work, both problems are
addressed with a smart TCS design having an integrated acoustic road-type
estimation (ARTE) unit. This unit enables the road-type recognition and this
information is used to retrieve the correct look-up table between friction
coefficient and slip-ratio. The estimation of the friction coefficient helps
the system to update the necessary input torque. The ARTE unit utilizes machine
learning, mapping the acoustic feature inputs to road-type as output. In this
study, three existing TCS for EVs are examined with and without the integrated
ARTE unit. The results show significant performance improvement with ARTE,
reducing the slip ratio by 75% while saving energy via reduction of applied
torque and increasing the robustness of the TCS.Comment: Accepted to be published by IEEE Trans. on Intelligent Vehicles, 22
Jan 201
Comparative Study of Different Methods in Vibration-Based Terrain Classification for Wheeled Robots with Shock Absorbers
open access articleAutonomous robots that operate in the field can enhance their security and efficiency by
accurate terrain classification, which can be realized by means of robot-terrain interaction-generated
vibration signals. In this paper, we explore the vibration-based terrain classification (VTC),
in particular for a wheeled robot with shock absorbers. Because the vibration sensors are
usually mounted on the main body of the robot, the vibration signals are dampened significantly,
which results in the vibration signals collected on different terrains being more difficult to
discriminate. Hence, the existing VTC methods applied to a robot with shock absorbers may degrade.
The contributions are two-fold: (1) Several experiments are conducted to exhibit the performance of
the existing feature-engineering and feature-learning classification methods; and (2) According to
the long short-term memory (LSTM) network, we propose a one-dimensional convolutional LSTM
(1DCL)-based VTC method to learn both spatial and temporal characteristics of the dampened
vibration signals. The experiment results demonstrate that: (1) The feature-engineering methods,
which are efficient in VTC of the robot without shock absorbers, are not so accurate in our project;
meanwhile, the feature-learning methods are better choices; and (2) The 1DCL-based VTC method
outperforms the conventional methods with an accuracy of 80.18%, which exceeds the second method
(LSTM) by 8.23%
Comparing CNN and Human Crafted Features for Human Activity Recognition
Deep learning techniques such as Convolutional
Neural Networks (CNNs) have shown good results in activity
recognition. One of the advantages of using these methods resides
in their ability to generate features automatically. This ability
greatly simplifies the task of feature extraction that usually
requires domain specific knowledge, especially when using big
data where data driven approaches can lead to anti-patterns.
Despite the advantage of this approach, very little work has
been undertaken on analyzing the quality of extracted features,
and more specifically on how model architecture and parameters
affect the ability of those features to separate activity classes
in the final feature space. This work focuses on identifying the
optimal parameters for recognition of simple activities applying
this approach on both signals from inertial and audio sensors.
The paper provides the following contributions: (i) a comparison
of automatically extracted CNN features with gold standard
Human Crafted Features (HCF) is given, (ii) a comprehensive
analysis on how architecture and model parameters affect separation
of target classes in the feature space. Results are evaluated
using publicly available datasets. In particular, we achieved a
93.38% F-Score on the UCI-HAR dataset, using 1D CNNs with
3 convolutional layers and 32 kernel size, and a 90.5% F-Score
on the DCASE 2017 development dataset, simplified for three
classes (indoor, outdoor and vehicle), using 2D CNNs with 2
convolutional layers and a 2x2 kernel size
Listening for Sirens: Locating and Classifying Acoustic Alarms in City Scenes
This paper is about alerting acoustic event detection and sound source
localisation in an urban scenario. Specifically, we are interested in spotting
the presence of horns, and sirens of emergency vehicles. In order to obtain a
reliable system able to operate robustly despite the presence of traffic noise,
which can be copious, unstructured and unpredictable, we propose to treat the
spectrograms of incoming stereo signals as images, and apply semantic
segmentation, based on a Unet architecture, to extract the target sound from
the background noise. In a multi-task learning scheme, together with signal
denoising, we perform acoustic event classification to identify the nature of
the alerting sound. Lastly, we use the denoised signals to localise the
acoustic source on the horizon plane, by regressing the direction of arrival of
the sound through a CNN architecture. Our experimental evaluation shows an
average classification rate of 94%, and a median absolute error on the
localisation of 7.5{\deg} when operating on audio frames of 0.5s, and of
2.5{\deg} when operating on frames of 2.5s. The system offers excellent
performance in particularly challenging scenarios, where the noise level is
remarkably high.Comment: 6 pages, 9 figure
Polyphonic Sound Event Detection by using Capsule Neural Networks
Artificial sound event detection (SED) has the aim to mimic the human ability
to perceive and understand what is happening in the surroundings. Nowadays,
Deep Learning offers valuable techniques for this goal such as Convolutional
Neural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has
been recently introduced in the image processing field with the intent to
overcome some of the known limitations of CNNs, specifically regarding the
scarce robustness to affine transformations (i.e., perspective, size,
orientation) and the detection of overlapped images. This motivated the authors
to employ CapsNets to deal with the polyphonic-SED task, in which multiple
sound events occur simultaneously. Specifically, we propose to exploit the
capsule units to represent a set of distinctive properties for each individual
sound event. Capsule units are connected through a so-called "dynamic routing"
that encourages learning part-whole relationships and improves the detection
performance in a polyphonic context. This paper reports extensive evaluations
carried out on three publicly available datasets, showing how the CapsNet-based
algorithm not only outperforms standard CNNs but also allows to achieve the
best results with respect to the state of the art algorithms
- …