270 research outputs found
Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection
Speech Activity Detection (SAD) plays an important role in mobile communications and automatic speech recognition (ASR). Developing efficient SAD systems for real-world applications is a challenging task due to the presence of noise. We propose a new approach to SAD where we treat it as a two-dimensional multilabel image classification problem. To classify the audio segments, we compute their Short-time Fourier Transform spectrograms and classify them with a Convolutional Recurrent Neural Network (CRNN), traditionally used in image recognition. Our CRNN uses a sigmoid activation function, max-pooling in the frequency domain, and a convolutional operation as a moving average filter to remove misclassified spikes. On the development set of Task 1 of the 2019 Fearless Steps Challenge, our system achieved a decision cost function (DCF) of 2.89%, a 66.4% improvement over the baseline. Moreover, it achieved a DCF score of 3.318% on the evaluation dataset of the challenge, ranking first among all submissions
FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition
3D Convolutional Neural Networks are gaining increasing attention from
researchers and practitioners and have found applications in many domains, such
as surveillance systems, autonomous vehicles, human monitoring systems, and
video retrieval. However, their widespread adoption is hindered by their high
computational and memory requirements, especially when resource-constrained
systems are targeted. This paper addresses the problem of mapping X3D, a
state-of-the-art model in Human Action Recognition that achieves accuracy of
95.5\% in the UCF101 benchmark, onto any FPGA device. The proposed toolflow
generates an optimised stream-based hardware system, taking into account the
available resources and off-chip memory characteristics of the FPGA device. The
generated designs push further the current performance-accuracy pareto front,
and enable for the first time the targeting of such complex model architectures
for the Human Action Recognition task.Comment: 8 pages, 6 figures, 2 table
Image-based Text Classification using 2D Convolutional Neural Networks
We propose a new approach to text classification
in which we consider the input text as an image and apply
2D Convolutional Neural Networks to learn the local and
global semantics of the sentences from the variations of the
visual patterns of words. Our approach demonstrates that
it is possible to get semantically meaningful features from
images with text without using optical character recognition
and sequential processing pipelines, techniques that traditional
natural language processing algorithms require. To validate
our approach, we present results for two applications: text
classification and dialog modeling. Using a 2D Convolutional
Neural Network, we were able to outperform the state-ofart
accuracy results for a Chinese text classification task and
achieved promising results for seven English text classification
tasks. Furthermore, our approach outperformed the memory
networks without match types when using out of vocabulary
entities from Task 4 of the bAbI dialog dataset
Comparing CNN and Human Crafted Features for Human Activity Recognition
Deep learning techniques such as Convolutional
Neural Networks (CNNs) have shown good results in activity
recognition. One of the advantages of using these methods resides
in their ability to generate features automatically. This ability
greatly simplifies the task of feature extraction that usually
requires domain specific knowledge, especially when using big
data where data driven approaches can lead to anti-patterns.
Despite the advantage of this approach, very little work has
been undertaken on analyzing the quality of extracted features,
and more specifically on how model architecture and parameters
affect the ability of those features to separate activity classes
in the final feature space. This work focuses on identifying the
optimal parameters for recognition of simple activities applying
this approach on both signals from inertial and audio sensors.
The paper provides the following contributions: (i) a comparison
of automatically extracted CNN features with gold standard
Human Crafted Features (HCF) is given, (ii) a comprehensive
analysis on how architecture and model parameters affect separation
of target classes in the feature space. Results are evaluated
using publicly available datasets. In particular, we achieved a
93.38% F-Score on the UCI-HAR dataset, using 1D CNNs with
3 convolutional layers and 32 kernel size, and a 90.5% F-Score
on the DCASE 2017 development dataset, simplified for three
classes (indoor, outdoor and vehicle), using 2D CNNs with 2
convolutional layers and a 2x2 kernel size
NEMESYS: Enhanced Network Security for Seamless Service Provisioning in the Smart Mobile Ecosystem
As a consequence of the growing popularity of smart mobile devices, mobile
malware is clearly on the rise, with attackers targeting valuable user
information and exploiting vulnerabilities of the mobile ecosystems. With the
emergence of large-scale mobile botnets, smartphones can also be used to launch
attacks on mobile networks. The NEMESYS project will develop novel security
technologies for seamless service provisioning in the smart mobile ecosystem,
and improve mobile network security through better understanding of the threat
landscape. NEMESYS will gather and analyze information about the nature of
cyber-attacks targeting mobile users and the mobile network so that appropriate
counter-measures can be taken. We will develop a data collection infrastructure
that incorporates virtualized mobile honeypots and a honeyclient, to gather,
detect and provide early warning of mobile attacks and better understand the
modus operandi of cyber-criminals that target mobile devices. By correlating
the extracted information with the known patterns of attacks from wireline
networks, we will reveal and identify trends in the way that cyber-criminals
launch attacks against mobile devices.Comment: Accepted for publication in Proceedings of the 28th International
Symposium on Computer and Information Sciences (ISCIS'13); 9 pages; 1 figur
A novel framework for retrieval and interactive visualization of multimodal data
With the abundance of multimedia in web databases and the increasing user need for content of many modalities, such as images, sounds, etc. , new methods for retrieval and visualization of multimodal media are required. In this paper, novel techniques for retrieval and visualization of multimodal data, i. e. documents consisting of many modalities, are proposed. A novel cross-modal retrieval framework is presented, in which the results of several unimodal retrieval systems are fused into a single multimodal list by the introduction of a cross-modal distance. For the presentation of the retrieved results, a multimodal visualization framework is also proposed, which extends existing unimodal similarity-based visualization methods for multimodal data. The similarity measure between two multimodal objects is defined as the weighted sum of unimodal similarities, with the weights determined via an interactive user feedback scheme. Experimental results show that the cross-modal framework outperforms unimodal and other multimodal approaches while the visualization framework enhances existing visualization methods by efficiently exploiting multimodality and user feedback
A Multi-Class Intrusion Detection System Based on Continual Learning
With the proliferation of smart devices, network security has become crucial to protect systems and data. In order to identify and categorise different network threats, this study introduces a flow-based Network Intrusion Detection System (NIDS) based on continual learning with a CNN backbone. Using the LYCOS-IDS2017 dataset, the study explores several continuous learning techniques for identifying threats including denial-of-service and SQL injection. Unlike previous approaches, this work treats intrusion detection as a multi-class classification problem, rather than anomaly detection. The findings show how continuously learning models may identify network intrusions with high recall rates and accuracy while generating few false alarms. This study contributes to the development of an adaptive NIDS that can handle attack classification simultaneously with detection, and that can be trained online without periodic offline training. Additionally, utilising the improved version of the dataset adds value to the research on LYCOS-IDS2017 by presenting results for untested models
Audio Content Analysis for Unobtrusive Event Detection in Smart Homes
Institute of Engineering Sciences
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Environmental sound signals are multi-source, heterogeneous, and varying in time. Many systems have been proposed to process such signals for event detection in ambient assisted living applications. Typically, these systems use feature extraction, selection, and classification. However, despite major advances, several important questions remain unanswered, especially in real-world settings. This paper contributes to the body of knowledge in the field by addressing the following problems for ambient sounds recorded in various real-world kitchen environments: 1) which features and which classifiers are most suitable in the
presence of background noise? 2) what is the effect of signal duration on recognition accuracy? 3) how do the signal-to-noise-ratio and the distance between the microphone and the audio source affect the recognition accuracy in an environment in which the system was not trained? We show that for systems that use traditional classifiers, it is beneficial to combine gammatone frequency cepstral coefficients and discrete wavelet transform coefficients and to use a gradient boosting classifier. For systems based on deep learning, we consider 1D and 2D Convolutional Neural Networks (CNN) using mel-spectrogram energies
and mel-spectrograms images, as inputs, respectively and show that the 2D CNN outperforms the 1D CNN. We obtained competitive classification results for two such systems. The first one, which uses a gradient boosting classifier,
achieved an F1-Score of 90.2% and a recognition accuracy of 91.7%. The second
one, which uses a 2D CNN with mel-spectrogram images, achieved an F1-Score
of 92.7% and a recognition accuracy of 96%
HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Devices
For Human Action Recognition tasks (HAR), 3D Convolutional Neural Networks
have proven to be highly effective, achieving state-of-the-art results. This
study introduces a novel streaming architecture based toolflow for mapping such
models onto FPGAs considering the model's inherent characteristics and the
features of the targeted FPGA device. The HARFLOW3D toolflow takes as input a
3D CNN in ONNX format and a description of the FPGA characteristics, generating
a design that minimizes the latency of the computation. The toolflow is
comprised of a number of parts, including i) a 3D CNN parser, ii) a performance
and resource model, iii) a scheduling algorithm for executing 3D models on the
generated hardware, iv) a resource-aware optimization engine tailored for 3D
models, v) an automated mapping to synthesizable code for FPGAs. The ability of
the toolflow to support a broad range of models and devices is shown through a
number of experiments on various 3D CNN and FPGA system pairs. Furthermore, the
toolflow has produced high-performing results for 3D CNN models that have not
been mapped to FPGAs before, demonstrating the potential of FPGA-based systems
in this space. Overall, HARFLOW3D has demonstrated its ability to deliver
competitive latency compared to a range of state-of-the-art hand-tuned
approaches being able to achieve up to 5 better performance compared to
some of the existing works.Comment: 11 pages, 8 figures, 6 table
- …