63,972 research outputs found
Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection
Speech Activity Detection (SAD) plays an important role in mobile communications and automatic speech recognition (ASR). Developing efficient SAD systems for real-world applications is a challenging task due to the presence of noise. We propose a new approach to SAD where we treat it as a two-dimensional multilabel image classification problem. To classify the audio segments, we compute their Short-time Fourier Transform spectrograms and classify them with a Convolutional Recurrent Neural Network (CRNN), traditionally used in image recognition. Our CRNN uses a sigmoid activation function, max-pooling in the frequency domain, and a convolutional operation as a moving average filter to remove misclassified spikes. On the development set of Task 1 of the 2019 Fearless Steps Challenge, our system achieved a decision cost function (DCF) of 2.89%, a 66.4% improvement over the baseline. Moreover, it achieved a DCF score of 3.318% on the evaluation dataset of the challenge, ranking first among all submissions
Fog Computing in Medical Internet-of-Things: Architecture, Implementation, and Applications
In the era when the market segment of Internet of Things (IoT) tops the chart
in various business reports, it is apparently envisioned that the field of
medicine expects to gain a large benefit from the explosion of wearables and
internet-connected sensors that surround us to acquire and communicate
unprecedented data on symptoms, medication, food intake, and daily-life
activities impacting one's health and wellness. However, IoT-driven healthcare
would have to overcome many barriers, such as: 1) There is an increasing demand
for data storage on cloud servers where the analysis of the medical big data
becomes increasingly complex, 2) The data, when communicated, are vulnerable to
security and privacy issues, 3) The communication of the continuously collected
data is not only costly but also energy hungry, 4) Operating and maintaining
the sensors directly from the cloud servers are non-trial tasks. This book
chapter defined Fog Computing in the context of medical IoT. Conceptually, Fog
Computing is a service-oriented intermediate layer in IoT, providing the
interfaces between the sensors and cloud servers for facilitating connectivity,
data transfer, and queryable local database. The centerpiece of Fog computing
is a low-power, intelligent, wireless, embedded computing node that carries out
signal conditioning and data analytics on raw data collected from wearables or
other medical sensors and offers efficient means to serve telehealth
interventions. We implemented and tested an fog computing system using the
Intel Edison and Raspberry Pi that allows acquisition, computing, storage and
communication of the various medical data such as pathological speech data of
individuals with speech disorders, Phonocardiogram (PCG) signal for heart rate
estimation, and Electrocardiogram (ECG)-based Q, R, S detection.Comment: 29 pages, 30 figures, 5 tables. Keywords: Big Data, Body Area
Network, Body Sensor Network, Edge Computing, Fog Computing, Medical
Cyberphysical Systems, Medical Internet-of-Things, Telecare, Tele-treatment,
Wearable Devices, Chapter in Handbook of Large-Scale Distributed Computing in
Smart Healthcare (2017), Springe
SecMon: End-to-End Quality and Security Monitoring System
The Voice over Internet Protocol (VoIP) is becoming a more available and
popular way of communicating for Internet users. This also applies to
Peer-to-Peer (P2P) systems and merging these two have already proven to be
successful (e.g. Skype). Even the existing standards of VoIP provide an
assurance of security and Quality of Service (QoS), however, these features are
usually optional and supported by limited number of implementations. As a
result, the lack of mandatory and widely applicable QoS and security guaranties
makes the contemporary VoIP systems vulnerable to attacks and network
disturbances. In this paper we are facing these issues and propose the SecMon
system, which simultaneously provides a lightweight security mechanism and
improves quality parameters of the call. SecMon is intended specially for VoIP
service over P2P networks and its main advantage is that it provides
authentication, data integrity services, adaptive QoS and (D)DoS attack
detection. Moreover, the SecMon approach represents a low-bandwidth consumption
solution that is transparent to the users and possesses a self-organizing
capability. The above-mentioned features are accomplished mainly by utilizing
two information hiding techniques: digital audio watermarking and network
steganography. These techniques are used to create covert channels that serve
as transport channels for lightweight QoS measurement's results. Furthermore,
these metrics are aggregated in a reputation system that enables best route
path selection in the P2P network. The reputation system helps also to mitigate
(D)DoS attacks, maximize performance and increase transmission efficiency in
the network.Comment: Paper was presented at 7th international conference IBIZA 2008: On
Computer Science - Research And Applications, Poland, Kazimierz Dolny
31.01-2.02 2008; 14 pages, 5 figure
Polyphonic Sound Event Detection by using Capsule Neural Networks
Artificial sound event detection (SED) has the aim to mimic the human ability
to perceive and understand what is happening in the surroundings. Nowadays,
Deep Learning offers valuable techniques for this goal such as Convolutional
Neural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has
been recently introduced in the image processing field with the intent to
overcome some of the known limitations of CNNs, specifically regarding the
scarce robustness to affine transformations (i.e., perspective, size,
orientation) and the detection of overlapped images. This motivated the authors
to employ CapsNets to deal with the polyphonic-SED task, in which multiple
sound events occur simultaneously. Specifically, we propose to exploit the
capsule units to represent a set of distinctive properties for each individual
sound event. Capsule units are connected through a so-called "dynamic routing"
that encourages learning part-whole relationships and improves the detection
performance in a polyphonic context. This paper reports extensive evaluations
carried out on three publicly available datasets, showing how the CapsNet-based
algorithm not only outperforms standard CNNs but also allows to achieve the
best results with respect to the state of the art algorithms
- …