63,972 research outputs found

    Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection

    Get PDF
    Speech Activity Detection (SAD) plays an important role in mobile communications and automatic speech recognition (ASR). Developing efficient SAD systems for real-world applications is a challenging task due to the presence of noise. We propose a new approach to SAD where we treat it as a two-dimensional multilabel image classification problem. To classify the audio segments, we compute their Short-time Fourier Transform spectrograms and classify them with a Convolutional Recurrent Neural Network (CRNN), traditionally used in image recognition. Our CRNN uses a sigmoid activation function, max-pooling in the frequency domain, and a convolutional operation as a moving average filter to remove misclassified spikes. On the development set of Task 1 of the 2019 Fearless Steps Challenge, our system achieved a decision cost function (DCF) of 2.89%, a 66.4% improvement over the baseline. Moreover, it achieved a DCF score of 3.318% on the evaluation dataset of the challenge, ranking first among all submissions

    Fog Computing in Medical Internet-of-Things: Architecture, Implementation, and Applications

    Full text link
    In the era when the market segment of Internet of Things (IoT) tops the chart in various business reports, it is apparently envisioned that the field of medicine expects to gain a large benefit from the explosion of wearables and internet-connected sensors that surround us to acquire and communicate unprecedented data on symptoms, medication, food intake, and daily-life activities impacting one's health and wellness. However, IoT-driven healthcare would have to overcome many barriers, such as: 1) There is an increasing demand for data storage on cloud servers where the analysis of the medical big data becomes increasingly complex, 2) The data, when communicated, are vulnerable to security and privacy issues, 3) The communication of the continuously collected data is not only costly but also energy hungry, 4) Operating and maintaining the sensors directly from the cloud servers are non-trial tasks. This book chapter defined Fog Computing in the context of medical IoT. Conceptually, Fog Computing is a service-oriented intermediate layer in IoT, providing the interfaces between the sensors and cloud servers for facilitating connectivity, data transfer, and queryable local database. The centerpiece of Fog computing is a low-power, intelligent, wireless, embedded computing node that carries out signal conditioning and data analytics on raw data collected from wearables or other medical sensors and offers efficient means to serve telehealth interventions. We implemented and tested an fog computing system using the Intel Edison and Raspberry Pi that allows acquisition, computing, storage and communication of the various medical data such as pathological speech data of individuals with speech disorders, Phonocardiogram (PCG) signal for heart rate estimation, and Electrocardiogram (ECG)-based Q, R, S detection.Comment: 29 pages, 30 figures, 5 tables. Keywords: Big Data, Body Area Network, Body Sensor Network, Edge Computing, Fog Computing, Medical Cyberphysical Systems, Medical Internet-of-Things, Telecare, Tele-treatment, Wearable Devices, Chapter in Handbook of Large-Scale Distributed Computing in Smart Healthcare (2017), Springe

    SecMon: End-to-End Quality and Security Monitoring System

    Get PDF
    The Voice over Internet Protocol (VoIP) is becoming a more available and popular way of communicating for Internet users. This also applies to Peer-to-Peer (P2P) systems and merging these two have already proven to be successful (e.g. Skype). Even the existing standards of VoIP provide an assurance of security and Quality of Service (QoS), however, these features are usually optional and supported by limited number of implementations. As a result, the lack of mandatory and widely applicable QoS and security guaranties makes the contemporary VoIP systems vulnerable to attacks and network disturbances. In this paper we are facing these issues and propose the SecMon system, which simultaneously provides a lightweight security mechanism and improves quality parameters of the call. SecMon is intended specially for VoIP service over P2P networks and its main advantage is that it provides authentication, data integrity services, adaptive QoS and (D)DoS attack detection. Moreover, the SecMon approach represents a low-bandwidth consumption solution that is transparent to the users and possesses a self-organizing capability. The above-mentioned features are accomplished mainly by utilizing two information hiding techniques: digital audio watermarking and network steganography. These techniques are used to create covert channels that serve as transport channels for lightweight QoS measurement's results. Furthermore, these metrics are aggregated in a reputation system that enables best route path selection in the P2P network. The reputation system helps also to mitigate (D)DoS attacks, maximize performance and increase transmission efficiency in the network.Comment: Paper was presented at 7th international conference IBIZA 2008: On Computer Science - Research And Applications, Poland, Kazimierz Dolny 31.01-2.02 2008; 14 pages, 5 figure

    Polyphonic Sound Event Detection by using Capsule Neural Networks

    Full text link
    Artificial sound event detection (SED) has the aim to mimic the human ability to perceive and understand what is happening in the surroundings. Nowadays, Deep Learning offers valuable techniques for this goal such as Convolutional Neural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has been recently introduced in the image processing field with the intent to overcome some of the known limitations of CNNs, specifically regarding the scarce robustness to affine transformations (i.e., perspective, size, orientation) and the detection of overlapped images. This motivated the authors to employ CapsNets to deal with the polyphonic-SED task, in which multiple sound events occur simultaneously. Specifically, we propose to exploit the capsule units to represent a set of distinctive properties for each individual sound event. Capsule units are connected through a so-called "dynamic routing" that encourages learning part-whole relationships and improves the detection performance in a polyphonic context. This paper reports extensive evaluations carried out on three publicly available datasets, showing how the CapsNet-based algorithm not only outperforms standard CNNs but also allows to achieve the best results with respect to the state of the art algorithms
    corecore