Search CORE

3,586 research outputs found

Spiking neural networks trained with backpropagation for low power neuromorphic implementation of voice activity detection

Author: Cernak Milos
Dellaferrera Giorgia
Mainar Pablo
Martinelli Flavio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/04/2020
Field of study

Recent advances in Voice Activity Detection (VAD) are driven by artificial and Recurrent Neural Networks (RNNs), however, using a VAD system in battery-operated devices requires further power efficiency. This can be achieved by neuromorphic hardware, which enables Spiking Neural Networks (SNNs) to perform inference at very low energy consumption. Spiking networks are characterized by their ability to process information efficiently, in a sparse cascade of binary events in time called spikes. However, a big performance gap separates artificial from spiking networks, mostly due to a lack of powerful SNN training algorithms. To overcome this problem we exploit an SNN model that can be recast into an RNN-like model and trained with known deep learning techniques. We describe an SNN training procedure that achieves low spiking activity and pruning algorithms to remove 85% of the network connections with no performance loss. The model achieves state-of-the-art performance with a fraction of power consumption comparing to other methods.Comment: 5 pages, 2 figures, 2 table

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

DeepTMH: Multimodal Semi-supervised framework leveraging Affective and Cognitive engagement for Telemental Health

Author: Agarwal Manas
Awasthi Naman
Bera Aniket
Bussell Kristin
Das Ritwika
Guhan Pooja
Manocha Dinesh
McDonald Kathryn
Reeves Gloria
Publication venue
Publication date: 26/11/2021
Field of study

To aid existing telemental health services, we propose DeepTMH, a novel framework that models telemental health session videos by extracting latent vectors corresponding to Affective and Cognitive features frequently used in psychology literature. Our approach leverages advances in semi-supervised learning to tackle the data scarcity in the telemental health session video domain and consists of a multimodal semi-supervised GAN to detect important mental health indicators during telemental health sessions. We demonstrate the usefulness of our framework and contrast against existing works in two tasks: Engagement regression and Valence-Arousal regression, both of which are important to psychologists during a telemental health session. Our framework reports 40% improvement in RMSE over SOTA method in Engagement Regression and 50% improvement in RMSE over SOTA method in Valence-Arousal Regression. To tackle the scarcity of publicly available datasets in telemental health space, we release a new dataset, MEDICA, for mental health patient engagement detection. Our dataset, MEDICA consists of 1299 videos, each 3 seconds long. To the best of our knowledge, our approach is the first method to model telemental health session data based on psychology-driven Affective and Cognitive features, which also accounts for data sparsity by leveraging a semi-supervised setup

arXiv.org e-Print Archive

Segmentation of Speech and Humming in Vocal Input

Author: Havlik J.
Polacek O.
Sporka A. J.
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/01/2012
Field of study

Non-verbal vocal interaction (NVVI) is an interaction method in which sounds other than speech produced by a human are used, such as humming. NVVI complements traditional speech recognition systems with continuous control. In order to combine the two approaches (e.g. "volume up, mmm") it is necessary to perform a speech/NVVI segmentation of the input sound signal. This paper presents two novel methods of speech and humming segmentation. The first method is based on classification of MFCC and RMS parameters using a neural network (MFCC method), while the other method computes volume changes in the signal (IAC method). The two methods are compared using a corpus collected from 13 speakers. The results indicate that the MFCC method outperforms IAC in terms of accuracy, precision, and recall

CiteSeerX

Directory of Open Access Journals

Digital library of Brno University of Technology

Machine Learning and Deep Learning Approaches for Brain Disease Diagnosis : Principles and Recent Advances

Author: Islam S. M.Riazul
Kader Md Fazlul
Kamal Md Shahriar
Khan Protima
Kwak Kyung Sup
Rahman Aisha B.
Toha Masbah Uddin
Publication venue
Publication date: 26/02/2021
Field of study

This work was supported in part by the National Research Foundation of Korea-Grant funded by the Korean Government (Ministry of Science and ICT) under Grant NRF 2020R1A2B5B02002478, and in part by Sejong University through its Faculty Research Program under Grant 20212023.Peer reviewedPublisher PD

Aberdeen University Research

Corporation robots

Author: Mansour Saleh Faraj
Publication venue
Publication date: 01/05/2011
Field of study

Nowadays, various robots are built to perform multiple tasks. Multiple robots working together to perform a single task becomes important. One of the key elements for multiple robots to work together is the robot need to able to follow another robot. This project is mainly concerned on the design and construction of the robots that can follow line. In this project, focuses on building line following robots leader and slave. Both of these robots will follow the line and carry load. A Single robot has a limitation on handle load capacity such as cannot handle heavy load and cannot handle long size load. To overcome this limitation an easier way is to have a groups of mobile robots working together to accomplish an aim that no single robot can do alon

UTHM Institutional Repository

A Speech Quality Classifier based on Tree-CNN Algorithm that Considers Network Degradations

Author: Demóstenes Zegarra Rodríguez
Renata Lopes Rosa
Samuel Terra Vieira
Publication venue: 'Croatian Communications and Information Society'
Publication date: 01/01/2020
Field of study

Many factors can affect the users’ quality of experience (QoE) in speech communication services. The impairment factors appear due to physical phenomena that occur in the transmission channel of wireless and wired networks. The monitoring of users’ QoE is important for service providers. In this context, a non-intrusive speech quality classifier based on the Tree Convolutional Neural Network (Tree-CNN) is proposed. The Tree-CNN is an adaptive network structure composed of hierarchical CNNs models, and its main advantage is to decrease the training time that is very relevant on speech quality assessment methods. In the training phase of the proposed classifier model, impaired speech signals caused by wired and wireless network degradation are used as input. Also, in the network scenario, different modulation schemes and channel degradation intensities, such as packet loss rate, signal-to-noise ratio, and maximum Doppler shift frequencies are implemented. Experimental results demonstrated that the proposed model achieves significant reduction of training time, reaching 25% of reduction in relation to another implementation based on DRBM. The accuracy reached by the Tree-CNN model is almost 95% for each quality class. Performance assessment results show that the proposed classifier based on the Tree-CNN overcomes both the current standardized algorithm described in ITU-T Rec. P.563 and the speech quality assessment method called ViSQOL

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia