3,586 research outputs found
Spiking neural networks trained with backpropagation for low power neuromorphic implementation of voice activity detection
Recent advances in Voice Activity Detection (VAD) are driven by artificial
and Recurrent Neural Networks (RNNs), however, using a VAD system in
battery-operated devices requires further power efficiency. This can be
achieved by neuromorphic hardware, which enables Spiking Neural Networks (SNNs)
to perform inference at very low energy consumption. Spiking networks are
characterized by their ability to process information efficiently, in a sparse
cascade of binary events in time called spikes. However, a big performance gap
separates artificial from spiking networks, mostly due to a lack of powerful
SNN training algorithms. To overcome this problem we exploit an SNN model that
can be recast into an RNN-like model and trained with known deep learning
techniques. We describe an SNN training procedure that achieves low spiking
activity and pruning algorithms to remove 85% of the network connections with
no performance loss. The model achieves state-of-the-art performance with a
fraction of power consumption comparing to other methods.Comment: 5 pages, 2 figures, 2 table
DeepTMH: Multimodal Semi-supervised framework leveraging Affective and Cognitive engagement for Telemental Health
To aid existing telemental health services, we propose DeepTMH, a novel
framework that models telemental health session videos by extracting latent
vectors corresponding to Affective and Cognitive features frequently used in
psychology literature. Our approach leverages advances in semi-supervised
learning to tackle the data scarcity in the telemental health session video
domain and consists of a multimodal semi-supervised GAN to detect important
mental health indicators during telemental health sessions. We demonstrate the
usefulness of our framework and contrast against existing works in two tasks:
Engagement regression and Valence-Arousal regression, both of which are
important to psychologists during a telemental health session. Our framework
reports 40% improvement in RMSE over SOTA method in Engagement Regression and
50% improvement in RMSE over SOTA method in Valence-Arousal Regression. To
tackle the scarcity of publicly available datasets in telemental health space,
we release a new dataset, MEDICA, for mental health patient engagement
detection. Our dataset, MEDICA consists of 1299 videos, each 3 seconds long. To
the best of our knowledge, our approach is the first method to model telemental
health session data based on psychology-driven Affective and Cognitive
features, which also accounts for data sparsity by leveraging a semi-supervised
setup
Segmentation of Speech and Humming in Vocal Input
Non-verbal vocal interaction (NVVI) is an interaction method in which sounds other than speech produced by a human are used, such as humming. NVVI complements traditional speech recognition systems with continuous control. In order to combine the two approaches (e.g. "volume up, mmm") it is necessary to perform a speech/NVVI segmentation of the input sound signal. This paper presents two novel methods of speech and humming segmentation. The first method is based on classification of MFCC and RMS parameters using a neural network (MFCC method), while the other method computes volume changes in the signal (IAC method). The two methods are compared using a corpus collected from 13 speakers. The results indicate that the MFCC method outperforms IAC in terms of accuracy, precision, and recall
Machine Learning and Deep Learning Approaches for Brain Disease Diagnosis : Principles and Recent Advances
This work was supported in part by the National Research Foundation of Korea-Grant funded by the Korean Government (Ministry of Science and ICT) under Grant NRF 2020R1A2B5B02002478, and in part by Sejong University through its Faculty Research Program under Grant 20212023.Peer reviewedPublisher PD
Corporation robots
Nowadays, various robots are built to perform multiple tasks. Multiple robots working
together to perform a single task becomes important. One of the key elements for multiple
robots to work together is the robot need to able to follow another robot. This project is
mainly concerned on the design and construction of the robots that can follow line. In this
project, focuses on building line following robots leader and slave. Both of these robots will
follow the line and carry load. A Single robot has a limitation on handle load capacity such as
cannot handle heavy load and cannot handle long size load. To overcome this limitation an
easier way is to have a groups of mobile robots working together to accomplish an aim that
no single robot can do alon
A Speech Quality Classifier based on Tree-CNN Algorithm that Considers Network Degradations
Many factors can affect the users’ quality of experience (QoE) in speech communication services. The impairment factors appear due to physical phenomena that occur in the transmission channel of wireless and wired networks. The monitoring of users’ QoE is important for service providers. In this context, a non-intrusive speech quality classifier based on the Tree Convolutional Neural Network (Tree-CNN) is proposed. The Tree-CNN is an adaptive network structure composed of hierarchical CNNs models, and its main advantage is to decrease the training time that is very relevant on speech quality assessment methods. In the training phase of the proposed classifier model, impaired speech signals caused by wired and wireless network degradation are used as input. Also, in the network scenario, different modulation schemes and channel degradation intensities, such as packet loss rate, signal-to-noise ratio, and maximum Doppler shift frequencies are implemented. Experimental results demonstrated that the proposed model achieves significant reduction of training time, reaching 25% of reduction in relation to another implementation based on DRBM. The accuracy reached by the Tree-CNN model is almost 95% for each quality class. Performance assessment results show that the proposed classifier based on the Tree-CNN overcomes both the current standardized algorithm described in ITU-T Rec. P.563 and the speech quality assessment method called ViSQOL
- …