3,997 research outputs found
Spiking neural networks trained with backpropagation for low power neuromorphic implementation of voice activity detection
Recent advances in Voice Activity Detection (VAD) are driven by artificial
and Recurrent Neural Networks (RNNs), however, using a VAD system in
battery-operated devices requires further power efficiency. This can be
achieved by neuromorphic hardware, which enables Spiking Neural Networks (SNNs)
to perform inference at very low energy consumption. Spiking networks are
characterized by their ability to process information efficiently, in a sparse
cascade of binary events in time called spikes. However, a big performance gap
separates artificial from spiking networks, mostly due to a lack of powerful
SNN training algorithms. To overcome this problem we exploit an SNN model that
can be recast into an RNN-like model and trained with known deep learning
techniques. We describe an SNN training procedure that achieves low spiking
activity and pruning algorithms to remove 85% of the network connections with
no performance loss. The model achieves state-of-the-art performance with a
fraction of power consumption comparing to other methods.Comment: 5 pages, 2 figures, 2 table
Improving recognition accuracy on CVSD speech under mismatched conditions
Emerging technology in mobile communications is seeing increasingly high acceptance as a preferred choice for last-mile communication. There have been a wide range of techniques to achieve signal compression to suit to the smaller bandwidths available on mobile communication channels; but speech recognition methods have seen success mostly only in controlled speech environments. However, designing of speech recognition systems for mobile communications is crucial in order to provide voice enabled command and control and for applications like Mobile Voice Commerce. Continuously Variable Slope Delta (CVSD) modulation, a technique for low bitrate coding of speech, has been in use particularly in military wireless environments for over 30 years, and is now also adopted by BlueTooth. CVSD is particularly suitable for Internet and mobile environments due to its robustness against transmission errors, and simplicity of implementation and the absence of a need for synchronization. In this paper, we study some characteristics of the CVSD speech in the context of robust recognition of compressed speech, and present two methods of improving the recognition accuracy in Automatic Speech Recognition (ASR) systems. We study the characteristics of the features extracted for ASR and how they relate to the corresponding features computed from Pulse Coded Modulation (PCM) speech and apply this relation to correct the CVSD features to improve recognition accuracy. Secondly we show that the ASR done on bit-streams directly, gives a good recognition accuracy and when combined with our approach gives a better accuracy
Recommended from our members
Evaluation and analysis of hybrid intelligent pattern recognition techniques for speaker identification
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem
of identifying a speaker from its voice regardless of the content (i.e.
text-independent), and to design efficient methods of combining face and voice in producing a robust authentication system.
A novel approach towards speaker identification is developed using
wavelet analysis, and multiple neural networks including Probabilistic
Neural Network (PNN), General Regressive Neural Network (GRNN)and Radial Basis Function-Neural Network (RBF NN) with the AND
voting scheme. This approach is tested on GRID and VidTIMIT cor-pora and comprehensive test results have been validated with state-
of-the-art approaches. The system was found to be competitive and it improved the recognition rate by 15% as compared to the classical Mel-frequency Cepstral Coe±cients (MFCC), and reduced the recognition time by 40% compared to Back Propagation Neural Network (BPNN), Gaussian Mixture Models (GMM) and Principal Component Analysis (PCA).
Another novel approach using vowel formant analysis is implemented using Linear Discriminant Analysis (LDA). Vowel formant based speaker identification is best suitable for real-time implementation and requires only a few bytes of information to be stored for each speaker, making it both storage and time efficient. Tested on GRID and Vid-TIMIT, the proposed scheme was found to be 85.05% accurate when Linear Predictive Coding (LPC) is used to extract the vowel formants, which is much higher than the accuracy of BPNN and GMM. Since the proposed scheme does not require any training time other than creating a small database of vowel formants, it is faster as well. Furthermore, an increasing number of speakers makes it di±cult for BPNN and GMM to sustain their accuracy, but the proposed score-based methodology stays almost linear.
Finally, a novel audio-visual fusion based identification system is implemented using GMM and MFCC for speaker identi¯cation and PCA for face recognition. The results of speaker identification and face recognition are fused at different levels, namely the feature, score and decision levels. Both the score-level and decision-level (with OR voting) fusions were shown to outperform the feature-level fusion in terms of accuracy and error resilience. The result is in line with the distinct nature of the two modalities which lose themselves when combined at the feature-level. The GRID and VidTIMIT test results validate that
the proposed scheme is one of the best candidates for the fusion of
face and voice due to its low computational time and high recognition accuracy
A Survey on Semantic Communications for Intelligent Wireless Networks
With deployment of 6G technology, it is envisioned that competitive edge of
wireless networks will be sustained and next decade's communication
requirements will be stratified. Also 6G will aim to aid development of a human
society which is ubiquitous and mobile, simultaneously providing solutions to
key challenges such as, coverage, capacity, etc. In addition, 6G will focus on
providing intelligent use-cases and applications using higher data-rates over
mill-meter waves and Tera-Hertz frequency. However, at higher frequencies
multiple non-desired phenomena such as atmospheric absorption, blocking, etc.,
occur which create a bottleneck owing to resource (spectrum and energy)
scarcity. Hence, following same trend of making efforts towards reproducing at
receiver, exact information which was sent by transmitter, will result in a
never ending need for higher bandwidth. A possible solution to such a challenge
lies in semantic communications which focuses on meaning (context) of received
data as opposed to only reproducing correct transmitted data. This in turn will
require less bandwidth, and will reduce bottleneck due to various undesired
phenomenon. In this respect, current article presents a detailed survey on
recent technological trends in regard to semantic communications for
intelligent wireless networks. We focus on semantic communications architecture
including model, and source and channel coding. Next, we detail cross-layer
interaction, and various goal-oriented communication applications. We also
present overall semantic communications trends in detail, and identify
challenges which need timely solutions before practical implementation of
semantic communications within 6G wireless technology. Our survey article is an
attempt to significantly contribute towards initiating future research
directions in area of semantic communications for intelligent 6G wireless
networks
IDENTIFICATION OF COVER SONGS USING INFORMATION THEORETIC MEASURES OF SIMILARITY
13 pages, 5 figures, 4 tables. v3: Accepted version13 pages, 5 figures, 4 tables. v3: Accepted version13 pages, 5 figures, 4 tables. v3: Accepted versio
- …