30,638 research outputs found
Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network
This paper proposes to use low-level spatial features extracted from
multichannel audio for sound event detection. We extend the convolutional
recurrent neural network to handle more than one type of these multichannel
features by learning from each of them separately in the initial stages. We
show that instead of concatenating the features of each channel into a single
feature vector the network learns sound events in multichannel audio better
when they are presented as separate layers of a volume. Using the proposed
spatial features over monaural features on the same network gives an absolute
F-score improvement of 6.1% on the publicly available TUT-SED 2016 dataset and
2.7% on the TUT-SED 2009 dataset that is fifteen times larger.Comment: Accepted for IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP 2017
Speech and crosstalk detection in multichannel audio
The analysis of scenarios in which a number of microphones record the activity of speakers, such as in a round-table meeting, presents a number of computational challenges. For example, if each participant wears a microphone, speech from both the microphone's wearer (local speech) and from other participants (crosstalk) is received. The recorded audio can be broadly classified in four ways: local speech, crosstalk plus local speech, crosstalk alone and silence. We describe two experiments related to the automatic classification of audio into these four classes. The first experiment attempted to optimize a set of acoustic features for use with a Gaussian mixture model (GMM) classifier. A large set of potential acoustic features were considered, some of which have been employed in previous studies. The best-performing features were found to be kurtosis, "fundamentalness," and cross-correlation metrics. The second experiment used these features to train an ergodic hidden Markov model classifier. Tests performed on a large corpus of recorded meetings show classification accuracies of up to 96%, and automatic speech recognition performance close to that obtained using ground truth segmentation
New receivers for DS-SS in time variant multipath channels based on the PN alignment concept
We present new combined blind equalization and detection schemes for a DS-SS system. The new proposed algorithms improve the bit error rate compared to traditional RAKE receivers in time-variant channels with multipath. This improvement is obtained in both simulated and a real ionospheric HF link. Its very low computational complexity makes them suitable to be implemented in real receivers.Peer ReviewedPostprint (published version
- …