Search CORE

5,041 research outputs found

Language Transfer of Audio Word2Vec: Learning Audio Segment Representations without Target Language Data

Author: Lee Hung-Yi
Shen Chia-Hao
Sung Janet Y.
Publication venue
Publication date: 19/07/2017
Field of study

Audio Word2Vec offers vector representations of fixed dimensionality for variable-length audio segments using Sequence-to-sequence Autoencoder (SA). These vector representations are shown to describe the sequential phonetic structures of the audio segments to a good degree, with real world applications such as query-by-example Spoken Term Detection (STD). This paper examines the capability of language transfer of Audio Word2Vec. We train SA from one language (source language) and use it to extract the vector representation of the audio segments of another language (target language). We found that SA can still catch phonetic structure from the audio segments of the target language if the source and target languages are similar. In query-by-example STD, we obtain the vector representations from the SA learned from a large amount of source language data, and found them surpass the representations from naive encoder and SA directly learned from a small amount of target language data. The result shows that it is possible to learn Audio Word2Vec model from high-resource languages and use it on low-resource languages. This further expands the usability of Audio Word2Vec.Comment: arXiv admin note: text overlap with arXiv:1603.0098

arXiv.org e-Print Archive

Crossref

Rumba : a Python framework for automating large-scale recursive internet experiments on GENI and FIRE+

Author: Ding Guiguang
Ding Xiaohan
Guo Yuchen
Han Jungong
Yan Chenggang
Publication venue
Publication date: 01/01/2018
Field of study

It is not easy to design and run Convolutional Neural Networks (CNNs) due to: 1) finding the optimal number of filters (i.e., the width) at each layer is tricky, given an architecture; and 2) the computational intensity of CNNs impedes the deployment on computationally limited devices. Oracle Pruning is designed to remove the unimportant filters from a well-trained CNN, which estimates the filters’ importance by ablating them in turn and evaluating the model, thus delivers high accuracy but suffers from intolerable time complexity, and requires a given resulting width but cannot automatically find it. To address these problems, we propose Approximated Oracle Filter Pruning (AOFP), which keeps searching for the least important filters in a binary search manner, makes pruning attempts by masking out filters randomly, accumulates the resulting errors, and finetunes the model via a multi-path framework. As AOFP enables simultaneous pruning on multiple layers, we can prune an existing very deep CNN with acceptable time cost, negligible accuracy drop, and no heuristic knowledge, or re-design a model which exerts higher accuracy and faster inferenc

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Warwick Research Archives Portal Repository

Rumba : a Python framework for automating large-scale recursive internet experiments on GENI and FIRE+

Author: Capitani M.
Maffione V.
Staessens Dimitri
Vrijders Sander
Publication venue
Publication date: 01/01/2018
Field of study

Crossref

Ghent University Academic Bibliography

End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

Author: Busso Carlos
Tao Fei
Publication venue
Publication date: 12/09/2018
Field of study

Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio

arXiv.org e-Print Archive

Speech Recognition on an FPGA Using Discrete and Continuous Hidden Markov Models

Author: Melnikoff Stephen Jonathan
Quigley Steven Francis
Russell Martin
Publication venue: Springer Verlag
Publication date: 01/01/2002
Field of study

Speech recognition is a computationally demanding task, particularly the stage which uses Viterbi decoding for converting pre-processed speech data into words or sub-word units. Any device that can reduce the load on, for example, a PC’s processor, is advantageous. Hence we present FPGA implementations of the decoder based alternately on discrete and continuous hidden Markov models (HMMs) representing monophones, and demonstrate that the discrete version can process speech nearly 5,000 times real time, using just 12% of the slices of a Xilinx Virtex XCV1000, but with a lower recognition rate than the continuous implementation, which is 75 times faster than real time, and occupies 45% of the same device

University of Birmingham Research Portal

Recommended from our members

Improving fault coverage and minimising the cost of fault identification when testing from finite state machines

Author: Guo Qiang
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics Theses
Publication date: 01/01/2006
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Software needs to be adequately tested in order to increase the confidence that the system being developed is reliable. However, testing is a complicated and expensive process. Formal specification based models such as finite state machines have been widely used in system modelling and testing. In this PhD thesis, we primarily investigate fault detection and identification when testing from finite state machines. The research in this thesis is mainly comprised of three topics - construction of multiple Unique Input/Output (UIO) sequences using Metaheuristic Optimisation Techniques (MOTs), the improved fault coverage by using robust Unique Input/Output Circuit (UIOC) sequences, and fault diagnosis when testing from finite state machines. In the studies of the construction of UIOs, a model is proposed where a fitness function is defined to guide the search for input sequences that are potentially UIOs. In the studies of the improved fault coverage, a new type of UIOCs is defined. Based upon the Rural Chinese Postman Algorithm (RCPA), a new approach is proposed for the construction of more robust test sequences. In the studies of fault diagnosis, heuristics are defined that attempt to lead to failures being observed in some shorter test sequences, which helps to reduce the cost of fault isolation and identification. The proposed approaches and techniques were evaluated with regard to a set of case studies, which provides experimental evidence for their efficacy.Brunel Research Initiative and Enterprise Fund (BRIEF) Award from Brunel University and Departmental bursary from Department of Information Systems and Computing, Brunel Universit

Brunel University Research Archive

Toward an architecture for quantum programming

Author: Bettelli S.
Calarco T.
Serafini L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

It is becoming increasingly clear that, if a useful device for quantum computation will ever be built, it will be embodied by a classical computing machine with control over a truly quantum subsystem, this apparatus performing a mixture of classical and quantum computation. This paper investigates a possible approach to the problem of programming such machines: a template high level quantum language is presented which complements a generic general purpose classical language with a set of quantum primitives. The underlying scheme involves a run-time environment which calculates the byte-code for the quantum operations and pipes it to a quantum device controller or to a simulator. This language can compactly express existing quantum algorithms and reduce them to sequences of elementary operations; it also easily lends itself to automatic, hardware independent, circuit simplification. A publicly available preliminary implementation of the proposed ideas has been realized using the C++ language.Comment: 23 pages, 5 figures, A4paper. Final version accepted by EJPD ("swap" replaced by "invert" for Qops). Preliminary implementation available at: http://sra.itc.it/people/serafini/quantum-computing/qlang.htm

arXiv.org e-Print Archive

CiteSeerX

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Archivio della ricerca - Fondazione Bruno Kessler