36 research outputs found
Investigating techniques for low resource conversational speech recognition
International audienceIn this paper we investigate various techniques in order to build effective speech to text (STT) and keyword search (KWS) systems for low resource conversational speech. Sub-word decoding and graphemic mappings were assessed in order to detect out-of-vocabulary keywords. To deal with the limited amount of transcribed data, semi-supervised training and data selection methods were investigated. Robust acoustic features produced via data augmentation were evaluated for acoustic modeling. For language modeling, automatically retrieved conversational-like Webdata was used, as well as neural network based models. We report STT improvements with all the techniques, but interestingly only some improve KWS performance. Results are reported for the Swahili language in the context of the 2015 OpenKWS Evaluation
Recommended from our members
Investigation of multilingual deep neural networks for spoken term detection
The development of high-performance speech processing systems for low-resource languages is a challenging area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speech-to-text (STT) systems. This paper presents an investigation into the application of these multilingual approaches to spoken term detection. Experiments were run using the IARPA Babel limited language pack corpora (âŒ10 hours/language) with 4 languages for initial multilingual system development and an additional held-out target language. STT gains achieved through using multilingual bottleneck features in a Tandem configuration are shown to also apply to keyword search (KWS). Further improvements in both STT and KWS were observed by incorporating language questions into the Tandem GMM-HMM decision trees for the training set languages. Adapted hybrid systems performed slightly worse on average than the adapted Tandem systems. A language independent acoustic model test on the target language showed that retraining or adapting of the acoustic models to the target language is currently minimally needed to achieve reasonable performance. © 2013 IEEE
Unicode-based graphemic systems for limited resource languages
© 2015 IEEE. Large vocabulary continuous speech recognition systems require a mapping from words, or tokens, into sub-word units to enable robust estimation of acoustic model parameters, and to model words not seen in the training data. The standard approach to achieve this is to manually generate a lexicon where words are mapped into phones, often with attributes associated with each of these phones. Contextdependent acoustic models are then constructed using decision trees where questions are asked based on the phones and phone attributes. For low-resource languages, it may not be practical to manually generate a lexicon. An alternative approach is to use a graphemic lexicon, where the 'pronunciation' for a word is defined by the letters forming that word. This paper proposes a simple approach for building graphemic systems for any language written in unicode. The attributes for graphemes are automatically derived using features from the unicode character descriptions. These attributes are then used in decision tree construction. This approach is examined on the IARPA Babel Option Period 2 languages, and a Levantine Arabic CTS task. The described approach achieves comparable, and complementary, performance to phonetic lexicon-based approaches
Subject access to OPACs: exploiting the capabilities of FileMaker Pro for designing a novel interface
Ever since the libraries came to being, subject access has had been
a problem. More often than not, subject searches result either in
no retrievals or too many records, discouraging users to proceed
further. Solutions to these problems were found in improving
search methods, indexing techniques, developing user
friendly novel interfaces and other methods. The present work
attempts to tackle the problems of subject access using an
experimental online catalogue by designing a graphic front end
user interface, wherein an enhanced indexing technique that is
traditional classification system coupled with improved search
method by providing end user thesaurus were
incorporated by using Macintosh compatible software package
called FileMaker Pro. The system provides subject access by
three methods i.e. Class Number Search (CNS), Subject Heading
Search (SHS) and Keyword Search (KWS) to cater to the needs of
two different levels of users i.e. naive or ordinary level and
another for the experienced or advanced level users. A cross
section of the searchers were invited to evaluate the interface. On
the basis of their reactions, certain recommendations were made
for the improvement of the system. In the process the capabilities
and limitations of FileMaker Pro were assessed and suggestions
were given for its further improvement. Certain points pertaining
to the further research on the subject were also recommended
Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion
The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2
(TEC2012-37585-C02-01) from the Spanish Ministry of Economy and
Competitiveness. This research was also funded by the European Regional
Development Fund, the Galician Regional Government (GRC2014/024,
âConsolidation of Research Units: AtlantTIC Projectâ CN2012/160)
Recommended from our members
Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks
Taxonomic Classification of IoT Smart Home Voice Control
Voice control in the smart home is commonplace, enabling the convenient
control of smart home Internet of Things hubs, gateways and devices, along with
information seeking dialogues. Cloud-based voice assistants are used to
facilitate the interaction, yet privacy concerns surround the cloud analysis of
data. To what extent can voice control be performed using purely local
computation, to ensure user data remains private? In this paper we present a
taxonomy of the voice control technologies present in commercial smart home
systems. We first review literature on the topic, and summarise relevant work
categorising IoT devices and voice control in the home. The taxonomic
classification of these entities is then presented, and we analyse our
findings. Following on, we turn to academic efforts in implementing and
evaluating voice-controlled smart home set-ups, and we then discuss open-source
libraries and devices that are applicable to the design of a privacy-preserving
voice assistant for smart homes and the IoT. Towards the end, we consider
additional technologies and methods that could support a cloud-free voice
assistant, and conclude the work
Information Outlook, March 2000
Volume 4, Issue 3https://scholarworks.sjsu.edu/sla_io_2000/1002/thumbnail.jp
An overview on the evaluated video retrieval tasks at TRECVID 2022
The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis
and retrieval evaluation with the goal of promoting progress in research and
development of content-based exploitation and retrieval of information from
digital video via open, tasks-based evaluation supported by metrology. Over the
last twenty-one years this effort has yielded a better understanding of how
systems can effectively accomplish such processing and how one can reliably
benchmark their performance. TRECVID has been funded by NIST (National
Institute of Standards and Technology) and other US government agencies. In
addition, many organizations and individuals worldwide contribute significant
time and effort. TRECVID 2022 planned for the following six tasks: Ad-hoc video
search, Video to text captioning, Disaster scene description and indexing,
Activity in extended videos, deep video understanding, and movie summarization.
In total, 35 teams from various research organizations worldwide signed up to
join the evaluation campaign this year. This paper introduces the tasks,
datasets used, evaluation frameworks and metrics, as well as a high-level
results overview.Comment: arXiv admin note: substantial text overlap with arXiv:2104.13473,
arXiv:2009.0998
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research