36 research outputs found

    Investigating techniques for low resource conversational speech recognition

    Get PDF
    International audienceIn this paper we investigate various techniques in order to build effective speech to text (STT) and keyword search (KWS) systems for low resource conversational speech. Sub-word decoding and graphemic mappings were assessed in order to detect out-of-vocabulary keywords. To deal with the limited amount of transcribed data, semi-supervised training and data selection methods were investigated. Robust acoustic features produced via data augmentation were evaluated for acoustic modeling. For language modeling, automatically retrieved conversational-like Webdata was used, as well as neural network based models. We report STT improvements with all the techniques, but interestingly only some improve KWS performance. Results are reported for the Swahili language in the context of the 2015 OpenKWS Evaluation

    Unicode-based graphemic systems for limited resource languages

    Get PDF
    © 2015 IEEE. Large vocabulary continuous speech recognition systems require a mapping from words, or tokens, into sub-word units to enable robust estimation of acoustic model parameters, and to model words not seen in the training data. The standard approach to achieve this is to manually generate a lexicon where words are mapped into phones, often with attributes associated with each of these phones. Contextdependent acoustic models are then constructed using decision trees where questions are asked based on the phones and phone attributes. For low-resource languages, it may not be practical to manually generate a lexicon. An alternative approach is to use a graphemic lexicon, where the 'pronunciation' for a word is defined by the letters forming that word. This paper proposes a simple approach for building graphemic systems for any language written in unicode. The attributes for graphemes are automatically derived using features from the unicode character descriptions. These attributes are then used in decision tree construction. This approach is examined on the IARPA Babel Option Period 2 languages, and a Levantine Arabic CTS task. The described approach achieves comparable, and complementary, performance to phonetic lexicon-based approaches

    Subject access to OPACs: exploiting the capabilities of FileMaker Pro for designing a novel interface

    Get PDF
    Ever since the libraries came to being, subject access has had been a problem. More often than not, subject searches result either in no retrievals or too many records, discouraging users to proceed further. Solutions to these problems were found in improving search methods, indexing techniques, developing user friendly novel interfaces and other methods. The present work attempts to tackle the problems of subject access using an experimental online catalogue by designing a graphic front end user interface, wherein an enhanced indexing technique that is traditional classification system coupled with improved search method by providing end user thesaurus were incorporated by using Macintosh compatible software package called FileMaker Pro. The system provides subject access by three methods i.e. Class Number Search (CNS), Subject Heading Search (SHS) and Keyword Search (KWS) to cater to the needs of two different levels of users i.e. naive or ordinary level and another for the experienced or advanced level users. A cross section of the searchers were invited to evaluate the interface. On the basis of their reactions, certain recommendations were made for the improvement of the system. In the process the capabilities and limitations of FileMaker Pro were assessed and suggestions were given for its further improvement. Certain points pertaining to the further research on the subject were also recommended

    Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

    Get PDF
    The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2 (TEC2012-37585-C02-01) from the Spanish Ministry of Economy and Competitiveness. This research was also funded by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, “Consolidation of Research Units: AtlantTIC Project” CN2012/160)

    Taxonomic Classification of IoT Smart Home Voice Control

    Full text link
    Voice control in the smart home is commonplace, enabling the convenient control of smart home Internet of Things hubs, gateways and devices, along with information seeking dialogues. Cloud-based voice assistants are used to facilitate the interaction, yet privacy concerns surround the cloud analysis of data. To what extent can voice control be performed using purely local computation, to ensure user data remains private? In this paper we present a taxonomy of the voice control technologies present in commercial smart home systems. We first review literature on the topic, and summarise relevant work categorising IoT devices and voice control in the home. The taxonomic classification of these entities is then presented, and we analyse our findings. Following on, we turn to academic efforts in implementing and evaluating voice-controlled smart home set-ups, and we then discuss open-source libraries and devices that are applicable to the design of a privacy-preserving voice assistant for smart homes and the IoT. Towards the end, we consider additional technologies and methods that could support a cloud-free voice assistant, and conclude the work

    Information Outlook, March 2000

    Get PDF
    Volume 4, Issue 3https://scholarworks.sjsu.edu/sla_io_2000/1002/thumbnail.jp

    An overview on the evaluated video retrieval tasks at TRECVID 2022

    Full text link
    The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital video via open, tasks-based evaluation supported by metrology. Over the last twenty-one years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2022 planned for the following six tasks: Ad-hoc video search, Video to text captioning, Disaster scene description and indexing, Activity in extended videos, deep video understanding, and movie summarization. In total, 35 teams from various research organizations worldwide signed up to join the evaluation campaign this year. This paper introduces the tasks, datasets used, evaluation frameworks and metrics, as well as a high-level results overview.Comment: arXiv admin note: substantial text overlap with arXiv:2104.13473, arXiv:2009.0998

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
    corecore