10,204 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

    Full text link
    The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes the ASR model to easily code-switch between languages. Specifically, we propose to use Jensen-Shannon divergence and cosine distance based constraints. The former will enforce output embeddings of monolingual languages to possess similar distributions, while the later simply brings the centroids of two distributions to be close to each other. Experimental results demonstrate high effectiveness of the proposed method, yielding up to 4.5% absolute mixed error rate improvement on Mandarin-English code-switching ASR task.Comment: 5 pages, 3 figures, accepted to INTERSPEECH 201

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Current trends in multilingual speech processing

    Get PDF
    In this paper, we describe recent work at Idiap Research Institute in the domain of multilingual speech processing and provide some insights into emerging challenges for the research community. Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers. For example, discriminative features are seeing wide application by the speech recognition community, but additional issues arise when using such features in a multilingual setting. Another example is the apparent convergence of speech recognition and speech synthesis technologies in the form of statistical parametric methodologies. This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS. The second driving force is the impetus being provided by both government and industry for technologies to help break down domestic and international language barriers, these also being barriers to the expansion of policy and commerce. Speech-to-speech and speech-to-text translation are thus emerging as key technologies at the heart of which lies multilingual speech processin

    Optimizing L2 Vocabulary Acquisition: Applied Linguistic Research

    Get PDF
    Any acquisition in Second Language Acquisition (SLA) starts as word recognition; as such vocabulary acquisition is integral to language learning as a whole and is a precursor to fluent communication (Ellis, 1996; Moore, 1996). To maximize SLA, vocabulary acquisition must be optimized. However, vocabulary acquisition is understudied and underutilized, especially compared to other aspects of SLA (Paribakht & Wesche, 1997). Cook states, “…the vast bulk of examinations, syllabuses, and course books around the globe show little overt influence from SLA research” (1998, p.10). Courses, teachers, and students would benefit from directly addressing SLA research, rather than utilize inefficient methods (Cook, 1998; Moore, 1996). Problematic course books influence thousands of teachers and a multitude of students (Cook, 1998); this costs educational institutions billions of dollars globally. Prioritizing sound pedagogy when designing courses would alleviate the problems of inefficient acquisition in SLA and the financial cost. An outline is presented for creating and supplementing programs in instructed SLA, these guidelines utilize linguistic research on vocabulary acquisition: 1) The course is built using frequency data, from spoken corpus in the target language. Zipf’s law dictates that word frequency occurs on a predictable curve where the most frequent word is twice as common as the next most frequent word; word rank is inversely proportional to frequency (Milton, 2009). The 100 most frequent words can be up to 50% of a text (Moore, 1996). The 2,000 most frequent words of English make up about 80% of the language. The next 2,000 words are 8% of the occurrences (Milton, 2009). Vocabulary sorting based on frequency, will provide the most useful words and [Document title] will front-load functional words, allowing L2 acquirers to create grammatical constructions (Milton, 2009; Moore, 1996). 2) This frequency determined L2 vocabulary, uses small, alliterated word lists instead of semantic sets. Alliterated word lists and phonological similarity improve L2 vocabulary retention (Hulstijn, 2003; Laufer, 2009). Semantic sets have been shown to create confusion (Hulstijn, 2003; Schmidt & Watanabe, 2001). 3) Pseudo immersion is avoided because it is not effective for L2 acquirers (Schmidt & Watanabe, 2001). Cody (2009) states, ‘immersion’ and incidental learning are often attempted. Although immersion is effective for (multiple) L1 acquisition, post critical-period acquisition is radically different; ‘mere exposure’ will not work (Hyltenstam & Abrahamsson, 2003). Explicit instruction in the student’s native language is encouraged (Atkinson, 1987). Lexical meaning must be taught explicitly and utilizing explicit instruction can double retention rates (Laufer, 2009; Laufer & Hulstijn, 2001). 4) Mnemonic devices, visual and otherwise are utilized. Flipping an image upside creates a unique association with the word, rather than have the learner ‘mediate’ with the L1 representation, which they would otherwise default to (Hulstijn, 2003). Learner generated mnemonics were found useful in Cohen’s 1987 study (Laufer, 2009). Multiple studies have determined that mnemonic devices comparing an L2 with a semantically related L1 word are effective (Hulstijn, 2003)

    Proceedings of the ACM SIGIR Workshop ''Searching Spontaneous Conversational Speech''

    Get PDF
    corecore