95,037 research outputs found

    Accessing the spoken word

    Get PDF
    Spoken-word audio collections cover many domains, including radio and television broadcasts, oral narratives, governmental proceedings, lectures, and telephone conversations. The collection, access, and preservation of such data is stimulated by political, economic, cultural, and educational needs. This paper outlines the major issues in the field, reviews the current state of technology, examines the rapidly changing policy issues relating to privacy and copyright, and presents issues relating to the collection and preservation of spoken audio conten

    Prosody-Based Automatic Segmentation of Speech into Sentences and Topics

    Get PDF
    A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models -- for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2), Special Issue on Accessing Information in Spoken Audio, September 200

    Trouble Articulating the Right Words: Evidence for a Response-Exclusion Account of Distraction During Semantic Fluency

    Get PDF
    It is widely held that single-word lexical access is a competitive process, a view based largely on the observation that naming a picture is slowed in the presence of a distractor-word. However, problematic for this view is that a low-frequency distractor-word slows the naming of a picture more than does a high-frequency word. This supports an alternative, response-exclusion, account in which a distractor-word interferes because it must be excluded from an articulatory output buffer before the right word can be articulated (the picture name): A high, compared to low, frequency word accesses the buffer more quickly and, as such, can also be excluded more quickly. Here we studied the respective roles of competition and response-exclusion for the first time in the context of semantic verbal fluency, a setting requiring the accessing of, and production of, multiple words from long-term memory in response to a single semantic cue. We show that disruption to semantic fluency by a sequence of to-be-ignored spoken distractors is also greater when those distractors are low in frequency, thereby extending the explanatory compass of the response-exclusion account to a multiple-word production setting and casting further doubt on the lexical-selection-by-competition view. The results can be understood as reflecting the contribution of speech output processes to semantic fluency

    A comparison of grapheme and phoneme-based units for Spanish spoken term detection

    Get PDF
    The ever-increasing volume of audio data available online through the world wide web means that automatic methods for indexing and search are becoming essential. Hidden Markov model (HMM) keyword spotting and lattice search techniques are the two most common approaches used by such systems. In keyword spotting, models or templates are defined for each search term prior to accessing the speech and used to find matches. Lattice search (referred to as spoken term detection), uses a pre-indexing of speech data in terms of word or sub-word units, which can then quickly be searched for arbitrary terms without referring to the original audio. In both cases, the search term can be modelled in terms of sub-word units, typically phonemes. For in-vocabulary words (i.e. words that appear in the pronunciation dictionary), the letter-to-sound conversion systems are accepted to work well. However, for out-of-vocabulary (OOV) search terms, letter-to-sound conversion must be used to generate a pronunciation for the search term. This is usually a hard decision (i.e. not probabilistic and with no possibility of backtracking), and errors introduced at this step are difficult to recover from. We therefore propose the direct use of graphemes (i.e., letter-based sub-word units) for acoustic modelling. This is expected to work particularly well in languages such as Spanish, where despite the letter-to-sound mapping being very regular, the correspondence is not one-to-one, and there will be benefits from avoiding hard decisions at early stages of processing. In this article, we compare three approaches for Spanish keyword spotting or spoken term detection, and within each of these we compare acoustic modelling based on phone and grapheme units. Experiments were performed using the Spanish geographical-domain Albayzin corpus. Results achieved in the two approaches proposed for spoken term detection show us that trigrapheme units for acoustic modelling match or exceed the performance of phone-based acoustic models. In the method proposed for keyword spotting, the results achieved with each acoustic model are very similar

    Is translation semantically mediated? Evidence from Welsh-English bilingual aphasia

    Get PDF
    The involvement of the semantic system in picture naming is undisputed. However, it has been proposed that translation could take place via direct lexical links between L1 and L2 word forms in addition to or instead of via semantics(i.e., with translation going from a spoken word in L1 accessing its meaning and this meaning then leading to the retrieval of the translation equivalent in L2). There is conflicting evidence in the psycholinguistic literature as to the extent of semantic mediation in translation vs. picture naming tasks (Potter et al, 1984; Kroll and Stewart, 1994). More recently, Hernandez et al (2010) investigated this question in a case study of JFF, a proficient bilingual Spanish-Catalan speaker with Alzheimer’s disease and naming difficulties due to a semantic deficit. As JFF’s semantic deficit did not only affect picture naming but also translation tasks, the authors concluded against the existence of functional direct lexical links to support translation. The goal of our study was to explore this issue further in a larger sample of proficient bilingual patients with aphasia and word finding difficulties in both languages. More specifically, we compare the rate of semantic errors produced in naming vs. translation tasks

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Spanish generation from Spanish Sign Language using a phrase-based translation system

    Get PDF
    This paper describes the development of a Spoken Spanish generator from Spanish Sign Language (LSE – Lengua de Signos Española) in a specific domain: the renewal of Identity Document and Driver’s license. The system is composed of three modules. The first one is an interface where a deaf person can specify a sign sequence in sign-writing. The second one is a language translator for converting the sign sequence into a word sequence. Finally, the last module is a text to speech converter. Also, the paper describes the generation of a parallel corpus for the system development composed of more than 4,000 Spanish sentences and their LSE translations in the application domain. The paper is focused on the translation module that uses a statistical strategy with a phrase-based translation model, and this paper analyses the effect of the alignment configuration used during the process of word based translation model generation. Finally, the best configuration gives a 3.90% mWER and a 0.9645 BLEU

    Towards vidcasts: a case study in the development and use of video podcasts

    Get PDF
    This case study presents the learning journey towards the development of vidcasts at Glasgow Caledonian University (GCU). This was undertaken by the Effective Learning Service (ELS) and the Spoken Word Team at GCU. ELS was established in 2001 as a service that is available to all students irrespective of level, background or programme. It is currently located within the Learner Support Department together with other centralised services including Spoken Word, Careers, the Library, and Disability Services. The ELS has developed extensive, collaborative partnerships across schools, in the planning and delivery of context based workshops. Spoken Word Services originated in the international Spoken Word project, which aimed to transform higher education through the integration of digitised audio into learning and teaching. A collaboration and legal deposit agreement with BBC Information & Archives allows Spoken Word to make use of audio and video programmes from the BBC’s extensive archive for teaching and learning purposes. At GCU it is responsible for providing tools and technologies, coping with intellectual property rights, supplying engaging and valuable content, and encouraging reflection on the learning and teaching process. The team recognises that teachers need to develop ‘pedagogical pluralism’ and, in this context, aims to encourage students and their teachers to “write on and for the internet” (Wallace and Donald, 2008). Spoken Word has extensive experience in producing podcasts and exciting interactive material collaboratively. Working with the REAP Project (Re-Engineering Assessment Practices), Spoken Word has applied video podcasts as a driver for change, replacing a one hour weekly lecture with a 15 minute video podcast designed around a blend of a lecturer’s narration, BBC audio and video clips, and related to the lecturer’s own PowerPoint slides (REAP Pilot Projects, 2007)

    Radio Oranje: Enhanced Access to a Historical Spoken Word Collection

    Get PDF
    Access to historical audio collections is typically very restricted:\ud content is often only available on physical (analog) media and the\ud metadata is usually limited to keywords, giving access at the level\ud of relatively large fragments, e.g., an entire tape. Many spoken\ud word heritage collections are now being digitized, which allows the\ud introduction of more advanced search technology. This paper presents\ud an approach that supports online access and search for recordings of\ud historical speeches. A demonstrator has been built, based on the\ud so-called Radio Oranje collection, which contains radio speeches by\ud the Dutch Queen Wilhelmina that were broadcast during World War II.\ud The audio has been aligned with its original 1940s manual\ud transcriptions to create a time-stamped index that enables the speeches to be\ud searched at the word level. Results are presented together with\ud related photos from an external database

    Towards Affordable Disclosure of Spoken Word Archives

    Get PDF
    This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken word archives in general, and in particular of a collection of recorded interviews with Dutch survivors of World War II concentration camp Buchenwald. Given such collections, the least we want to be able to provide is search at different levels and a flexible way of presenting results. Strategies for automatic annotation based on speech recognition – supporting e.g., within-document search– are outlined and discussed with respect to the Buchenwald interview collection. In addition, usability aspects of the spoken word search are discussed on the basis of our experiences with the online Buchenwald web portal. It is concluded that, although user feedback is generally fairly positive, automatic annotation performance is still far from satisfactory, and requires additional research
    corecore