3,131 research outputs found

    Radio Oranje: Enhanced Access to a Historical Spoken Word Collection

    Get PDF
    Access to historical audio collections is typically very restricted:\ud content is often only available on physical (analog) media and the\ud metadata is usually limited to keywords, giving access at the level\ud of relatively large fragments, e.g., an entire tape. Many spoken\ud word heritage collections are now being digitized, which allows the\ud introduction of more advanced search technology. This paper presents\ud an approach that supports online access and search for recordings of\ud historical speeches. A demonstrator has been built, based on the\ud so-called Radio Oranje collection, which contains radio speeches by\ud the Dutch Queen Wilhelmina that were broadcast during World War II.\ud The audio has been aligned with its original 1940s manual\ud transcriptions to create a time-stamped index that enables the speeches to be\ud searched at the word level. Results are presented together with\ud related photos from an external database

    Clearing the transcription hurdle in dialect corpus building : the corpus of Southern Dutch dialects as case-study

    Get PDF
    This paper discusses how the transcription hurdle in dialect corpus building can be cleared. While corpus analysis has strongly gained in popularity in linguistic research, dialect corpora are still relatively scarce. This scarcity can be attributed to several factors, one of which is the challenging nature of transcribing dialects, given a lack of both orthographic norms for many dialects and speech technological tools trained on dialect data. This paper addresses the questions (i) how dialects can be transcribed efficiently and (ii) whether speech technological tools can lighten the transcription work. These questions are tackled using the Southern Dutch dialects (SDDs) as case study, for which the usefulness of automatic speech recognition (ASR), respeaking, and forced alignment is considered. Tests with these tools indicate that dialects still constitute a major speech technological challenge. In the case of the SDDs, the decision was made to use speech technology only for the word-level segmentation of the audio files, as the transcription itself could not be sped up by ASR tools. The discussion does however indicate that the usefulness of ASR and other related tools for a dialect corpus project is strongly determined by the sound quality of the dialect recordings, the availability of statistical dialect-specific models, the degree of linguistic differentiation between the dialects and the standard language, and the goals the transcripts have to serve

    A Contextual Study of Semantic Speech Editing in Radio Production

    Get PDF
    Radio production involves editing speech-based audio using tools that represent sound using simple waveforms. Semantic speech editing systems allow users to edit audio using an automatically generated transcript, which has the potential to improve the production workflow. To investigate this, we developed a semantic audio editor based on a pilot study. Through a contextual qualitative study of five professional radio producers at the BBC, we examined the existing radio production process and evaluated our semantic editor by using it to create programmes that were later broadcast. We observed that the participants in our study wrote detailed notes about their recordings and used annotation to mark which parts they wanted to use. They collaborated closely with the presenter of their programme to structure the contents and write narrative elements. Participants reported that they often work away from the office to avoid distractions, and print transcripts so they can work away from screens. They also emphasised that listening is an important part of production, to ensure high sound quality. We found that semantic speech editing with automated speech recognition can be used to improve the radio production workflow, but that annotation, collaboration, portability and listening were not well supported by current semantic speech editing systems. In this paper, we make recommendations on how future semantic speech editing systems can better support the requirements of radio production
    • …
    corecore