Search CORE

3,131 research outputs found

Radio Oranje: Enhanced Access to a Historical Spoken Word Collection

Author: Heeren Willemijn
Jong Franciska de
Ordelman Roeland
Werff Laurens van der
Publication venue: Landelijke Onderzoekschool Taalwetenschap
Publication date: 01/01/2007
Field of study

Access to historical audio collections is typically very restricted:\ud content is often only available on physical (analog) media and the\ud metadata is usually limited to keywords, giving access at the level\ud of relatively large fragments, e.g., an entire tape. Many spoken\ud word heritage collections are now being digitized, which allows the\ud introduction of more advanced search technology. This paper presents\ud an approach that supports online access and search for recordings of\ud historical speeches. A demonstrator has been built, based on the\ud so-called Radio Oranje collection, which contains radio speeches by\ud the Dutch Queen Wilhelmina that were broadcast during World War II.\ud The audio has been aligned with its original 1940s manual\ud transcriptions to create a time-stamped index that enables the speeches to be\ud searched at the word level. Results are presented together with\ud related photos from an external database

University of Twente Research Information

Utrecht University Repository

Clearing the transcription hurdle in dialect corpus building : the corpus of Southern Dutch dialects as case-study

Author: Breitbarth Anne
Farasyn Melissa
Ghyselen Anne-Sophie
van Hessen Arjan
Van Keymeulen Jacques
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2020
Field of study

This paper discusses how the transcription hurdle in dialect corpus building can be cleared. While corpus analysis has strongly gained in popularity in linguistic research, dialect corpora are still relatively scarce. This scarcity can be attributed to several factors, one of which is the challenging nature of transcribing dialects, given a lack of both orthographic norms for many dialects and speech technological tools trained on dialect data. This paper addresses the questions (i) how dialects can be transcribed efficiently and (ii) whether speech technological tools can lighten the transcription work. These questions are tackled using the Southern Dutch dialects (SDDs) as case study, for which the usefulness of automatic speech recognition (ASR), respeaking, and forced alignment is considered. Tests with these tools indicate that dialects still constitute a major speech technological challenge. In the case of the SDDs, the decision was made to use speech technology only for the word-level segmentation of the audio files, as the transcription itself could not be sped up by ASR tools. The discussion does however indicate that the usefulness of ASR and other related tools for a dialect corpus project is strongly determined by the sound quality of the dialect recordings, the availability of statistical dialect-specific models, the degree of linguistic differentiation between the dialects and the standard language, and the goals the transcripts have to serve

Ghent University Academic Bibliography

Automated Metadata Extraction for Semantic Access to Spoken Word Archives

Author: de Jong Franciska M.G.
Heeren W.F.L.
Nijholt Antinus
Ordelman Roeland J.F.
van Hessen Adrianus J.
Publication venue: Centre for Applied Linguistics
Publication date: 17/01/2011
Field of study

University of Twente Research Information

A Contextual Study of Semantic Speech Editing in Radio Production

Author: Anguera Miro
Arons
Baume
Bell
Berthouzoz
Burke
Casares
Choi
Engström
Hart
Kim
Klemmer
Long
Loviscach
Matsuo
Nielsen
Panagiotakis
Perry
Rubin
Shin
Sivaraman
Suhm
Sun
Vemuri
Weibel
Whittaker
Whittaker
Whittaker
Wolfe
Yoon
Publication venue: 'Elsevier BV'
Publication date: 01/07/1997
Field of study

Radio production involves editing speech-based audio using tools that represent sound using simple waveforms. Semantic speech editing systems allow users to edit audio using an automatically generated transcript, which has the potential to improve the production workflow. To investigate this, we developed a semantic audio editor based on a pilot study. Through a contextual qualitative study of five professional radio producers at the BBC, we examined the existing radio production process and evaluated our semantic editor by using it to create programmes that were later broadcast. We observed that the participants in our study wrote detailed notes about their recordings and used annotation to mark which parts they wanted to use. They collaborated closely with the presenter of their programme to structure the contents and write narrative elements. Participants reported that they often work away from the office to avoid distractions, and print transcripts so they can work away from screens. They also emphasised that listening is an important part of production, to ensure high sound quality. We found that semantic speech editing with automated speech recognition can be used to improve the radio production workflow, but that annotation, collaboration, portability and listening were not well supported by current semantic speech editing systems. In this paper, we make recommendations on how future semantic speech editing systems can better support the requirements of radio production

Crossref

Surrey Research Insight

eCommons@Cornell