Search CORE

20 research outputs found

A spoken document retrieval application in the oral history domain

Author: Huijbregts Marijn
Jong Franciska de
Ordelman Roeland
Publication venue: University of Patras, Wire Communications Laboratory Moscow State Linguistics University
Publication date: 01/01/2005
Field of study

The application of automatic speech recognition in the broadcast news domain is well studied. Recognition performance is generally high and accordingly, spoken document retrieval can successfully be applied in this domain, as demonstrated by a number of commercial systems. In other domains, a similar recognition performance is hard to obtain, or even far out of reach, for example due to lack of suitable training material. This is a serious impediment for the successful application of spoken document retrieval techniques for other data then news. This paper outlines our first steps towards a retrieval system that can automatically be adapted to new domains. We discuss our experience with a recently implemented spoken document retrieval application attached to a web-portal that aims at the disclosure of a multimedia data collection in the oral history domain. The paper illustrates that simply deploying an off-theshelf\ud broadcast news system in this task domain will produce error rates that are too high to be useful for retrieval tasks. By applying adaptation techniques on the acoustic level and language model level, system performance can be improved considerably, but additional research on unsupervised adaptation and search interfaces is required to create an adequate search environment based on speech transcripts

University of Twente Research Information

Unravelling the voice of Willem Frederik Hermans: an oral history indexing case study

Author: Huijbregts Marijn
Jong Franciska de
Ordelman Roeland
Publication venue: University of Twente, Centre for Telematics and Information Technology (CTIT)
Publication date: 01/01/2009
Field of study

University of Twente Research Information

Robust audio indexing for Dutch spoken-word collections

Author: Huijbregts Marijn
Jong Franciska de
Leeuwen David van
Ordelman Roeland
Publication venue: KNAW
Publication date: 01/01/2005
Field of study

Abstract—Whereas the growth of storage capacity is in accordance with widely acknowledged predictions, the possibilities to index and access the archives created is lagging behind. This is especially the case in the oral history domain and much of the rich content in these collections runs the risk to remain inaccessible for lack of robust search technologies. This paper addresses the history and development of robust audio indexing technology for searching Dutch spoken-word collections and compares Dutch audio indexing in the well-studied broadcast news domain with an oral-history case-study. It is concluded that despite significant advances in Dutch audio indexing technology and demonstrated applicability in several domains, further research is indispensable for successful automatic disclosure of spoken-word collections

University of Twente Research Information

Language attitudes revisited: auditory affective priming

Author: Geeraerts Dirk
Impe Leen
Speelman Dirk
Spruyt Adriaan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Ghent University Academic Bibliography

Detecting grammatical errors in machine translation output using dependency parsing and treebank querying

Author: Hoste Veronique
Macken Lieve
Tezcan Arda
Publication venue
Publication date: 01/01/2016
Field of study

Despite the recent advances in the field of machine translation (MT), MT systems cannot guarantee that the sentences they produce will be fluent and coherent in both syntax and semantics. Detecting and highlighting errors in machine-translated sentences can help post-editors to focus on the erroneous fragments that need to be corrected. This paper presents two methods for detecting grammatical errors in Dutch machine-translated text, using dependency parsing and treebank querying. We test our approach on the output of a statistical and a rule-based MT system for English-Dutch and evaluate the performance on sentence and word-level. The results show that our method can be used to detect grammatical errors with high accuracy on sentence-level in both types of MT output

Ghent University Academic Bibliography

Enriching a Descriptive Grammar with Treebank Queries

Author: Frank Landsbergen
Gosse Bouma
Jan Odijk
Marjo Van Koppen
Matje Van De Camp
Ton Van Der Wouden
Publication venue
Publication date: 05/03/2020
Field of study

Abstract The Syntax of Dutch (SoD) is a descriptive and detailed grammar of Dutch, that provides data for many issues raised in linguistic theory. We present the results of a pilot project that investigated the possibility of enriching the online version of the text with links to queries that provide relevant results from syntactically annotated corpora

CiteSeerX

Acoustic Correlates of Prosodic Boundaries in French A Review of Corpus Data / Correlatos acústicos de fronteiras prosódicas em francês: uma revisão de dados de corpora

Author: George Christodoulides
Publication venue: 'Faculdade de Letras da UFMG'
Publication date: 01/10/2018
Field of study

Abstract: In this article we investigate the acoustic correlates of prosodic boundaries in French speech. We compare the prosodic structure annotation performed by experts in two multi-genre corpora (Rhapsodie and LOCAS-F). A uniform analysis procedure is applied to both corpora. The results show that the main acoustic correlates of prosodic boundaries are silent pauses and pre-boundary syllable lengthening. Pitch movements contribute to the perception of boundaries but are essentially correlates of boundary function, rather than boundary strength. Two levels of four-level annotation of boundary strength in the Rhapsodie corpus (periods and packages) correspond to the two-levels of strength in the LOCAS-F corpus. Keywords: prosody; speech segmentation; prosodic boundaries; corpus linguistics; French. Resumo: Neste artigo investigamos os correlatos acústicos de fronteiras prosódicas da fala em língua francesa. Comparamos a anotação da estrutura prosódica efetuada por anotadores experts em dois corpora multigêneros (Rhapsodie e LOCAS-F). Um procedimento de análise uniforme é aplicado a ambos os corpora. Os resultados indicam que os principais correlatos acústicos de fronteiras prosódicas são pausa silenciosa e alongamento da sílaba pré-fronteira. Movimentos de pitch contribuem para a percepção de fronteiras mas são essencialmente correlatos de funções de fronteira, e não de força de fronteira. Dois dos níveis de anotação dos quatro níveis de anotação de força de fronteira do corpus Rhapsodie (períodos e pacotes) correspondem aos dois níveis de intensidade do corpus LOCAS-F. Palavras-chave: prosódia; segmentação da fala; fronteiras prosódicas; linguística de corpus; francês

Directory of Open Access Journals

Enriching a Scientific Grammar with Links to Linguistic Resources: The Taalportaal

Author: Bouma Gosse
Landsbergen Frank
Odijk Jan
Odijk Jan
van de Camp Matje
van der Wouden Ton
van Hessen Arjan
van Koppen Marjo
Publication venue: UBIQUITY PRESS LTD
Publication date: 28/12/2017
Field of study

Scientic research within the humanities is dierent from what it was a few decades ago. For instance, new sources of information, such as digital grammars, lexical databases and large corpora of real-language data oer new opportunities for linguistics. The Taalportaal grammatical database, with its links to other linguistic resources via the CLARIN infrastructure, is a prime example of a new type of tool for linguistic research.

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Utrecht University Repository

Dissertations of the University of Groningen

Difference between written and spoken Czech::The case of verbal nouns denoting an action

Author: Kolar Jan
Kolářová Veronika
Mikulová Marie
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/04/2017
Field of study

Abstract The present paper extends understanding of differences in expressing actions by verbal nouns in corpora of written vs. spoken Czech, namely in the Czech part of the Prague Czech-English Dependency Treebank and in the Prague Dependency Treebank of Spoken Czech. We show that while the written corpus includes more complex noun phrases with more explicit expression of adnominal participants, noun phrases in the spoken corpus contain more deletions and more exophoric references. We also carried out a quantitative analysis focusing on relative frequencies of combinations of participants modifying verbal nouns; although the written corpus shows higher relative frequencies, the order of the relative frequencies of particular combinations is the same in both types of communication.</jats:p

Crossref

University of Birmingham Research Portal

The Construction of a 500-Million-Word Reference Corpus of Contemporary Written Dutch

Author: A Bosch Van den
A Braasch
C Rijsbergen Van
G Aston
J Leveling
J Trapman
JC Carletta
M Recasens
M Reynaert
Martin W. C. Reynaert
W Daelemans
W Daelemans
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

Tilburg University Repository