Search CORE

78,569 research outputs found

Towards a Knowledge Graph based Speech Interface

Author: Auer Sören
Kumar Ashwini Jaya
köhler Joachim
Schmidt Christoph
Publication venue
Publication date: 23/05/2017
Field of study

Applications which use human speech as an input require a speech interface with high recognition accuracy. The words or phrases in the recognised text are annotated with a machine-understandable meaning and linked to knowledge graphs for further processing by the target application. These semantic annotations of recognised words can be represented as a subject-predicate-object triples which collectively form a graph often referred to as a knowledge graph. This type of knowledge representation facilitates to use speech interfaces with any spoken input application, since the information is represented in logical, semantic form, retrieving and storing can be followed using any web standard query languages. In this work, we develop a methodology for linking speech input to knowledge graphs and study the impact of recognition errors in the overall process. We show that for a corpus with lower WER, the annotation and linking of entities to the DBpedia knowledge graph is considerable. DBpedia Spotlight, a tool to interlink text documents with the linked open data is used to link the speech recognition output to the DBpedia knowledge graph. Such a knowledge-based speech recognition interface is useful for applications such as question answering or spoken dialog systems.Comment: Under Review in International Workshop on Grounding Language Understanding, Satellite of Interspeech 201

arXiv.org e-Print Archive

Fraunhofer-ePrints

Measuring Syntactic Complexity in Spoken and Written Learner Language: Comparing the Incomparable?

Author: Bardovi
Bardovi
Baron
Baron
Beaman
Beaman
Bergman
Bergman
Bourdin
Bourdin
Brown
Brown
Bulté
Bulté
Cambridge
Cambridge
Cleland
Cleland
Conrad
Conrad
Ellis
Ellis
Ellis
Ellis
Foster
Foster
Gaies
Gaies
Gilabert
Gilabert
Halleck
Halleck
Halliday
Halliday
Halliday
Halliday
Housen
Housen
Housen
Housen
Housen
Housen
Hunt
Hunt
Ishikawa
Ishikawa
Ishikawa
Ishikawa
Iwashita
Iwashita
Kuiken
Kuiken
Larsen
Larsen
Larsen
Larsen
Larson
Larson
Leech
Leech
Mari Mäkilä
Norris
Norris
Norwood
Norwood
Ortega
Ortega
Ortega
Ortega
Pallotti
Pallotti
Pekka Lintunen
Pica
Pica
Pietilä
Pietilä
Robinson
Robinson
Robinson
Robinson
Scarborough
Scarborough
Sharma
Sharma
Silva
Silva
Silva
Silva
Skehan
Skehan
Skehan
Skehan
Storch
Storch
Szmrecsányi
Szmrecsányi
Tanskanen
Tanskanen
Tavakoli
Tavakoli
Tonkyn
Tonkyn
Towell
Towell
Vyatkina
Vyatkina
Wolfe
Wolfe
Zhang
Zhang
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/12/2014
Field of study

Spoken and written language are two modes of language. When learners aim at higher skill levels, the expected outcome of successful second language learning is usually to become a fluent speaker and writer who can produce accurate and complex language in the target language. There is an axiomatic difference between speech and writing, but together they form the essential parts of learners’ L2 skills. The two modes have their own characteristics, and there are differences between native and nonnative language use. For instance, hesitations and pauses are not visible in the end result of the writing process, but they are characteristic of nonnative spoken language use. The present study is based on the analysis of L2 English spoken and written productions of 18 L1 Finnish learners with focus on syntactic complexity. As earlier spoken language segmentation units mostly come from fluency studies, we conducted an experiment with a new unit, the U-unit, and examined how using this unit as the basis of spoken language segmentation affects the results. According to the analysis, written language was more complex than spoken language. However, the difference in the level of complexity was greatest when the traditional units, T-units and AS-units, were used in segmenting the data. Using the U-unit revealed that spoken language may, in fact, be closer to written language in its syntactic complexity than earlier studies had suggested. Therefore, further research is needed to discover whether the differences in spoken and written learner language are primarily due to the nature of these modes or, rather, to the units and measures used in the analysis

Crossref

Biblioteka Nauki - repozytorium artykuÅÃ³w

Repozytorium Uniwersytetu Łódzkiego (University of Lodz Repository)

Overview of VideoCLEF 2009: New perspectives on speech-based multimedia content enrichment

Author: A. Hanjalic
A.F. Smeaton
J. Kekäläinen
J. Kürsten
J.J.M. Kierkels
J.M. Perea-Ortega
M. Larson
M. Larson
P. Pecina
S. Raaijmakers
T.-A. Dobrilǎ
Á. Gyarmati
Publication venue
Publication date: 01/01/2009
Field of study

VideoCLEF 2009 offered three tasks related to enriching video content for improved multimedia access in a multilingual environment. For each task, video data (Dutch-language television, predominantly documentaries) accompanied by speech recognition transcripts were provided. The Subject Classification Task involved automatic tagging of videos with subject theme labels. The best performance was achieved by approaching subject tagging as an information retrieval task and using both speech recognition transcripts and archival metadata. Alternatively, classifiers were trained using either the training data provided or data collected from Wikipedia or via general Web search. The Affect Task involved detecting narrative peaks, defined as points where viewers perceive heightened dramatic tension. The task was carried out on the “Beeldenstorm” collection containing 45 short-form documentaries on the visual arts. The best runs exploited affective vocabulary and audience directed speech. Other approaches included using topic changes, elevated speaking pitch, increased speaking intensity and radical visual changes. The Linking Task, also called “Finding Related Resources Across Languages,” involved linking video to material on the same subject in a different language. Participants were provided with a list of multimedia anchors (short video segments) in the Dutch-language “Beeldenstorm” collection and were expected to return target pages drawn from English-language Wikipedia. The best performing methods used the transcript of the speech spoken during the multimedia anchor to build a query to search an index of the Dutch language Wikipedia. The Dutch Wikipedia pages returned were used to identify related English pages. Participants also experimented with pseudo-relevance feedback, query translation and methods that targeted proper names

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service