Search CORE

264,267 research outputs found

Language-based multimedia information retrieval

Author: Gauvain J.L.
Hiemstra D.
Jong F.M.G. de
Netter K.
Publication venue
Publication date: 01/01/2000
Field of study

This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material by use of human language technologies. Thus, in contrast to image or sound-based retrieval methods, where both the query language and the indexing methods build on non-linguistic data, these methods attempt to exploit advanced text retrieval technologies for the retrieval of non-textual material. While POP-EYE was building on subtitles or captions as the prime language key for disclosing video fragments, OLIVE is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which then serve as the basis for text-based retrieval functionality

CiteSeerX

Radboud Repository

University of Twente Research Information

Joint Modeling of Content and Discourse Relations in Dialogues

Author: Kim Joseph
Qin Kechen
Wang Lu
Publication venue
Publication date: 01/01/2017
Field of study

We present a joint modeling approach to identify salient discussion points in spoken meetings as well as to label the discourse relations between speaker turns. A variation of our model is also discussed when discourse relations are treated as latent variables. Experimental results on two popular meeting corpora show that our joint model can outperform state-of-the-art approaches for both phrase-based content selection and discourse relation prediction tasks. We also evaluate our model on predicting the consistency among team members' understanding of their group decisions. Classifiers trained with features constructed from our model achieve significant better predictive performance than the state-of-the-art.Comment: Accepted by ACL 2017. 11 page

arXiv.org e-Print Archive

Crossref

An Experimental Digital Library Platform - A Demonstrator Prototype for the DigLib Project at SICS

Author: Hulth Anette
Jonsson Anna
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/1999
Field of study

Within the framework of the Digital Library project at SICS, this thesis describes the implementation of a demonstrator prototype of a digital library (DigLib); an experimental platform integrating several functions in one common interface. It includes descriptions of the structure and formats of the digital library collection, the tailoring of the search engine Dienst, the construction of a keyword extraction tool, and the design and development of the interface. The platform was realised through sicsDAIS, an agent interaction and presentation system, and is to be used for testing and evaluating various tools for information seeking. The platform supports various user interaction strategies by providing: search in bibliographic records (Dienst); an index of keywords (the Keyword Extraction Function (KEF)); and browsing through the hierarchical structure of the collection. KEF was developed for this thesis work, and extracts and presents keywords from Swedish documents. Although based on a comparatively simple algorithm, KEF contributes by supplying a long-felt want in the area of Information Retrieval. Evaluations of the tasks and the interface still remain to be done, but the digital library is very much up and running. By implementing the platform through sicsDAIS, DigLib can deploy additional tools and search engines without interfering with already running modules. If wanted, agents providing other services than SICS can supply, can be plugged in

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Towards Affordable Disclosure of Spoken Word Archives

Author: Heeren W.F.L.
Hiemstra D.
Huijbregts M.A.H.
Jong F.M.G. de
Ordelman R.J.F.
Publication venue: ILPS, University of Amsterdam
Publication date: 01/01/2008
Field of study

This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken word archives in general, and in particular of a collection of recorded interviews with Dutch survivors of World War II concentration camp Buchenwald. Given such collections, the least we want to be able to provide is search at different levels and a flexible way of presenting results. Strategies for automatic annotation based on speech recognition – supporting e.g., within-document search– are outlined and discussed with respect to the Buchenwald interview collection. In addition, usability aspects of the spoken word search are discussed on the basis of our experiences with the online Buchenwald web portal. It is concluded that, although user feedback is generally fairly positive, automatic annotation performance is still far from satisfactory, and requires additional research

CiteSeerX

Radboud Repository

University of Twente Research Information

TMX markup: a challenge when adapting SMT to the localisation environment

Author: Du Jinhua
Roturier Johann
Way Andy
Publication venue: European Association for Machine Translation
Publication date: 01/05/2010
Field of study

Translation memory (TM) plays an important role in localisation workflows and is used as an efficient and fundamental tool to carry out translation. In recent years, statistical machine translation (SMT) techniques have been rapidly developed, and the translation quality and speed have been significantly improved as well. However,when applying SMT technique to facilitate post-editing in the localisation industry, we need to adapt SMT to the TM data which is formatted with special mark-up. In this paper, we explore some issues when adapting SMT to Symantec formatted TM data. Three different methods are proposed to handle the Translation Memory eXchange (TMX) markup and a comparative study is carried out between them. Furthermore, we also compare the TMX-based SMT systems with a customised SYSTRAN system through human evaluation and automatic evaluation metrics. The experimental results conducted on the French and English language pair show that the SMT can perform well using TMX as input format either during training or at runtime

DCU Online Research Access Service

SCRIPTKELL : a tool for measuring cognitive effort and time processing in writing and other complex cognitive activities

Author: Olive O
Piolat A
Roussey JY
Thunin O
Ziegler J C
Publication venue
Publication date: 01/01/1999
Field of study

We present SCRIPTKELL, a computer-assisted experimental tool that makes it possible to measure the time and cognitive effort allocated to the subprocesses of writing and other cognitive activities, SCRIPTKELL was designed to easily use and modulate Kellogg's (1986) triple-task procedure,.which consists of a combination of three tasks: a writing task (or another task), a reaction time task (auditory signal detection), and a directed retrospection task (after each signal detection during writing). We demonstrate how this tool can be used to address several novel empirical and theoretical issues. In sum, SCRIPTKELL should facilitate the flexible realization of experimental designs and the investigation of critical issues concerning the functional characteristics of complex cognitive activities

Simple principles for a complex output: An experiment in early syntactic development

Author: Parisse Christophe
Publication venue
Publication date: 01/09/2001
Field of study

A set of iterative mechanisms, the Three-Step Algorithm, is proposed to account for the burst in the syntactic capacities of children over age two. These mechanisms are based on the childrens perception, memory, elementary rule-like behavior and cognitive capacities, and do not require any specific innate grammatical capacities. The relevance of the Three-Step Algorithm is tested, using the large Manchester corpus in the CHILDES database. The results show that 80% of the utterances can be exactly reconstructed and that, when incomplete reconstructions are taken into account, 94% of all utterances are reconstructed. The Three-Step Algorithm should be followed by the progressive acquisition of syntactic categories and use of slot-and-frame structures which lead to a greater and more complex linguistic mastery

CogPrints Cognitive Sciences Eprint Archive