Search CORE

19,601 research outputs found

Relevance of ASR for the Automatic Generation of Keywords Suggestions for TV programs

Author: Brugman Hennie
Gazendam Luit
Heeren Willemijn
Malaisé Véronique
Ordelman Roeland
Publication venue: Association pour le traitement automatique des langues
Publication date: 01/01/2009
Field of study

Semantic access to multimedia content in audiovisual archives is to a large extent dependent on quantity and quality of the metadata, and particularly the content descriptions that are attached to the individual items. However, given the growing amount of materials that are being created on a daily basis and the digitization of existing analogue collections, the traditional manual annotation of collections puts heavy demands on resources, especially for large audiovisual archives. One way to address this challenge, is to introduce (semi) automatic annotation techniques for generating and/or enhancing metadata. The NWO funded CATCH-CHOICE project has investigated the extraction of keywords form textual resources related to the TV programs to be archived (context documents), in collaboration with the Dutch audiovisual archives, Sound and Vision. Besides the descriptions of the programs published by the broadcasters on their Websites, Automatic Speech Transcription (ASR) techniques from the CATCH-CHoral project, also provide textual resources that might be relevant for suggesting keywords. This paper investigates the suitability of ASR for generating such keywords, which we evaluate against manual annotations of the documents and against keywords automatically generated from context documents

VU Research Portal

University of Twente Research Information

Sound and Vision Publications

Context-aware person identification in personal photo collections

Author: O'Hare Neil
Smeaton Alan F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2009
Field of study

Identifying the people in photos is an important need for users of photo management systems. We present MediAssist, one such system which facilitates browsing, searching and semi-automatic annotation of personal photos, using analysis of both image content and the context in which the photo is captured. This semi-automatic annotation includes annotation of the identity of people in photos. In this paper, we focus on such person annotation, and propose person identiﬁcation techniques based on a combination of context and content. We propose language modelling and nearest neighbor approaches to context-based person identiﬁcation, in addition to novel face color and image color content-based features (used alongside face recognition and body patch features). We conduct a comprehensive empirical study of these techniques using the real private photo collections of a number of users, and show that combining context- and content-based analysis improves performance over content or context alone

Crossref

Irish Universities

DCU Online Research Access Service

Collaborative analysis of multi-gigapixel imaging data using Cytomine

Author: Benjamin Stévens
Carpenter
de Souza
Gilles Louppe
Jean-Michel Begon
Leroi
Louis Wehenkel
Loïc Rollus
Marée
Philipp Kainz
Pierre Geurts
Raphaël Marée
Renaud Hoyoux
Rémy Vandaele
Weekers
Publication venue: 'Oxford University Press (OUP)'
Publication date: 10/01/2016
Field of study

Motivation: Collaborative analysis of massive imaging datasets is essential to enable scientific discoveries. Results: We developed Cytomine to foster active and distributed collaboration of multidisciplinary teams for large-scale image-based studies. It uses web development methodologies and machine learning in order to readily organize, explore, share and analyze (semantically and quantitatively) multi-gigapixel imaging data over the internet. We illustrate how it has been used in several biomedical applications

Central Archive at the University of Reading

Crossref

PubMed Central

Open Repository and Bibliography - Liège

Learning Multimodal Latent Attributes

Author: Fu Y
Gong S
Hospedales TM
Xiang T
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2014
Field of study

Abstract—The rapid development of social media sharing has created a huge demand for automatic media classification and annotation techniques. Attribute learning has emerged as a promising paradigm for bridging the semantic gap and addressing data sparsity via transferring attribute knowledge in object recognition and relatively simple action classification. In this paper, we address the task of attribute learning for understanding multimedia data with sparse and incomplete labels. In particular we focus on videos of social group activities, which are particularly challenging and topical examples of this task because of their multi-modal content and complex and unstructured nature relative to the density of annotations. To solve this problem, we (1) introduce a concept of semi-latent attribute space, expressing user-defined and latent attributes in a unified framework, and (2) propose a novel scalable probabilistic topic model for learning multi-modal semi-latent attributes, which dramatically reduces requirements for an exhaustive accurate attribute ontology and expensive annotation effort. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multimedia sparse data learning tasks including: multi-task learning, learning with label noise, N-shot transfer learning and importantly zero-shot learning

CiteSeerX

Queen Mary Research Online

On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow

Author: Allamanis M.
Chowdhury S.
Cicchetti A.
Lin Y.
Pedregosa F.
Settles B.
Settles B.
Soliman M.
Ying A.
Publication venue
Publication date: 01/01/2017
Field of study

Abundant data is the key to successful machine learning. However, supervised learning requires annotated data that are often hard to obtain. In a classification task with limited resources, Active Learning (AL) promises to guide annotators to examples that bring the most value for a classifier. AL can be successfully combined with self-training, i.e., extending a training set with the unlabelled examples for which a classifier is the most certain. We report our experiences on using AL in a systematic manner to train an SVM classifier for Stack Overflow posts discussing performance of software components. We show that the training examples deemed as the most valuable to the classifier are also the most difficult for humans to annotate. Despite carefully evolved annotation criteria, we report low inter-rater agreement, but we also propose mitigation strategies. Finally, based on one annotator's work, we show that self-training can improve the classification accuracy. We conclude the paper by discussing implication for future text miners aspiring to use AL and self-training.Comment: Preprint of paper accepted for the Proc. of the 21st International Conference on Evaluation and Assessment in Software Engineering, 201

arXiv.org e-Print Archive

Lund University Publications

Crossref

Swedish Institute of Computer Science Publications Database

Automatic tagging and geotagging in video collections and communities

Author: Jones Gareth J.F.
Larson Martha
Serdyukov Pavel
Soleymani Mohammad
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2011
Field of study

Automatically generated tags and geotags hold great promise to improve access to video collections and online communi- ties. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features

Irish Universities

DCU Online Research Access Service