1,436 research outputs found
The LIG Multi-Criteria System for Video Retrieval
International audienceThe LIG search system uses a user-controlled combination of six criteria: keywords, phonetic string, similarity to example images, semantic categories, similarity to already identified positive images, and temporal closeness to already identified positive images
Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop
We summarize the accomplishments of a multi-disciplinary workshop exploring
the computational and scientific issues surrounding the discovery of linguistic
units (subwords and words) in a language without orthography. We study the
replacement of orthographic transcriptions by images and/or translated text in
a well-resourced language to help unsupervised discovery from raw speech.Comment: Accepted to ICASSP 201
CHORUS Deliverable 4.3: Report from CHORUS workshops on national initiatives and metadata
Minutes of the following Workshops:
• National Initiatives on Multimedia Content Description and Retrieval, Geneva, October 10th, 2007.
• Metadata in Audio-Visual/Multimedia production and archiving, Munich, IRT, 21st – 22nd November 2007
Workshop in Geneva 10/10/2007
This highly successful workshop was organised in cooperation with the European Commission. The event brought together
the technical, administrative and financial representatives of the various national initiatives, which have been established
recently in some European countries to support research and technical development in the area of audio-visual content
processing, indexing and searching for the next generation Internet using semantic technologies, and which may lead to an
internet-based knowledge infrastructure. The objective of this workshop was to provide a platform for mutual information
and exchange between these initiatives, the European Commission and the participants. Top speakers were present from
each of the national initiatives. There was time for discussions with the audience and amongst the European National
Initiatives. The challenges, communalities, difficulties, targeted/expected impact, success criteria, etc. were tackled. This
workshop addressed how these national initiatives could work together and benefit from each other.
Workshop in Munich 11/21-22/2007
Numerous EU and national research projects are working on the automatic or semi-automatic generation of descriptive and
functional metadata derived from analysing audio-visual content. The owners of AV archives and production facilities are
eagerly awaiting such methods which would help them to better exploit their assets.Hand in hand with the digitization of
analogue archives and the archiving of digital AV material, metadatashould be generated on an as high semantic level as
possible, preferably fully automatically. All users of metadata rely on a certain metadata model. All AV/multimedia search
engines, developed or under current development, would have to respect some compatibility or compliance with the
metadata models in use. The purpose of this workshop is to draw attention to the specific problem of metadata models in the
context of (semi)-automatic multimedia search
Video shot boundary detection: seven years of TRECVid activity
Shot boundary detection (SBD) is the process of automatically detecting the boundaries between shots in video. It is a problem which has attracted much attention since video became available in digital form as it is an essential pre-processing step to almost all video analysis, indexing, summarisation, search, and other content-based operations. Automatic SBD was one of the tracks of activity within the annual TRECVid benchmarking exercise, each year from 2001 to 2007 inclusive. Over those seven years we have seen 57 different research groups from across the world work to determine the best approaches to SBD while using a common dataset and common scoring metrics. In this paper we present an overview of the TRECVid shot boundary detection task, a high-level overview of the most significant of the approaches taken, and a comparison of performances, focussing on one year (2005) as an example
IRIM at TRECVID 2012: Semantic Indexing and Instance Search
International audienceThe IRIM group is a consortium of French teams work- ing on Multimedia Indexing and Retrieval. This paper describes its participation to the TRECVID 2012 se- mantic indexing and instance search tasks. For the semantic indexing task, our approach uses a six-stages processing pipelines for computing scores for the likeli- hood of a video shot to contain a target concept. These scores are then used for producing a ranked list of im- ages or shots that are the most likely to contain the tar- get concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classi cation, fusion of descriptor variants, higher-level fusion, and re-ranking. We evaluated a number of dif- ferent descriptors and tried di erent fusion strategies. The best IRIM run has a Mean Inferred Average Pre- cision of 0.2378, which ranked us 4th out of 16 partici- pants. For the instance search task, our approach uses two steps. First individual methods of participants are used to compute similrity between an example image of in- stance and keyframes of a video clip. Then a two-step fusion method is used to combine these individual re- sults and obtain a score for the likelihood of an instance to appear in a video clip. These scores are used to ob- tain a ranked list of clips the most likely to contain the queried instance. The best IRIM run has a MAP of 0.1192, which ranked us 29th on 79 fully automatic runs
Rushes summarization by IRIM consortium: redundancy removal and multi-feature fusion
International audienceIn this paper, we present the first participation of a consortium of French laboratories, IRIM, to the TRECVID 2008 BBC Rushes Summarization task. Our approach resorts to video skimming. We propose two methods to reduce redundancy, as rushes include several takes of scenes. We also take into account low and midlevel semantic features in an ad-hoc fusion method in order to retain only significant content
Describing Common Human Visual Actions in Images
Which common human actions and interactions are recognizable in monocular
still images? Which involve objects and/or other people? How many is a person
performing at a time? We address these questions by exploring the actions and
interactions that are detectable in the images of the MS COCO dataset. We make
two main contributions. First, a list of 140 common `visual actions', obtained
by analyzing the largest on-line verb lexicon currently available for English
(VerbNet) and human sentences used to describe images in MS COCO. Second, a
complete set of annotations for those `visual actions', composed of
subject-object and associated verb, which we call COCO-a (a for `actions').
COCO-a is larger than existing action datasets in terms of number of actions
and instances of these actions, and is unique because it is data-driven, rather
than experimenter-biased. Other unique features are that it is exhaustive, and
that all subjects and objects are localized. A statistical analysis of the
accuracy of our annotations and of each action, interaction and subject-object
combination is provided
Video-4-Video: using video for searching, classifying and summarising video
YouTube has meant that we are now becoming accustomed to searching for video clips, and finding them, for both work and leisure pursuits. But YouTube, like the Internet Archive, OpenVideo and almost everything other video library, doesn't use video to find video, it uses metadata, usually based on user generated content (UGC). But what if we don't know what we're looking for and the metadata doesn't help, or we have poor metadata or no UGC, can we use the video to find video ? Can we automatically derive semantic concepts directly from video which we can use for retrieval or summarisation ? Many dozens of research groups throughout the world work on the problems associated with content-based video search, content-based detection of semantic concepts, shot boundary detection, content-based summarisation and content-based event detection. In this presentation we give a summary of the achievements of almost a decade of research by the TRECVid community, including a report on performance of groups in different TRECVid tasks. We present the modus operandi of the annual TRECVid benchmarking, the problems associated with running an annual evaluation for nearly 100 research groups every year and an overview of the most successful approaches to each task
Video Mining using LIM Based Clustering and Self Organizing Maps
AbstractVideo mining has grown as an energetic research area and given incremental concentration in recent years due to impressive and rapid raise in the volume of digital video databases. The aim of this research work is to find out new objects in videos. This work proposes a novel approach for video mining using LIM based clustering technique and self organizing maps to recognize novelty in the frames of video sequence. The proposed work is designed and implemented on MATLAB. It is tested with the sample videos and provides promising results. And it is suitable for day to day video mining applications and object detection systems including remote video surveillance in defense for national and international border tracking
- …