Search CORE

23 research outputs found

Concept-based video search with the PicSOM multimedia retrieval system

Author: Koskela Markus
Laaksonen Jorma
Sjöberg Mats
Viitaniemi Ville
Publication venue: Aalto University School of Science and Technology
Publication date: 01/01/2010
Field of study

Definition of enriched relevance feedback in PicSOM : deliverable D1.3.1 of FP7 project nº 216529 PinView

Author: Laaksonen Jorma
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2008
Field of study

This report defines and implements communication principles and data formats for transferring enriched relevance feedback to the PicSOM content-based image retrieval system used in the PinView project. The modalities of enriched relevance feedback include recorded eye movements, pointer and keyboard events and audio including speech. The communication is based on the AJAX technology, where the client and server exchange XML formatted content by using the XMLHttpRequest method

Aaltodoc Publication Archive

Measuring concept similarities in multimedia ontologies: analysis and evaluations

Author: Koskela Markus
Laaksonen J.
Smeaton Alan F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2007
Field of study

The recent development of large-scale multimedia concept ontologies has provided a new momentum for research in the semantic analysis of multimedia repositories. Different methods for generic concept detection have been extensively studied, but the question of how to exploit the structure of a multimedia ontology and existing inter-concept relations has not received similar attention. In this paper, we present a clustering-based method for modeling semantic concepts on low-level feature spaces and study the evaluation of the quality of such models with entropy-based methods. We cover a variety of methods for assessing the similarity of different concepts in a multimedia ontology. We study three ontologies and apply the proposed techniques in experiments involving the visual and semantic similarities, manual annotation of video, and concept detection. The results show that modeling inter-concept relations can provide a promising resource for many different application areas in semantic multimedia processing

Irish Universities

DCU Online Research Access Service

Evaluation of pointer click relevance feedback in PicSOM : deliverable D1.2 of FP7 project nº 216529 PinView

Author: Laaksonen Jorma
Viitaniemi Ville
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2008
Field of study

This report presents the results of a series of experiments where knowledge of the most relevant part of images is given as additional information to a content-based image retrieval system. The most relevant parts have been identified by search-task-dependent pointer clicks on the images. As such they provide a rudimentary form of explicit enriched relevance feedback and to some extent mimic genuine implicit eye movement measurements which are essential ingredients of the PinView project

Aaltodoc Publication Archive

The Effectiveness of Concept Based Search for Video Retrieval

Author: Aly Robin
Hauff Claudia
Hiemstra Djoerd
Publication venue: Gesellschaft fuer Informatik
Publication date: 01/01/2007
Field of study

In this paper we investigate how a small number of high-level concepts\ud derived for video shots, such as Sport. Face.Indoor. etc., can be used effectively for ad hoc search in video material. We will answer the following questions: 1) Can we automatically construct concept queries from ordinary text queries? 2) What is the best way to combine evidence from single concept detectors into final search results? We evaluated algorithms for automatic concept query formulation using WordNet based concept extraction, and we evaluated algorithms for fast, on-line combination of concepts. Experimental results on data from the TREC Video 2005 workshop and 25 test users show the following. 1) Automatic query formulation through WordNet based concept extraction can achieve comparable results to user created query concepts and 2) Combination methods that take neighboring shots into account outperform more simple combination methods

CiteSeerX

Radboud Repository

University of Twente Research Information

An empirical study of inter-concept similarities in multimedia ontologies

Author: Koskela Markus
Smeaton Alan F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2007
Field of study

Generic concept detection has been a widely studied topic in recent research on multimedia analysis and retrieval, but the issue of how to exploit the structure of a multimedia ontology as well as different inter-concept relations, has not received similar attention. In this paper, we present results from our empirical analysis of different types of similarity among semantic concepts in two multimedia ontologies, LSCOM-Lite and CDVP-206. The results show promise that the proposed methods may be helpful in providing insight into the existing inter-concept relations within an ontology and selecting the most facilitating set of concepts and hierarchical relations. Such an analysis as this can be utilized in various tasks such as building more reliable concept detectors and designing large-scale ontologies

Crossref

Irish Universities

DCU Online Research Access Service

Video Summarization with SOMs

Author: Koskela Markus
Laaksonen Jorma
Muurinen Hannes
Sjöberg Mats
Viitaniemi Ville
Publication venue: Technische Fakultät, Arbeitsgruppen der Informatik
Publication date: 31/12/2007
Field of study

Video summarization is a process where a long video file is converted to a considerably shorter form. The video summary can then be used to facilitate efficient searching and browsing of video files in large video collections. The aim of successful automatic summarization is to preserve as much as possible from the essential content of each video. What is essential is of course subjective and also dependent on the use of the videos and the overall content of the collection. In this paper we present an overview of the SOM-based methodology we have used for video summarization, which analyzes the temporal trajectories of the best-matching units of frame-wise feature vectors. It has been developed as a part of PicSOM, our content-based multimedia information retrieval and analysis framework. The video material we have used in our experiments comes from NIST's annual TRECVID evaluation for content-based video retrieval systems

BieColl - Bielefeld Electronic Collections

BieColl - Bielefeld eCollections

Video Content Understanding Using Text

Author: Mazaheri Amir
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2020
Field of study

The rise of the social media and video streaming industry provided us a plethora of videos and their corresponding descriptive information in the form of concepts (words) and textual video captions. Due to the mass amount of available videos and the textual data, today is the best time ever to study the Computer Vision and Machine Learning problems related to videos and text. In this dissertation, we tackle multiple problems associated with the joint understanding of videos and text. We first address the task of multi-concept video retrieval, where the input is a set of words as concepts, and the output is a ranked list of full-length videos. This approach deals with multi-concept input and prolonged length of videos by incorporating multi-latent variables to tie the information within each shot (short clip of a full-video) and across shots. Secondly, we address the problem of video question answering, in which, the task is to answer a question, in the form of Fill-In-the-Blank (FIB), given a video. Answering a question is a task of retrieving a word from a dictionary (all possible words suitable for an answer) based on the input question and video. Following the FIB problem, we introduce a new problem, called Visual Text Correction (VTC), i.e., detecting and replacing an inaccurate word in the textual description of a video. We propose a deep network that can simultaneously detect an inaccuracy in a sentence while benefiting 1D-CNNs/LSTMs to encode short/long term dependencies, and fix it by replacing the inaccurate word(s). Finally, as the last part of the dissertation, we propose to tackle the problem of video generation using user input natural language sentences. Our proposed video generation method constructs two distributions out of the input text, corresponding to the first and last frames latent representations. We generate high-fidelity videos by interpolating latent representations and a sequence of CNN based up-pooling blocks

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Video Summarization with SOMs

Author: Koskela Markus
Laaksonen Jorma
Muurinen Hannes
Sjöberg Mats
Viitaniemi Ville
Publication venue: Technische Fakultät, Arbeitsgruppen der Informatik
Publication date: 31/12/2007
Field of study

BieColl - Bielefeld eCollections

The AXES submissions at TrecVid 2013

Author: Aly Robin
Arandjelovic Relja
Chatfield Ken
Douze Matthijs
Fernando Basura
Harchaoui Zaid
McGuinness Kevin
O'Connor Noel E.
Oneata Dan
Parkhi Omkar M.
Potapov Danila
Revaud Jérôme
Schmid Cordelia
Schwenninger Jochen
Scott David
Tuytelaars Tinne
Verbeek Jakob
Wang Heng
Zisserman Andrew
Publication venue
Publication date: 01/11/2013
Field of study

The AXES project participated in the interactive instance search task (INS), the semantic indexing task (SIN) the multimedia event recounting task (MER), and the multimedia event detection task (MED) for TRECVid 2013. Our interactive INS focused this year on using classifiers trained at query time with positive examples collected from external search engines. Participants in our INS experiments were carried out by students and researchers at Dublin City University. Our best INS runs performed on par with the top ranked INS runs in terms of P@10 and P@30, and around the median in terms of mAP. For SIN, MED and MER, we use systems based on state- of-the-art local low-level descriptors for motion, image, and sound, as well as high-level features to capture speech and text and the visual and audio stream respectively. The low-level descriptors were aggregated by means of Fisher vectors into high- dimensional video-level signatures, the high-level features are aggregated into bag-of-word histograms. Using these features we train linear classifiers, and use early and late-fusion to combine the different features. Our MED system achieved the best score of all submitted runs in the main track, as well as in the ad-hoc track. This paper describes in detail our INS, MER, and MED systems and the results and findings of our experimen

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Irish Universities

DCU Online Research Access Service