5 research outputs found
Probabilistic visual concept trees
This paper presents probabilistic visual concept trees, a model for large visual semantic taxonomy structures and its use in visual concept detection. Organizing visual semantic knowl-edge systematically is one of the key challenges towards large-scale concept detection, and one that is complemen-tary to optimizing visual classification for individual con-cepts. Semantic concepts have traditionally been treated as isolated nodes, a densely-connected web, or a tree. Our anal-ysis shows that none of these models are sufficient in mod-eling the typical relationships on a real-world visual taxon-omy, and these relationships belong to three broad categories β semantic, appearance and statistics. We propose proba-bilistic visual concept trees for modeling a taxonomy forest with observation uncertainty. As a Bayesian network with parameter constraints, this model is flexible enough to ac-count for the key assumptions in all three types of taxonomy relations, yet it is robust enough to accommodate expansion or deletion in a taxonomy. Our evaluation results on a large web image dataset show that the classification accuracy has considerably improved upon baselines without, or with only a subset of concept relationships
Visual Transfer Learning: Informal Introduction and Literature Overview
Transfer learning techniques are important to handle small training sets and
to allow for quick generalization even from only a few examples. The following
paper is the introduction as well as the literature overview part of my thesis
related to the topic of transfer learning for visual recognition problems.Comment: part of my PhD thesi
Deliverable D1.1 State of the art and requirements analysis for hypervideo
This deliverable presents a state-of-art and requirements analysis report for hypervideo authored as part of the WP1 of the LinkedTV project. Initially, we present some use-case (viewers) scenarios in the LinkedTV project and through the analysis of the distinctive needs and demands of each scenario we point out the technical requirements from a user-side perspective. Subsequently we study methods for the automatic and semi-automatic decomposition of the audiovisual content in order to effectively support the annotation process. Considering that the multimedia content comprises of different types of information, i.e., visual, textual and audio, we report various methods for the analysis of these three different streams. Finally we present various annotation tools which could integrate the developed analysis results so as to effectively support users (video producers) in the semi-automatic linking of hypervideo content, and based on them we report on the initial progress in building the LinkedTV annotation tool. For each one of the different classes of techniques being discussed in the deliverable we present the evaluation results from the application of one such method of the literature to a dataset well-suited to the needs of the LinkedTV project, and we indicate the future technical requirements that should be addressed in order to achieve higher levels of performance (e.g., in terms of accuracy and time-efficiency), as necessary