315,271 research outputs found
Knowledge-rich Image Gist Understanding Beyond Literal Meaning
We investigate the problem of understanding the message (gist) conveyed by
images and their captions as found, for instance, on websites or news articles.
To this end, we propose a methodology to capture the meaning of image-caption
pairs on the basis of large amounts of machine-readable knowledge that has
previously been shown to be highly effective for text understanding. Our method
identifies the connotation of objects beyond their denotation: where most
approaches to image understanding focus on the denotation of objects, i.e.,
their literal meaning, our work addresses the identification of connotations,
i.e., iconic meanings of objects, to understand the message of images. We view
image understanding as the task of representing an image-caption pair on the
basis of a wide-coverage vocabulary of concepts such as the one provided by
Wikipedia, and cast gist detection as a concept-ranking problem with
image-caption pairs as queries. To enable a thorough investigation of the
problem of gist understanding, we produce a gold standard of over 300
image-caption pairs and over 8,000 gist annotations covering a wide variety of
topics at different levels of abstraction. We use this dataset to
experimentally benchmark the contribution of signals from heterogeneous
sources, namely image and text. The best result with a Mean Average Precision
(MAP) of 0.69 indicate that by combining both dimensions we are able to better
understand the meaning of our image-caption pairs than when using language or
vision information alone. We test the robustness of our gist detection approach
when receiving automatically generated input, i.e., using automatically
generated image tags or generated captions, and prove the feasibility of an
end-to-end automated process
Sparse Multi-modal probabilistic Latent Semantic Analysis for Single-Image Super-Resolution
This paper presents a novel single-image super-resolution (SR) approach
based on latent topics in order to take advantage of the semantics pervading
the topic space when super-resolving images. Image semantics has shown to
be useful to relieve the ill-posed nature of the SR problem, however the most
accepted clustering-based approach used to define semantic concepts limits the
capability of representing complex visual relationships. The proposed approach
provides a new probabilistic perspective where the SR process is performed
according to the semantics encapsulated by a new topic model, the Sparse Multimodal
probabilistic Latent Semantic Analysis (sMpLSA). Firstly, the sMpLSA
model is formulated. Subsequently, a new SR framework based on sMpLSA is
defined. Finally, an experimental comparison is conducted using seven learningbased
SR methods over three different image datasets. Experiments reveal the
potential of latent topics in SR by reporting that the proposed approach is able
to provide a competitive performance
Proximal Methods for Hierarchical Sparse Coding
Sparse coding consists in representing signals as sparse linear combinations
of atoms selected from a dictionary. We consider an extension of this framework
where the atoms are further assumed to be embedded in a tree. This is achieved
using a recently introduced tree-structured sparse regularization norm, which
has proven useful in several applications. This norm leads to regularized
problems that are difficult to optimize, and we propose in this paper efficient
algorithms for solving them. More precisely, we show that the proximal operator
associated with this norm is computable exactly via a dual approach that can be
viewed as the composition of elementary proximal operators. Our procedure has a
complexity linear, or close to linear, in the number of atoms, and allows the
use of accelerated gradient techniques to solve the tree-structured sparse
approximation problem at the same computational cost as traditional ones using
the L1-norm. Our method is efficient and scales gracefully to millions of
variables, which we illustrate in two types of applications: first, we consider
fixed hierarchical dictionaries of wavelets to denoise natural images. Then, we
apply our optimization tools in the context of dictionary learning, where
learned dictionary elements naturally organize in a prespecified arborescent
structure, leading to a better performance in reconstruction of natural image
patches. When applied to text documents, our method learns hierarchies of
topics, thus providing a competitive alternative to probabilistic topic models
Therapeutic creativity and the lived experience of grief in the collaborative fiction film Lost Property
This collaborative project aimed to represent the embodied experience of grief in a fiction film by drawing on research, and on the personal and professional experience of all involved: academics; an artist; bereavement therapists and counsellors; and professional actors, cinematographers, sound engineers and other film crew. By representing grief in a more phenomenologically minded manner, the project sought to capture the lived experience of loss on screen while contributing meaningfully to the discourse on practice-as-research. Hay, Dawson and Rosling used a collaborative fiction film and participatory action research to investigate whether storying loss, and representing it through narrative, images and embodied movement, is therapeutic. Participatory action research was beneficial in facilitating changes in the co-researchers’ thinking, feeling and practice, and in enabling participants to inhabit multiple roles in a manner that expanded their disciplinary boundaries. However, while the project’s effect on some of the participants demonstrated the ways that creativity and meaning making can support adaptive grieving, it also revealed the risks of using participatory action research and fiction film to investigate highly emotive topics such as grief
Dublin City University video track experiments for TREC 2003
In this paper, we describe our experiments for both the News Story Segmentation task and Interactive Search task for
TRECVID 2003. Our News Story Segmentation task involved the use of a Support Vector Machine (SVM) to combine evidence from audio-visual analysis tools in order to generate a listing of news stories from a given news programme. Our
Search task experiment compared a video retrieval system based on text, image and relevance feedback with a text-only
video retrieval system in order to identify which was more effective. In order to do so we developed two variations of our Físchlár video retrieval system and conducted user testing in a controlled lab environment. In this paper we outline our work on both of these two tasks
Modifying the Yamaguchi Four-Component Decomposition Scattering Powers Using a Stochastic Distance
Model-based decompositions have gained considerable attention after the
initial work of Freeman and Durden. This decomposition which assumes the target
to be reflection symmetric was later relaxed in the Yamaguchi et al.
decomposition with the addition of the helix parameter. Since then many
decomposition have been proposed where either the scattering model was modified
to fit the data or the coherency matrix representing the second order
statistics of the full polarimetric data is rotated to fit the scattering
model. In this paper we propose to modify the Yamaguchi four-component
decomposition (Y4O) scattering powers using the concept of statistical
information theory for matrices. In order to achieve this modification we
propose a method to estimate the polarization orientation angle (OA) from
full-polarimetric SAR images using the Hellinger distance. In this method, the
OA is estimated by maximizing the Hellinger distance between the un-rotated and
the rotated and the components of the coherency matrix
. Then, the powers of the Yamaguchi four-component model-based
decomposition (Y4O) are modified using the maximum relative stochastic distance
between the and the components of the coherency matrix at the
estimated OA. The results show that the overall double-bounce powers over
rotated urban areas have significantly improved with the reduction of volume
powers. The percentage of pixels with negative powers have also decreased from
the Y4O decomposition. The proposed method is both qualitatively and
quantitatively compared with the results obtained from the Y4O and the Y4R
decompositions for a Radarsat-2 C-band San-Francisco dataset and an UAVSAR
L-band Hayward dataset.Comment: Accepted for publication in IEEE J-STARS (IEEE Journal of Selected
Topics in Applied Earth Observations and Remote Sensing
- …