315,214 research outputs found

    Knowledge-rich Image Gist Understanding Beyond Literal Meaning

    Full text link
    We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that has previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image understanding focus on the denotation of objects, i.e., their literal meaning, our work addresses the identification of connotations, i.e., iconic meanings of objects, to understand the message of images. We view image understanding as the task of representing an image-caption pair on the basis of a wide-coverage vocabulary of concepts such as the one provided by Wikipedia, and cast gist detection as a concept-ranking problem with image-caption pairs as queries. To enable a thorough investigation of the problem of gist understanding, we produce a gold standard of over 300 image-caption pairs and over 8,000 gist annotations covering a wide variety of topics at different levels of abstraction. We use this dataset to experimentally benchmark the contribution of signals from heterogeneous sources, namely image and text. The best result with a Mean Average Precision (MAP) of 0.69 indicate that by combining both dimensions we are able to better understand the meaning of our image-caption pairs than when using language or vision information alone. We test the robustness of our gist detection approach when receiving automatically generated input, i.e., using automatically generated image tags or generated captions, and prove the feasibility of an end-to-end automated process

    Sparse Multi-modal probabilistic Latent Semantic Analysis for Single-Image Super-Resolution

    Get PDF
    This paper presents a novel single-image super-resolution (SR) approach based on latent topics in order to take advantage of the semantics pervading the topic space when super-resolving images. Image semantics has shown to be useful to relieve the ill-posed nature of the SR problem, however the most accepted clustering-based approach used to define semantic concepts limits the capability of representing complex visual relationships. The proposed approach provides a new probabilistic perspective where the SR process is performed according to the semantics encapsulated by a new topic model, the Sparse Multimodal probabilistic Latent Semantic Analysis (sMpLSA). Firstly, the sMpLSA model is formulated. Subsequently, a new SR framework based on sMpLSA is defined. Finally, an experimental comparison is conducted using seven learningbased SR methods over three different image datasets. Experiments reveal the potential of latent topics in SR by reporting that the proposed approach is able to provide a competitive performance

    Proximal Methods for Hierarchical Sparse Coding

    Get PDF
    Sparse coding consists in representing signals as sparse linear combinations of atoms selected from a dictionary. We consider an extension of this framework where the atoms are further assumed to be embedded in a tree. This is achieved using a recently introduced tree-structured sparse regularization norm, which has proven useful in several applications. This norm leads to regularized problems that are difficult to optimize, and we propose in this paper efficient algorithms for solving them. More precisely, we show that the proximal operator associated with this norm is computable exactly via a dual approach that can be viewed as the composition of elementary proximal operators. Our procedure has a complexity linear, or close to linear, in the number of atoms, and allows the use of accelerated gradient techniques to solve the tree-structured sparse approximation problem at the same computational cost as traditional ones using the L1-norm. Our method is efficient and scales gracefully to millions of variables, which we illustrate in two types of applications: first, we consider fixed hierarchical dictionaries of wavelets to denoise natural images. Then, we apply our optimization tools in the context of dictionary learning, where learned dictionary elements naturally organize in a prespecified arborescent structure, leading to a better performance in reconstruction of natural image patches. When applied to text documents, our method learns hierarchies of topics, thus providing a competitive alternative to probabilistic topic models

    Therapeutic creativity and the lived experience of grief in the collaborative fiction film Lost Property

    Get PDF
    This collaborative project aimed to represent the embodied experience of grief in a fiction film by drawing on research, and on the personal and professional experience of all involved: academics; an artist; bereavement therapists and counsellors; and professional actors, cinematographers, sound engineers and other film crew. By representing grief in a more phenomenologically minded manner, the project sought to capture the lived experience of loss on screen while contributing meaningfully to the discourse on practice-as-research. Hay, Dawson and Rosling used a collaborative fiction film and participatory action research to investigate whether storying loss, and representing it through narrative, images and embodied movement, is therapeutic. Participatory action research was beneficial in facilitating changes in the co-researchers’ thinking, feeling and practice, and in enabling participants to inhabit multiple roles in a manner that expanded their disciplinary boundaries. However, while the project’s effect on some of the participants demonstrated the ways that creativity and meaning making can support adaptive grieving, it also revealed the risks of using participatory action research and fiction film to investigate highly emotive topics such as grief

    Dublin City University video track experiments for TREC 2003

    Get PDF
    In this paper, we describe our experiments for both the News Story Segmentation task and Interactive Search task for TRECVID 2003. Our News Story Segmentation task involved the use of a Support Vector Machine (SVM) to combine evidence from audio-visual analysis tools in order to generate a listing of news stories from a given news programme. Our Search task experiment compared a video retrieval system based on text, image and relevance feedback with a text-only video retrieval system in order to identify which was more effective. In order to do so we developed two variations of our Físchlár video retrieval system and conducted user testing in a controlled lab environment. In this paper we outline our work on both of these two tasks

    Modifying the Yamaguchi Four-Component Decomposition Scattering Powers Using a Stochastic Distance

    Full text link
    Model-based decompositions have gained considerable attention after the initial work of Freeman and Durden. This decomposition which assumes the target to be reflection symmetric was later relaxed in the Yamaguchi et al. decomposition with the addition of the helix parameter. Since then many decomposition have been proposed where either the scattering model was modified to fit the data or the coherency matrix representing the second order statistics of the full polarimetric data is rotated to fit the scattering model. In this paper we propose to modify the Yamaguchi four-component decomposition (Y4O) scattering powers using the concept of statistical information theory for matrices. In order to achieve this modification we propose a method to estimate the polarization orientation angle (OA) from full-polarimetric SAR images using the Hellinger distance. In this method, the OA is estimated by maximizing the Hellinger distance between the un-rotated and the rotated T33T_{33} and the T22T_{22} components of the coherency matrix [T]\mathbf{[T]}. Then, the powers of the Yamaguchi four-component model-based decomposition (Y4O) are modified using the maximum relative stochastic distance between the T33T_{33} and the T22T_{22} components of the coherency matrix at the estimated OA. The results show that the overall double-bounce powers over rotated urban areas have significantly improved with the reduction of volume powers. The percentage of pixels with negative powers have also decreased from the Y4O decomposition. The proposed method is both qualitatively and quantitatively compared with the results obtained from the Y4O and the Y4R decompositions for a Radarsat-2 C-band San-Francisco dataset and an UAVSAR L-band Hayward dataset.Comment: Accepted for publication in IEEE J-STARS (IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
    corecore