551,723 research outputs found

    Multi modal multi-semantic image retrieval

    Get PDF
    PhDThe rapid growth in the volume of visual information, e.g. image, and video can overwhelm users’ ability to find and access the specific visual information of interest to them. In recent years, ontology knowledge-based (KB) image information retrieval techniques have been adopted into in order to attempt to extract knowledge from these images, enhancing the retrieval performance. A KB framework is presented to promote semi-automatic annotation and semantic image retrieval using multimodal cues (visual features and text captions). In addition, a hierarchical structure for the KB allows metadata to be shared that supports multi-semantics (polysemy) for concepts. The framework builds up an effective knowledge base pertaining to a domain specific image collection, e.g. sports, and is able to disambiguate and assign high level semantics to ‘unannotated’ images. Local feature analysis of visual content, namely using Scale Invariant Feature Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’ model (BVW) as an effective method to represent visual content information and to enhance its classification and retrieval. Local features are more useful than global features, e.g. colour, shape or texture, as they are invariant to image scale, orientation and camera angle. An innovative approach is proposed for the representation, annotation and retrieval of visual content using a hybrid technique based upon the use of an unstructured visual word and upon a (structured) hierarchical ontology KB model. The structural model facilitates the disambiguation of unstructured visual words and a more effective classification of visual content, compared to a vector space model, through exploiting local conceptual structures and their relationships. The key contributions of this framework in using local features for image representation include: first, a method to generate visual words using the semantic local adaptive clustering (SLAC) algorithm which takes term weight and spatial locations of keypoints into account. Consequently, the semantic information is preserved. Second a technique is used to detect the domain specific ‘non-informative visual words’ which are ineffective at representing the content of visual data and degrade its categorisation ability. Third, a method to combine an ontology model with xi a visual word model to resolve synonym (visual heterogeneity) and polysemy problems, is proposed. The experimental results show that this approach can discover semantically meaningful visual content descriptions and recognise specific events, e.g., sports events, depicted in images efficiently. Since discovering the semantics of an image is an extremely challenging problem, one promising approach to enhance visual content interpretation is to use any associated textual information that accompanies an image, as a cue to predict the meaning of an image, by transforming this textual information into a structured annotation for an image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct types of information representation and modality, there are some strong, invariant, implicit, connections between images and any accompanying text information. Semantic analysis of image captions can be used by image retrieval systems to retrieve selected images more precisely. To do this, a Natural Language Processing (NLP) is exploited firstly in order to extract concepts from image captions. Next, an ontology-based knowledge model is deployed in order to resolve natural language ambiguities. To deal with the accompanying text information, two methods to extract knowledge from textual information have been proposed. First, metadata can be extracted automatically from text captions and restructured with respect to a semantic model. Second, the use of LSI in relation to a domain-specific ontology-based knowledge model enables the combined framework to tolerate ambiguities and variations (incompleteness) of metadata. The use of the ontology-based knowledge model allows the system to find indirectly relevant concepts in image captions and thus leverage these to represent the semantics of images at a higher level. Experimental results show that the proposed framework significantly enhances image retrieval and leads to narrowing of the semantic gap between lower level machinederived and higher level human-understandable conceptualisation

    Compact Tensor Pooling for Visual Question Answering

    Get PDF
    Performing high level cognitive tasks requires the integration of feature maps with drastically different structure. In Visual Question Answering (VQA) image descriptors have spatial structures, while lexical inputs inherently follow a temporal sequence. The recently proposed Multimodal Compact Bilinear pooling (MCB) forms the outer products, via count-sketch approximation, of the visual and textual representation at each spatial location. While this procedure preserves spatial information locally, outer-products are taken independently for each fiber of the activation tensor, and therefore do not include spatial context. In this work, we introduce multi-dimensional sketch ({MD-sketch}), a novel extension of count-sketch to tensors. Using this new formulation, we propose Multimodal Compact Tensor Pooling (MCT) to fully exploit the global spatial context during bilinear pooling operations. Contrarily to MCB, our approach preserves spatial context by directly convolving the MD-sketch from the visual tensor features with the text vector feature using higher order FFT. Furthermore we apply MCT incrementally at each step of the question embedding and accumulate the multi-modal vectors with a second LSTM layer before the final answer is chosen

    Accessing and browsing 3D anatomical images with a navigational ontology.

    Get PDF
    The problem that our research addresses is the lack of a comprehensive, universally useful system for navigating 3D images ofanatomical structures. In this paper we discuss the organization of anatomical information in a navigational ontology, a knowledge representation formalism that supports intelligent browsing of 3D anatomical images. For the purposes ofthis project, 'intelligent' means that the computer system behaves as if it had accurate knowledge of human anatomy consistent with that of a trained anatomist (though not necessarily as complete). To give a simple example, if the user asks to see the component structures of the urinary system, the system will return to the user either a list of structures and/or a model of them, just as an anatomy instructor might do. The Vesalius Anatomy Browser provides an interface for navigating 3D anatomical images in which anatomical images are linked to a hierarchical representation of conceptual information that corresponds directly to the images displayed on the screen. The association of the concepts with images makes possible simultaneous visual exploration of anatomical information via word and image

    Representation of contralateral visual space in the human hippocampus

    Get PDF
    The initial encoding of visual information primarily from the contralateral visual field is a fundamental organizing principle of the primate visual system. Recently, the presence of such retinotopic sensitivity has been shown to extend well beyond early visual cortex to regions not historically considered retinotopically sensitive. In particular, human scene-selective regions in parahippocampal and medial parietal cortex exhibit prominent biases for the contralateral visual field. Here we used fMRI to test the hypothesis that the human hippocampus, which is thought to be anatomically connected with these scene-selective regions, would also exhibit a biased representation of contralateral visual space. First, population receptive field mapping with scene stimuli revealed strong biases for the contralateral visual field in bilateral hippocampus. Second, the distribution of retinotopic sensitivity suggested a more prominent representation in anterior medial portions of the hippocampus. Finally, the contralateral bias was confirmed in independent data taken from the Human Connectome Project initiative. The presence of contralateral biases in the hippocampus - a structure considered by many as the apex of the visual hierarchy - highlights the truly pervasive influence of retinotopy. Moreover, this finding has important implications for understanding how this information relates to the allocentric global spatial representations known to be encoded therein.SIGNIFICANCE STATEMENT:Retinotopic encoding of visual information is an organizing principle of visual cortex. Recent work demonstrates this sensitivity in structures far beyond early visual cortex, including those anatomically connected to the hippocampus. Here, using population receptive field modelling in two independent sets of data we demonstrate a consistent bias for the contralateral visual field in bilateral hippocampus. Such a bias highlights the truly pervasive influence of retinotopy, with important implications for understanding how the presence of retinotopy relates to more allocentric spatial representations

    Indexing Audio-Visual Sequences by Joint Audio and Video Processing

    Get PDF
    The focus of this work is oriented to the creation of a content-based hierarchical organisation of audio-visual data (a description scheme) and to the creation of meta-data (descriptors) to associate with audio and/or visual signals. The generation of efficient indices to access audio-visual databases is strictly connected to the generation of content descriptors and to the hierarchical representation of audio-visual material. Once a hierarchy can be extracted from the data analysis, a nested indexing structure can be created to access relevant information at a specific level of detail. Accordingly, a query can be made very specific in relationship to the level of detail that is required by the user. In order to construct the hierarchy, we describe how to extract information content from audio-visual sequences so as to have different hierarchical indicators (or descriptors), which can be associated to each media (audio, video). At this stage, video and audio signals can be separated into temporally consistent elements. At the lowest level, information is organised in frames (groups of pixels for visual information, groups of consecutive samples for audio information). At a higher level, low-level consistent temporal entities are identified: in case of digital image sequences, these consist of shots (or continuous camera records) which can be obtained by detecting cuts or special effects such as dissolves, fade in and fade out; in case of audio information, these represent consistent audio segments belonging to one specific audio type (such as speech, music, silence, ...). One more level up, patterns of video shots or audio segments can be recognised so as to reflect more meaningful structures such as dialogues, actions, ... At the highest level, information is organised so as to establish correlation beyond the temporal organisation of information, allowing to reflect classes of visual or audio types: we call these classes idioms. The paper ends with a description of possible solutions to allow a cross-modal analysis of audio and video information, which may validate or invalidate the proposed hierarchy, and in some cases enable more sophisticated levels of representation of information content

    The effect of graphic organizer on student's learning in school.

    Get PDF
    Graphic organizers are instrument of representation , illustration and modelling of information in visuals or graphics form that use to achieve a meaningful learning. GOs are a set of learning strategies which involve translating words expressed in linear form into visual structures. When written material or difficult concepts are expressed graphically, the students can develop alternative structures for understanding the course concepts. In this paper, illustrate the use of graphic organizers on the effects of students’ l earning in schools. Previous research studies investigating on the effects of graphic organizers on students' learning in schools are reviewed. It was found that graphic organizers had effect on the improvement of students’ comprehension, performance and motivation in learning
    corecore