114,894 research outputs found

    A correspondence-based neural mechanism for position invariant feature processing

    Get PDF
    Poster presentation: Introduction We here focus on constructing a hierarchical neural system for position-invariant recognition, which is one of the most fundamental invariant recognition achieved in visual processing [1,2]. The invariant recognition have been hypothesized to be done by matching a sensory image of a particular object stimulated on the retina to the most suitable representation stored in memory of the higher visual cortical area. Here arises a general problem: In such a visual processing, the position of the object image on the retina must be initially uncertain. Furthermore, the retinal activities possessing sensory information are being far from the ones in the higher area with a loss of the sensory object information. Nevertheless, with such recognition ambiguity, the particular object can effortlessly and easily be recognized. Our aim in this work is an attempt to resolve such a general recognition problem. ..

    Discovery of Federal Income Tax Returns and the New “Qualified” Privileges

    Get PDF
    The notion of scale selection refers to methods for estimating characteristic scales in image data and for automatically determining locally appropriate scales in a scale-space representation, so as to adapt subsequent processing to the local image structure and compute scale invariant image features and image descriptors. An essential aspect of the approach is that it allows for a bottom-up determination of inherent scales of features and objects without first recognizing them or delimiting alternatively segmenting them from their surrounding. Scale selection methods have also been developed from other viewpoints of performing noise suppression and exploring top-down information.QC 20130111</p

    Scanner Invariant Representations for Diffusion MRI Harmonization

    Get PDF
    Purpose: In the present work we describe the correction of diffusion-weighted MRI for site and scanner biases using a novel method based on invariant representation. Theory and Methods: Pooled imaging data from multiple sources are subject to variation between the sources. Correcting for these biases has become very important as imaging studies increase in size and multi-site cases become more common. We propose learning an intermediate representation invariant to site/protocol variables, a technique adapted from information theory-based algorithmic fairness; by leveraging the data processing inequality, such a representation can then be used to create an image reconstruction that is uninformative of its original source, yet still faithful to underlying structures. To implement this, we use a deep learning method based on variational auto-encoders (VAE) to construct scanner invariant encodings of the imaging data. Results: To evaluate our method, we use training data from the 2018 MICCAI Computational Diffusion MRI (CDMRI) Challenge Harmonization dataset. Our proposed method shows improvements on independent test data relative to a recently published baseline method on each subtask, mapping data from three different scanning contexts to and from one separate target scanning context. Conclusion: As imaging studies continue to grow, the use of pooled multi-site imaging will similarly increase. Invariant representation presents a strong candidate for the harmonization of these data

    Generalized local N-ary patterns for texture classification

    Full text link
    Local Binary Pattern (LBP) has been well recognised and widely used in various texture analysis applications of computer vision and image processing. It integrates properties of texture structural and statistical texture analysis. LBP is invariant to monotonic gray-scale variations and has also extensions to rotation invariant texture analysis. In recent years, various improvements have been achieved based on LBP. One of extensive developments was replacing binary representation with ternary representation and proposed Local Ternary Pattern (LTP). This paper further generalises the local pattern representation by formulating it as a generalised weight problem of Bachet de Meziriac and proposes Local N-ary Pattern (LNP). The encouraging performance is achieved based on three benchmark datasets when compared with its predecessors. © 2013 IEEE

    The What-And-Where Filter: A Spatial Mapping Neural Network for Object Recognition and Image Understanding

    Full text link
    The What-and-Where filter forms part of a neural network architecture for spatial mapping, object recognition, and image understanding. The Where fllter responds to an image figure that has been separated from its background. It generates a spatial map whose cell activations simultaneously represent the position, orientation, ancl size of all tbe figures in a scene (where they are). This spatial map may he used to direct spatially localized attention to these image features. A multiscale array of oriented detectors, followed by competitve and interpolative interactions between position, orientation, and size scales, is used to define the Where filter. This analysis discloses several issues that need to be dealt with by a spatial mapping system that is based upon oriented filters, such as the role of cliff filters with and without normalization, the double peak problem of maximum orientation across size scale, and the different self-similar interpolation properties across orientation than across size scale. Several computationally efficient Where filters are proposed. The Where filter rnay be used for parallel transformation of multiple image figures into invariant representations that are insensitive to the figures' original position, orientation, and size. These invariant figural representations form part of a system devoted to attentive object learning and recognition (what it is). Unlike some alternative models where serial search for a target occurs, a What and Where representation can he used to rapidly search in parallel for a desired target in a scene. Such a representation can also be used to learn multidimensional representations of objects and their spatial relationships for purposes of image understanding. The What-and-Where filter is inspired by neurobiological data showing that a Where processing stream in the cerebral cortex is used for attentive spatial localization and orientation, whereas a What processing stream is used for attentive object learning and recognition.Advanced Research Projects Agency (ONR-N00014-92-J-4015, AFOSR 90-0083); British Petroleum (89-A-1204); National Science Foundation (IRI-90-00530, Graduate Fellowship); Office of Naval Research (N00014-91-J-4100, N00014-95-1-0409, N00014-95-1-0657); Air Force Office of Scientific Research (F49620-92-J-0499, F49620-92-J-0334

    Leveraging Image-based Generative Adversarial Networks for Time Series Generation

    Full text link
    Generative models for images have gained significant attention in computer vision and natural language processing due to their ability to generate realistic samples from complex data distributions. To leverage the advances of image-based generative models for the time series domain, we propose a two-dimensional image representation for time series, the Extended Intertemporal Return Plot (XIRP). Our approach captures the intertemporal time series dynamics in a scale-invariant and invertible way, reducing training time and improving sample quality. We benchmark synthetic XIRPs obtained by an off-the-shelf Wasserstein GAN with gradient penalty (WGAN-GP) to other image representations and models regarding similarity and predictive ability metrics. Our novel, validated image representation for time series consistently and significantly outperforms a state-of-the-art RNN-based generative model regarding predictive ability. Further, we introduce an improved stochastic inversion to substantially improve simulation quality regardless of the representation and provide the prospect of transfer potentials in other domains

    Multi modal multi-semantic image retrieval

    Get PDF
    PhDThe rapid growth in the volume of visual information, e.g. image, and video can overwhelm users’ ability to find and access the specific visual information of interest to them. In recent years, ontology knowledge-based (KB) image information retrieval techniques have been adopted into in order to attempt to extract knowledge from these images, enhancing the retrieval performance. A KB framework is presented to promote semi-automatic annotation and semantic image retrieval using multimodal cues (visual features and text captions). In addition, a hierarchical structure for the KB allows metadata to be shared that supports multi-semantics (polysemy) for concepts. The framework builds up an effective knowledge base pertaining to a domain specific image collection, e.g. sports, and is able to disambiguate and assign high level semantics to ‘unannotated’ images. Local feature analysis of visual content, namely using Scale Invariant Feature Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’ model (BVW) as an effective method to represent visual content information and to enhance its classification and retrieval. Local features are more useful than global features, e.g. colour, shape or texture, as they are invariant to image scale, orientation and camera angle. An innovative approach is proposed for the representation, annotation and retrieval of visual content using a hybrid technique based upon the use of an unstructured visual word and upon a (structured) hierarchical ontology KB model. The structural model facilitates the disambiguation of unstructured visual words and a more effective classification of visual content, compared to a vector space model, through exploiting local conceptual structures and their relationships. The key contributions of this framework in using local features for image representation include: first, a method to generate visual words using the semantic local adaptive clustering (SLAC) algorithm which takes term weight and spatial locations of keypoints into account. Consequently, the semantic information is preserved. Second a technique is used to detect the domain specific ‘non-informative visual words’ which are ineffective at representing the content of visual data and degrade its categorisation ability. Third, a method to combine an ontology model with xi a visual word model to resolve synonym (visual heterogeneity) and polysemy problems, is proposed. The experimental results show that this approach can discover semantically meaningful visual content descriptions and recognise specific events, e.g., sports events, depicted in images efficiently. Since discovering the semantics of an image is an extremely challenging problem, one promising approach to enhance visual content interpretation is to use any associated textual information that accompanies an image, as a cue to predict the meaning of an image, by transforming this textual information into a structured annotation for an image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct types of information representation and modality, there are some strong, invariant, implicit, connections between images and any accompanying text information. Semantic analysis of image captions can be used by image retrieval systems to retrieve selected images more precisely. To do this, a Natural Language Processing (NLP) is exploited firstly in order to extract concepts from image captions. Next, an ontology-based knowledge model is deployed in order to resolve natural language ambiguities. To deal with the accompanying text information, two methods to extract knowledge from textual information have been proposed. First, metadata can be extracted automatically from text captions and restructured with respect to a semantic model. Second, the use of LSI in relation to a domain-specific ontology-based knowledge model enables the combined framework to tolerate ambiguities and variations (incompleteness) of metadata. The use of the ontology-based knowledge model allows the system to find indirectly relevant concepts in image captions and thus leverage these to represent the semantics of images at a higher level. Experimental results show that the proposed framework significantly enhances image retrieval and leads to narrowing of the semantic gap between lower level machinederived and higher level human-understandable conceptualisation
