122,739 research outputs found

    Efficient On-the-fly Category Retrieval using ConvNets and GPUs

    Full text link
    We investigate the gains in precision and speed, that can be obtained by using Convolutional Networks (ConvNets) for on-the-fly retrieval - where classifiers are learnt at run time for a textual query from downloaded images, and used to rank large image or video datasets. We make three contributions: (i) we present an evaluation of state-of-the-art image representations for object category retrieval over standard benchmark datasets containing 1M+ images; (ii) we show that ConvNets can be used to obtain features which are incredibly performant, and yet much lower dimensional than previous state-of-the-art image representations, and that their dimensionality can be reduced further without loss in performance by compression using product quantization or binarization. Consequently, features with the state-of-the-art performance on large-scale datasets of millions of images can fit in the memory of even a commodity GPU card; (iii) we show that an SVM classifier can be learnt within a ConvNet framework on a GPU in parallel with downloading the new training images, allowing for a continuous refinement of the model as more images become available, and simultaneous training and ranking. The outcome is an on-the-fly system that significantly outperforms its predecessors in terms of: precision of retrieval, memory requirements, and speed, facilitating accurate on-the-fly learning and ranking in under a second on a single GPU.Comment: Published in proceedings of ACCV 201

    Image annotation and retrieval based on multi-modal feature clustering and similarity propagation.

    Get PDF
    The performance of content-based image retrieval systems has proved to be inherently constrained by the used low level features, and cannot give satisfactory results when the user\u27s high level concepts cannot be expressed by low level features. In an attempt to bridge this semantic gap, recent approaches started integrating both low level-visual features and high-level textual keywords. Unfortunately, manual image annotation is a tedious process and may not be possible for large image databases. In this thesis we propose a system for image retrieval that has three mains components. The first component of our system consists of a novel possibilistic clustering and feature weighting algorithm based on robust modeling of the Generalized Dirichlet (GD) finite mixture. Robust estimation of the mixture model parameters is achieved by incorporating two complementary types of membership degrees. The first one is a posterior probability that indicates the degree to which a point fits the estimated distribution. The second membership represents the degree of typicality and is used to indentify and discard noise points. Robustness to noisy and irrelevant features is achieved by transforming the data to make the features independent and follow Beta distribution, and learning optimal relevance weight for each feature subset within each cluster. We extend our algorithm to find the optimal number of clusters in an unsupervised and efficient way by exploiting some properties of the possibilistic membership function. We also outline a semi-supervised version of the proposed algorithm. In the second component of our system consists of a novel approach to unsupervised image annotation. Our approach is based on: (i) the proposed semi-supervised possibilistic clustering; (ii) a greedy selection and joining algorithm (GSJ); (iii) Bayes rule; and (iv) a probabilistic model that is based on possibilistic memebership degrees to annotate an image. The third component of the proposed system consists of an image retrieval framework based on multi-modal similarity propagation. The proposed framework is designed to deal with two data modalities: low-level visual features and high-level textual keywords generated by our proposed image annotation algorithm. The multi-modal similarity propagation system exploits the mutual reinforcement of relational data and results in a nonlinear combination of the different modalities. Specifically, it is used to learn the semantic similarities between images by leveraging the relationships between features from the different modalities. The proposed image annotation and retrieval approaches are implemented and tested with a standard benchmark dataset. We show the effectiveness of our clustering algorithm to handle high dimensional and noisy data. We compare our proposed image annotation approach to three state-of-the-art methods and demonstrate the effectiveness of the proposed image retrieval system

    Hierarchical visual perception and two-dimensional compressive sensing for effective content-based color image retrieval

    Get PDF
    Content-based image retrieval (CBIR) has been an active research theme in the computer vision community for over two decades. While the field is relatively mature, significant research is still required in this area to develop solutions for practical applications. One reason that practical solutions have not yet been realized could be due to a limited understanding of the cognitive aspects of the human vision system. Inspired by three cognitive properties of human vision, namely, hierarchical structuring, color perception and embedded compressive sensing, a new CBIR approach is proposed. In the proposed approach, the Hue, Saturation and Value (HSV) color model and the Similar Gray Level Co-occurrence Matrix (SGLCM) texture descriptors are used to generate elementary features. These features then form a hierarchical representation of the data to which a two-dimensional compressive sensing (2D CS) feature mining algorithm is applied. Finally, a weighted feature matching method is used to perform image retrieval. We present a comprehensive set of results of applying our proposed Hierarchical Visual Perception Enabled 2D CS approach using publicly available datasets and demonstrate the efficacy of our techniques when compared with other recently published, state-of-the-art approaches

    Robust phase retrieval for high resolution edge illumination x-ray phase-contrast computed tomography in non-ideal environments

    Get PDF
    Edge illumination x-ray phase contrast tomography is a recently developed imaging technique which enables three-dimensional visualisation of low-absorbing materials. Dedicated phase retrieval algorithms can provide separate computed tomography (CT) maps of sample absorption, refraction and scattering properties. In this paper we propose a novel “modified local retrieval” method which is capable of accurately retrieving sample properties in a range of realistic, non-ideal imaging environments. These include system misalignment, defects in the used optical elements and system geometry variations over time due to vibrations or temperature fluctuations. System instabilities were analysed, modelled and incorporated into a simulation study. As a result, an additional modification was introduced to the retrieval procedure to account for changes in the imaging system over time, as well as local variations over the field of view. The performance of the proposed method was evaluated in comparison to a previously used "global retrieval" method by applying both approaches to experimental CT data of a rat’s heart acquired in a non-ideal environment. The use of the proposed method resulted in the removal of major artefacts, leading to a significant improvement in image quality. This method will therefore enable acquiring high-resolution, reliable CT data of large samples in realistic settings

    Facial image restoration and retrieval through orthogonality

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Orthogonality has different definitions in geometry, statistics and calculus. This thesis studies how to incorporate orthogonality to facial image restoration and retrieval tasks. A facial image restoration method and three retrieval methods were proposed. Blur in facial images significantly impedes the efficiency of recognition approaches. However, most existing blind deconvolution methods cannot generate satisfactory results, due to their dependence on strong edges which are sufficient in natural images but not in facial images. A novel method is proposed in this report. Point spread functions (PSF) are represented by the linear combination of a set of pre-defined orthogonal PSFs and similarly, an estimated intrinsic sharp face image (EI) is represented by the linear combination of a set of pre-defined orthogonal face images. In doing so, PSF and EI estimation is simplified to discovering two sets of linear combination coefficients which are simultaneously found by the proposed coupled learning algorithm. To make the method robust to different kinds of blurry face images, several candidate PSFs and EIs are generated for a test image, and then a non-blind deconvolution method is adopted to generate more EIs by those candidate PSFs. Finally, a blind image quality assessment metric is deployed to automatically select the optimal EI. On the other hand, the orthogonality is incorporated into the proposed Unimodal image retrieval method. Hashing methods have been widely investigated for fast approximate nearest neighbor searching in large datasets. Most existing methods use binary vectors in lower dimensional spaces to represent data points that are usually real vectors of higher dimensionality. The proposed method divides the hashing process into two steps. Data points are first embedded in a low-dimensional space, and the Global Positioning System (GPS) method is subsequently introduced but modified for binary embedding. Data-independent and data-dependent methods are devised to distribute the satellites at appropriate locations. The proposed methods are based on finding the tradeoff between the information losses in these two steps. Experiments show that the data-dependent method outperforms other methods in different-sized datasets from 100K to 10M. By incorporating the orthogonality of the code matrix, both data-independent and data-dependent methods are particularly impressive in experiments on longer bits. In social networks, heterogeneous multimedia data correlates to each other, such as videos and their corresponding tags in YouTube and image-text pairs in Facebook. Nearest neighbor retrieval across multiple modalities on large data sets becomes a hot yet challenging problem. Hashing is expected to be an efficient solution, since it represents data as binary codes. As the bit-wise XOR operations can be fast handled, the retrieval time is greatly reduced. Few existing multi-modal hashing methods consider the correlation among hashing bits. The correlation has negative impact on hashing codes. When the hashing code length becomes longer, the retrieval performance improvement becomes slower. The proposed method incorporates a so-called minimum correlation constraint which can be treated as a generalization of orthogonality constraint. Experiments show the superiority of the proposed method becomes greater as the code length increases. Deep neural network is expected to be an efficient way for multi-modal hashing. We propose a hybrid neural network which consists of a convolutional neural network for facial images and a full-connected neural network for tags or labels. The minimum correlation regularization is imposed on the parameters of output layers. Experiments validates the superiority of the proposed hybrid neural network

    Towards open-universe image parsing with broad coverage

    Get PDF
    One of the main goals of computer vision is to develop algorithms that allow the computer to interpret an image not as a pattern of colors but as the semantic relationships that make up a real world three-dimensional scene. In this dissertation, I present a system for image parsing, or labeling the regions of an image with their semantic categories, as a means of scene understanding. Most existing image parsing systems use a fixed set of a few hundred hand-labeled images as examples from which they learn how to label image regions, but our world cannot be adequately described with only a few hundred images. A new breed of open universe datasets have recently started to emerge. These datasets not only have more images but are constantly expanding, with new images and labels assigned by users on the web. Here I present a system that is able to both learn from these larger datasets of labeled images and scale as the dataset expands, thus greatly broadening the number of class labels that can correctly be identified in an image. Throughout this work I employ a retrieval-based methodology: I first retrieve images similar to the query and then match image regions from this set of retrieved images. My system can assign to each image region multiple forms of meaning: for example, it can simultaneously label the wing of a crow as an animal, crow, wing, and feather. I also broaden the label coverage by using both region and detector based similarity measures to effectively match a broad range to label types. This work shows the power of retrieval-based systems and the importance of having a diverse set of image cues and interpretations.Doctor of Philosoph

    Image mining: trends and developments

    Get PDF
    [Abstract]: Advances in image acquisition and storage technology have led to tremendous growth in very large and detailed image databases. These images, if analyzed, can reveal useful information to the human users. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Image mining is more than just an extension of data mining to image domain. It is an interdisciplinary endeavor that draws upon expertise in computer vision, image processing, image retrieval, data mining, machine learning, database, and artificial intelligence. In this paper, we will examine the research issues in image mining, current developments in image mining, particularly, image mining frameworks, state-of-the-art techniques and systems. We will also identify some future research directions for image mining

    Exploiting Deep Features for Remote Sensing Image Retrieval: A Systematic Investigation

    Full text link
    Remote sensing (RS) image retrieval is of great significant for geological information mining. Over the past two decades, a large amount of research on this task has been carried out, which mainly focuses on the following three core issues: feature extraction, similarity metric and relevance feedback. Due to the complexity and multiformity of ground objects in high-resolution remote sensing (HRRS) images, there is still room for improvement in the current retrieval approaches. In this paper, we analyze the three core issues of RS image retrieval and provide a comprehensive review on existing methods. Furthermore, for the goal to advance the state-of-the-art in HRRS image retrieval, we focus on the feature extraction issue and delve how to use powerful deep representations to address this task. We conduct systematic investigation on evaluating correlative factors that may affect the performance of deep features. By optimizing each factor, we acquire remarkable retrieval results on publicly available HRRS datasets. Finally, we explain the experimental phenomenon in detail and draw conclusions according to our analysis. Our work can serve as a guiding role for the research of content-based RS image retrieval

    Image mining: issues, frameworks and techniques

    Get PDF
    [Abstract]: Advances in image acquisition and storage technology have led to tremendous growth in significantly large and detailed image databases. These images, if analyzed, can reveal useful information to the human users. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Image mining is more than just an extension of data mining to image domain. It is an interdisciplinary endeavor that draws upon expertise in computer vision, image processing, image retrieval, data mining, machine learning, database, and artificial intelligence. Despite the development of many applications and algorithms in the individual research fields cited above, research in image mining is still in its infancy. In this paper, we will examine the research issues in image mining, current developments in image mining, particularly, image mining frameworks, state-of-the-art techniques and systems. We will also identify some future research directions for image mining at the end of this paper

    Aggregated Deep Local Features for Remote Sensing Image Retrieval

    Get PDF
    Remote Sensing Image Retrieval remains a challenging topic due to the special nature of Remote Sensing Imagery. Such images contain various different semantic objects, which clearly complicates the retrieval task. In this paper, we present an image retrieval pipeline that uses attentive, local convolutional features and aggregates them using the Vector of Locally Aggregated Descriptors (VLAD) to produce a global descriptor. We study various system parameters such as the multiplicative and additive attention mechanisms and descriptor dimensionality. We propose a query expansion method that requires no external inputs. Experiments demonstrate that even without training, the local convolutional features and global representation outperform other systems. After system tuning, we can achieve state-of-the-art or competitive results. Furthermore, we observe that our query expansion method increases overall system performance by about 3%, using only the top-three retrieved images. Finally, we show how dimensionality reduction produces compact descriptors with increased retrieval performance and fast retrieval computation times, e.g. 50% faster than the current systems.Comment: Published in Remote Sensing. The first two authors have equal contributio
    corecore