57 research outputs found

    Deep Hashing for Image Similarity Search

    Get PDF
    Hashing for similarity search is one of the most widely used methods to solve the approximate nearest neighbor search problem. In this method, one first maps data items from a real valued high-dimensional space to a suitable low dimensional binary code space and then performs the approximate nearest neighbor search in this code space instead. This is beneficial because the search in the code space can be solved more efficiently in terms of runtime complexity and storage consumption. Obviously, for this method to succeed, it is necessary that similar data items be mapped to binary code words that have small Hamming distance. For real-world data such as images, one usually proceeds as follows. For each data item, a pre-processing algorithm removes noise and insignificant information and extracts important discriminating information to generate a feature vector that captures the important semantic content. Next, a vector hash function maps this real valued feature vector to a binary code word. It is also possible to use the raw feature vectors afterwards to further process the search result candidates produced by binary hash codes. In this dissertation we focus on the following. First, developing a learning based counterpart for the MinHash hashing algorithm. Second, presenting a new unsupervised hashing method UmapHash to map the neighborhood relations of data items from the feature vector space to the binary hash code space. Finally, an application of the aforementioned hashing methods for rapid face image recognition

    Large scale visual search

    Get PDF
    With the ever-growing amount of image data on the web, much attention has been devoted to large scale image search. It is one of the most challenging problems in computer vision for several reasons. First, it must address various appearance transformations such as changes in perspective, rotation and scale existing in the huge amount of image data. Second, it needs to minimize memory requirements and computational cost when generating image representations. Finally, it needs to construct an efficient index space and a suitable similarity measure to reduce the response time to the users. This thesis aims to provide robust image representations that are less sensitive to above mentioned appearance transformations and are suitable for large scale image retrieval. Although this thesis makes a substantial number of contributions to large scale image retrieval, we also presented additional challenges and future research based on the contributions in this thesis.China Scholarship Council (CSC)Computer Systems, Imagery and Medi

    Streaming Facility Location in High Dimension via New Geometric Hashing

    Full text link
    In Euclidean Uniform Facility Location, the input is a set of clients in Rd\mathbb{R}^d and the goal is to place facilities to serve them, so as to minimize the total cost of opening facilities plus connecting the clients. We study the classical setting of dynamic geometric streams, where the clients are presented as a sequence of insertions and deletions of points in the grid {1,,Δ}d\{1,\ldots,\Delta\}^d, and we focus on the high-dimensional regime, where the algorithm's space complexity must be polynomial (and certainly not exponential) in dlogΔd\cdot\log\Delta. We present a new algorithmic framework, based on importance sampling from the stream, for O(1)O(1)-approximation of the optimal cost using only poly(dlogΔ)\mathrm{poly}(d\cdot\log\Delta) space. This framework is easy to implement in two passes, one for sampling points and the other for estimating their contribution. Over random-order streams, we can extend this to a one-pass algorithm by using the two halves of the stream separately. Our main result, for arbitrary-order streams, computes O(d1.5)O(d^{1.5})-approximation in one pass by using the new framework but combining the two passes differently. This improves upon previous algorithms that either need space exponential in dd or only guarantee O(dlog2Δ)O(d\cdot\log^2\Delta)-approximation, and therefore our algorithms for high-dimensional streams are the first to avoid the O(logΔ)O(\log\Delta)-factor in approximation that is inherent to the widely-used quadtree decomposition. Our improvement is achieved by introducing a novel geometric hashing scheme that maps points in Rd\mathbb{R}^d into buckets of bounded diameter, with the key property that every point set of small-enough diameter is hashed into at most poly(d)\mathrm{poly}(d) distinct buckets. Finally, we complement our results by showing 1.0851.085-approximation requires space exponential in poly(dlogΔ)\mathrm{poly}(d\cdot\log\Delta), even for insertion-only streams.Comment: The abstract is shortened to meet the length constraint of arXi

    Automatic 3D Facial Performance Acquisition and Animation using Monocular Videos

    Get PDF
    Facial performance capture and animation is an essential component of many applications such as movies, video games, and virtual environments. Video-based facial performance capture is particularly appealing as it offers the lowest cost and the potential use of legacy sources and uncontrolled videos. However, it is also challenging because of complex facial movements at different scales, ambiguity caused by the loss of depth information, and a lack of discernible features on most facial regions. Unknown lighting conditions and camera parameters further complicate the problem. This dissertation explores the video-based 3D facial performance capture systems that use a single video camera, overcome the challenges aforementioned, and produce accurate and robust reconstruction results. We first develop a novel automatic facial feature detection/tracking algorithm that accurately locates important facial features across the entire video sequence, which are then used for 3D pose and facial shape reconstruction. The key idea is to combine the respective powers of local detection, spatial priors for facial feature locations, Active Appearance Models (AAMs), and temporal coherence for facial feature detection. The algorithm runs in realtime and is robust to large pose and expression variations and occlusions. We then present an automatic high-fidelity facial performance capture system that works on monocular videos. It uses the detected facial features along with multilinear facial models to reconstruct 3D head poses and large-scale facial deformation, and uses per-pixel shading cues to add fine-scale surface details such as emerging or disappearing wrinkles and folds. We iterate the reconstruction procedure on large-scale facial geometry and fine-scale facial details to improve the accuracy of facial reconstruction. We further improve the accuracy and efficiency of the large-scale facial performance capture by introducing a local binary feature based 2D feature regression and a convolutional neural network based pose and expression regression, and complement it with an efficient 3D eye gaze tracker to achieve realtime 3D eye gaze animation. We have tested our systems on various monocular videos, demonstrating the accuracy and robustness under a variety of uncontrolled lighting conditions and overcoming significant shape differences across individuals

    The 1993 Space and Earth Science Data Compression Workshop

    Get PDF
    The Earth Observing System Data and Information System (EOSDIS) is described in terms of its data volume, data rate, and data distribution requirements. Opportunities for data compression in EOSDIS are discussed

    Development of Deep Learning Techniques for Image Retrieval

    Get PDF
    Images are used in many real-world applications, ranging from personal photo repositories to medical imaging systems. Image retrieval is a process in which the images in the database are first ranked in terms their similarities with respect to a query image, then a certain number of the images are retrieved from the ranked list that are most similar to the query image. The performance of an image retrieval algorithm is measured in terms of mean average precision. There are numerous applications of image retrieval. For example, face retrieval can help identify a person for security purposes, medical image retrieval can help doctors make more informed medical diagnoses, and commodity image retrieval can help customers find desired commodities. In recent years, image retrieval has gained more popularity in view of the emergence of large-capacity storage devices and the availability of low-cost image acquisition equipment. On the other hand, with the size and diversity of image databases continuously growing, the task of image retrieval has become increasingly more complex. Recent image retrieval techniques have focused on using deep learning techniques because of their exceptional feature extraction capability. However, deep image retrieval networks often employ very complex networks to achieve a desired performance, thus limiting their practicability in applications with limited storage and power capacity. The objective of this thesis is to design high-performance, low complexity deep networks for the task of image retrieval. This objective is achieved by developing three different low-complexity strategies for generating rich sets of discriminating features. Spatial information contained in images is crucial for providing detailed information about the positioning and interrelation of various elements within an image and thus, it plays an important role in distinguishing different images. As a result, designing a network to extract features that characterize this spatial information within an image is beneficial for the task of image retrieval. In the light of the importance of spatial information, in our first strategy, we develop two deep convolutional neural networks capable of extracting features with a focus on the spatial information. For the design of the first network, multi-scale dilated convolution operations are used to extract spatial information, whereas in the design of the second network, fusion of feature maps obtained from different hierarchical levels are employed to extract spatial information. Textural, structural, and edge information is very important for distinguishing images, and therefore, a network capable of extracting features characterizing this type of information about the images could be very useful for the task of image retrieval. Hence, in our second strategy, we develop a deep convolutional neural network that is guided to extract textural, structural, and edge information contained in an image. Since morphological operations process the texture and structure of the objects within an image based on their geometrical properties and edges are fundamental features of an image, we use morphological operations to guide the network in extracting textural and structural information, and a novel pooling operation for extracting the edge information in an image. Most of the researchers in the area of image retrieval have focused on developing algorithms aimed at yielding good retrieval performance at low computational complexity by outputting a list of certain number of images ranked in a decreasing order of similarity with respect to the query image. However, there are other researchers who have adopted a course of improving the results of an already existing image retrieval algorithm through a process of a re-ranking technique. A re-ranking scheme for image retrieval accesses the list of the images retrieved by an image retrieval algorithm and re-ranks them so that the re-ranked list at the output the scheme has a mean average precision value higher than that of the originally retrieved list. A re-ranking scheme is an overhead to the process of image retrieval, and therefore, its complexity should be as small as possible. Most of the re-ranking schemes in the literature aim to boost the retrieval performance at the expense of a very high computational complexity. Therefore, in our third strategy, we develop a computationally efficient re-ranking scheme for image retrieval, whose performance is superior to that of the existing re-ranking schemes. Since image hashing offers the dual benefits of computational efficiency and the ability to generate versatile image representation, we adopt it in the proposed re-ranking scheme. Extensive experiments are performed, in this thesis, using benchmark datasets, to demonstrate the effectiveness of the proposed new strategies in designing low-complexity deep networks for image retrieval
    corecore