81,801 research outputs found

    Application-Driven Near-Data Processing for Similarity Search

    Full text link
    Similarity search is a key to a variety of applications including content-based search for images and video, recommendation systems, data deduplication, natural language processing, computer vision, databases, computational biology, and computer graphics. At its core, similarity search manifests as k-nearest neighbors (kNN), a computationally simple primitive consisting of highly parallel distance calculations and a global top-k sort. However, kNN is poorly supported by today's architectures because of its high memory bandwidth requirements. This paper proposes an application-driven near-data processing accelerator for similarity search: the Similarity Search Associative Memory (SSAM). By instantiating compute units close to memory, SSAM benefits from the higher memory bandwidth and density exposed by emerging memory technologies. We evaluate the SSAM design down to layout on top of the Micron hybrid memory cube (HMC), and show that SSAM can achieve up to two orders of magnitude area-normalized throughput and energy efficiency improvement over multicore CPUs; we also show SSAM is faster and more energy efficient than competing GPUs and FPGAs. Finally, we show that SSAM is also useful for other data intensive tasks like kNN index construction, and can be generalized to semantically function as a high capacity content addressable memory.Comment: 15 pages, 8 figures, 7 table

    Large-scale image analysis using docker sandboxing

    Full text link
    With the advent of specialized hardware such as Graphics Processing Units (GPUs), large scale image localization, classification and retrieval have seen increased prevalence. Designing scalable software architecture that co-evolves with such specialized hardware is a challenge in the commercial setting. In this paper, we describe one such architecture (\textit{Cortexica}) that leverages scalability of GPUs and sandboxing offered by docker containers. This allows for the flexibility of mixing different computer architectures as well as computational algorithms with the security of a trusted environment. We illustrate the utility of this framework in a commercial setting i.e., searching for multiple products in an image by combining image localisation and retrieval

    Deep and Wide Multiscale Recursive Networks for Robust Image Labeling

    Full text link
    Feedforward multilayer networks trained by supervised learning have recently demonstrated state of the art performance on image labeling problems such as boundary prediction and scene parsing. As even very low error rates can limit practical usage of such systems, methods that perform closer to human accuracy remain desirable. In this work, we propose a new type of network with the following properties that address what we hypothesize to be limiting aspects of existing methods: (1) a `wide' structure with thousands of features, (2) a large field of view, (3) recursive iterations that exploit statistical dependencies in label space, and (4) a parallelizable architecture that can be trained in a fraction of the time compared to benchmark multilayer convolutional networks. For the specific image labeling problem of boundary prediction, we also introduce a novel example weighting algorithm that improves segmentation accuracy. Experiments in the challenging domain of connectomic reconstruction of neural circuity from 3d electron microscopy data show that these "Deep And Wide Multiscale Recursive" (DAWMR) networks lead to new levels of image labeling performance. The highest performing architecture has twelve layers, interwoven supervised and unsupervised stages, and uses an input field of view of 157,464 voxels (54354^3) to make a prediction at each image location. We present an associated open source software package that enables the simple and flexible creation of DAWMR networks

    Linked Component Analysis from Matrices to High Order Tensors: Applications to Biomedical Data

    Full text link
    With the increasing availability of various sensor technologies, we now have access to large amounts of multi-block (also called multi-set, multi-relational, or multi-view) data that need to be jointly analyzed to explore their latent connections. Various component analysis methods have played an increasingly important role for the analysis of such coupled data. In this paper, we first provide a brief review of existing matrix-based (two-way) component analysis methods for the joint analysis of such data with a focus on biomedical applications. Then, we discuss their important extensions and generalization to multi-block multiway (tensor) data. We show how constrained multi-block tensor decomposition methods are able to extract similar or statistically dependent common features that are shared by all blocks, by incorporating the multiway nature of data. Special emphasis is given to the flexible common and individual feature analysis of multi-block data with the aim to simultaneously extract common and individual latent components with desired properties and types of diversity. Illustrative examples are given to demonstrate their effectiveness for biomedical data analysis.Comment: 20 pages, 11 figures, Proceedings of the IEEE, 201

    Combining Visual Analytics and Content Based Data Retrieval Technology for Efficient Data Analysis

    Full text link
    One of the most useful techniques to help visual data analysis systems is interactive filtering (brushing). However, visualization techniques often suffer from overlap of graphical items and multiple attributes complexity, making visual selection inefficient. In these situations, the benefits of data visualization are not fully observable because the graphical items do not pop up as comprehensive patterns. In this work we propose the use of content-based data retrieval technology combined with visual analytics. The idea is to use the similarity query functionalities provided by metric space systems in order to select regions of the data domain according to user-guidance and interests. After that, the data found in such regions feed multiple visualization workspaces so that the user can inspect the correspondent datasets. Our experiments showed that the methodology can break the visual analysis process into smaller problems (views) and that the views hold the expectations of the analyst according to his/her similarity query selection, improving data perception and analytical possibilities. Our contribution introduces a principle that can be used in all sorts of visualization techniques and systems, this principle can be extended with different kinds of integration visualization-metric-space, and with different metrics, expanding the possibilities of visual data analysis in aspects such as semantics and scalability.Comment: Published as Jose Rodrigues, Luciana A. S. Romani, Agma Juci Machado Traina, Caetano Traina Jr (2010), Combining Visual Analytics and Content Based Data Retrieval Technology for Efficient Data Analysis, 14th Int Conf on Inf Visualisation, 61-6

    Unsupervised Parallel Extraction based Texture for Efficient Image Representation

    Full text link
    SOM is a type of unsupervised learning where the goal is to discover some underlying structure of the data. In this paper, a new extraction method based on the main idea of Concurrent Self-Organizing Maps (CSOM), representing a winner-takes-all collection of small SOM networks is proposed. Each SOM of the system is trained individually to provide best results for one class only. The experiments confirm that the proposed features based CSOM is capable to represent image content better than extracted features based on a single big SOM and these proposed features improve the final decision of the CAD. Experiments held on Mammographic Image Analysis Society (MIAS) dataset.Comment: arXiv admin note: substantial text overlap with arXiv:1408.414

    Fast MPEG-CDVS Encoder with GPU-CPU Hybrid Computing

    Full text link
    The compact descriptors for visual search (CDVS) standard from ISO/IEC moving pictures experts group (MPEG) has succeeded in enabling the interoperability for efficient and effective image retrieval by standardizing the bitstream syntax of compact feature descriptors. However, the intensive computation of CDVS encoder unfortunately hinders its widely deployment in industry for large-scale visual search. In this paper, we revisit the merits of low complexity design of CDVS core techniques and present a very fast CDVS encoder by leveraging the massive parallel execution resources of GPU. We elegantly shift the computation-intensive and parallel-friendly modules to the state-of-the-arts GPU platforms, in which the thread block allocation and the memory access are jointly optimized to eliminate performance loss. In addition, those operations with heavy data dependence are allocated to CPU to resolve the extra but non-necessary computation burden for GPU. Furthermore, we have demonstrated the proposed fast CDVS encoder can work well with those convolution neural network approaches which has harmoniously leveraged the advantages of GPU platforms, and yielded significant performance improvements. Comprehensive experimental results over benchmarks are evaluated, which has shown that the fast CDVS encoder using GPU-CPU hybrid computing is promising for scalable visual search

    GPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring

    Full text link
    Fisher vector has been widely used in many multimedia retrieval and visual recognition applications with good performance. However, the computation complexity prevents its usage in real-time video monitoring. In this work, we proposed and implemented GPU-FV, a fast Fisher vector extraction method with the help of modern GPUs. The challenge of implementing Fisher vector on GPUs lies in the data dependency in feature extraction and expensive memory access in Fisher vector computing. To handle these challenges, we carefully designed GPU-FV in a way that utilizes the computing power of GPU as much as possible, and applied optimizations such as loop tiling to boost the performance. GPU-FV is about 12 times faster than the CPU version, and 50\% faster than a non-optimized GPU implementation. For standard video input (320*240), GPU-FV can process each frame within 34ms on a model GPU. Our experiments show that GPU-FV obtains a similar recognition accuracy as traditional FV on VOC 2007 and Caltech 256 image sets. We also applied GPU-FV for realtime video monitoring tasks and found that GPU-FV outperforms a number of previous works. Especially, when the number of training examples are small, GPU-FV outperforms the recent popular deep CNN features borrowed from ImageNet. The code can be downloaded from the following link https://bitbucket.org/mawenjing/gpu-fv.Comment: accepted by ICMR 201

    Monolingual sentence matching for text simplification

    Full text link
    This work improves monolingual sentence alignment for text simplification, specifically for text in standard and simple Wikipedia. We introduce a convolutional neural network structure to model similarity between two sentences. Due to the limitation of available parallel corpora, the model is trained in a semi-supervised way, by using the output of a knowledge-based high performance aligning system. We apply the resulting similarity score to rescore the knowledge-based output, and adapt the model by a small hand-aligned dataset. Experiments show that both rescoring and adaptation improve the performance of knowledge-based method

    Deep Feature Learning for Graphs

    Full text link
    This paper presents a general graph representation learning framework called DeepGL for learning deep node and edge representations from large (attributed) graphs. In particular, DeepGL begins by deriving a set of base features (e.g., graphlet features) and automatically learns a multi-layered hierarchical graph representation where each successive layer leverages the output from the previous layer to learn features of a higher-order. Contrary to previous work, DeepGL learns relational functions (each representing a feature) that generalize across-networks and therefore useful for graph-based transfer learning tasks. Moreover, DeepGL naturally supports attributed graphs, learns interpretable features, and is space-efficient (by learning sparse feature vectors). In addition, DeepGL is expressive, flexible with many interchangeable components, efficient with a time complexity of O(∣E∣)\mathcal{O}(|E|), and scalable for large networks via an efficient parallel implementation. Compared with the state-of-the-art method, DeepGL is (1) effective for across-network transfer learning tasks and attributed graph representation learning, (2) space-efficient requiring up to 6x less memory, (3) fast with up to 182x speedup in runtime performance, and (4) accurate with an average improvement of 20% or more on many learning tasks
    • …
    corecore