81,801 research outputs found
Application-Driven Near-Data Processing for Similarity Search
Similarity search is a key to a variety of applications including
content-based search for images and video, recommendation systems, data
deduplication, natural language processing, computer vision, databases,
computational biology, and computer graphics. At its core, similarity search
manifests as k-nearest neighbors (kNN), a computationally simple primitive
consisting of highly parallel distance calculations and a global top-k sort.
However, kNN is poorly supported by today's architectures because of its high
memory bandwidth requirements.
This paper proposes an application-driven near-data processing accelerator
for similarity search: the Similarity Search Associative Memory (SSAM). By
instantiating compute units close to memory, SSAM benefits from the higher
memory bandwidth and density exposed by emerging memory technologies. We
evaluate the SSAM design down to layout on top of the Micron hybrid memory cube
(HMC), and show that SSAM can achieve up to two orders of magnitude
area-normalized throughput and energy efficiency improvement over multicore
CPUs; we also show SSAM is faster and more energy efficient than competing GPUs
and FPGAs. Finally, we show that SSAM is also useful for other data intensive
tasks like kNN index construction, and can be generalized to semantically
function as a high capacity content addressable memory.Comment: 15 pages, 8 figures, 7 table
Large-scale image analysis using docker sandboxing
With the advent of specialized hardware such as Graphics Processing Units
(GPUs), large scale image localization, classification and retrieval have seen
increased prevalence. Designing scalable software architecture that co-evolves
with such specialized hardware is a challenge in the commercial setting. In
this paper, we describe one such architecture (\textit{Cortexica}) that
leverages scalability of GPUs and sandboxing offered by docker containers. This
allows for the flexibility of mixing different computer architectures as well
as computational algorithms with the security of a trusted environment. We
illustrate the utility of this framework in a commercial setting i.e.,
searching for multiple products in an image by combining image localisation and
retrieval
Deep and Wide Multiscale Recursive Networks for Robust Image Labeling
Feedforward multilayer networks trained by supervised learning have recently
demonstrated state of the art performance on image labeling problems such as
boundary prediction and scene parsing. As even very low error rates can limit
practical usage of such systems, methods that perform closer to human accuracy
remain desirable. In this work, we propose a new type of network with the
following properties that address what we hypothesize to be limiting aspects of
existing methods: (1) a `wide' structure with thousands of features, (2) a
large field of view, (3) recursive iterations that exploit statistical
dependencies in label space, and (4) a parallelizable architecture that can be
trained in a fraction of the time compared to benchmark multilayer
convolutional networks. For the specific image labeling problem of boundary
prediction, we also introduce a novel example weighting algorithm that improves
segmentation accuracy. Experiments in the challenging domain of connectomic
reconstruction of neural circuity from 3d electron microscopy data show that
these "Deep And Wide Multiscale Recursive" (DAWMR) networks lead to new levels
of image labeling performance. The highest performing architecture has twelve
layers, interwoven supervised and unsupervised stages, and uses an input field
of view of 157,464 voxels () to make a prediction at each image location.
We present an associated open source software package that enables the simple
and flexible creation of DAWMR networks
Linked Component Analysis from Matrices to High Order Tensors: Applications to Biomedical Data
With the increasing availability of various sensor technologies, we now have
access to large amounts of multi-block (also called multi-set,
multi-relational, or multi-view) data that need to be jointly analyzed to
explore their latent connections. Various component analysis methods have
played an increasingly important role for the analysis of such coupled data. In
this paper, we first provide a brief review of existing matrix-based (two-way)
component analysis methods for the joint analysis of such data with a focus on
biomedical applications. Then, we discuss their important extensions and
generalization to multi-block multiway (tensor) data. We show how constrained
multi-block tensor decomposition methods are able to extract similar or
statistically dependent common features that are shared by all blocks, by
incorporating the multiway nature of data. Special emphasis is given to the
flexible common and individual feature analysis of multi-block data with the
aim to simultaneously extract common and individual latent components with
desired properties and types of diversity. Illustrative examples are given to
demonstrate their effectiveness for biomedical data analysis.Comment: 20 pages, 11 figures, Proceedings of the IEEE, 201
Combining Visual Analytics and Content Based Data Retrieval Technology for Efficient Data Analysis
One of the most useful techniques to help visual data analysis systems is
interactive filtering (brushing). However, visualization techniques often
suffer from overlap of graphical items and multiple attributes complexity,
making visual selection inefficient. In these situations, the benefits of data
visualization are not fully observable because the graphical items do not pop
up as comprehensive patterns. In this work we propose the use of content-based
data retrieval technology combined with visual analytics. The idea is to use
the similarity query functionalities provided by metric space systems in order
to select regions of the data domain according to user-guidance and interests.
After that, the data found in such regions feed multiple visualization
workspaces so that the user can inspect the correspondent datasets. Our
experiments showed that the methodology can break the visual analysis process
into smaller problems (views) and that the views hold the expectations of the
analyst according to his/her similarity query selection, improving data
perception and analytical possibilities. Our contribution introduces a
principle that can be used in all sorts of visualization techniques and
systems, this principle can be extended with different kinds of integration
visualization-metric-space, and with different metrics, expanding the
possibilities of visual data analysis in aspects such as semantics and
scalability.Comment: Published as Jose Rodrigues, Luciana A. S. Romani, Agma Juci Machado
Traina, Caetano Traina Jr (2010), Combining Visual Analytics and Content
Based Data Retrieval Technology for Efficient Data Analysis, 14th Int Conf on
Inf Visualisation, 61-6
Unsupervised Parallel Extraction based Texture for Efficient Image Representation
SOM is a type of unsupervised learning where the goal is to discover some
underlying structure of the data. In this paper, a new extraction method based
on the main idea of Concurrent Self-Organizing Maps (CSOM), representing a
winner-takes-all collection of small SOM networks is proposed. Each SOM of the
system is trained individually to provide best results for one class only. The
experiments confirm that the proposed features based CSOM is capable to
represent image content better than extracted features based on a single big
SOM and these proposed features improve the final decision of the CAD.
Experiments held on Mammographic Image Analysis Society (MIAS) dataset.Comment: arXiv admin note: substantial text overlap with arXiv:1408.414
Fast MPEG-CDVS Encoder with GPU-CPU Hybrid Computing
The compact descriptors for visual search (CDVS) standard from ISO/IEC moving
pictures experts group (MPEG) has succeeded in enabling the interoperability
for efficient and effective image retrieval by standardizing the bitstream
syntax of compact feature descriptors. However, the intensive computation of
CDVS encoder unfortunately hinders its widely deployment in industry for
large-scale visual search. In this paper, we revisit the merits of low
complexity design of CDVS core techniques and present a very fast CDVS encoder
by leveraging the massive parallel execution resources of GPU. We elegantly
shift the computation-intensive and parallel-friendly modules to the
state-of-the-arts GPU platforms, in which the thread block allocation and the
memory access are jointly optimized to eliminate performance loss. In addition,
those operations with heavy data dependence are allocated to CPU to resolve the
extra but non-necessary computation burden for GPU. Furthermore, we have
demonstrated the proposed fast CDVS encoder can work well with those
convolution neural network approaches which has harmoniously leveraged the
advantages of GPU platforms, and yielded significant performance improvements.
Comprehensive experimental results over benchmarks are evaluated, which has
shown that the fast CDVS encoder using GPU-CPU hybrid computing is promising
for scalable visual search
GPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring
Fisher vector has been widely used in many multimedia retrieval and visual
recognition applications with good performance. However, the computation
complexity prevents its usage in real-time video monitoring. In this work, we
proposed and implemented GPU-FV, a fast Fisher vector extraction method with
the help of modern GPUs. The challenge of implementing Fisher vector on GPUs
lies in the data dependency in feature extraction and expensive memory access
in Fisher vector computing. To handle these challenges, we carefully designed
GPU-FV in a way that utilizes the computing power of GPU as much as possible,
and applied optimizations such as loop tiling to boost the performance. GPU-FV
is about 12 times faster than the CPU version, and 50\% faster than a
non-optimized GPU implementation. For standard video input (320*240), GPU-FV
can process each frame within 34ms on a model GPU. Our experiments show that
GPU-FV obtains a similar recognition accuracy as traditional FV on VOC 2007 and
Caltech 256 image sets. We also applied GPU-FV for realtime video monitoring
tasks and found that GPU-FV outperforms a number of previous works. Especially,
when the number of training examples are small, GPU-FV outperforms the recent
popular deep CNN features borrowed from ImageNet. The code can be downloaded
from the following link https://bitbucket.org/mawenjing/gpu-fv.Comment: accepted by ICMR 201
Monolingual sentence matching for text simplification
This work improves monolingual sentence alignment for text simplification,
specifically for text in standard and simple Wikipedia. We introduce a
convolutional neural network structure to model similarity between two
sentences. Due to the limitation of available parallel corpora, the model is
trained in a semi-supervised way, by using the output of a knowledge-based high
performance aligning system. We apply the resulting similarity score to rescore
the knowledge-based output, and adapt the model by a small hand-aligned
dataset. Experiments show that both rescoring and adaptation improve the
performance of knowledge-based method
Deep Feature Learning for Graphs
This paper presents a general graph representation learning framework called
DeepGL for learning deep node and edge representations from large (attributed)
graphs. In particular, DeepGL begins by deriving a set of base features (e.g.,
graphlet features) and automatically learns a multi-layered hierarchical graph
representation where each successive layer leverages the output from the
previous layer to learn features of a higher-order. Contrary to previous work,
DeepGL learns relational functions (each representing a feature) that
generalize across-networks and therefore useful for graph-based transfer
learning tasks. Moreover, DeepGL naturally supports attributed graphs, learns
interpretable features, and is space-efficient (by learning sparse feature
vectors). In addition, DeepGL is expressive, flexible with many interchangeable
components, efficient with a time complexity of , and
scalable for large networks via an efficient parallel implementation. Compared
with the state-of-the-art method, DeepGL is (1) effective for across-network
transfer learning tasks and attributed graph representation learning, (2)
space-efficient requiring up to 6x less memory, (3) fast with up to 182x
speedup in runtime performance, and (4) accurate with an average improvement of
20% or more on many learning tasks
- …