327,807 research outputs found
Modeling the Resource Requirements of Convolutional Neural Networks on Mobile Devices
Convolutional Neural Networks (CNNs) have revolutionized the research in
computer vision, due to their ability to capture complex patterns, resulting in
high inference accuracies. However, the increasingly complex nature of these
neural networks means that they are particularly suited for server computers
with powerful GPUs. We envision that deep learning applications will be
eventually and widely deployed on mobile devices, e.g., smartphones,
self-driving cars, and drones. Therefore, in this paper, we aim to understand
the resource requirements (time, memory) of CNNs on mobile devices. First, by
deploying several popular CNNs on mobile CPUs and GPUs, we measure and analyze
the performance and resource usage for every layer of the CNNs. Our findings
point out the potential ways of optimizing the performance on mobile devices.
Second, we model the resource requirements of the different CNN computations.
Finally, based on the measurement, pro ling, and modeling, we build and
evaluate our modeling tool, Augur, which takes a CNN configuration (descriptor)
as the input and estimates the compute time and resource usage of the CNN, to
give insights about whether and how e ciently a CNN can be run on a given
mobile platform. In doing so Augur tackles several challenges: (i) how to
overcome pro ling and measurement overhead; (ii) how to capture the variance in
different mobile platforms with different processors, memory, and cache sizes;
and (iii) how to account for the variance in the number, type and size of
layers of the different CNN configurations
Anytime Stereo Image Depth Estimation on Mobile Devices
Many applications of stereo depth estimation in robotics require the
generation of accurate disparity maps in real time under significant
computational constraints. Current state-of-the-art algorithms force a choice
between either generating accurate mappings at a slow pace, or quickly
generating inaccurate ones, and additionally these methods typically require
far too many parameters to be usable on power- or memory-constrained devices.
Motivated by these shortcomings, we propose a novel approach for disparity
prediction in the anytime setting. In contrast to prior work, our end-to-end
learned approach can trade off computation and accuracy at inference time.
Depth estimation is performed in stages, during which the model can be queried
at any time to output its current best estimate. Our final model can process
1242375 resolution images within a range of 10-35 FPS on an NVIDIA
Jetson TX2 module with only marginal increases in error -- using two orders of
magnitude fewer parameters than the most competitive baseline. The source code
is available at https://github.com/mileyan/AnyNet .Comment: Accepted by ICRA201
Highly accurate numerical computation of implicitly defined volumes using the Laplace-Beltrami operator
This paper introduces a novel method for the efficient and accurate
computation of the volume of a domain whose boundary is given by an orientable
hypersurface which is implicitly given as the iso-contour of a sufficiently
smooth level-set function. After spatial discretization, local approximation of
the hypersurface and application of the Gaussian divergence theorem, the volume
integrals are transformed to surface integrals. Application of the surface
divergence theorem allows for a further reduction to line integrals which are
advantageous for numerical quadrature. We discuss the theoretical foundations
and provide details of the numerical algorithm. Finally, we present numerical
results for convex and non-convex hypersurfaces embedded in cuboidal domains,
showing both high accuracy and thrid- to fourth-order convergence in space.Comment: 25 pages, 17 figures, 3 table
Large-scale Ferrofluid Simulations on Graphics Processing Units
We present an approach to molecular-dynamics simulations of ferrofluids on
graphics processing units (GPUs). Our numerical scheme is based on a
GPU-oriented modification of the Barnes-Hut (BH) algorithm designed to increase
the parallelism of computations. For an ensemble consisting of one million of
ferromagnetic particles, the performance of the proposed algorithm on a Tesla
M2050 GPU demonstrated a computational-time speed-up of four order of magnitude
compared to the performance of the sequential All-Pairs (AP) algorithm on a
single-core CPU, and two order of magnitude compared to the performance of the
optimized AP algorithm on the GPU. The accuracy of the scheme is corroborated
by comparing the results of numerical simulations with theoretical predictions
Fast Parallel Randomized Algorithm for Nonnegative Matrix Factorization with KL Divergence for Large Sparse Datasets
Nonnegative Matrix Factorization (NMF) with Kullback-Leibler Divergence
(NMF-KL) is one of the most significant NMF problems and equivalent to
Probabilistic Latent Semantic Indexing (PLSI), which has been successfully
applied in many applications. For sparse count data, a Poisson distribution and
KL divergence provide sparse models and sparse representation, which describe
the random variation better than a normal distribution and Frobenius norm.
Specially, sparse models provide more concise understanding of the appearance
of attributes over latent components, while sparse representation provides
concise interpretability of the contribution of latent components over
instances. However, minimizing NMF with KL divergence is much more difficult
than minimizing NMF with Frobenius norm; and sparse models, sparse
representation and fast algorithms for large sparse datasets are still
challenges for NMF with KL divergence. In this paper, we propose a fast
parallel randomized coordinate descent algorithm having fast convergence for
large sparse datasets to archive sparse models and sparse representation. The
proposed algorithm's experimental results overperform the current studies' ones
in this problem
- …