418 research outputs found
Geometry-Oblivious FMM for Compressing Dense SPD Matrices
We present GOFMM (geometry-oblivious FMM), a novel method that creates a
hierarchical low-rank approximation, "compression," of an arbitrary dense
symmetric positive definite (SPD) matrix. For many applications, GOFMM enables
an approximate matrix-vector multiplication in or even time,
where is the matrix size. Compression requires storage and work.
In general, our scheme belongs to the family of hierarchical matrix
approximation methods. In particular, it generalizes the fast multipole method
(FMM) to a purely algebraic setting by only requiring the ability to sample
matrix entries. Neither geometric information (i.e., point coordinates) nor
knowledge of how the matrix entries have been generated is required, thus the
term "geometry-oblivious." Also, we introduce a shared-memory parallel scheme
for hierarchical matrix computations that reduces synchronization barriers. We
present results on the Intel Knights Landing and Haswell architectures, and on
the NVIDIA Pascal architecture for a variety of matrices.Comment: 13 pages, accepted by SC'1
Accelerating Deep Neural Networks on Low Power Heterogeneous Architectures
Deep learning applications are able to recognise images and
speech with great accuracy, and their use is now everywhere in our daily
lives. However, developing deep learning architectures such as deep neural networks in embedded systems is a challenging task because of the
demanding computational resources and power consumption. Hence, sophisticated algorithms and methods that exploit the hardware of the
embedded systems need to be investigated. This paper is our first step
towards examining methods and optimisations for deep neural networks
that can leverage the hardware architecture of low power embedded devices. In particular, in this work we accelerate the inference time of the
VGG-16 neural network on the ODROID-XU4 board. More specifically,
a serial version of VGG-16 is parallelised for both the CPU and GPU
present on the board using OpenMP and OpenCL. We also investigate
several optimisation techniques that exploit the specific hardware architecture of the ODROID board and can accelerate the inference further.
One of these optimisations uses the CLBlast library specifically tuned
for the ARM Mali-T628 GPU present on the board. Overall, we improve
the inference time of the initial serial version of the code by 2.8X using
OpenMP, and by 9.4X using the most optimised version of OpenCL
Grocery Shopping Assistant Using OpenCV
In this paper we present an android mobile application that allows user to keep track of food products and grocery items bought during each grocery shopping along with its nutrient information. This application allows user to get nutrient information of products and grocery by just taking a photo. Product matching is performed using SURF feature detection followed by FLANN feature matching. We extract the table from the nutrient fact table image using concepts of erosion, dilation and contour detection. Classifying the grocery is done using Object Categorization through the concepts of Bag of Words (BOW) and SVM machine learning. This application includes three main subsystems: client (Android), server (Node.js) and image processing (OpenCV)
The AXIOM software layers
AXIOM project aims at developing a heterogeneous computing board (SMP-FPGA).The Software Layers developed at the AXIOM project are explained.OmpSs provides an easy way to execute heterogeneous codes in multiple cores. People and objects will soon share the same digital network for information exchange in a world named as the age of the cyber-physical systems. The general expectation is that people and systems will interact in real-time. This poses pressure onto systems design to support increasing demands on computational power, while keeping a low power envelop. Additionally, modular scaling and easy programmability are also important to ensure these systems to become widespread. The whole set of expectations impose scientific and technological challenges that need to be properly addressed.The AXIOM project (Agile, eXtensible, fast I/O Module) will research new hardware/software architectures for cyber-physical systems to meet such expectations. The technical approach aims at solving fundamental problems to enable easy programmability of heterogeneous multi-core multi-board systems. AXIOM proposes the use of the task-based OmpSs programming model, leveraging low-level communication interfaces provided by the hardware. Modular scalability will be possible thanks to a fast interconnect embedded into each module. To this aim, an innovative ARM and FPGA-based board will be designed, with enhanced capabilities for interfacing with the physical world. Its effectiveness will be demonstrated with key scenarios such as Smart Video-Surveillance and Smart Living/Home (domotics).Peer ReviewedPostprint (author's final draft
- …