ViPS: Visual processing system for medical imaging by Hussain, Tassadaq et al.
ViPS: Visual Processing System for Medical Imaging
Tassadaq Hussain1, Oscar Palomar2, Adrian Cristal2, Eduard Ayguade´2
1 Riphah International University Islamabad, Pakistan 2 Computer Sciences, Barcelona Supercomputing Center, Barcelona, Spain
Email: {tassaduq.hussain@riphah.edu.pk}
Abstract—Imaging has become an indispensable tool in modern
medicine. Various powerful and expensive platforms to study
medical imaging applications appear in recent years. In this
article, we design and propose a Visual Processing System (ViPS)
that processes medical imaging applications efficiently. ViPS
provides a user-friendly programming environment and high-
performance architecture to perform image analysis, features
extraction and object recognition for complex real-time images or
videos. The data structure of image or video is described in the
program memory using pattern descriptors; ViPS uses specialized
3D memory structure to handle complex images or videos
and processes them on microprocessors or application specific
hardware accelerators. The proposed system is highly reliable
in terms of cost, performance, and power. ViPS based system
is implemented and tested on a Xilinx Virtex-7 FPGA VC707
Evaluation Kit. The performance of ViPS is compared with the
Intel i7 multi-core, GPU Jetson TK1 Embedded Development Kit
with 192 CUDA cores based graphic systems. When compared
with the Intel and GPU-based systems, the results show that ViPS
performs real-time video reconstruction at 2x and 1.45x of higher
frame rate, achieves 14.6x to 4.8x of speedup while executing
different image processing applications and 20.3% and 12.6% of
speedup for video processing algorithms respectively.
I. INTRODUCTION
Graphics software programs are growing and are desirable
on a big spectrum ranging from medical science to gaming
technology because they can generate realistic images and
enable graphics effects for user interfaces that give different
viewpoints and visual clues. Different software engines [1] [2]
[3] are introduced which provide an efficient means of reuse
between a set of related products. Those software solutions
do provide flexibility and re-programmability, but graphics
performance is limited by the computation power of graphics
devices.
Graphics systems are now being used in medical to diagnose
and manage the physical form of patients. As the biomedical
industry tries to lower the patient cost and achieves earlier
disease prediction, the medical imaging equipment takes on
an increasingly critical role in health care. To meet these
industry goals, the bio-medical industry is pushing towards
high-performance computing designs. As the performance of
these devices grows, application specific and high-performance
hardware are required to run complicated/complex applica-
tions.
A number of High Performance Computing (HPC) graphics
engines e.g ATI [4] and nVidia [5] are now available in the
market. A major drawback of these architectures is the lack
of programming models for the medical imaging applications.
The research leading to these results has received funding from the Riphah
International University Islamabad Pakistan.
These architectures use their knowledge of the personal com-
puter market and provide a generic programming model for the
HPC applications. These architectures do not fulfill demands
of medical scientists for solving the medical imaging problem.
Therefore, the medical imaging industry needs an architecture
that not only gives high performance but also provides a
programming model that facilitates the medical scientist to
write their application without going into detail of hardware
design.
In this work, we intend to develop a low-power, low-
cost, easy to use and high-performance graphics architecture
called Visual Processing System (ViPS). ViPS provides a
high performance FPGA-based design which takes complex
image/video data from medical imaging interfaces or stored in
the memory, manages them in an on-chip Specialized Memory
and process them using specialized hardware accelerators or
multi-core system. The ViPS programming model aims to
remove the programming effort of manually arranging data
transfer requests, memory management, input/output periph-
eral management and meet the performance requirements of
the imaging applications. The approach reduces the application
processing time, gives promising interconnection approach
for multiple imaging peripherals with the potential to exploit
parallelism while copying the memory/network latencies and
balances the workload. ViPS bus scheduler provides low-cost
and simple control characteristics that arranges multiple imag-
ing peripherals requests and communicates with integrated
processing units. We integrated dedicated hardware accelera-
tors in the design as they have a low footprint and low power
consumption and gives high-performance computation. ViPS
supports multi-peripherals (camera, display) and processor
core without the support of the master cores and operating
system (OS). The integration of ViPS with peripherals facili-
tates the graphics system to overcome wire (interconnection)
and memory read/write delays and improves the performance
of application kernels by arranging complex on-chip data
transfers. While comparing results with the Intel and GPU-
based systems, it show that ViPS performs real-time video
reconstruction at 2x and 1.45x of higher frame rate, achieves
14.6x to 4.8x of speedup while executing different image
processing applications and 20.3% and 12.6% of speedup for
video processing
The rest of this paper is organized as follows: Section
III describes ViPS system. Section IV presents the results.
Finally, Section II discusses the related work and Section V
provides the conclusions.
II. RELATED WORK
Bakalash et al. [6] proposed MediCube system for 3D
medical imaging. The system supports the reconstruction,
manipulation, analysis, and display of 3D volumetric medical
images. The system is based on general-purpose voxel-based
Cube architecture, which employs parallel memory and paral-
lel processing to support real-time manipulation and display
of voxel imagery. The ViPS handles 3D medical imaging
using specialized scratchpad memory and uses reconfigurable
application specific hardware accelerators.
Specialized DSP based systems like the Bluetechnix [7]
Blackn camera boards provide superior image processing
abilities at the expense of power, price, and complexity. Lee
et al. [8] offered SONY digital signal processor (DSP) based
advanced video camera system which has high automatic
focus (AF), automatic exposure (AE), and automatic white-
balance (AWB) control. The ViPS system offers low cost and
low power architecture with the support of image processing
abilities. Jinghong et al. [9] proposed an Image processing
system structure based on DSP and FPGA. The system uses
DSP as advanced image processing unit and FPGA as a logic
unit for image sampling and display. The developed system
can take image, display image and make some image process-
ing operations that include geometry transform, orthographic
transform, operations based on pixels, image compression
and color space conversion. ViPS uses application specific
hardware accelerators for high performance applications and
to take high-speed data from sensors. A 32-bit RISC core
is integrated with the design for programmability, moderate
performance and low cost systems.
Tassadaq et al. proposed Programmable Graphics Con-
troller [10] [11] for low cost and low power graphics system.
The system takes 2 dimensional images to process applica-
tions. ViPS uses specialized memory which helps architecture
to execute 3 dimensional images for medical applications.
The ViPS architecture provides easy to use programming
environment for the applications.
III. VIPS GRAPHICS SYSTEM SPECIFICATION
Architectural investigation for visual processing system
ranges from high-level system architecture to analog and
circuit-level design. ViPS architecture covers the reuse of
processing elements, data parallelism and the network archi-
tecture.
In this section, we describe the specification of ViPS system
and design its architecture. The section is further categorized
into five subsections: Overview of ViPS, the Network Unit,
Memory Unit, Processing Units and Programming Model.
A. Overview of ViPS
ViPS architecture is pipelined from the sensor chip over the
wire to the processing chip, detailing data flow and onboard
data storage. ViPS inner architecture is shown in Figure 1,
which displays the interconnection of the processing units
and memory. The system uses combined hardware/software
solution that includes hardware accelerators, and an RISC
Fig. 1. ViPS : Internal Structure
processor core. The camera and display are controlled by
custom Application Specific Hardware Accelerators (ASHA).
The Specialized Memory holds the complex image data for
image/video processing and efficiently accesses, reuses and
feeds data to the Processing Unit. To store high resolution/-
dencity images the Main Memory is integrated. The Program
Memory is used to hold application program description and
data transfer information. Depending upon the data transfer,
the Memory Manager takes single or multiple instructions
from Program Memory and schedules the data movement.
The ViPS Scheduler handles the concurrent bus request by
different analog peripherals (e.g. camera and display) and
rearranges multiple data access requests and arbitrates data
transfer without creating on-chip bus contention.
B. Memory Unit
The ViPS memory is organized into three sections: the
Program Memory, the Specialized Memory, and the Main
Memory.
1) Program Memory: The Program Memory holds descrip-
tors [12], [13] that define the data movement between the
processing unit and the memory unit. The descriptors allow
the programmer to describe the shape and size of images and
its location in memory. A single descriptor is represented
by parameters called command, source address, destination
address, stream, and stride. A command specifies the operation
to be performed. The address parameters specify the source
and destination locations. Stream defines the number of pixels
to be transferred. Stride indicates the distance between two
consecutive memory addresses of a stream. C/C++ function
calls are provided to define a complex image structure in
software.
2) Specialized Memory: The ViPS Specialized Mem-
ory [14] (SM) is directly connected to the Processing Unit
and provides single cycle data access. Like a cache, the SM
temporarily holds data to speed up later accesses. Unlike
a cache, data is deliberately placed in the SM at a known
location, rather than automatically cached according to a
fixed hardware policy. The on-chip encapsulation of SM with
Processing Unit allows applications to access data without
the additional delay of on-chip data management. Depending
Fig. 2. (a) 3D Medical Image of Human Foot (b) ViPS Specialized Memory Architecture (c) 3D Image Placed in Specialized Memory
upon the available block RAMs, the SM can be organized into
multiple banks. Each bank has two ports (PortA & PortB),
which allows the Processing Unit to perform parallel reads
and writes data. To exploit parallelism better, the banks of SM
are organized physically into a multi-dimensional (1D/2D/3D)
architecture to map the kernel access pattern on the SM.
For example, a generic 3D image structure is shown in
Figure 2(a). The SM shown in Figure 2(b) uses multiple banks
to accommodate 3D image. An example of 3D image placed in
SM is shown in Figure 2(c). Each bank handles a single image
and its read/write operation is independent of other banks and
can be performed in parallel.
In our current evaluation on Xilinx Virtex-7, the SM has 8
banks and each bank holds 3 KPixel image. Each bank uses
multiple BRAM, which is controlled by a separate BRAM
controller and has a different base address. Single or two
dimension data sets are placed in a single bank and can use
single or multiple BRAM/s. ViPS accesses and places data in
tiles if the data set is too larger than the SM structure.
3) Main Memory: The slowest type of memory in the ViPS
architecture is Main Memory and is accessible by the whole
system. The Main Memory has SDRAM, SD/SDHC memories,
etc. interfaces to read/write data.
C. Network Unit
The Network Unit transfers data from external graphic com-
ponents such as camera & display sensors to the processing
core. The data width of Network Unit is a significant factor;
there are two types of interfaces that ViPS system supports: the
Parallel Front-End Interface (PFEI) and the Serial Front-End
Interface (SFEI). The RGB Raw Data and ITU-R 656 (YUV)
are PFEI and are available nearly in all graphic sensors and are
used for low cost applications without integrated Image Signal
Processor (ISP). The interface has a frame rate limitation
for higher resolutions and has Electro Magnetic Interference
issues at high pixel clock. Most of the
textitSFEI are based upon LVDS (Low Voltage Differential
Signaling) or subLVDS. LVDS make use of low-voltage dif-
ferential signals and drives point-to-point and multi-drop by
using low-voltage, low-power, and differential technology. The
most attractive features of LVDS are high signaling rate, low
power consumption, and electromagnetic compatibility.
The ViPS Network Unit uses Scheduler and Memory Man-
ager, to manage the processing units and memory units. The
ViPS Scheduler along with Memory Manager arrange requests
are coming from single or multiple imaging peripherals. The
Memory Manager holds the address and control information
and plays a critical role in managing and allocating data for
the application kernel.
The memory accesses are rearranged during compile-time,
to fit in the minimum number of descriptors of the Program
Memory. At run-time the Memory Manager executes these de-
scriptor blocks in hardware without generating overhead such
as the time spent handling request and grant signals between
processor and memory, as well as an address generation delay.
At run-time, the Memory Manager executes these descriptor
blocks in hardware, typically overlapping address generation
and memory requests with computation in the processing
unit. The Memory Manager manages the access patterns of
application kernels with complex image data layouts. It keeps
the information of the data currently stored in the SM and
reuses data when possible.
D. Processing Unit
ViPS supports two types of cores: the Application Specific
Hardware Accelerator and the RISC Core.
1) Application Specific Hardware Accelerators: Applica-
tion Specific Hardware Accelerators (ASHA) are used in the
design for the imaging peripherals. A camera ASHA grabs raw
data from the image sensor, processes it and transfers it to the
system via the Network Unit. The primary function blocks of a
camera ASHA are Camera Interface Front-End, Image Signal
Processor, Color processing, Scaling, Compression, and Bus
controller. To display image data on LCD panel, a display
ASHA is used. It supports LCD 16bpp up to 24bpp colors and
user defined resolution from VGA to QSXGA. Programming is
done by register read/write transactions using a slave interface.
2) Processor Core: A low power and light weight 32-
bit RISC processor core is used to provide programmability,
flexibility and software data processing. ViPS is using a
software interface API that can be used to correct design
errors, update the system to a new graphic standard and to add
more features to the graphics system. The proposed processor
core has 32-bit data bus, 32-general purpose registers, custom
instruction set, non-pipelined Load/Store access, hardwired
control unit, 64KByte address space, total 16-interrupts and
memory mapped I/Os.
E. Programming Model
When using ViPS programming model, the programmer
does not need to worry about the hardware related pro-
gramming and configuration constraints. By using ViPS, the
memory operations are shaped into patterns and are scheduled
in parallel with the processing unit. ViPS supports com-
plex irregular, strider 1D, 2D, 3D and automated blocking
for image/video access operations to transfer data between,
Network Unit, Local and Main memories. The Processing
Unit communicate with ViPS through a group of commands,
controls, status and data registers and signals. Table I shows
function calls to programm the current ViPS architecture.
IV. RESULTS AND DISCUSSION
In this section, we analyze the results of different experi-
ments conducted on ViPS. In order to evaluate the performance
of ViPS, the results are compared with Intel i7-2670QM quad-
core (2.2 GHz, 6MB Cache) based laptop and the GPU Jetson
TK1 board having a quad-core ARM Cortex-A15 processor
and 192 CUDA cores. The architectures are connected with
CMOS and Ultrasonic imaging sensors. The experiments are
classified into three subsections: Real-time Image Reconstruc-
tion, Image Processing and Video Processing.
A. Real-time Image Reconstruction
For complex imaging algorithms, the computational com-
plexity of image reconstruction has increased dramatically.
Therefore, high-speed image reconstruction is required which
is even more critical in real-time imaging applications, such as
online adaptive therapy. Ultra-fast image reconstruction could
also allow the clinician to adjust reconstruction parameters in-
teractively and optimize the noise/spatial resolution trade-off.
A multi-camera graphics system can be used for 3D-graphics
using geometric transformation and projection plane [15]. In
this section, two THDB-D5M image sensors are used that
generate two separate, simultaneous video streams and apply
Alpha blending application that evaluate the performance of
the system. Each camera is operating at VGA color resolution.
The video of dual image sensors is combined into a single
TABLE I
C/C++ DEVICE DRIVERS TO PROGRAM/OPERATE VIPS
API Function Description
READ_IMAGE, WRITE_IMAGE, Image Data Access
ViPS_MEMCPY ( SPECIALIZED_MEMORY
SPECIALIZED_MEMORY, indicates Local Memory buffer
DATA_SET, Priority) DATA_SET indicates Main Memory data set
3D_FILTER( Specialized Data Transfer
SPECIALIZED_MEMORY, SPECIALIZED_MEMORY indicates SM structure
DATA_SET, Kernel, Priority) DATA_SET indicates Main Memory data set
Kernel defines the type of filter
Fig. 3. Applications Executing Time: ViPS ASHA, ViPS SSP, Intel i7 and
GPU
stream, processed by graphics core and then displayed. The
key issue of the dual-camera system is receiving the images
synchronously, in the right format and on the right bus. The
graphic system sends the configuration data to both image
sensors and ensures that they are properly configured and
synchronized. Once both sensors are set up and synchronized,
both sensors begin to transmit image data. The graphic system
looks for the appropriate control characters, so it recognizes
the start of the frame and start of a line for each sensor.
ViPS performs it by looking for a control character and
sequence of sensors commands. Alpha blending is applied
to give a translucent effect to the incoming video stream.
The application blends the color value of the consecutive
pixels of image sensors of the same position. This blending
is done according to the alpha value associated with the
pixel. The alpha value represents the capacity of the given
pixel. Results show that ViPS handles dual camera system
and support system up to 50 fps. The Intel i7 and GPU based
dual-camera graphics system supports videos up to 25 and
35 fps respectively. The ViPS on-chip scheduler update multi-
camera information in status register. This allows both cameras
to synchronize without using extra clocks.
B. Image Processing
In this section, we execute Thresholding (Thresh), Finite
Impulse Response (FIR), Fast Fourier Transform (FFT) and
Laplacian Filter (Laplacian) application kernels on ViPS SSP
and ViPS ASHA based systems. ViPS SSP and ViPS ASHA
systems execute application kernels on soft scalar processor
and application-specific hardware accelerators respectively.
Figure 3 shows applications execution time. X-axis presents
applications name, and Y-axis displays execution time in
seconds (lower is better). ViPS results are compared with
the Intel i7 and GPU based system. The systems read one
still image of QSXGA resolution from the camera sensor
and write it to the Main Memory. The processor core reads
the image, performs computation and then writes it back to
the Main Memory. The results show that while executing
Thresh, ViPS ASHA achieves 14.6x and 4.4x of speedups
compare to the Intel i7 and GPU systems respectively. This
application kernel requires single pixel element and very few
operation. The FIR application has streaming data access
pattern and performs multiplication and addition. ViPS ASHA
achieve 12.3x and 4.7x of speedups respectively. The FFT
application kernel reads a 1D block of data, perform complex
computation and writes it back to the Main Memory. ViPS
ASHA achieve 7.7x and 4.8x of speedups respectively. The
Laplacian application kernel processes over a 2D block of data,
ViPS ASHA make 7.4x and 2.6x of speedups respectively.
ViPS places access patterns on Program Memory at program
time and are programmed in such a way that few operations
are required for generating addresses at run-time. The Intel i7
and GPU-based systems use multiple load/store, or DMA calls
to access complex patterns. The speedups are possible because
ViPS can manage data transfers with a single Descriptor.
At run-time, ViPS takes Descriptor from Program Memory
independently and manages them in the Specialized Memory,
whereas the baseline systems are dependent on the processor
core that manages on chip data, data transfer instructions and
the Main Memory data. The stand-alone working operation
of ViPS removes the overhead of processor/memory system
request/grant delay.
C. Video Processing
In this section, we use real-time data from different sen-
sors and applies video processing algorithms using CMOS
image sensor. The systems apply Object Detection, Object
Recognition, 3D-Stereo Filtering and Ultrasonic Image Re-
constructing on real-time sensor data for 10 minutes shown in
Figure 4. X-axis presents applications processed by ViPS, Intel
i7, and GPU systems. ViPS system processes an application
using ASHAs. Y-axis shows number of processed frames
in 10 minutes by each system. The CMOS image sensor
is programmed at (640×480) resolution and 30 (frame per
second) (fps) therefore maximum number of frames are 18000.
Each bar presents the number of processed frames per 10
minutes (higher is better). If the application is complex,
the system takes more time and skips some of the frames.
We measure only those frames that are processed by the
system. While executing Object Detection application, the
results show that ViPS achieves 20.3%, and 8.4% of speedups
Fig. 4. Video Processed Frames: ViPS, Intel i7 and GPU
compare to the Intel i7 and GPU systems respectively. This
application takes 2D images directly from the image sensor.
The Object Recognition application reads a 2D block of data
and performs recognition algorithm. ViPS achieve 7.7% and
4.8% of speedups respectively. The 3D Stereo Filtering appli-
cation processes over a 3D block of data, ViPS reach 17.4%
and 12.6% of speedups respectively. ViPS uses Specialized
Memory to handle 3D stereo images and organize the graphic
data access patterns in the form of descriptors which takes
few operations for generating addresses at run-time. We also
use the ultrasound sensor to create an image from sound
and applies Ultrasonic Image Reconstruction using Bayesian
Image Reconstruction [16]. The results show that ViPS achieve
19.4% and 10.6% of speedups respectively while applying
Ultrasonic Image Reconstruction application over ultrasound
sensor.
V. CONCLUSION
In this paper, we have suggested a Visual Processing System
(ViPS) for medical imaging applications. The system takes
high-resolution images and supports video at higher frame rate
without the help of a processor. The ViPS system provides
efficient data access from image sensors that eliminate the
on-chip/off-chip bus delays for arranging and gathering data.
In the future, we plan to provide more functions to support
image/video processing applications.
REFERENCES
[1] OGRE: Object-Oriented Graphics Rendering Engine. http://www.
ogre3d.org/.
[2] Irrlicht: An Open Source High Performance Realtime 3D Engine. http:
//irrlicht.sourceforge.net/.
[3] Ali et al. Alzaabi. Tcct: A gui table comparison computer tool. In
Emerging Trends in Computing, Informatics, Systems Sciences, and
Engineering. Springer.
[4] ATI Technologies Inc. http://www.amd.com/.
[5] Visual computing technology from NVIDIA. http://www.nvidia.com/.
[6] Reuven Bakalash and Arie Kaufman. Medicube: A 3d medical imaging
architecture. Computers & Graphics, 13(2), 1989.
[7] Bluetechnix Black DSP. http://www.xbow.com.
[8] Lee, June-Sok and Jung, You-Young and Kim, Byung-Soo and Ko, Sung-
Jea. An advanced video camera system with robust AF, AE, and AWB
control. IEEE Transactions on Consumer Electronics.
[9] Jinghong, Duan and Yaling, Deng and Kun, Liang. Development of
image processing system based on DSP and FPGA. In Electronic
Measurement and Instruments, 2007. ICEMI’07.
[10] Tassadaq Hussain, Oscar Palomar, Adrian Cristal, Osman Unsal, Eduard
Ayguady, Mateo Valero and Amna Haider. Stand-alone Memory
Controller for Graphics System. In The 10th International Symposium
on Applied Reconfigurable Computing (ARC 2014). ACM, 2014.
[11] Tassadaq Hussain and Amna Haider. PGC: A Pattern-Based Graphics
Controller. Int. J. Circuits and Architecture Design, 2014.
[12] Tassadaq Hussain, Miquel Pericas, Nacho Navarro and Eduard Ayguade.
Reconfigurable Memory Controller with Programmable Pattern Support.
HiPEAC Workshop on Reconfigurable Computing, Jan, 2011.
[13] Tassadaq Hussain, Muhammad Shafiq, Miquel Pericas, Nacho Navarro
and Eduard Ayguade. PPMC: A Programmable Pattern based Memory
Controller. In ARC 2012.
[14] Tassadaq Hussain, Oscar Palomar, Adrian Cristal, Osman Unsal, Eduard
Ayguady and Mateo Valero. Advanced Pattern based Memory Controller
for FPGA based HPC Applications. In International Conference on High
Performance Computing & Simulation, page 8. ACM, IEEE, 2014.
[15] Richard Hartley and Andrew Zisserman. Multiple view geometry in
computer vision, volume 2. Cambridge Univ Press, 2000.
[16] Jinyi Qi, Richard M Leahy, Simon R Cherry, Arion Chatziioannou, and
Thomas H Farquhar. High-resolution 3d bayesian image reconstruction
using the micropet small-animal scanner. Physics in medicine and
biology, 43(4), 1998.
