FPGA implementation of a simple 3D graphics pipeline by Kašík, Vladimír & Kurečka, Aleš
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
FPGA Implementation of a Simple 3D Graphics
Pipeline
Vladimir KASIK, Ales KURECKA
Department of Cybernetics and Biomedical Engineering, Faculty of Electrical Engineering and Computer
Science, VSB–Technical University of Ostrava, 17. listopadu 15, 708 33 Ostrava, Czech Republic
vladimir.kasik@vsb.cz, ales.kurecka@vsb.cz
DOI: 10.15598/aeee.v13i1.1125
Abstract. Conventional methods for computing 3D
projects are nowadays usually implemented on stan-
dard or graphics processors. The performance of these
devices is limited especially by the used architecture,
which to some extent works in a sequential manner.
In this article we describe a project which utilizes pa-
rallel computation for simple projection of a wireframe
3D model. The algorithm is optimized for a FPGA-
based implementation. The design of the numerical
logic is described in VHDL with the use of several basic
IP cores used especially for computing trigonometric
functions. The implemented algorithms allow smooth
rotation of the model in two axes (azimuth and eleva-
tion) and a change of the viewing angle. Tests carried
out on a FPGA Xilinx Spartan-6 development board
have resulted in real-time rendering at over 5000 fps.
In the conclusion of the article, we discuss additional
possibilities for increasing the computational output in
graphics applications via the use of HPC (High Perfor-
mance Computing).
Keywords
3D projection, FPGA, parallel processing, real
time, VGA, VHDL.
1. Introduction
The drawing of graphics scenes in 3D obtained from
their representations requires the processing of large
volumes of data. Special chips are available for this
purpose – GPUs which rely on mass parallelization.
Under usual circumstances, CPUs are not suitable for
these tasks (even though there do exist instruction sets
supporting multiple computations), since by their de-
sign they process instructions serially and hence would
require much larger frequencies to achieve comparable
speeds. FPGA also support high parallelization and
may be used to achieve high computational through-
puts.
This project originated as a semester project with an
initial goal of drawing 3D projections of simple wire-
frame models in real-time on a single chip, where the
intent was to achieve very high values of fps. Since
the described problem commonly lies beyond the boun-
daries of usual microcontrollers/CPUs, the solution has
led to the creation of a hardware graphics pipeline for
drawing on a screen via the VGA interface [1].
2. Graphics Pipeline
Current GPUs comprise many cores containing unified
shaders, which allow the realization of operations pre-
viously carried out by vertex units, pixel units, TMUs
(texture mapping units) and ROPs (render output
units). Drawing of 3D models on the screen is basically
the results of several consecutive blocks (simplified) [5]:
• Primitive processing – reading primitives, vertices
and their connection.
• Vertex shader – the vertex shader transforms co-
ordinates of vertices by their multiplication with
the matrices of the scene. This is where the trans-
formation from 3D→ 2D occurs.
• Primitive assembly – vertices are joined into pri-
mitives.
• Rasterization – primitives are rasterized into pi-
xels.
• Pixel shader – this is applied to each pixel of the
rasterized scene and computes its color. This step
also applies textures.
c© 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 39
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
A vertex unit (whose functionality is nowadays in-
cluded in the vertex shader) will suffice for the pur-
poses of this work. The data source is a ROM with the
vertices of the model, which is then transformed by
the vertex unit to vertices in a plane. These vertices
are then joined by line segments and drawn in video
RAM, from which the VGA adapter will subsequently
generate VGA signal for the screen.
The graphics pipeline consists of the unit carrying
out the computation of the projection matrix (Grx-
GenerateProjectionMatrix ) and the unit multiplying
the projection matrix with the vertices of the displayed
model (GrxVertexProjection). Both GrxGeneratePro-
jectionMatrix and GrxVertexProjection together form
the vertex unit and ensure the actual 3D→ 2D display.
The obtained 2D vertices are scaled to the required size
and converted from decimal numbers represented with
a fixed decimal point to integers into the monitor co-
ordinates system and stored in the memory cache (the
2D vertex bank). Vertices from the cache are read by
the unit drawing the wireframe model based on their
connection map (from the ROM model). The model
is drawn in the black-and-white video RAM (frame
buffer), from which display data are read by the VGA
adapter and displayed on the screen.
Displaying is ensured by the VGA adapter with bi-
nary modulation of base colors. The control of the
whole Cubido3D project is ensured by one primary and
several local FSMs (Finite State Machines). A sim-
plified diagram of the graphics pipeline is provided in
Fig. 1.
Fig. 1: Simplified diagram of the Cubido3D graphics pipeline.
Individual blocks of the graphics pipeline consist of
separate VHDL modules or optimized IP cores which
are a part of the Xilinx ISE development kit.
2.1. Perspective Projection
Displaying a 3D object in two dimensions is a linear
transformation over the R3 vector space into R2. A
special case of this is the projection transformation.
This transformation can be described by the projection
matrix. During projection, the dimension degrades
from dimR3 = 3 to dimR2 = 2 and the vectors ob-
tained by the transformation can be used to display
the object on a plane (e.g. a screen). There exist two
types of projection which are used in graphics: par-
allel projections (which include isometric, orthogonal,
oblique projections etc.) and perspective projections.
In this article we focus on the latter type of projec-
tions. Linear perspective projections always work with
a representation of the beams from the projected ob-
ject to the observer’s eye (the camera) through a plane,
on which the object is projected. See Fig. 2 for an il-
lustration.
Fig. 2: Perspective projection of an object onto a plane.
The transformation matrix realizing perspective pro-
jection is obtained by multiplication of 3 matrices [4]:
A = P ·T ·R. (1)
• The transformation matrix R, which is obtained
by a composition of rotation by axes x and z:
Rx =

1 0 0 0
0 cos(θx) − sin(θx) 0
0 sin(θx) cos(θx) 0




cos(θz) − sin(θz) 0 0
sin(θz) cos(θz) 0 0
0 0 1 0
0 0 0 1
 , (3)
R = Rx ·Rz. (4)
Rotation is typically entered in the form of an azi-
muth and elevation, where the following relations
hold:
c© 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 40
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
θz = −θaz,




Figure 3 illustrates the effects of the transforma-
tion matrix.
Fig. 3: Effects of the transformation matrix on the displayed
object.
• The translation matrix T translated the starting
point [0, 0, 0] of the coordinate system of the object
and has the shape:
T =

0 0 0 vx
0 1 0 vy
0 0 1 vz
0 0 0 1
 . (6)
The translation moves the beginning of the coor-
dinate system, usually to the center of the object.
This forms the center based on which the object









 cos(θel) · sin(θaz)− cos(θel) · cos(θaz)
sin(θel)
 . (7)
Fig. 4: Effects of the translation matrix on the displayed object.
If the coordinates of object vertices are set so that
the object lies in [0, 0, 0], the matrix becomes a
unit matrix and the translation is not necessary.
• The perspective transformation P carries out the
actual conversion from 3D→ 2D and has the form:
P =

0 0 0 0
0 1 0 0
0 0 1 0
0 0 − 1f df
 , f = d, (8)
where f is the focal distance, which can be com-




2 · tan(φ2 )
, (9)
φ determines the rate of the projection deforma-
tion of the object. If φ = 0, the projection de-
grades to an orthogonal projection.
Fig. 5: Effect of viewing angle on perspective deformation of
the display.
3. Implementation
The Cubido3D project implements a hardware-based
graphics pipeline based on FPGA. Projection, transla-
tion of objects and generation of video signal is adapted
for an architecture based on a programmable logic.
Aside from the graphics pipeline, the project also
includes a VGA adapter generating the video signal
for the screen, a memory of displayed models, blocks
generating the background image + panel image and
supporting logic (distribution of clock signal, control
of the graphics pipeline etc.). Pipelining and strong
parallelism are commonly used to obtain the target fre-
quency of 100 MHz.
3.1. Cubido3D
Cubido3D forms the Top Level Module. It synchro-
nizes input signals, controls the azimuths and elevation
by counters with acceleration and the viewing angle by
a counter with overflow protection. Combinations of
buttons also allow the selection of the drawing model
and the generation of a global reset.
The selected data of the model are, together with
the azimuth, elevation and viewing angle, transferred
to GrxGraphicUnit. Cubido3D connects the VGA
adapter and FPs display on a 7-segmented display.
Since FPGA doesn’t have sufficient memory for double
buffering, video RAM has a capacity of only 1 frame
and must thus be redrawn synchronously at the time
the screen is in an inactive area. Due to this, redraw-
ing is always called during the receipt of the vertical
synchronization impulse. Drawing is fast (takes ap-
proximately 200 µs in case of a cube) and thus finishes
before the screen transfers to the active area.
c© 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 41
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
Fig. 6: Example of an output generated by FPGA captured on
a VGA interface.
3.2. Implementation of the Graphics
Pipeline
The graphics pipeline consists of the unit for comput-
ing the projection matrix, the vertex unit, the simple
rendering unit and a two-port video memory.
The pipeline first deletes video RAM, computes the
projection matrix with transferred parameters (az-
imuth, elevation, viewing angle) and reads the normal-
ized 3D model from the vertex memory and their inter-
connections. This is then displayed in 2D and rescaled
for display on the screen (numbers are converted with a
certain offset and scale from FXP to integer form). The
computer 2D vertices are stored into the small cache.
The wireframe model is then drawn in the video mem-
ory from the computed 2D vertices.
1) Calculation of the Projection Matrix
(GrxGenerateProjectionMatrix)
GrxGenerateProjectionMatrix is a unit which com-
putes the matrix of a perspective display from received
values of the azimuth, elevation and viewing angle
(amount of perspective deformation). The computa-
tion of formulas Eq. (1) to Eq. (9) is adapted for pro-
cessing via FPGA. The center of the projection (the
target point) is fixed to the initial point of the coor-
dinate system. Despite best efforts to make the com-
putation as parallel as possible, it is strongly sequen-
tial and its processing is carried out by a state ma-
chine with 28 states. The computation of trigonometric
functions is carried out by the CORDIC unit (Cordic-
Core_SINCOS entity), which computes in parallel the
sine and cosine functions for the entered angle. The
computation of the tangent of the viewing angle is car-
ried out in 2 steps: the first is the computation of the
sine and cosine of the viewing angle, followed by their
division. Division is carried out in an adjoined serial
divisor. All additional computations are carried out by
the DSP48A1 unit which is part of the architecture of
the used FPGA. DSP48A1 is specifically configured to
carry out the following computation:
R = A+ (B − C) ·D. (10)
The unit works with numbers with a fixed decimal
(FXP) in the 10Q8 format. The computation is illus-
trated in the Fig. 7:
Fig. 7: Computation of the projection matrix by the GrxGe-
nerateProjectionMatrix module.
The resulting projection matrix is serialized into a
4×4×10Q8 bus, and thus has a width of 288 bits. The
bus is connected to computational units via registers
and bus multiplexes controlled by the state automaton.
The largest amount of running time is used by the se-
rial divisors, and hence they are initiated shortly after
the computation of the matrix begins and they work
in parallel with the other computations.
2) Vertex Unit (GrxVertexProjection)
The vertex unit computes 2D vertices of the image by
projecting their 3D template through multiplication
with the projection matrix received from GrxGene-
rateProjectionMatrix . The whole principle is very si-
milar to the computation carried out by the GrxGe-
nerateProjectionMatrix unit. The computation is con-
c© 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 42
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
trolled by a state machine (12 states) together with a
divisor and 2 DSP48A1. Both DSP48A1 realize the
following computation:
R = A+B ·D. (11)
The steps of the computation carried out by the ver-
tex unit are illustrated in the Fig. 8:
Fig. 8: Multiplication of the projection matrix and a vertex.
Since the computations of vertices are mutually in-
dependent, they can be carried out in parallel in several
identical vertex units.
The coordinates for the templates of vertices are read
from the model memory by the superordinate state ma-
chine in GrxGraphicsUnit. The obtained coordinates of
vertices in the plane are first converted from FXP for-
mat to coordinates on the screen, or more specifically
in the video memory, and stored in the memory cache
(Vertex2D_Bank).
3) Wireframe Model Rendering
(GrxDrawWireframe)
GrxDrawWireframe draws the wireframe from the 2D
vertex bank (Vertex2D_Bank). The unit reads the
connection map between individual vertices from the
model memory into the wireframe model, and these
vertices are then read from the bank of 2D vertices
and transferred to GrxDrawLine2D , which draws a line
segment between the vertices.
Memory control is managed by the unit’s own re-
sources.
4) Drawing a Line Segment
(GrxDrawLine2D)
This unit draws a line segment defined by 2 integer
2D vertices into the memory. The vertices need not
be ordered or otherwise preprocessing (the unit takes
care of this automatically). The speed of generating
points on the line is 1 point per 2 clocks. Generation
Fig. 9: One of the models stored in ModelDescriptionROM
drawn directly by the graphics pipeline with a detailed
view of the rasterization of the wireframe model.
and hence also writing can be stopped by a signal, and
if necessary this can be setup by the state machine
controlling video memory.
The state machine (12 states) first captures, com-
pares and if necessary adjusts the order and coordi-
nates of the input vertices (the unit transforms all line
segments into the first half of the first quadrant – an-
gular coefficient 0 to pi4 ). This is necessary to allow
the generation of coordinate x by a counter; larger co-
efficients would lead to the loss of points on the line,
see Fig. 10. It then computes their angular coefficient
(the increase in the vertical axis per unit step on the
horizontal axis). The number of the fractional (break-
line) bits of the angular coefficient is set automatically
so as to reach the target vertex. The unit then uses
the counter to generate the horizontal coordinate and
generates the vertical one by the accumulator. The
vertical coordinate is rounded. Coordinates are also
adjusted by the offset specified in point 1 and trans-
formed back into their quadrant. Finally, the write
signal is created.
Fig. 10: Drawing of a line segment and the effect of the angular
coefficient.
Generation of coordinates for a write request is car-
ried out in 2 cycles, and the unit is hence capable of
drawing 1 pixel per 2 cycles. In general the drawing
process takes circa 40+2n cycles (where n is the num-
ber of drawn points). Interrupts of writing called by
the video memory unit are not taken into account. Fi-
gure 9 illustrates the generation of a line segment.
5) Video Memory (GrxVideoRAM )
This module implements the video memory with a
function for quick deletion of the whole RAM. Read-
c© 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 43
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
ing and writing is realized by a state automaton, which
works with a larger internal data bus than the required
external one (and hence also a smaller internal address
bus). This allows much quicker deletion of RAM. Part
of the external address bus (the upper bits) addresses a
location in RAM, whereas the lower bits map the exter-
nal data bus to the greater internal data bus. Writing
can in some cases take longer than 1 clock, since it is
necessary to first read a data block in RAM, change
the corresponding group of bits in this block via the
input data for writing and then write this block, as
illustrated in Fig. 11. The current block is cached.
Fig. 11: Video memory architecture.
3.3. VGA Adapter
(GrxAdapterVGA)
A generic VGA adapter which supports various reso-
lutions and color depths based on the configuration of
[6]. Its correct operation is based on an incoming signal
with a VGA pixel frequency from the main clock dis-
tribution through the DCM (Digital Clock Manager)
module.
Clock domains are strictly separated. All simple sig-
nals are resynchronized by the GResynchronizer block
(part of the library of general project components) and
the transmission of video signal from the input clock
domain is resolved via a FIFO queue with separated
clocks for reading and writing. This queue is also used
as a line cache (writing is carried out on a line-by-line
basis). The queue is inserted as an IP core to ensure
timing and synchronization. Synchronization of writ-
ing in the line cache is carried out via the LineStrobe
(beginning of a line) and FrameStrobe (beginning of a
frame) signals in combination with the PixelRequest-
Axis (current coordinate for writing) signal.
3.4. Project-Oriented Modules
The source files of the project connect individual com-
ponents into larger wholes – they connect the memo-
ries, the graphics pipeline, and the VGA adapter and
also take care of synchronization, adjustment and pro-
cessing of input and output signals.
1) Model Memory
(ModelDescriptionROM )
The definition of models consists of a list of vertices
(Vertex3DBankROM ) and their connections (Vertex-
PointerROM ). Models are stored sequentially. Mod-
elsDescriptorROM is used to store the offset to Ver-
tex3DBankROM and VertexPointerROM, where the
model begins and simultaneously the length of the
records of the model in these memories. Vertex-
PointerROM consists of a map of vertex connections
– it contains tuples of indices (addresses) into Ver-
tex3DBankROM which define a line segment in the
wireframe model. Figure 12 illustrates the model mem-
ory architecture.
Fig. 12: Model memory architecture.
2) Background Video Signal Generator
(BackgroundVisualizer)
This takes care of the generation of RGB signals for
drawing the background in VGA. It draws a vertical
color shift from black.
Fig. 13: Examples of generated background for various azi-
muths and elevation (rotated by 90◦).
c© 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 44
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
The hue is based on the azimuth and elevation and
hence changes depending on the "rotation" of the ob-
ject. The formula used for the video signal v is:
















The computation of goniometric functions is carried
out by a separate CORDIC core with its own controller,
which periodically sends the azimuth and elevation to
the core and then writes these in the registers. The
computation is then a simple connection of adders and
multipliers with suitable pipelining.
Since an 8-bit color depth is insufficient for drawing
a color shift, the dithering technique is used through
the simple generation of noise generated by the GRand-
GeneratorLFSR block.
3) Panel Image (PanelImageROM )
This ROM stores the panel image drawn on the screen
(Fig. 14). The image has a resolution of 640×32 with a
2-bit color depth. The colors on the image are indexed.
Fig. 14: One of the images stored in PanelImageROM .
Image data are stored in the ROM organized as
(640× 32)× 2 bits. The address is this computed from
coordinates and the data output is 2-bit – i.e. 4 indexed
colors in total. The output of the previous ROM is sent
to the look-up table which converts the index to a spe-
cific color with a 256 color depth. Figure 15 illustrates
the realization of the memory.
Fig. 15: Storage of the panel image.
3.5. The Libgenerics Library
This library provides the basic functional blocks and
functions:
• GResynchronizer – Resynchronizer of one-bit
asynchronous signals into the internal clock do-
main.
• GResetSynchronizer – Resynchronizer of reset into
the internal clock domain.
• GDebounceFilter – Debouncing filter for button
press.
• GEdgeDetector – Detector of rising/falling/both
edges of the monitored signal.
• GAccumulator – Accumulator register with syn-
chronous reset and overflow detection.
• GRandGeneratorLFSR – Linear shift feedback
register (LSFR) implementing a general pseudo-
random number generator. This is requires pri-
marily for the creation of a smooth color shift in
the drawn background.
• GNonOverflowCounter – A non-overflowing bidi-
rectional synchronous counter with synchronous
reset and pre-divisor. This is used to set the view-
ing angle.
• GAccelerateCounter – Bidirectional binary
counter with customizable TOP value and syn-
chronous reset. The counter freely overflows
in both directions, has a configurable counter
acceleration speed and a pre-divisor of the clock
signal. This is used to set the azimuth and
elevation and creates the effect of "gradual"
rotation of the object on the screen.
• GBcdCounter – Generic BCD increasing counter
with synchronous reset.
• GBcd7Display – BCD display driver consisting of
7 segmented digits. Forms the FPS indicator to-
gether with GBcdCounter .
• GBlockRAM – 2-port block RAM with a single
clock signal.
• GSerialDivider – Unsigned generic serial divisor.
Designed based on the application note [2]. The
computation takes approximately 2n+2 clock cy-
cles (where n is the bus width), which allows for
future improvement.
c© 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 45
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
4. Future Work
Parallel processing may under certain conditions be
used to improve the efficiency of the 3D functions.
However, it needs to be said that this method is only
suitable for algorithms which can be efficiently paral-
lelized. In this case it is possible to use a large number
of FPGA circuits which are interconnected via appro-
priate data channels. Currently, there exists a num-
ber of commercially available HPC (High-Performance
Computing) systems ranging from dozens to thousands
of FPGA circuits. These computational units may sig-
nificantly speed up the computations in various areas,
such as medical imaging, cryptography, statistical data
processing, biological sciences etc.
Efficient HPC systems usually place FPGAs in in-
dividual cards inserted into slots of the motherboard
with high data throughput. Communication between
the user and FPGA cards is secured by the host com-
puter via the corresponding API interface.
The Rivyera HPC (SciEngines GmbH) system with
Xilinx FPGA circuits was selected for the further de-
velopment of the project. The efficient use of such a
HPC system is based especially on the design of a suit-
able design of the logical structure for FPGA circuits
and the programming of applications for data exchange
between the user and the FPGA logic. The digit de-
sign for FPGA based on VHDL can be created through
the Xilinx ISE development kit. The programming of
application software on the host PC is then possible
through the API for C and Java.
Fig. 16: Overview of a Rivyera HPC with FPGA circuits.
Rivyera HPC offers a highly efficient bus system
which allows the organization of FPGA circuits into
a systolic chain, which minimizes delays in the system
caused by connections.
Fig. 17: Linear systolic chain based on FPGA [8].
Individual Rivyera cards are equipped with massive
FPGA circuits of the S6-LX150 line. Each user FPGA
circuit on the card is connected to a memory subsys-
tem consisting of up to 512 MiB DDR3 RAM, 256 kiB
of EEPROM memory and a micro SD/HC Flash (se-
lected). Data transmission between individual FPGA
circuits will be carried out through the bus architecture
and connection diagrams implemented in the sophisti-
cated API.
Fig. 18: HW and SW computing methods in Rivyera HPC.
Assuming the efficient use of connections and op-
timal digit design, we can reach a data throughput
between adjacent FPGAs of up to 2 Gb · s−1. How-
ever, the actually usable throughput may differ based
on API and FPGA limitations [8] An efficient systolic
chain can be used between FPGA circuits on a single
card as well as between individual cards in the system.
The Rivyera can be equipped with up to 128 FPGA
circuits (or up to 256 FPGA circuits by doubling the
number of cards).
Fig. 19: RIVYERA supercomputer with 256 FPGAs [8].
5. Conclusion
The implementation of the project is intended for the
Xilinx Spartan-6 circuit with a graphics output to
VGA, e.g. [3]. The procedures and outputs of the
project are useful in many built-in control systems
which are based on FPGA circuits. The most inter-
esting applications will probably be found in techni-
cal equipment which relies on virtualization and real-
time computations. One of the areas where highly ef-
ficient computations and parallelism are both required
is medical data imaging. Another advantage of this
hardware-based solution is that it increases functional
c© 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 46
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
safety, which is a difficult task in the case of sequen-
tial tools based on microprocessors. Similar methods
have been used in areas such as mobile applications and
even home care systems, see [7]. Techniques for design
verification form an important part of the design me-
thods for these programmable circuits. These provide
us with near-certainty regarding the actual reliability
of the programmable logical circuits.
Tab. 1: FPGA device utilization summary.
Number of FSMs 12
Number of Block RAMs 32 of 32 (100 %)
Number of Slice LUTs 4237 of 9112 (46 %)
Number of bonded IOBs 28 of 232 (12 %)
Number of BUFG/BUFCTRLs 3 of 16 (18 %)
Number of DSP48A1s 10 of 32 (31 %)
Number of PLL_ADVs 1 of 2 (50 %)
Acknowledgment
This paper has been elaborated in the framework of
the project "Support research and development in the
Moravian-Silesian Region 2013 DT 1 - International
research teams" (RRC/05/2013). Financed from the
budget of the Moravian-Silesian Region. The work
and the contributions were supported by the project
SP2014/194 "Biomedicinske inzenyrske systemy X".
References
[1] KASIK, V., A. KURECKA and P. POSPECH.
3D Graphics Processing Unit with VGA Output.
IEE Proceedings - Circuits, Devices and Systems.
2005, vol. 152, iss. 3, pp. 388–393. ISSN 1474-6670.
DOI: 10.3182/20130925-3-CZ-3023.00081.
[2] AVR200. Multiply and Divide Routines. At-
mel, 2009. Available at: http://www.atmel.com/
Images/doc0936.pdf.
[3] Nexys3. Board Reference Manual. Digilent, 2013.
Available at: http://www.digilentinc.com/
Data/Products/NEXYS3/Nexys3_rm.pdf.
[4] BENSAALI, F., A. AMIRA and A. BOURI-
DANE. Accelerating matrix product on re-
configurable hardware for image processing
applications. IEE Proceedings - Circuits, Devices
and Systems. 2005, vol. 152, iss. 3, pp. 236–246.
ISSN 1350-2409. DOI: 10.1049/ip-cds:20040838.
[5] HO AHN, S. OpenGL programming tutorials, ex-
amples and notes written with C++ [online]. 2013.
Available at: http://www.songho.ca/opengl/
index.html.
[6] TRAN, V.-H. and X.-T. TRAN. An efficient ar-
chitecture design for VGA monitor controller. In:
2011 International Conference on Consumer Elec-
tronics, Communications and Networks (CEC-
Net). XianNing: IEEE, 2011, pp. 3917–3921.
ISBN 978-1-61284-458-9. DOI: 10.1109/CEC-
NET.2011.5768261.
[7] PENHAKER, M., M. STANKUS, J. KIJONKA
and P. GRYGAREK. Design and Application of
Mobile Embedded Systems for Home Care Ap-
plications. In: 2010 Second International Confer-
ence on Computer Engineering and Applications.
Bali Island: IEEE, 2010, pp. 412–416. ISBN 978-
1-4244-6079-3. DOI: 10.1109/ICCEA.2010.86.
[8] SciEngines GmbH. 2014. Available at: http://
www.sciengines.com.
About Authors
Vladimir KASIK was born in Vyskov, Czech Re-
public, in 1973. He received his M.Sc. in Cybernetics,
Automation and Control from the Brno University of
Technology, Czech Republic, in 1996 and his Ph.D. in
Technical Cybernetics from VSB–Technical University
of Ostrava, Czech Republic in 2000. Currently he is
an assistant professor at VSB–Technical University of
Ostrava, Department of Cybernetics and Biomedical
Engineering, Ostrava, Czech Republic, where he
teaches and collaborates with industry in the areas of
programmable logic, electronics, embedded and con-
trol systems. He is the author of several international
publications and in earlier years he attended a vari-
ety of lecture stays in Universite Joseph Fourier and
L’Institut National Polytechnique de Grenoble, France.
Ales KURECKA was born in 1989. He received
his M.Sc. degree from VSB–Technical University of
Ostrava, Czech Republic in 2013. He is currently a
Ph.D. student at the department of Cybernetics and
Biomedical Engineering. His research interests include
primarily localization techniques and embedded sys-
tems.
c© 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 47
