Integration of a Fractal Generator with Mali GPU by Kjøll, Per Kristian
Integration of a Fractal Generator with 
Mali GPU
Per Kristian Kjøll
Master of Science in Electronics
Supervisor: Per Gunnar Kjeldsberg, IET
Co-supervisor: Øystein Gjermundnes, ARM Norway AS
Department of Electronics and Telecommunications
Submission date: June 2012
Norwegian University of Science and Technology

Problem Description
Candidate name: Per Kristian Kjøll
Assignment title: Integration of a Fractal Generator with Mali-GPU
Assignment Text
This proposal describes a possible subject for a project thesis for students
with background in microelectronics and computer graphics. The description
given here is meant for an autumn project only. A possible extension of
the content into a master thesis in the spring may be discussed with the
supervisors as the work proceeds.
Abstract
In a recent master thesis Per Christian Corneliussen successfully developed
a fractal generator. The purpose of this project thesis is to take this frac-
tal generator, design AXI and APB interfaces to the fractal generator and
integrate it into a system with a Mali-400 GPU and ﬁnally run a OpenGL
demo that uses data structures generated by the fractal generator. A demo
has already been implemented by Per Christian, but the student is encour-
aged to extend it in order to give it a personal touch. The fractal generator
will be realized in an FPGA. When the fractal generator is successfully inte-
grated in the system, the student may participate in one of the regular demo
competitions at ARM Norway.
Introduction
A fractal is "a rough or fragmented geometric shape that can be split into
parts, each of which is (at least approximately) a reduced-size copy of the
whole,"[4] a property called self-similarity. A mathematical fractal is based
on an equation that undergoes iteration, a form of feedback based on recursion[9].
One such mathematical fractal is the fractal deﬁned by the Mandelbrot set.
The Mandelbrot set is a mathematical set of points in the complex plane, the
boundary of which forms a fractal [7]. The point c belongs to the Mandelbrot
set if and only if
|Zn| = 2, foralln = 0, whereZn+ 1 = Zn2 + c and Z0 = 0
An image can be created from the Mandelbrot set by mapping the (x,y)
coordinates of the pixels in the image to the real and imaginary parts of a
complex number. For each pixel it is computed how many iterations that
is necessary of Eq. 1 before the absolute value of the complex number Zn
exceeds 2. The number of iterations is then used as an index into a colour
palette which ﬁnally determines the colour of the pixel.
A globe is made by wrapping a map around a sphere. This process is known as
texture mapping in the ﬁeld of computer graphics, and is used for drawing or
wrapping an image on to a 3D object. Textures are in many cases generated
in advance to running a computer game, but they could also be generated
on the ﬂy.
Thesis statement
The master thesis by Per Christian Corneliussen describes a fractal generator
for use with a Mali GPU. Moreover it describes a demo program that creates
an animated 3D landscape where the height and the colour of the landscape
at any given point is determined by the z-coordinate of vertices generated by
the fractal generator.
In the ﬁrst part of this thesis the student should present the Mandelbrot set
and give an overview of the fractal generator.
The core of the fractal generator is complete, but it is necessary to also
implement a AXI and a APB interface in order to integrate it into a system
with Mali-400. This will be the main task in this project. The RTL code
must be written in Verilog.
In order to run the demo it will also be necessary to do some minor changes
to the software driver in order to conﬁgure the fractal generator prior to each
frame. The demo itself could also be reworked in order to give the student
the possibility of adding a personal touch to the demo.
Finally the system must be synthesized for FPGA, set the fractal generator
up to feed the OpenGLES application with datastructures and participate
in a demo competition held at ARM Norway.
Co-supervisor:Øystein Gjerdmundnes, ARM Norway AS
Supervisor:Per Gunnar Kjeldsberg, NTNU
II
Abstract
The Mandelbrot set is a well-known fractal with mathematical properties
that can be exploited to create 3D-landscapes. The operations required to
calculate a heightmap using the Mandelbrot set are highly parallelizable
and is thus suitable for a hardware implementation. Generation of 3D-
landscapes,on-the-ﬂy, using the Mandelbrot set is desirable since the Mandel-
brot set is inﬁnitely complex[4] and deterministic. This makes possible the
creation of many diﬀerent landscapes with complex patters in, for example,
computer games.
A previous master thesis[4] presents a vertex array generator(VAG) that
generates the vertices of a 3D-landscape based on an area of the Mandelbrot
set. This thesis explores diﬀerent architectures that connect this vertex array
generator with the Mali-400 graphics processing unit(GPU). The result is
that the VAG in its current state is not suitable for integration, mostly since
it does not support random access to vertices. Thus, a new fractal generator
architecture is presented, reusing parts of the VAG.
The new fractal generator is implemented in Verilog and its functionality
is veriﬁed using the Universal Veriﬁcation Methodology(UVM). Then, the
fractal generator is integrated with the Mali-400 GPU in an FPGA frame-
work and synthesized on FPGA. Tests are also performed at each step of
integration.
An OpenGL for Embedded Systems 2.0 demo is written to showcase the
functionality of the fractal generator. Changes have been made to the Mali-
400 drivers to automatically conﬁgure and set-up the fractal generator while
the demo is running.
The fractal generator is shown to be working as intended with a scalable
performance based on a number of internal cores. Using 64 cores the fractal
generator has a worst-case frame time of 51.1 ms at 400Mhz which equals a
frame rate of 450 frames pr second, vastly outperforming a software imple-
mentation.
The fractal generator is currently limited to creating landscapes of 128 · 128
points, the intention was to use the demo and driver to increase the resolution
but this has not been solved.
Increasing the resolution and optimizing the cache size of the fractal generator
has been left for future work.
III
Sammendrag
Mandelbrot-settet er en velkjent fraktal med matematiske egenskaper som
kan brukes for å tegne 3D-landskaper. De matematiske utregningene man
trenger for å regne ut høydene til et landskap er svært paralleliserbare og
egner seg for implementasjon i hardware. Generering av landskap basert på
Mandelbrot-settet er ønskelig siden settet er uendelig komplekst og deter-
ministisk, slik at mange forskjellige landskaper kan bli laget fra settet.
En tidligere masteroppgave beskrev en fraktalgenerator(VAG) som gener-
erte punktene til et 3D-landskap basert på et område av Mandelbrot settet.
Denne masteroppgaven utforsker forskjellige hardware-arkitekturen som kan
koble VAG til Mali-400 GPU. VAG viser seg å være uegnet for integrasjon
med Mali og det blir bestemt at en ny fraktalgenerator skal lages som kan
gjenbruke deler av VAG.
Den nye fraktalgeneratoren er implementert i Verilog og dens funksjonalitet
er testet med Universal Veriﬁcation Methodology(UVM). Deretter blir frak-
talgeneratoren integrert med Mali-400 og syntetisert på FPGA.
En OpenGL for Embedded Systems 2.0 demo har blitt skrevet for å vise
funksjonaliteten til fraktalgeneratoren. Endringer har blitt utførst på Mali-
400 driveren for å automatisk konﬁgurere og sette-opp fraktalgeneratoren
mens demoen kjører.
Fraktalgeneratoren fungerer som planlagt med skalerbar ytelse basert på et
antall indre kjerner. Ved bruk av 64 kjerner har fraktalgeneratoren i verste
fall en frame tid på 51.1 ms ved 400Mhz, noe som utgjør en hastighet på 450
bilder i sekundet. Fraktalgeneratoren viser seg å være mye raskere enn en
software implementasjon av fraktalgeneratoren.
Fraktalgeneratoren er begrenset til å lage landskaper med en oppløsning på
128 · 128 punkter. Intensjonen var å bruke demoen og driveren for å øke
oppløsningen men dette fungerte ikke som planlagt.
Å øke oppløsningen og å optimalisere cache-størrelsen på fraktalgeneratoren
har blitt satt til framtidig arbeid.
IV
Preface
The problem description for this thesis was originally given for a term project
and asked for two interfaces to an existing fractal generator. The existing
fractal generator was designed in a previous master thesis given by ARM.
During the term project it was decided that the existing fractal generator was
not suited for integration with Mali-400 and a new fractal generator should
be designed.
The term project began on the work presented in this thesis, speciﬁcally it
explored diﬀerent architectures and begun on the fractal generator design.
Some of this work is recapped in this thesis to gather the whole fractal
generator design process in one place.
The Vertex Array Generator presented in Section 3 was designed by Per
Christian Corneliussen in a previous master thesis. The architecture explo-
ration and discussion in Chapter 4 was done in the term project.
The rest of the chapters present the work done for this thesis.
To keep the thesis brief, the chapters does not cover low level implementa-
tion details. Some exceptions are made when the details are important to
understand the discussions and choices made. If more details are needed, see
the appendices for source code.
In addition to this, descriptions of the Mali-400 and its drivers has been
limited in some cases due to NDA. The driver source code has been left out
for this reason as well.
Thanks to Per Gunnar Kjeldsberg(NTNU) and Øystein Gjerdmundnes(ARM
Norway AS) for help throughout the semester.
V
Contents
1 Introduction 1
1.1 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background Theory 4
2.1 The Mandelbrot Set . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 OpenGL ES 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 The Mali-400 GPU . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 The AMBA AXI Protocol . . . . . . . . . . . . . . . . . . . . 8
2.5 The AMBA APB Protocol . . . . . . . . . . . . . . . . . . . . 8
2.6 The PL301 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Hardware Veriﬁcation . . . . . . . . . . . . . . . . . . . . . . . 9
3 The Vertex Array Generator 11
4 Architecture Exploration 14
4.1 The Vertex Array Generator as an AXI Slave . . . . . . . . . 14
4.2 The Vertex Array Generator with DMA . . . . . . . . . . . . 15
4.3 Fractal Generator With Cache . . . . . . . . . . . . . . . . . . 16
5 Fractal Generator Design 19
5.1 Conﬁguration Parameters and Data Types . . . . . . . . . . . 19
5.2 Relationship Between AXI address and Coordinates of a Fractal 21
5.2.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 The APB Interface . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3.1 Verilog Implementation . . . . . . . . . . . . . . . . . . 23
5.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 24
VI
5.4 The AXI Interface . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4.1 Verilog Implementation . . . . . . . . . . . . . . . . . . 24
5.4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.5 The Arbiter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.5.1 Verilog Implementation . . . . . . . . . . . . . . . . . . 26
5.5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.6 The Coordinate Cache . . . . . . . . . . . . . . . . . . . . . . 28
5.6.1 Verilog Implementation . . . . . . . . . . . . . . . . . . 28
5.6.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.7 Fractal Generator Veriﬁcation . . . . . . . . . . . . . . . . . . 33
5.7.1 The Veriﬁcation Framework . . . . . . . . . . . . . . . 33
5.7.2 The Veriﬁcation Plan . . . . . . . . . . . . . . . . . . . 34
5.7.3 Results and Discussion . . . . . . . . . . . . . . . . . . 35
6 Integration of the Fractal Generator 38
6.1 Connecting Mali and the fractal generator . . . . . . . . . . . 40
6.1.1 Test Results . . . . . . . . . . . . . . . . . . . . . . . . 41
6.1.2 Integration Discussion and Conclusion . . . . . . . . . 41
6.2 FPGA Integration and Test . . . . . . . . . . . . . . . . . . . 42
6.2.1 Platform and Framework Description . . . . . . . . . . 42
6.2.2 Testing the FPGA Framwork . . . . . . . . . . . . . . 43
6.2.3 Test Results . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2.4 Synthesis Results and Test on FPGA . . . . . . . . . . 43
6.2.5 FPGA Test Discussion and Conclusion . . . . . . . . . 44
7 The OpenGL ES 2.0 Demo and Driver 45
7.1 The Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.1.1 The Initialization Phase . . . . . . . . . . . . . . . . . 46
VII
7.1.2 The Rendering Phase . . . . . . . . . . . . . . . . . . . 47
7.1.3 Conﬁguring the Fractal Generator . . . . . . . . . . . . 47
7.1.4 The Vertex Shader . . . . . . . . . . . . . . . . . . . . 48
7.1.5 The Fragment Shader . . . . . . . . . . . . . . . . . . . 48
7.1.6 Software Fractal Generator . . . . . . . . . . . . . . . . 49
7.2 The OpenGL ES 2.0 Driver . . . . . . . . . . . . . . . . . . . 50
7.2.1 The Mali GPU Open GL ES Driver Architecture . . . 50
7.2.2 Controlling the fractal generator using the Mali driver 52
7.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . 56
7.3.1 The Fractal Resolution . . . . . . . . . . . . . . . . . . 57
8 Proﬁling 59
9 Conclusion 61
9.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
10 Appendices I
A Source Code for the Fractal Generator I
B Source Code for UVM Veriﬁcation Framework XXIV
C Source Code for the Fractal Demo XLIII
VIII
List of Figures
1 An example system with Mali and the fractal generator. Pro-
duced from [13]. . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 The Mandelbrot set[4]. . . . . . . . . . . . . . . . . . . . . . . 4
3 The Mandelbrot Set with color and zoom[9]. . . . . . . . . . . 5
4 Two images of the mandelbrot set with diﬀerent iteration lim-
its.Left = 160 iterations, right = 80 iterations. Points that hit
the limit are colored red. . . . . . . . . . . . . . . . . . . . . . 5
5 The OpenGL ES 2.0 Graphics Pipeline [6]. . . . . . . . . . . . 6
6 The Architecture of the VAG. Taken from [4] . . . . . . . . . . 12
7 The Vertex Array Generator as an AXI slave. . . . . . . . . . 14
8 The Vertex Array Generator as an AXI master(DMA). . . . . 15
9 Structure of the Fractal Generator and how it draws a frame. . 18
10 Internal and external AXI address mapping of the vertices in
a 3x3 fractal. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
11 The internal components of the Arbiter and their state de-
scriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
12 The structure of the Coordinate Cache. . . . . . . . . . . . . . 32
13 The veriﬁcation framework. . . . . . . . . . . . . . . . . . . . 35
14 Coverage metrics logged by VCS while running the veriﬁcation
framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
15 Left:Fractal landscape using a blue to white gradient. Right:
The Odroid-A tablet. . . . . . . . . . . . . . . . . . . . . . . . 39
16 ARM Test Bench for Mali [15]. . . . . . . . . . . . . . . . . . 40
17 Connecting Mali and the fractal generator. . . . . . . . . . . . 41
18 1. Motherboard Express µATX. 2-3. Virtex6 FPGAs. 4.Coretile-
Express. 5.The A9 CPU . . . . . . . . . . . . . . . . . . . . . 42
19 Hardware and software components of the Mali graphics sys-
tem architecture(Linux). [18, p. 1.4] . . . . . . . . . . . . . . . 51
20 The fractal demo zooming in on a point. . . . . . . . . . . . . 56
IX
List of Tables
1 Synthesis results for the fractal generator with 8 FPGs on a
Xilinx Virtex-6x FPGA. . . . . . . . . . . . . . . . . . . . . . 44
2 Average frame time performance over 500 frames of the demo. 60
3 Average geometry processor frame time, vertex shader time,
and Polygon List Builder Unit time during 30 frames of the
worst-case scenario. All results estimated at 400 Mhz. . . . . . 60
1 Introduction
Today, more than half of the world population has a mobile phone. In 2011,
17% of mobile phones were smartphones and the percentage is increasing
rapidly[8].
One factor leading to the increased adoption of smartphones has been the
improvement in display and graphics technologies[3]. The phone has gone
from being just a phone to an entertainment platform with music, video,
games and the internet.
Since, among other tasks, 3D-rendering for games can always be done more
eﬃciently on special-purpose hardware than on a general-purpose CPU[3],
the increased demand for powerful 3D-rendering in smartphones require new
phones to have dedicated graphics processing units(GPU). An example of
such a GPU is ARM's Mali-400(Mali), which is for example available on the
smartphone Samsung S3.
In computer games and other graphical visualizations it is common to provide
a 3D-landscape or terrain for the user to navigate. The height coordinates of
a 3D-landscape can be calculated by exploiting the mathematical properties
of the Mandelbrot set, a well-known geometric fractal. The computations
involved are computationally expensive but highly parallelizable, each point
can be processed independently of the others, so the calculations are suited
for hardware implementation.
In this thesis, a fractal generator is a component that calculates the height
coordinates as explained above. The main motivation for designing a fractal
generator in hardware is 3D-landscape generation. A fractal generator can
generate landscapes on-the-ﬂy, requiring little or no memory between frames.
Since the Mandelbrot set is inﬁnitely complex[4] and deterministic, the gen-
erator can be used to draw a large amount of diﬀerent landscapes. Another
usage of a fractal generator is to draw aesthetically pleasing 2D images.
1.1 Thesis Objectives
The main objective in this thesis is to design a fractal generator and integrate
it with Mali on an FPGA. The fractal generator shall be able to accelerate
the generation of a 3D-landscape, where the 3D-landscape is based on the
Mandelbrot set and animated using OpenGL ES. The fractal generator must
be implemented in Verilog and shall communicate with Mali using the ARM
1
APB and AXI bus communication protocols. Figure 1 illustrates how a
fractal generator could be connected to a Mali system.
Figure 1: An example system with Mali and the fractal generator. Produced
from [13].
The secondary objective in the project is to alter the Mali drivers, such that
control of the fractal generator is performed automatically by the driver,
based on input from the OpenGL ES demo.
1.2 Thesis Outline
The thesis starts with background theory and previous work.
Chapter 2 will explain the concepts, terms and components needed to un-
derstand the later chapters of the thesis.
Chapter 3 describes the task of the fractal generator in detail and presents a
previous fractal generator implementation.
Chapter 4 explores diﬀerent hardware architectures that integrate a fractal
generator with Mali, discusses advantages and disadvantages, and ultimately
selects an architecture to implement in Verilog.
Chapter 5 presents the chosen fractal generator design and all its modules
in detail. Alternative choices are also discussed brieﬂy for each module.
2
Section 5.7 veriﬁes the behavior of the fractal generator hardware using the
Universal Veriﬁcation Methodology(UVM).
To avoid diﬃcult debugging, it is necessary to ensure working hardware on
FPGA before starting on the driver.
Chapter 6 describes the methodology used to achieve this. It connects the
fractal generator to Mali, integrates the whole system with an FPGA frame-
work and synthesizes it. Tests are performed at each step.
Chapter 7 presents the OpenGL ES demo, written to showcase the function-
ality of the fractal generator, and how it communicates with the driver to
control and set-up the fractal generator prior to each frame. Furthermore it
presents the Mali driver architecture and the changes made to enable software
control of the fractal generator.
Chapter 8 is the proﬁling chapter, it examines the performance of the fractal
generator and compares the hardware accelerated version with other solu-
tions.
Chapter 9 concludes the thesis, it discusses if using the fractal generator
increased performance, and if the fractal generator is useful for any practical
purposes.
3
2 Background Theory
This chapter explains the terms and theory needed to understand the con-
cepts, descriptions and discussions used in the rest of the thesis.
2.1 The Mandelbrot Set
The Mandelbrot set is a fractal set of complex numbers c. The set is formally
deﬁned by the iterative equation
Zn+1 = Z
2
n + c, c, Z ∈ C (1)
where the number c is part of the Mandelbrot set if the equation remains
bounded when n− > ∞. It can be shown that if (|Zre| > 2) || (|Zim| > 2),
the equation will diverge [2, p. 81]. Thus, to examine if a point is in the
Mandelbrot set or not, one has to iterate the equation until either the real
or imaginary part of Z exceeds two.
The Mandelbrot set is named after Benoit B.Mandelbrot and is inﬁnitely
complex[5, p. 197] and connected[10]. If the Mandelbrot set is plotted in a
two-dimensional coordinate system as in Figure 2,using the real and imag-
inary part of c as its x- and y-coordinates, it shows a boundary with a
distinctive and easily recognizable two-dimensional fractal shape.
Figure 2: The Mandelbrot set[4].
By coloring points in the aforementioned 2D coordinate system with a gra-
dient, based on the number of iterations required to determine if the point
is in the Mandelbrot set or not, the Mandelbrot set reveals a complex and
aesthetically pleasing structure. Zooming in on speciﬁc areas of the set shows
self-similarity and high detail as in shown in Figure 3.
4
Figure 3: The Mandelbrot Set with color and zoom[9].
Since many points require an inﬁnite number of the above iterations, it is nec-
essary to set an iteration limit when drawing the Mandelbrot set in practice.
This limit will aﬀect the detail of the resulting image(Figure 4).
Figure 4: Two images of the mandelbrot set with diﬀerent iteration limits.Left
= 160 iterations, right = 80 iterations. Points that hit the limit are colored
red.
The Mandelbrot set can be used to create 3D landscapes: Instead of, or in
addition to, using the number of iterations to color each point, as explained
above, the number is used as a third coordinate z. The z-coordinate repre-
sents the height of the landscape at that point. For example, the point C1 in
Listing1 would get a z-coordinate of 80(the iteration limit). Since zooming
in on the Mandelbrot constantly reveals new patterns and more complexity,
the set can be used to generate a large number of unique landscapes.
1 MAX_ITERATIONS = 80
C1 = (0, 0) = 0 + i0
3 Z0 = 0
Z1 = Z20 + c = 0
5 . . .
Z80 = 0
7 Z i s s t i l l bounded when the a lgor i thm reaches the i t e r a t i o n
l im i t , so (0, 0) i s part o f the Mandelbrot Set .
Listing 1: Example of iteration
5
2.2 OpenGL ES 2.0
OpenGL ES is an Application Programming Interface(API) for 3D graph-
ics in embedded systems(ES). It is based on the widespread desktop-API
OpenGL, and aims to be smaller and optimized for constrained devices such
as mobile phones[6]. In short, the API is a portable and fast software in-
terface to graphics hardware[7]. This thesis will use OpenGL ES 2.0 when
coding a demo(Section 7.1) for use with the fractal generator.
Figure 5: The OpenGL ES 2.0 Graphics Pipeline [6].
OpenGL ES 2.0 implements the graphics pipeline in Figure 5. The basic
function of the API, or pipeline, is to project vertices represented in three-
dimensional virtual space onto a two-dimensional screen. In addition to
this, the API facilitates vertex transformation prior to the projection. For
example, the vertex shader can transform the vertex positions, compute
the lighting at each point and more. Since many of these transformations
are done using expensive matrix operations, special hardware(GPUs) is used
to accelerate the API calls.
When using OpenGL ES functions in an application, a call is made to the
OpenGL ES driver. The driver passes information, data structures and hard-
ware control register settings, to the GPU(if there is one) [19] [16]. Section 2.3
explains how the Mali GPU uses this information to draw frames.
A brief explanation of some OpenGL ES concepts:
6
Primitives:
When OpenGL ES draws an object it constructs it by combining groups
of vertices into primitives[7]. The most common primitive is the triangle,
and there are several drawing modes that uses the triangle as a primitive[4].
If an array contains three vertices,i.e. nine coordinates((x,y,z)*3), and it is
drawn with the GL_TRIANGLES mode, it will be drawn as a single triangle.
Adding three more vertices will add another triangle.
Triangle Strip:
When drawing a second triangle primitive; instead of drawing a new triangle
by adding three vertices as above, a new triangle can be drawn by only
adding one more vertex and draw lines to it from the ﬁrst triangle. The
GL_TRIANGLE_STRIP mode exploits this fact and avoids drawing all the
vertices shared between triangles multiple times[6][7]. However, to keep the
concept of primitives, some must still be drawn twice.
Vertex buﬀer object(VBO):
Vertex buﬀer objects allow OpenGL ES applications to allocate and cache
vertex data in high performance graphics memory[6]. There are two types of
buﬀer objects; array buﬀer objects contains vertex data(the coordinates), and
element buﬀer objects describes the order in which to connect the vertices to
create an object. Element buﬀer objects are also called index arrays, which
are used throughout this thesis. All OpenGL ES vertices in this thesis are
stored in VBOs.
2.3 The Mali-400 GPU
The Mali-400 MP GPU(Mali) is a hardware accelerator for 2D and 3D
graphics systems. The GPU implements a graphics pipeline supporting the
OpenGL ES and OpenVG APIs[16]. The main processing units in the GPU
are the geometry processor(GP) and pixel processors(PP). The pipeline is
divided into two main jobs, a GP job and a PP job. Each job performs
speciﬁc parts of the pipeline, with the GP job doing the vertex shading and
the PP job doing the fragment shading.
Below is a brief explanation how the Mali-400 GPU reads and process vertex
data to draw geometry.
As described in the above section the OpenGL ES driver create data struc-
tures in memory for Mali and conﬁgures the hardware prior to each scene(frame).
Following this step, the Mali geometry processor transforms each vertex with
the instructions in a vertex shader program[16]. This vertex shader pro-
7
gram is written by the user and loaded in the OpenGL ES application. The
OpenGL ES driver compiles the program into a command list for the Mali
vertex shader.
The Mali vertex shader can have several input streams of vertex data where
each stream is read from memory by the vertex loader component.Vertices
from each stream can be transformed, moved or combined in the 3D space.
The shader can also add lightning or change the perspective of the vertices.
After the vertex shader there are several more steps in the graphics pipeline.
However, they are not relevant for this thesis and will not be explained here.
See [16] for more information about each stage. See[6] or [7] for a more
thorough explanation of the shader language.
2.4 The AMBA AXI Protocol
The Advanced Extensible Interface(AXI) protocol is a part of the Advanced
Micro controller Bus Architecture(AMBA) family. The AXI protocol is a
communication protocol suitable for high-performance, high-frequency sys-
tem designs[17].
The AXI protocol uses separate address/control and data phases. The pro-
tocol supports unaligned transfers, burst transactions, multiple outstanding
addresses and out-of-order transactions[17]. Figure 1 shows the bus structure
of an example system using the AXI and APB protocols.
The vertex loader component in Mali uses the AXI protocol. It reads vertex
data in transactions of 32bytes. The external AXI bus width from Mali is
128 bits. This means that the AXI requests from the vertex loader will result
in incremental bursts with two 16-byte(128bit) transfers in each burst. The
vertex loader is used to read data from the fractal generator.
2.5 The AMBA APB Protocol
The APB protocol is a communication protocol optimized for low power
consumption and reduced interface complexity. The APB protocol can either
be used with low-bandwidth peripherals that require less performance than
the AXI protocol, or, it can be used to program control registers of peripheral
devices[14].
In Mali, the APB protocol is used to conﬁgure internal control registers. Note
8
that there is no common external APB bus on the GPU, each APB interface
is instead given a range of addresses on the AXI bus. Reads or writes to this
range are converted to 32-bit APB signals ahead of the APB interfaces1.
2.6 The PL301
When connecting several AXI slaves and one AXI master to the same AXI bus
there needs to be a way to determine the intended target of an AXI request.
This can be done by giving each slave a range of AXI addresses. ARM has
a component called PL301 that can be inserted onto the bus and perform
this type of address management. The address mapping in the PL301 can be
conﬁgured using the ARM software AMBA-designer. The software outputs
the complete Verilog code of the conﬁgured component.
2.7 Hardware Veriﬁcation
This section will explain the terms and techniques that will be used to verify
the functionality of the fractal generator in Section 5.7.
Simulation-Based Veriﬁcation is the most commonly used veriﬁcation ap-
proach [12]. Simulation-based veriﬁcation uses a test bench to apply input
stimuli to, and record output from, a design. The output from the design is
then compared to a reference output. Thus, simulation-based veriﬁcation is
a form of veriﬁcation by redundancy [12].
Directed testing: Using the directed test approach, a list of tests that each
concentrate on a set of speciﬁc features are obtained from the hardware
speciﬁcation. The list of tests is then used as a veriﬁcation plan[22]. Test
bench stimuli vectors are written manually to exercise this test, and examine
the speciﬁc features, in the device-under-test(DUT). Once the test works, it
is marked as successful in the veriﬁcation plan, and the procedure is repeated
for the next test[22].
Constrained random testing: In a complex design, the directed testing ap-
proach consumes a lot of time and resources. When the complexity doubles,
it takes twice as long, or twice as many people to complete[22]. A faster
methodology is needed in order to obtain a high amount of coverage.
1For example, AXI requests in address range 0x0000-0x1000 could be routed to the
APB interface on the geometry processor.
9
Constrained random testing(CRT) can help eliminate much of the manual
nature of directed testing[1]. The CRT approach adds a component to the
test bench which automatically generates random stimuli and applies it to
the DUT. The stimuli can be constrained to emphasize certain aspects of the
DUT, and to avoid illegal inputs to the design[12].
Coverage: When a test bench using CRT is is randomly testing the design
states of a DUT, two important questions are: What have been tested by
this stimuli? Have the test bench veriﬁed enough? [12][22] When using
either CRT or directed testing, the answer to these questions, the veriﬁcation
progress, can be gauged using coverage.
Functional coverage is a measure of which design features have been exercised
by the tests [22], it measures the implementational completeness and correct-
ness of the functions obtained from the design speciﬁcation[12]. It is closely
tied to the design intent and is sometimes called speciﬁcation coverage.
Code coverage provides insight into how thoroughly the code of a design is
simulated by a test bench[12], it is the easiest way to measure veriﬁcation
progress [22] and can be measured automatically by simulator tools.
The Universal Veriﬁcation Methodology The Universal Veriﬁcation
Methodology(UVM) is a complete methodology that codiﬁes the best prac-
tices for eﬃcient and exhaustive veriﬁcation[21]. The UVM is implemented
as a System Verilog class library, and consists of several object-oriented,
reusable, UVM veriﬁcation components(UVC).
The UVM and libraries provide:
• Infrastructure to partition the veriﬁcation environment into speciﬁc,
hierarchical and reusable components. The infrastructure also stream-
lines the creation of test bench environments.
• Built-in functions to perform common activities like printing, compar-
ing and packing items.
• Module- and system level stimulus generation, where data can be ran-
domized and conﬁgured according to the system state.
• Incorporating functional coverage and data checks using best-known
practices.
10
3 The Vertex Array Generator
The background theory states that objects drawn in OpenGL ES are passed
to the vertex shader as a stream of vertices. The vertex shader can then
perform transformations on the stream. Thus, in order for Mali to do trans-
formations on the landscape created by the fractal generator, the landscape
vertices has to be provided to the vertex shader. The actual job of the fractal
generator(in hardware) can be now be more precisely formulated. It has to
do two speciﬁc tasks:
1. Calculate the number of iterations of each point in a given landscape
with the Mandelbrot set equation.
2. Provide the landscape to the Mali GP as a stream of vertices.
This chapter presents a hardware fractal generator from a previous master
thesis [4] and explains how it generates a stream of vertices. To avoid confu-
sion with the fractal generator design in later chapters, the fractal generator
in this chapter is from now on referred to as the vertex array generator(VAG).
The discussions in Chapter 4 will use the VAG as a reference point 2.
The Vertex Array Generator The vertex array generator(VAG) imple-
ments the following algorithm to calculate the Mandelbrot set.
1. Take as input an area of the Mandelbrot set.
2. Calculate the height(using equation 1) of all vertices in a single row of
the area, starting with the bottom row.
3. Store the row as a triangle strip of vertices(as explained in Chapter 2).
4. Triangle strip is ready, signal Mali for retrieval.
5. Transmit strip and go back to step two, calculating all the rows from
bottom to top until the entire area is calculated and transmitted as
triangle strips.
The VAG is implemented in hardware by several components as shown in
Figure 6, each component and their roles in the algorithm is described below.
2Chapter 4 explores diﬀerent architectures that integrates a fractal generator with Mali.
11
Figure 6: The Architecture of the VAG. Taken from [4]
The control state machine(CSM) controls the execution of the algorithm. It
takes as input three values, left, bottom and step size. These values, together
with NUMPOINTS described below, deﬁne which area of the Mandelbrot
set to draw. Left and bottom is the coordinates to the bottom left point
in the area, step size is the distance between points. After receiving the
area, the CSM calculates the xy- coordinates of each vertex point in the
current strip of the area, these coordinates are then fed to the fractal point
generators(FPGs).
The fractal point generator(FPG) does the calculations for each vertex at
step two in the algorithm. It takes as input the x- and y- coordinates of a
pixel, use the coordinates as the real and imaginary values of the point c in
equation 1, and returns the z-coordinate; i.e., how many iterations it takes to
determine if the pixel is in the Mandelbrot set or not. The VAG can contain
several FPGs to calculate z-coordinates in parallel.
12
The FPG arbiter decides which of the FPGs that get to do calculations on
a xy-coordinate from the CSM, it keeps track of which FPGs are busy and
free. The FPG arbiter is also responsible for storing results from the FPGs
into the Z-memory.
All the vertices of the current triangle strip is stored in the Z memory. The
Z memory consists of two buﬀers with space to contain NUMPOINTS
vertex coordinates(x,y,z). NUMPOINTS is a constant parameter set ahead
of runtime, where the entire fractal has a resolution of NUMPOINTS ·
NUMPOINTS. The CSM keeps track of which buﬀer to store coordinates
in and avoids recalculating z-coordinates that was calculated the previous
triangle strip.
All coordinates in the VAG are represented by 16bit GLﬂoat values. This
datatype was chosen instead of the 32bit equvialent to save bandwith when
transferring the vertices to Mali.
13
4 Architecture Exploration
This chapter will examine diﬀerent architectures that integrate the fractal
generator and Mali. The chapter will discuss the advantages and disadvan-
tages of each architecture, and ultimately use the discussion to choose a
fractal generator design for Verilog implementation.
4.1 The Vertex Array Generator as an AXI Slave
As described in the previous chapter, the vertex array generator(VAG) out-
puts its vertex data using triangle strips. Drawing a frame is done strip by
strip from bottom to top. One possible way to integrate the vertex array
generator with Mali is to connect the VAG as a slave to the AXI bus from
the Mali L2 cache(Figure 7). Mali can then request strips one at the time
from the VAG with, slightly modiﬁed, AXI requests. The area of the Man-
delbrot set to draw from could be speciﬁed by extending the VAG with an
APB slave. This architecture is a very memory eﬃcient design since the all
Figure 7: The Vertex Array Generator as an AXI slave.
the coordinates(x,y and z) are transferred directly, i.e, not being stored in
memory ahead of Mali(Mali will store data later though). There are however
many disadvantages to this integration.
The main disadvantage with this architecture is that Mali has no say in which
order the vertices are incoming, the AXI address does nothing, Mali has to
read strip by strip. This is unacceptable, since the AXI bus from the L2 cache
does not always request data in the same order. Mali would have to buﬀer
all the vertexes internally and wait for the correct vertex before continuing.
Thus, the design needs to support random access to the z-coordinates.
14
Another disadvantage is that Mali would have to check for a new strip very
often, and be ready to receive data constantly, otherwise the VAG would
stop when the Z memory got full. Polling like this would consume time from
the L2, and the VAG would be a bottleneck for the overall performance of
the system. An improvement would be to transfer data from the VAG to a
buﬀer connected to the bus, like a cache, and then transfer data in bursts.
Unfortunately, the vertex shader could still end up waiting really long for a
coordinate, if it was not included in the previous burst.
4.2 The Vertex Array Generator with DMA
Another way to integrate by using the vertex array generator is to make it
a master on the AXI bus and transfer data to the system RAM instead of
Mali(Figure 8). With this solution Mali can ask for data from the RAM
when it chooses, and random access would be supported, when the VAG is
ﬁnished with a frame.
Figure 8: The Vertex Array Generator as an AXI master(DMA).
However, the arrays would still come in order and, if a Mali requests a vertex
not yet in the RAM, it could take a long time before the vertex is ready. In
addition this would require double traﬃc on the RAM bus, since data would
be transferred to the RAM from the VAG, and then to Mali from the RAM.
Bus traﬃc is already a bottleneck in the Mali system(source) so this is a
large deterrent.
Another big disadvantage to using the VAG, in both the previous methods, is
that the VAG transfers all the coordinates of a vertex(x,y and z) via the tri-
angle strips. This is unnecessary since the xy-coordinates of the Mandelbrot
set are not needed to draw the landscape, they are only needed to calculate
15
the height at each point. Instead, a ﬂat landscape with static xy-coordinates,
stored in memory, can be used as a frame for all the fractal landscapes. When
drawing the landscape, the z-coordinates are read from the fractal generator
and the xy-coordinates are read from the static landscape. As long as the
total number of points are the same, and the z-coordinates are placed in the
correct order, the landscape will be correct.
Using static xy-coordinates is a big advantage since it greatly reduces the
traﬃc between the fractal generator and Mali. Thus, the xy-coordinates
that are transferred to Mali from the fractal generator should rather be stored
by using vertex buﬀer objects in OpenGL ES. In theory, this would enable
Mali to store the entire static landscape in cache. In addition to this, the
saved bandwidth from the static xy-coordinates can be used to change the
representation of the coordinates from 16 to 32bits. This will allow the fractal
generator to zoom in further on the Mandelbrot set before losing detail from
rounding of the decimal numbers.
One last disadvantage for the VAG is that the VAG forces Mali to use
the GL_TRIANGLE_STRIP method when drawing the landscape with
OpenGL ES. This is mostly a theoretical disadvantage, since triangle strips
are good for landscapes and one would rarely wish to use another drawing
method.
4.3 Fractal Generator With Cache
As discussed in the above sections, there are several disadvantages when
connecting the vertex array generator with Mali. This section will present a
new fractal generator that does not use the VAG. The new fractal generator
will use the discussions above as a basis. It will reuse the good elements of
the VAG and redesign the bad.
The new design will use the fractal point generators from the VAG to calcu-
late the z-coordinates. No disadvantages have been connected to this com-
ponent. It will also use the same method as the VAG to describe an area of
the Mandelbrot set, except that it will use the xy-coordinates of the top-left
corner and step size between points. This is instead of the bottom-left corner
and seems a more intuitive way to deﬁne an area, mostly personal preference.
The initial xy-coordinates will be referred to as x0 and y0 throughout the
thesis. The rest of the fractal generator design is new and is made to avoid
the disadvantages discussed above.
The new fractal generator is connected to the Mali AXI bus as an AXI slave.
16
The fractal generator consists of four main components described below. The
components are connected, and draw a frame, as described in Figure 9.
• The APB interface: This interface allows an APB master to conﬁgure
hardware registers inside the fractal generator. The registers decide
which area of the Mandelbrot set the z-coordinates will be calculated
from.
• The coordinate cache: A cache where the z-coordinates are stored as
they are calculated, the cache is built up by several RAMs pasted to-
gether.
• The AXI interface: This component is responsible for answering AXI
requests. Based on the address of a request, the AXI interface tries
to read the corresponding z-coordinates from the coordinate cache. If
the coordinates are not there, the AXI interface signals the arbiter to
calculate them. After reading the coordinates they are transmitted on
the AXI bus.
• The arbiter: This module uses the values of the APB registers to cal-
culate the x- and y-coordinates of all the vertices in the chosen area of
the Mandelbrot set. These coordinates are then arbitrated to a param-
eterized number of fractal point generators[4]. The FPGs calculate the
number of iterations for each point and the resulting z-coordinates are
stored in the coordinate cache.
The landscapes that the fractal generator generate consist of a static number
of vertices. The number is set by a parameter, NUMPOINTS, and is
equal to NUMPOINTS2. Another parameterized value in the design is
the number of FPG units inside the arbiter, this value decides how many
z-coordinates that can be calculated in parallel. These parameters are the
same as used in the VAG [4] and are presented further in Section 5.1.
This architecture has several advantages when compared to the ones using
the VAG. Most importantly it supports random access of vertices by AXI
addressing. Furthermore, continually calculating coordinates and using a
cache means that Mali can request coordinates when it wants to, without
aﬀecting the performance of the fractal generator. No polling is required. In
addition to this, the architecture saves bandwidth since it only transfers z-
coordinates, and since it doesn't use hardware triangle strips it is not limited
to using GL_TRIANGLE_STRIP in OpenGL ES.
17
Figure 9: Structure of the Fractal Generator and how it draws a frame.
This architecture combats all of the disadvantages seen when using the VAG.
It has been selected for implementation in Verilog. A detailed description of
the design is presented in the next chapter.
18
5 Fractal Generator Design
The previous chapter chose a hardware architecture for the fractal generator
and explained its general functionality(Section 4.3). This chapter describes
the functionality of the fractal generator in detail by presenting the frac-
tal generator's submodules. In addition to this, the chapter discusses some
alternative implementations of each module and explains why the chosen
implementation has been selected.
Section 5.1 presents the conﬁguration parameters of the fractal generator and
the data types used to represent coordinates used by the fractal generator.
Section 4.3 states that the AXI interface tries to read z-coordinates from the
cache based on the received AXI address. This requires a predeﬁned relation-
ship between AXI addresses and each vertex(x,y,z) in the fractal landscape.
A predeﬁned relationship enables Mali to request the z-coordinates of speciﬁc
xy-pairs by setting the current AXI address to the address that is mapped
to the wanted coordinate.
Section 5.2 explains how the fractal generator maps the vertices internally
and how to translate from an AXI address to a given vertex, the actual trans-
lation is done by the AXI interface.
The following sections explain the functionality and Verilog implementation
of each submodule of the fractal generator(Figure 9).
5.1 Conﬁguration Parameters and Data Types
This section describes the three conﬁgurable parameters in the fractal gen-
erator. It also describes the data types used by the fractal generator.
NUMPOINTS This parameter sets the amount of total points in the frac-
tal made by the fractal generator. The fractal generator creates a
square fractal of NUMPOINTS ·NUMPOINTS z-coordinates. This
parameter together with the APB conﬁguration registers decide which
area of the Mandelbrot set to cover with the fractal. TheNUMPOINTS
parameter is limited to be a power of two, this is done to reduce calcu-
lations with the parameter to bit-shift operations.
The current OpenGL ES demo uses one index array to describe a frac-
tal. Since the index array is limited by the data type unsigned short,
the maximum value of NUMPOINTS with this implementation is
19
128.3 Unless drawing very small landscapes, 128 is recommended as
the standard value for this parameter. The resolution of the ﬁnal land-
scape is further discussed in Section 7.3.1.
NUMUNITS This is the number of FPGs in the system. The parameter
conﬁgures the amount of z-coordinates that can be calculated in par-
allel. Increasing this parameter will increase the performance of the
fractal generator at the cost of area. Chapter 8 examines the perfor-
mance of the fractal generator with varying NUMUNITS.
MAX_ITERATIONS This is the iteration limit the FPGs use when de-
termining if a point is in the Mandelbrot set or not. If the iterations
reach this limit, the point is deﬁned as part of the Mandelbrot set.
Lowering this number will increase the performance of the fractal gen-
erator, since each point will be ﬁnished faster by the FPGs. However,
it will also reduce the detail of the produced image, since the maximum
height is limited. The points that would've gotten higher iterations will
all have the same height and color. Chapter 8 will show an example of
this.
Data Types As mentioned in Section 4.2, the amount of bits used to
represent coordinates in the Mandelbrot set aﬀect the maximum zoom level
before losing detail. This loss of detail is caused by rounding decimal numbers
when they reach the limits of their data types. Increasing the size of the
data types, e.g. from 16bit to 32bit ﬂoating point(fp32) values, increases the
details of the landscape at the cost of bandwidth and storage4. Since many
interesting aspects of the Mandelbrot set appear at large levels of zoom,
it was decided to use fp32 values when calculating on the Mandelbrot set
internally.
This means that the inputs to the APB interface are 32 bits for the initial co-
ordinate(x0,y0) and step size. In addition to this, the internal calculations on
the fractal landscape are done with fp32 operations. Since fp32 calculations
are done in one cycle, the internal performance is not aﬀected by choosing
this data type, but storing the temporary coordinates take more space. See
Section 5.5 for more information about the calculations.
The output of the fractal generator are the z-coordinates of the landscape.
3Since the vertex array with NUMPOINTS = 256 would have indexes above the
range of unsigned short
4The thesis uses the IEEE754 binary32 standard to represent numbers in fp32.
20
Since the z-coordinates are the number of iterations in the Mandelbrot set
equation, they can be represented as integers without loss of detail. Thus,
the z-coordinates are stored as integers internally.
However, the programming language used to write the vertex-shader and
fragment-shader(Section 7.1), does not allow a stream of integers as input [11,
p. 31]. It only allows ﬂoating point numbers. Thus, the z-coordinates are
converted to fp32 on the output of the cache. It would be better to convert
the coordinates to fp16, but since the limitation by the shader language was
discovered late in the design, and ARM had ﬁnished unsigned integer to fp32
converters, fp32 it was chosen instead. This causes a bit of performance loss
since reading data requires twice as many bursts. Since it does not aﬀect the
internal performance, and the internal performance of the fractal generator
will be the bottleneck in the system5, converting to fp16 is left as future
work.
5.2 Relationship Between AXI address and Coordinates
of a Fractal
As explained above there needs to be a predetermined method of translation
between an AXI address and a given fractal vertex, this section explains how
the translation is performed.
Internally, the fractal generator maps the z-coordinates to the address range
0 to NUMPOINTS2 − 1(Figure 10). The ﬁrst row of the fractal is given
addresses 0 to NUMPOINTS − 1, with increasing address from left to
right. There is a total of NUMPOINTS rows where each row is divided
into addresses like the ﬁrst. Thus, the address range of row n is given by the
below equation.
row n address range = [n ∗NUMPOINTS (n ∗NUMPOINTS) +NUMPOINTS − 1)
where 0 <= n <= NUMPOINTS − 1
(2)
This address setup enables easy translation from an address to a speciﬁc
fractal vertex: The row of the vertex is found by address/NUMPOINTS
and the column is found by address mod NUMPOINTS. Using the param-
eters set via the APB interface, the column and row can be used to calculate
the exact x- and y-coordinates(Section 5.5), which then can be used to cal-
culate the z-coordinates with the Mandelbrot equation(1. To ensure fast
address translation in hardware, NUMPOINTS is limited to be a power of
5Mali only has to read data while the fractal generator is calculating.
21
Figure 10: Internal and external AXI address mapping of the vertices in a
3x3 fractal.
two, which reduces the calculations of x and y to shift operations. See the
arbiter source code for implementation, and see Section 5.1 for setting the
NUMPOINTS parameter.
Externally, the AXI bus uses byte addressing and the fractal generator is
mapped to a speciﬁc address range on the bus. Otherwise the mapping
is the same as the internal; coordinates are address mapped in row major
order(Figure 10) 6.
The AXI Interface is the only module that sees the external requests from
Mali, and it has to convert the addresses to the internal format before commu-
nicating with the other modules. The conversion is done by simply ignoring
the left most bits of the AXI address, and dividing the resulting number by
four(by shift). The leftmost bits can be removed since they are used to select
components on the AXI bus, and this selection is performed ahead of the
AXI Interface. Since each coordinate is four bytes, the internal address can
then be found by dividing by four.
5.2.1 Discussion
Why not use the external mapping internally and remove the conversion?
Using the external mapping as the internal mapping and removing the con-
version would simplify the AXI interface. However, this would complicate
6The mappings are not equal by coincidence, the external mapping is set by the index
array in the OpenGL ES demo.
22
several of the internal calculations in the fractal generator. For instance, the
arbiter uses the address to calculate the row and column of the xy-coordinate
in the Mandelbrot set(Equation 1 on page 4). To calculate this with the ex-
ternal mapping, one would have to both remove the leftmost bits and divide
by four(to ﬁnd the column). This operation is identical to the address trans-
lation so nothing is saved.
Another example is in the coordinate cache, the cache use the address to
determine which rams to read and write from. To properly determine the
ram column, the address would have to be converted in this component as
well. Thus, it was chosen to separate the internal and external mappings.
No other internal mappings have been considered since no drawbacks have
been connected to the current one. One could switch to column major ad-
dressing in both hardware and the demo, but the performance would be the
same.
5.3 The APB Interface
The APB interface enables an APB master, e.g. the Mali driver, to conﬁgure
hardware registers in the fractal generator. The registers control which area
of the Mandelbrot set that is used by the fractal generator when drawing a
frame. Changing the contents of these registers thus change the landscape
drawn by the fractal generator.
5.3.1 Verilog Implementation
The APB interface is implemented in Verilog as a ﬁnite state machine. When
the interface detects an incoming APB write, it stores the incoming data in
one of the following registers. The destination register is determined by the
APB address.
• The x0-coordinate register, which determines the x coordinate, in the
Mandelbrot set, of the top left point of the fractal.
• The y0-coordinate register, which determines the y-coordinate, in the
Mandelbrot set, of the top-left point of the fractal.
• The step size register, which determines the distance between each
point in the chosen area of the Mandelbrot set. The step size value
determines the zoom-level of the area to be drawn from.
23
After storing data in one of the registers, the interface signals to the APB
master that data has been received and that it is ready to receive new data.
In addition to the above registers, the APB interface also provides read access
to a couple of debug registers. These registers provide information from the
arbiter and the AXI interface and can be read via APB. These registers are
not in use during normal operation, but are useful for debugging purposes.
For example, the Mali driver can read from these registers to conﬁrm that
the fractal generator has been conﬁgured properly and that it has started its
calculations.
5.3.2 Discussion
The APB interface is a very light-weight interface; it only uses a small subset
of the APB protocol to communicate. Since it so small, does no computation,
and no disadvantages have been observed, no alternative APB interfaces have
been considered.
5.4 The AXI Interface
The AXI interface module is deﬁned as an AXI slave device; it cannot initiate
AXI transfers on the bus[17]. Its main responsibility is to wait for an AXI
request from Mali and respond with the z-coordinates that correspond to the
AXI address of the request. The coordinates are read from the coordinate
cache described in Section 5.6 and transmitted on the bus in two bursts
with four coordinates(128 bit) each burst. Note that an AXI request only
provides the address of the ﬁrst coordinate in the transfer; the AXI interface
automatically has to read and transmit subsequent coordinates based on the
initial AXI address, and the burst and transfer length signals.7
5.4.1 Verilog Implementation
The AXI interface is implemented as a ﬁnite state machine. When there
is an AXI request, the AXI interface checks if the ﬁrst four coordinates of
the transfer are in the coordinate cache. If they are ready the AXI interface
checks the next four. On the other hand, if they are not, the AXI interface
7The AXI interface only supports transfers with one or two bursts, which equals four
or eight coordinates. The Mali vertex loader will never ask for more than two bursts, so
no more is needed.
24
interrupts the arbiter with the address of the ﬁrst missing coordinate. The
arbiter will then prioritize the request from the AXI interface and calculate
the missing coordinates. The AXI interface proceeds to wait for the coor-
dinates to be ready. After all eight coordinates are ready the AXI interface
examines if Mali is ready to receive the burst. If Mali is ready, the coordi-
nates are sent in two cycles, four coordinates each cycle. After transmission,
the AXI interface goes back to the waiting state and waits for a new request.
5.4.2 Discussion
The AXI part of the AXI interface is optimized for communication with the
vertex loader in Mali, as the APB interface it uses the bare minimum of
signals to communicate. No other implementations have been considered on
this part. On the other hand, several implementations have been considered
on the reading of coordinates from the coordinate cache and how to check
if a coordinate is valid(i.e. calculated by the arbiter and is not belonging to
an old frame). Currently, all the coordinates in a burst is read four at the
time, and if one of them are not ready, the address of the ﬁrst one is given to
the Arbiter. It would also be possible to check the coordinates one by one,
and only give the address of the missing one to the arbiter, but this would
require more cycles to check. Also, since the arbiter calculates coordinates
sequentially, it is likely that one missing coordinate in a burst means that
the other coordinates are also missing. In this case, checking one and one
coordinate would waste a lot of time compared to the chosen solution, since
the arbiter would have to be interrupted several times in a row. See the
discussion in the coordinate cache section for details on the validity check.
5.5 The Arbiter
The primary function of the arbiter 8 is to calculate all the z-coordinate
values of a fractal and store them in the coordinate cache. As displayed
in Figure 11 the arbiter consists of several sub-components; a feeder, the
fractal point generators(FPGs) and a coordinate storer. The number of FPGs
is parameterized and determines the number of z-coordinates that can be
calculated in parallel. The functionality of each component is explained
below.
8This is not a descriptive name, the module does more than arbitration, but good
names are hard to ﬁnd.
25
The job of the feeder is to feed x- and y-coordinates to the FPGs; the FPGs
will then calculate the z-coordinate of the given vertex. The Feeder is given
the initial coordinates and step size of the fractal landscape from the APB
interface; once the values have been received the Feeder starts calculating
the xy-coordinates of the vertices. When an xy-coordinate is ready it is fed
to the ﬁrst available FPG.
The feeder assumes that Mali will request the coordinates in incremental
order, based on AXI addresses as speciﬁed in Section 5.2 , so it calculates
coordinates in row major order until the fractal is complete or there is an
interrupt from the AXI interface. If there is an interrupt, the AXI interface
just tried to read missing coordinates from the coordinate cache. These
coordinates must be given priority since Mali is waiting for them. The feeder,
after ﬁnishing the current coordinate, handles the interrupt by calculating
the coordinates of the data burst that belongs to the address from the AXI 9.
Once the interrupt has been handled, the feeder continues to calculate from
the position it was interrupted.
The FPG is taken from the work in [4] and is explained in Chapter 3. It
uses Equation 1 to calculate the z-coordinate of a vertex, using the x- and
y-coordinate of the vertex as input.
The coordinate storer reads ﬁnished z-coordinates from the FPGs and stores
them in the coordinate cache.
5.5.1 Verilog Implementation
The feeder is implemented as a state machine: At the initial state it checks
for an interrupt from the AXI Interface. As explained above, the status of
the interrupt decides which pair of xy-coordinates to calculate next. In the
case of an interrupt, the feeder keeps track of its current position with an
internal address register.
Calculation of coordinates is done according to the steps below; note that
the feeder uses two memories to store ﬁnished xy-coordinates in, this is done
to save calculations as the coordinates are used NUMPOINTS times in the
calculations of a frame.
1. The coordinates of the top left vertex(x0,y0) and the step size between
9The address given to the arbiter from the AXI interface is in the internal format
described in Section 5.2. The arbiter converts this address into the corresponding xy-
coordinate.
26
vertices of the fractal are read from the APB interface.
2. The row and column of the x,y,z vertex is found by translating the
vertex address as described in Section 5.2.
3. The feeder checks if the xy-coordinates are in memory. If they are, they
have already been calculated and is fed straight to an available FPG,
skipping the next steps.
4. If the xy-coordinates are not in memory they are calculated:
y = y0 + (stepsize ∗ rownumber)
x = x0 + (stepsize ∗ columnnumber)
This step requires one 32bit ﬂoating-point adder and one 32bit ﬂoating-
point multiplier. In addition, it requires a 32bit ﬁxed- to ﬂoating-
point converter to convert the row and column numbers. All these
components are provided by ARM libraries.
5. Finally, the coordinates are fed to an FPG and stored in memory for
reuse at a later time.
The FPGs are taken straight from [4] so the implementation details are not
presented here. See Chapter 3.
The coordinate storer is implemented as a state machine. It continuously
checks if any of the FPGs are ﬁnished with a coordinate. When an FPG
ﬁnishes, its z-coordinate is written to the z-coordinate's address in the coor-
dinate cache.
5.5.2 Discussion
The arbiter has many design options, especially regarding the calculation
and storing of xy-coordinates. Currently it calculates the xy-coordinates and
then stores them in a memory. Another option is to not have memories; this
would save area at the cost of performance. Since performance is the most
important criteria for the fractal generator, memories are used.
Currently the xy-coordinates are calculated when they are used for the ﬁrst
time. An option that has been considered is to calculate the xy-coordinates
in parallel with the regular operation of the Feeder. This would improve per-
formance since the xy-coordinates would be more likely to be in the memories
when needed. However, implementing this would make the design more com-
plex, and it would only have an performance impact once for each vertex(the
27
ﬁrst time an x- or y-coordinate is seen). Thus, it has not been chosen for
implementation.
5.6 The Coordinate Cache
The coordinate cache stores the computed z-coordinates of the fractal sur-
face. It has two address inputs; one from the arbiter and one from the AXI
interface. The address from the arbiter is used to write a single coordinate
to memory, and the address from the AXI interface is used to read a burst
of four coordinates at once, starting from the coordinate at the AXI inter-
face address. When reading a burst, the coordinate cache indicates if all the
coordinates in a burst exist in memory. This information is used by the AXI
interface to interrupt the arbiter when coordinates are missing.
Upon reset, all the data in the coordinate cache is cleared to zero by the
AXI interface. The data is cleared in bursts of four coordinates each cycle.
Since the Mandelbrot equation can never result in zero iterations, a zero in
memory indicates that the coordinate at that address is missing. In addition
to the global reset, the fractal generator's reset is connected to the reset of
the Mali geometry processor(GP). The GP is reset after each completed GP
job; since the fractal generator only provides data to the GP, this can be used
to clear the coordinate cache as well. In fact, since the clear takes much less
time than the PP job following the GP job, this clear is completely without
performance loss.
5.6.1 Verilog Implementation
The coordinate cache consists of four separate memory blocks with com-
mon input and output signals(Figure 12). The separate memory blocks
are written in Verilog to infer as block RAMs when synthesizing on a Xil-
inx FPGA. Each memory has space for one z-coordinate in width, and
NUMPOINTS ∗ NUMPOINTS/4 coordinates in depth. Since the co-
ordinate cache is organized in rows of 4 coordinates it needs to do some
math to translate between address and z-coordinate position. All RAMs are
connected to the same enable and address(depth) bus. When reading and
writing from the RAMs the buses are set according to the following equation.
Write : enable[ARBITER_ADDRESS%4] <= 1;
address <= (ARBITER_ADDRESS >> 2)
Readburst : enable[3 : 0] <= 4′hf ;
28
address <= (AXI_ADDRESS >> 2)
(3)
The NUMPOINTS parameter must be restricted to be a multiple of 4 if
the addresses are shifted as above.
5.6.2 Discussion
The main advantage of organizing the coordinate cache with 4 separate
RAMS is that both reading a burst of coordinates and checking if the co-
ordinates are valid can be done in two clock cycles. This enables the AXI
interface to respond very quickly to Mali, leading to a higher frame rate if the
coordinates needed are in the cache. The main disadvantage of this method
is the translation needed from address to coordinate when reading and writ-
ing. These operations could potentially reduce the maximum frequency of
the system.
An easier implementation would be to have one big memory block with a
depth of NUMPOINTS2 coordinates. Reading a burst from this RAM
would take 5 cycles, and if the last coordinate was missing, the cycles would've
been wasted since the arbiter would get interrupted later. Since performance
is the main requirement of the fractal generator, the ﬁrst option was chosen.
Determining if data is valid in the cache can be done in many ways. Initially
the cache used a valid bit for each coordinate; when a coordinate was read
or written the bit got set to 0 or 1 respectively. As with the chosen solution,
this requires an initial reset to get the cache in a known state. Additionally
it requires an extra storage bit and a little extra logic in the memory blocks
compared to the chosen solution.
Since clearing the whole cache between frames doesn't aﬀect performance,
except at the ﬁrst frame, clearing the cache was chosen. Performance might
be lost at the ﬁrst frame, since Mali might request data while the clear is
ongoing. For all other frames the cache is cleared during a PP job. 10
Cache Size Currently the cache stores the entire NUMPOINTS2 frac-
tal. Since Mali requests vertices in a fairly predictive manner according to
the OpenGL ES index array, the memory size can be reduced without much
10This consideration did not include power consumption. Since the clearing method
requires more switching of bits, it probably uses more power overall.
29
performance loss. For example, a cache with ten rows and NUMPOINTS
columns could ﬁrst calculate the ﬁrst 10 rows of the fractal. As Mali began
reading from the rows, the data would be continuously replaced with the
fractal data from subsequent rows. However, this would require a more ad-
vanced addressing scheme. The fractal generator would need to keep track
of the rows currently in the cache, and know which external addresses they
belong to.
Reducing and optimizing the cache size has been left for future work. Using
the recommended parameters, the cache stores 128 · 128 coordinates which
equals 16kB of RAM.
30
Figure 11: The internal components of the Arbiter and their state descrip-
tions.
31
Figure 12: The structure of the Coordinate Cache.
32
5.7 Fractal Generator Veriﬁcation
This section will use the Universal Veriﬁcation Methodology to create a ver-
iﬁcation environment and perform tests to ensure that the fractal generator
is working as intended. This is done to limit the sources of error when inte-
grating the fractal generator with Mali in later chapters.
The Universal Veriﬁcation Methodology is an emerging standard of veriﬁca-
tion. Its biggest strength arguably lies in a large amount of code reuse and
ﬂexibility. However, setting up the ﬁrst framework for tests require quite
a bit of code, so it is not optimal when doing a single veriﬁcation like in
this thesis. Ultimately, the methodology was chosen mostly for educational
purposes.
A single veriﬁcation environment will be used to verify the fractal generator.
I.e., the framework will only connect to the top-level module of the generator.
In an industry situation, each subcomponent would have their own tests and
framework. The sub-tests would get inherited by the top-level framework,
ensuring correctness at all levels.
The framework uses a combination of directed and constrained-random test-
ing.
5.7.1 The Veriﬁcation Framework
The veriﬁcation framework is shown in Figure 13. The list below brieﬂy
explains the functionality of each of the components. The full source code
can be found in the appendix.
APB Agent The APB agent is connected to the APB interface of the fractal
generator. It consists of a sequence, sequencer, driver and monitor. The
sequencer provides APB items from the sequence to the driver. The
driver simulates the APB protocol and stimulates the fractal generator
with the contents in the given APB items. The monitor logs the APB
traﬃc and transmits the traﬃc information to the scoreboard.
An APB item contains the address,data and direction of an APB trans-
fer. The sequencer sends three items to conﬁgure the fractal generator
with x0,y0 and the step size.
AXI Agent The AXI agent has the same subcomponents as the APB agent.
It does the same as the APB agent except it is connected to the AXI
33
interface of the fractal generator and uses the AXI protocol to read
data from the fractal generator.
Virtual Sequencer The virtual sequencer controls the sequences in the
agents and thus controls the timing of the entire system. The sequencer
makes sure that the fractal generator is conﬁgured through the APB
interface before the AXI agent starts reading vertices.
Scoreboard The scoreboard is responsible for checking if the response from
the fractal generator is correct, it uses the data from both agents moni-
tors to do its own calculation of the fractal coordinates. The calculation
is done with the following steps.
1. The scoreboard registers when an APB transfer conﬁgures the
fractal generator and stores the conﬁguration values.
2. When an AXI read is performed by the virtual sequencer the score-
board checks the address of the read request. This address, and
the conﬁguration values from step 1, is used to calculate the co-
ordinate in the Mandelbrot set that maps to this address(5.2).
3. The coordinate from step 2 is used to calculate the rest of the
burst, the coordinates are input into a function calculate_iterations
that runs the Mandelbrot set equation on the coordinates. Calcu-
late_iterations is an external c function that has been imported
into System Verilog, it uses the same algorithm as the demo to
calculate the iterations for each coordinate.
4. The iteration values from the scoreboard calculations are com-
pared with the actual data read from the fractal generator.
5.7.2 The Veriﬁcation Plan
The following functional coverpoints have been chosen to verify the fractal
generator.
• Read the version register from the APB interface and conﬁrm the value.
This conﬁrms that the APB interface is properly connected.
• Read the debug register and check that the fractal generator has started.
This conﬁrms proper connection between the APB interface and the
other components. It also conﬁrms proper reset behavior.
34
Figure 13: The veriﬁcation framework.
• Read one address from the top, middle and bottom row of a given
fractal conﬁguration. This will conﬁrms the general functionality of
the fractal generator. It also checks all math and address translation
between the components
If the system were to be completely veriﬁed one should perform these checks
for a large range of random fractal conﬁgurations, i.e. many areas of the
Mandelbrot set. However, this veriﬁcation will only use one area of the
Mandelbrot set(x0=-1.5,y0=0.8, step size=0.01) withNUMPOINTS = 128
The functional coverpoints will be veriﬁed with assert-statements in the
scoreboard, manually changing the addresses of the AXI requests between
runs. In a more thorough test case one would use UVM to generate these
addresses randomly and test all the possible addresses of a fractal.
In addition to functional coverage, line coverage, state coverage and condi-
tional coverage will be measured automatically by the Synopsys VCS simu-
lator.
5.7.3 Results and Discussion
All the coverpoints in the veriﬁcation plan were successfully tested, the func-
tional coverage is 100%. The coverage measurements done by VCS are shown
in Figure 14.
The functional veriﬁcation showed that the fractal generator is working as
35
Figure 14: Coverage metrics logged by VCS while running the veriﬁcation
framework.
intended, the calculation of coordinates and address translation is correct.
The values in Figure 14 are a measure of how thoroughly the test bench
simulates the fractal generator. Some of the numbers seem too low, but
when looking deeper they are all acceptable.
For example, the conditional coverage of the AXI interface is only 33%, but
there are only three conditional blocks and two of them contain debug checks
for signal values that should never happen. Thus, the conditions will never
be fulﬁlled unless there is an error.
Two other examples are the state coverages of the AXI interface and the
arbiter. Both modules contain a state for ﬂushing of memories. Almost all
the missing state transitions involve moving to the ﬂushing state from the
normal operation states, this should never happen, so the state coverage will
be low.
In conclusion, the veriﬁcation was successful and the test conﬁrms that the
36
general functionality of the fractal generator is implemented correctly. If
more time was to be put in veriﬁcation, the test should be run with more APB
conﬁgurations to examine the fractal generator in detail. These tests should
also use constrained random testing and measure the functional coverage
automatically using the UVM capabilities.
37
6 Integration of the Fractal Generator
The two previous chapters presented the chosen design for the fractal gen-
erator and veriﬁed its functionality. This chapter will describe and verify
the integration of the fractal generator with Mali. In addition to this, the
chapter will connect the fractal generator and Mali to an FPGA framework
and synthesize the complete system onto an FPGA.
Ultimately, the fractal generator is to be controlled by the software driver.
Implementing and testing the driver without testing the communication be-
tween Mali and the fractal generator could lead to a lot of painful debugging;
a problem could be located in either hardware, software or both. It was there-
fore decided early in the design process that the hardware should be tested
independently. The following steps were performed to conﬁrm working hard-
ware on FPGA.
1. Design and verify the fractal generator(done in previous chapters).
2. Connect, as described in Chapter 4, the fractal generator with Mali.
3. Test the connection between Mali and the fractal generator.
4. Integrate Mali and the fractal generator with the FPGA framework.
5. Test the complete FPGA framework.
6. Synthesize the framework onto an FPGA and test it again.
7. Start implementing the driver.
ARM has a test system(Figure 16) that allows conﬁguration of Mali by pars-
ing commands from text ﬁles. The ﬁles are parsed and simulated using the
Synopsys VCS simulator. The commands can perform writes using the APB
protocol and change memory contents. Each test in the list above uses the
same(almost, see sections below) set of text ﬁles to run speciﬁc GP and
PP(see Section 2.3) jobs. The test system also provides access to the frame-
buﬀer output from the jobs. The framebuﬀer data can be displayed as an
image of the frame drawn by the jobs. This will be used to compare the
output from the tests with the original input.
For a test to examine the integration of Mali and the fractal generator, the
input text ﬁles needs to actually use the fractal generator for something.
This means that the test cannot draw a random image, it has to be made
38
speciﬁcally for the fractal generator. Making these text ﬁles manually is not
feasible, so a diﬀerent method was required.
The Mali drivers can be compiled and conﬁgured to activate dumping of
frames. When dumping a frame, you can choose to also dump all the infor-
mation about the memory, and the jobs, used to draw the frame. The dump
information is stored in hexadecimal format precisely like the input text ﬁles.
Thus, the following steps were taken to obtain the proper test stimuli:
1. Write the OpenGL ES demo(Section 7.1) that draws an image from
the Mandelbrot set using the fractal generator, but use a software im-
plementation of the fractal generator to draw it.
2. Run the demo on a platform containing the Mali 400-GPU, and dump
a single frame of the demo. This dump can now, with some slight
changes, be used as input to all the tests listed above 11. The platform
used for this was the Odroid-A tablet and the output image can be
seen in Figure 15.
Figure 15: Left:Fractal landscape using a blue to white gradient. Right: The
Odroid-A tablet.
The test ﬁles obtained from the Odroid were used as inputs to all the tests
listed above. The below sections describe how the hardware was connected
in the diﬀerent scenarios, and the results of running the test.
11Changes needed: Change the address of the fractal generator vertex stream ,add
conﬁguration of the fractal generator and delete all the input in the PP job that are
supposed to be from the GP.
39
6.1 Connecting Mali and the fractal generator
A Mali-400 test bench has been provided by ARM; the fractal generator is to
be connected with Mali using this test bench. The text ﬁles obtained above
can be used to conﬁgure Mali and the fractal generator through this test
bench. As shown in Figure 16, the ARM test bench usually connects Mali
directly to an external memory(the RAM) via the AXI bus.
Figure 16: ARM Test Bench for Mali [15].
To integrate the fractal generator with this test bench, it needs to be con-
nected to the AXI bus. This is done by inserting the PL301 component, as
described in Chapter 2, between Mali and the slave memory in Figure 16.
The memory and the fractal generator is then both connected to the PL301
as AXI slaves, and the PL301 is connected to Mali. The PL301 will now arbi-
trate the AXI communication from Mali to the correct AXI slave. In this case
the PL301 is conﬁgured such that AXI address 0x1000_0000 to 0x1ﬀf_ﬀﬀ
selects the fractal generator, while all other addresses enables the memory.
In addition to this, the original test setup has a APB bus connected directly
to Mali. In the ﬁnal test bench the APB bus is reconﬁgured by adding a
mux so that the fractal generator has its own APB address range. The new
40
top level module consisting of Mali, the fractal generator and PL301 can be
seen in Figure 17. It replaces the Mali_toplevel module in Figure 16.
Figure 17: Connecting Mali and the fractal generator.
6.1.1 Test Results
All the data written to memory by Mali during the test is dumped to a hex-
ﬁle. The frame buﬀer, and thus also the frame produced by the test can be
extracted from this hex-ﬁle using ARM software. The image produced was
identical to the image produced by the Odroid in Figure 15.
6.1.2 Integration Discussion and Conclusion
The integration test was successful and shows that the communication be-
tween Mali and the fractal generator works as intended. It also shows that the
Open GL ES demo and vertex shader are conﬁgured correctly, only the ver-
tex stream of z-coordinates are retrieved from the fractal generator; the other
streams are read from memory. This test says nothing about performance or
achieved frame rate though, it only conﬁrms that the communication between
Mali and the fractal generator works.
The image retrieved from the frame buﬀer after running the test is identical
to the image produced by the demo without the fractal generator. The fractal
generator is producing the correct z-values for this frame and the test was
successful.
41
6.2 FPGA Integration and Test
The previous section conﬁrmed that Mali and the fractal generator(the sys-
tem) are communicating properly. ARM has provided a platform and FPGA
framework for the system. This section will describe the framework and how
the system is connected with the framework. In addition to this, the section
will test the complete framework, synthesize it and load it onto an FPGA.
When the system is on the FPGA it will be tested once more, all the tests
in this section are as described in Section 6.
6.2.1 Platform and Framework Description
The platform used in this thesis is the Versatile Express Motherboard Ex-
press µATX with the daugtherboard Coretile Express A9 and two Virtex 6
FPGAs(Figure 18. A complete verilog framework for use with Mali-450 was
given by the ARM FPGA group. To integrate the system used in Chap-
ter 6.1 with this framework, the Mali-450 was removed from the framework
and replaced with the new top-level module described in Figure 17. In ad-
dition to this, the features unique to Mali-450 were removed to accomodate
the Mali-400, which is used with the fractal generator. Makeﬁles to compile,
build, and synthesize the framework were also reconﬁgured to include the
fractal generator and to use Mali-400 instead of Mali-450. The framework
was compiled and simulated in VCS and synthesized using Synopsys Certify,
Certify also partioned the framework across the two FPGAs.
2 3
5
4
1
Figure 18: 1. Motherboard Express µATX. 2-3. Virtex6 FPGAs. 4.Coretile-
Express. 5.The A9 CPU
42
6.2.2 Testing the FPGA Framwork
Included with the framework was a test bench similar to the one in Chap-
ter 6.1. This allows the simulation of the test system directly on the frame-
work with VCS. The original plan was to use the Odroid dumps directly
on this test bench, just as in Section 6.1,to conﬁrm that the new top level
module was connected correctly in the framework.
However, since the memory mapping was diﬀerent from the Odroid to the
FPGA, the addresses in the dump had to be remapped 12. To do this without
corrupting the information contained in the dumps, speciﬁcally the connec-
tion between GP and PP jobs, the memory dumps(hex ﬁles containing the
data structures of the jobs) had to be left alone. Instead the memory man-
agement units(MMUs) inside Mali were used to remap the data. A MMU
page table was constructed to remap the addresses from Odroid mapping
to FPGA mapping, and then manually edited to map the fractal generator
address as well. After constructing the page table, the dumps were edited
slightly to enable the MMUs and read the height coordinates from the fractal
generator. The MMU mapping was successful, and the dumps could now be
used to test single frames on both the FPGA test bench and on the FPGA
itself.
6.2.3 Test Results
After remapping the dumps the framework produced the same image as the
software fractal generator(Figure 15).
6.2.4 Synthesis Results and Test on FPGA
After synthesizing the framework onto the FPGA the framework was again
tested. Since the dumps were already modiﬁed to ﬁt the FPGA address
mapping, the same dumps could be used again. Simulation on the FPGA
could not be done using VCS, since the FPGA is actual hardware and not
a simulation. Instead, an ARM program that parses the same text ﬁles to
conﬁgure the hardware on the platform was used.
The image produced by the FPGA was identical to the image produced by
the software fractal generator(Figure 15).
12This was not the case with the Mali integration.
43
Module Max clock frequency(Mhz) Number of LUTS
Fractal Generator(8 FPGs) 48.1 15826
Arbiter(8 FPGs) 48.6 15398
One FPG 91.3 1750
AXI interface 446 211
APB interface 555 13
Coordinate Cache 204 501
Table 1: Synthesis results for the fractal generator with 8 FPGs on a Xilinx
Virtex-6x FPGA.
6.2.5 FPGA Test Discussion and Conclusion
The tests were successful; the proper image was produced both in simulation
of the framework and when testing it on the FPGA. Mali and the fractal
generator is properly connected with the FPGA framework.
The address map collision between Odroid and the FPGA cost a lot of time
to debug and should have been avoided. The Odroid-A address map can be
adjusted in the driver, and ideally this mapping should have been set equal
to the FPGAs address range before starting the tests.
Ultimately, the series of tests performed in this chapter revealed a lot of
bugs that would have been harder to solve with the driver in place. The
test methodology worked as intended, and the system is now ready to be
controlled by the driver.
44
7 The OpenGL ES 2.0 Demo and Driver
The problem description states that an OpenGL ES demo(the demo) must
be written to showcase the functionality of the fractal generator. In order to
run a demo with the fractal generator, changes to the Mali software driver
are needed to set up the fractal generator prior to each frame. This chapter
starts with a brief overview of the written demo and the driver, how they
communicate and what they use the fractal generator for. Furthermore, the
chapter presents and discusses the details of the chosen implementations for
both the demo and driver.
The demo is an application that draws 3D graphics using the OpenGL ES
library function calls. The task of the driver is to conﬁgure the Mali GPU and
the fractal generator to perform these function calls in hardware, improving
their performance when compared to a pure software implementation.
The demo constructs a 3D landscape by combining two separate vertex
streams. The ﬁrst stream provides the x- and y-coordinates of a static and
ﬂat landscape. This landscape can be viewed as a frame to put height co-
ordinates on; its xy-coordinates does not change throughout the demo13.
The second stream is the important one in this thesis. This stream is gen-
erated by the fractal generator based on the Mandelbrot set and provides
the z-coordinates of the landscape. Combining the heights from the fractal
generator with the static landscape makes a three-dimensional landscape.
Changing the stream from the fractal generator, by changing the chosen area
of the Mandelbrot set, thus changes the landscape.
For the fractal generator to generate a stream it needs to be conﬁgured. The
conﬁguration must be done via the APB interface and it must set the initial
xy-coordinate and step size to calculate the wanted area of the Mandelbrot
set. To conﬁgure the fractal generator, the demo has to let the driver know
when a function call uses the generator, and pass along the settings required
to conﬁgured it. The driver reads the settings, conﬁgures the fractal gener-
ator, and then conﬁgures Mali to perform the function call. The function
call tells Mali to read data from the streams and combine them into a land-
scape. The combined landscape vertices then proceeds through the rest of
the OpenGL ES pipeline.
The above explains the general relationship between the driver, demo and
fractal generator. The following sections present the implementations in more
detail.
13This stream has nothing to do with the Mandelbrot set.
45
7.1 The Demo
The demo is divided into two major phases, one initialization phase and one
rendering phase, described below. The demo also implements a software
version of the fractal generator. The software implementation was used to
obtain test stimuli in Chapter 6, and will be used as a reference point when
testing the hardware performance in Chapter 8.
This section will only present the OpenGL ES information relevant to the
fractal generator; there is much mandatory OpenGL ES coding that has been
left out. The demo source code can be found in the appendix. The section
will also present the shaders written for the demo. Discussions around the
implementations in this chapter will be done in Section 7.3 since the choices
made in the demo are strongly connected to the driver implementations.
7.1.1 The Initialization Phase
The initialization phase performs platform speciﬁc EGL and OpenGL ES ini-
tialization and loads the shaders. Afterwards it constructs the ﬂat landscape
mentioned in the above section.
The 2D landscape is drawn using the GL_TRIANGLE_STRIP mode. The
vertex data and index array(Section 2.2) of the landscape is stored as one
VBO each and bound to a variable in the vertex shader. This enables the
vertex shader to read data from the VBO as a stream of vertices.
In addition to this, the phase creates and binds a third VBO that will contain
the height vertices of the 2D landscape. An initial point and step size in the
Mandelbrot set is used to calculate the height values. This VBO uses the
same index array as the 2D landscape but is bound to a diﬀerent variable in
the vertex shader, this makes the z-coordinates a separate stream input to
the shader.
If the hardware fractal generator is used the third VBO will be left empty in
the demo; the vertices will be given from the fractal generator. Otherwise,
the software implementation of the fractal generator is used to calculate and
store all the height vertices of the landscape in this VBO.
46
7.1.2 The Rendering Phase
The rendering phase is written as a looping state machine, where each loop
focuses on animations around an interesting point in the Mandelbrot set.
The list below explains what is done in each state.
State 1: The demo contains a list of interesting points in the Mandelbrot
set. Here, interesting means that zooming in on the point reveals some good
looking geometrical structures. The ﬁrst state in the demo positions the
camera into a starting position and randomly selects a point from the list.
State 2: In the second state the camera is moved towards the current inter-
esting points, positioning the camera to start zooming down into the fractal.
State 3: In the third state the camera zooms in on the interesting point by
keeping the point in the center while lowering the step size between points
in the Mandelbrot set.
Each of the above states can make several calls to glDrawElements(i.e., a
state can last for several frames) and has to either calculate the height values
with the software fractal generator or conﬁgure the hardware one for each call.
The rendering phase does not do anything with the 2D-landscape created by
the initialization phase, it only moves the camera and changes the fractal
generator conﬁguration.
7.1.3 Conﬁguring the Fractal Generator
The demo uses an array of three ﬂoats called fractal_conﬁguration for each
of the interesting points. The array contains the three APB settings(x0,y0
and step size)that selects an area of the Mandelbrot set. To conﬁgure the
fractal generator this array has to be passed to the driver.
When a draw call is made in OpenGL ES the driver copies the OpenGL ES
state and data structures into a big C struct. Thus, inside the driver, one
has access to all the relevant OpenGL ES information for the draw call.
glVertexAttribPointer is a function that describes the data inside a VBO.
When the VBO is drawn by a draw call, the information stored by this func-
tion is available in the driver. The function can set a parameter pointer which
is an oﬀset into the VBO where the data described by glVertexAttribPointer
is stored. This can be used to store diﬀerent sections of data in the same
VBO. However, this functionality is not crucial, since one can always make
another VBO without any oﬀset instead. Thus, the pointer ﬁeld is used to
47
pass the fractal conﬁguration to the driver. Instead of using the ﬁeld as an
oﬀset, the pointer points to the fractal_conﬁguration array for the current
Mandelbrot set area to be drawn. If the data in a VBO is not provided by
the fractal generator, the pointer ﬁeld must be set to the null pointer. See
Listing 2.
1 g lVer t exAtt r ibPo in te r ( a fhe ightLoc , 1 , GL_FLOAT, GL_FALSE, 0 ,
l andscapes [ i ]−>f r a c t a l_con f i g u r a t i o n )
;
Listing 2: Calling glVertexAttribPointer with a fractal conﬁguration.
7.1.4 The Vertex Shader
As explained in the above sections, two vertex streams are input to the vertex
shader. One stream is the coordinates in the 2D-landscape, and one is the
height vertices. The vertex shader combines the two streams into a single
3D-landscape and then perform matrix operations on it(moving the camera).
The merging is done by simply setting the y-coordinate of the 2D-landscape
to the current value in the height stream 14. Since both streams contain
equal amount of points and uses the same index array this makes the height
coordinates appear at the correct point in the array, drawing the Mandelbrot
set. The vertex shader also divides the height values by 80, this is done to
normalize the height values, since the standard OpenGL ES range is between
-1 and 1.
7.1.5 The Fragment Shader
The fragment shader reads the height value for each coordinate from the
vertex shader. It examines the value of the height coordinate and adds color
to it. The color of each point is decided using the built in function mix and
a gradient, but if the point has the maximum iteration value the point gets
a color of red. The maximum iteration points are represented by a separate
color to provide a clear visual boundary, and to easily see diﬀerences with
varying numbers of the MAX_ITERATIONS parameter.
14This is a bit confusing since it is the z-coordinate,not y, that has been referred to as the
height throughout the thesis. However, in the shader the height is put in the y-coordinate
position.
48
7.1.6 Software Fractal Generator
The demo is used in Chapter 6 to obtain test stimuli for the fractal generator.
In order to do this, the demo must simulate the hardware fractal generator
with a software implementation. The software fractal generator is presented
below.
The software fractal generator uses the same three values(x0,y0 and step size)
for conﬁguration as the hardware version. These parameters and a pointer
to an array is input to a function which calculates the z-coordinates of the
Mandelbrot set and stores them in the given array. The main function uses an
internal function, getFractPoint, made by Per Christian Corneliussen in [4]
to calculate the iterations for each point. See Listing 3.
void ge tFrac ta lHe igh t s ( f l o a t x , f l o a t y , f l o a t s tep s i z e , i n t
numpoints , GLfloat ∗ i t e r a t i o n s )
2 // Ca l cu l a t e s the he ight / i t e r a t i o n s o f a l l the v e r t i c e s in the
f r a c t a l s p e c i f i e d
// by x , y , s tep s i z e and numpoints . The f r a c t a l i s assumed to be
square = numpoints ^2.
4 // Assumes an array with space f o r numpoints^2 i n t e g e r s .
{
6 i n t i , j ;
f o r ( i =0; i<numpoints ; i++) //y−ax i s = imaginary
8 {
f o r ( j =0; j<numpoints ; j++) //x−ax i s
10 {
i t e r a t i o n s [ j+( i ∗numpoints ) ] = getFractPo int ( x+( j ∗ s tep s i z e )
, y−( i ∗ s tep s i z e ) ) ;
12 }
}
14 }
Listing 3: The software fractal generator.
49
7.2 The OpenGL ES 2.0 Driver
This section explains the Mali driver architecture and the changes made to
the driver to properly conﬁgure the fractal generator for each call to gl-
DrawElements made by the demo.
7.2.1 The Mali GPU Open GL ES Driver Architecture
The main task of the Mali driver is to conﬁgure, and provide input to, the
Mali GPU based on OpenGL ES calls from a graphics application. The
driver uses the Mali GPU to accelerate the OpenGL ES operations, which
improves performance and power consumption compared to a software-only
solution [19].
The Mali driver is modular and divided into separate layers(Figure 19). This
provides several levels of abstraction and enables diﬀerent drivers to share
common software components. Below is a brief presentation of each layer
that is relevant to the fractal generator.
Graphics application The graphics application that are making the OpenGL
ES calls. In this thesis the application is the OpenGL ES demo.
OpenGL ES driver The OpenGL ES driver translates the current OpenGL
ES state and draw operations into Mali GPU data structures and jobs.
The data structures enable hardware graphics features based on the
current state. This driver also performs set-up and initialization of the
diﬀerent hardware jobs.
Base driver The base driver provides the OpenGL ES driver with an ab-
straction from the operating system and hardware platform. It is re-
sponsible for memory management, job handling and interfacing with
the OS. The base driver provides communication between user-mode
and kernel-mode OS operation, and hides the low-level details from the
OpenGL ES API.
Mali device driver The Mali device driver runs in the OS kernel and is the
interface between the base driver and the hardware. It reads and writes
hardware control registers, dispatches jobs to the Mali hardware, per-
forms low level memory management and handles multiple concurrent
users.
50
Figure 19: Hardware and software components of the Mali graphics system
architecture(Linux). [18, p. 1.4]
51
7.2.2 Controlling the fractal generator using the Mali driver
For the Mali GPU to read any vertices provided by the demo from the fractal
generator, the fractal generator needs to be conﬁgured(and thus started) via
the APB interface. The driver has to perform these APB writes at the correct
moment, based on the information provided by the OpenGL ES demo as
explained in Section 7.1. Speciﬁcally, the driver has to monitor the draw
calls made by the demo, determine if the draw call contains a fractal steam(a
stream of vertices that are to be provided by the fractal generator), determine
the fractal conﬁguration of the stream and transmit this information to the
fractal generator through its APB interface. In addition to this, the fractal
generator must be conﬁgured at the right time, to avoid reconﬁguration in
the middle of the previous draw call.
The Mali GPU also has to be informed when a stream of vertices is to be
read from the fractal generator instead of the main memory, the driver has
to do this by setting the stream address to the physical address of the fractal
generator, which is hard-coded in the RTL to access the fractal generator's
AXI interface 15.
The following changes have been made to the driver code to enable the above
features.
Changes to the base driver The base driver uses a struct to describe
a Mali GP job, the struct is visible to the OpenGL ES driver and contains
an array of 6 address registers that points to important data structures for
the current job. This address array is, before a job is started, copied to the
Mali device driver. Thus, this array can be used to provide the Mali device
driver with the fractal conﬁguration described in 7.1.3. The array is therefore
extended with room for the three fractal_conﬁguration ﬂoats.
Changes to the OpenGL ES driver When a call is made to glDrawEle-
ments in the demo, the OpenGL ES context is stored in a data structure in the
driver, and a whole series of function calls are made to conﬁgure and initialize
the hardware. One of these calls are called _gles_gb_setup_input_streams
and is responsible for setting up the vertex stream information that is pro-
vided to the vertex shader(and a given GP job) in the Mali GPU. One of the
streams to the vertex shader are to be provided by the fractal generator, and
thus changes has to be made to this function. The function uses the OpenGL
15This is the address 0x1000_0000, which has been mentioned earlier in the thesis.
52
ES context to examine the type of each of the incoming streams, e.g. checks
if they are vertex buﬀer objects or not, and then conﬁgures, among other
things, the memory address of the stream such that the Mali GPU knows
where to ﬁnd the vertices to work with. A stream from the fractal genera-
tor needs to have a speciﬁc physical address. Since the current function is
responsible for setting addresses, this can be implemented in this function.
The implementation is done in the following manner:
When the function checks if a stream is a vertex buﬀer object, a second check
is made to see if a pointer was passed with the VBO. This pointer is usually
used as an oﬀset into the VBO, but in the OpenGL ES demo, this pointer
points to the fractal conﬁguration array. Only the VBO to be provided by
the fractal generator will ever have a non null pointer here, and this check will
thus determine if a stream is to come from the fractal generator.(assuming
all fractal streams are VBOs in the demo).
If a stream is a fractal stream, the Mali memory address to ﬁnd the stream
is set to 0x1000_0000, and the fractal conﬁguration array is copied into the
registers that have been added to the Mali GP job structure described above.
i f ( attr ib_array−>po in t e r != 0)
2 {
f r a c t a l_con f i g u r a t i o n = attr ib_array−>po in t e r ;
4 unsigned i n t ∗ l o l = ( unsigned i n t ∗)
f r a c t a l_con f i g u r a t i o n ;
gp_job−>r e g i s t e r s [ 6 ] = ∗ l o l ;
6 gp_job−>r e g i s t e r s [ 7 ] = ∗( l o l +1) ;
gp_job−>r e g i s t e r s [ 8 ] = ∗( l o l +2) ;
8 mem−>mali_memory−>mali_address=0x10000000 ;
mem_addr=0x10000000 ;
10 p r i n t f ( "Driver : Mali address o f b lock= %x\n" , mem−>
mali_memory−>mali_address ) ;
p r i n t f ( "Driver : Cpu−seen address o f b lock=%x\n" ,mem−>
mali_memory−>cpu_address ) ;
12 attr ib_array−>po in t e r = 0 ;
p r i n t f ( "Driver : mem_addr2 = %x\n" , mem_addr) ;
14 }
Listing 4: Then OpenGL ES driver ﬁnding a fractal stream.
Changes to the Mali device driver
Allocating memory to the fractal generator The Mali GP and PP
have their own MMUs, this means that when the GP tries to read vertices
53
from address 0x1000_0000, as set by the OpenGL ES driver, this address has
to be in the MMU page table and it has to be mapped to a valid allocated
address. Since the fractal generator is at physical address 0x1000_0000, the
virtual address should be mapped to the same physical address. Since the
Mali device driver is responsible for low level memory management on the
Mali GPU, this mapping has to be done here.
The page table has to be mapped and the memory allocated, fortunately the
driver has built in functions that do this. Thus, in the function that(among
other things) allocates the standard page table, a second page table is added
that maps virtual address 0x1000_0000 to physical address 0x1000_0000,
and allocates enough space for a NUMPOINTS ∗NUMPOINTS array of
height coordinates.
Adding the fractal generator to the APB memory map The fractal
generator must be conﬁgured through its APB interface, for this to be pos-
sible, the fractal generator's APB addresses must be allocated in the device
driver.
The Mali GP is represented in the device driver by a struct called mali_gp_core.
The struct contains a hardware core(this is the GP in hardware) that also
have to get allocated memory. Functions that create and map the hardware
core already exist, so to add the fractal generator a new hardware core called
"fractal core" has been added to the GP struct. With some small changes,
the fractal core can now be created and mapped in the same function as the
GP core.
Conﬁguring and starting the fractal generator After allocating APB
and memory the fractal generator is ready to be conﬁgured via APB. The
device driver provides the functions needed to read and write from hardware
registers(APB) so the question is where and when to conﬁgure. Since the
fractal core was added to the GP as explained in the above section, the
register functions can be used on the fractal generator in any function that
has access to the GP core.
The previous sections state that the fractal generator conﬁguration is con-
tained within every GP job struct, thus the fractal generator should be
conﬁgured ahead of every GP job. The device driver is responsible for
conﬁguring the hardware and initialize jobs, and it uses a function called
mali_gp_job_start to start each job. The function is called with the struct
of the job, and the struct of the GP core, as arguments. These structs provide
54
the fractal conﬁguration and access to the fractal generators APB interface.
This is everything needed to conﬁgure the fractal generator and this function
is a chosen as the location to start the fractal generator in.
At every call to this function, a check examines the contents of the array of
address registers mentioned in section7.2.2. This check examines if the start-
ing job requires the fractal generator. If the fractal generator is needed, the
fractal conﬁguration is read from the address registers and written to the frac-
tal generator using the provided device driver functions. Mali_gp_job_start
then proceeds to start the job as normal.
After starting a GP job, the device driver is continually monitoring for inter-
rupts. One of the possible interrupts happen when the GP requires more heap
memory to start. When this happens, the job is stopped and then restarted
using a diﬀerent function(mali_gp_resume_with_new_heap). The inter-
rupt causes the fractal generator to reset, and it must therefore be reconﬁg-
ured as above in this function as well.
/∗ s t a r t i n g f r a c t a l generato r ∗/
2 /∗ reg6 = x , reg7=y , reg8=step s i z e ∗/
i f ( job−>frame_reg i s t e r s [ 8 ] != 0)
4 {
MALI_DEBUG_PRINT(1 , ( "Devicedrv : This i s a f r a c t a l job .
Conf igur ing r e g i s t e r s \n" ) ) ;
6 mali_hw_core_register_write(&core−>frac ta l_core , 0x0000 , job−>
frame_reg i s t e r s [ 6 ] ) ;
mali_hw_core_register_write(&core−>frac ta l_core , 0x0004 , job−>
frame_reg i s t e r s [ 7 ] ) ;
8 mali_hw_core_register_write(&core−>frac ta l_core , 0x0008 , job−>
frame_reg i s t e r s [ 8 ] ) ;
MALI_DEBUG_PRINT(1 , ( "Devicedrv : Conf igured r e g i s t e r s \n" ) ) ;
10 MALI_DEBUG_PRINT(1 , ( "Devicedrv : X = 0x%08X\n" ,
mali_hw_core_register_read(&core−>frac ta l_core , 0x0000 ) ) )
;
MALI_DEBUG_PRINT(1 , ( "Devicedrv : Y = 0x%08x\n" ,
mali_hw_core_register_read(&core−>frac ta l_core , 0x0004 ) ) )
;
12 MALI_DEBUG_PRINT(1 , ( "Devicedrv : STEPSIZE = 0x%x\n" ,
mali_hw_core_register_read(&core−>frac ta l_core , 0x0008 ) ) )
;
MALI_DEBUG_PRINT(1 , ( "Devicedrv : Frac ta l r e g i s t e r n r = 0x%08X\
n" , mali_hw_core_register_read(&core−>frac ta l_core , 0
x0014 ) ) ) ;
14 }
Listing 5: Finding a fractal job and conﬁguring the fractal generator in the
device driver.
55
Figure 20: The fractal demo zooming in on a point.
7.3 Results and Discussion
The driver and demo implementations were successful. Figure 20 shows a
photo of the demo running with the hardware fractal generator. The perfor-
mance of the demo is examined in Chapter 8.
Demo Discussion An important aspect of the implementation was the
communication between the demo and driver. The fractal generator conﬁg-
uration has to be obtainable from the OpenGL ES state for the driver to
access it. In addition to this, the conﬁguration has to be connected to a
VBO, since a single draw call can draw several VBOs and not all VBO data
will be provided by the fractal generator.
glVertexAttribPointer satisﬁes all the requirements, it is called to conﬁgure
each VBO and the parameters set by the function are accessible in the driver.
In addition to this, the function is called regardless of the fractal generator,
so there is no performance loss from using it. The only cost from using
it are the three ﬂoats that conﬁgure the fractal generator. A disadvantage
from using the pointer ﬁeld in the function is that the ﬁeld can not be used
for its normal purpose. However, this is a very minor inconvenience. Thus,
the glVertexAttribPointer solution was chosen. Since the chosen solution
has no practical cost, no other solutions have been explored. In theory,
any function that conﬁgures a VBO could be used. One could also modify
existing functions or add extra functions to avoid drawbacks like losing the
56
oﬀset pointer. However, this would be much work for minimal, if any, gain.
Driver Discussion Checking if a stream is a fractal stream is done in
the driver by checking the pointer ﬁeld of all VBOs. The driver is already
checking if input streams are VBOs or not, so the only needed change is an
extra check of the pointer ﬁeld. When a fractal stream is discovered, the
fractal_conﬁguration array is stored in the base driver and the address is set
to the fractal generator. Thus, the cost of the chosen solution is one if state-
ment and three assignments for each fractal stream. This is a very eﬃcient
solution, so no other solutions have been evaluated. In fact, no other so-
lutions have been identiﬁed; the function(_gles_gb_setup_input_streams)
that has been modiﬁed is the only place were one can easily change the
address of input streams.
Allocating memory and adding the fractal generator to the memory map has
to be done. The chosen implementation mirrors the memory management
of existing components(like the geometry processor) in the driver. It uses
optimized built in functions for all memory management and the functions
are only called once during the lifetime of the demo. There is little to no
room for improvement here so no other solutions have been explored.
The actual conﬁguration of the fractal generator is done by the device driver.
The current solution mirrors the initialization and startup of the geometry
processor to start the fractal generator. Since the fractal generator provides
data to the GP this ensures that they stay in sync. The implementation
consists of a single check of the fractal_conﬁguration registers and a couple
of register writes to the fractal generator. The solution is very eﬃcient and
is implemented in functions that have to be called any way. There is very
little to gain from changing this solution so no others have been considered.
If the fractal generator were to be conﬁgured at a diﬀerent time, one would
have to make sure that the conﬁgurations stay synchronized with the GP.
No method to do this have been identiﬁed.
7.3.1 The Fractal Resolution
As mentioned in Section 5.1 the resolution of the landscape created by the
fractal generator is equal to NUMPOINTS · NUMPOINTS. It also states
that when using a single index array for a landscape, the maximum value of
NUMPOINTS is 128.
The demo was originally written to draw several 128x128 landscapes and
57
connect them together into a big high resolution fractal. The plan was to
draw each 128x128 fractal with separate draw calls, conﬁguring the fractal
generator for each call. The plan assumed that one draw call was made into
one GP job on Mali, which would make it possible to conﬁgure the fractal
generator correctly for all the landscapes. However, when the driver sees that
draw calls are combined to draw a single frame, it also combines the draw
calls into one big GP job. No method to separate the calls into separate GP-
jobs was identiﬁed. Thus, one can not use the intended method to increase
the resolution since one doesn't know when to conﬁgure the fractal generator.
There is no identiﬁed way for the driver to extract which draw call the GP
is currently working on.
The resolution of the demo in this thesis has been limited to 128 · 128 and
improving the resolution has been left for future work. Below is a list of
suggestions to increase the resolution.
1. Change the driver to split the draw calls into separate jobs. This would
likely require many changes to the driver, but all hardware could be left
as-is. Separate jobs leads to more set-up overhead, so the performance
would be worse than with a single job.
2. Make the driver monitor the combined GP job and understand when
it is working on a part of the landscape. Then conﬁgure the frac-
tal generator to create that landscape. This method requires intimate
knowledge of the GP to determine what it is currently doing. In ad-
dition there is no easy way to monitor the hardware from the driver.
It might require big changes to the Mali GP or constant monitoring of
memory, so the method is not seen as feasible.
3. Currently the landscape calculated by the fractal generator is limited
by the index array describing the landscape. Instead of having one
VBO(and one index array) for the vertices from the fractal generator,
one could split a landscape into several VBOs and give each VBO a
separate index array. This would allow the fractal generator to create
a big landscape where each VBO is assigned a part of the landscape
and an address range on the fractal generator.
This would require changes to the hardware of the fractal generator,
at least the addressing has to be changed, but it is deﬁnitely a feasible
method.
58
8 Proﬁling
To determine if the hardware fractal generator is useful its performance has
to be measured. This chapter will measure the average frame time when
the fractal generator is running the demo and compare the results with the
software implementation of the fractal generator 16.
The ﬁrst performance measure is done inside the demo. However, measuring
performance in the demo is not straightforward since the landscape is not
drawn on screen at any exact time in the program; the calls and data is
pipelined in hardware. A good practical solution is to measure the time
between consecutive returns of the swapbuﬀer command[20]
17. Even if this does not give the exact time taken by the hardware to draw
a frame, it is good enough to use as a comparison tool between solutions.
The performance will be measured for diﬀerent amounts of FPGs in the
fractal generator, and also varyings of the max iterations parameter.
It is important to note that the software fractal generator is running on the
A9 CPU on the platform. This processor is not on the FPGA and is capable
of running at 400Mhz [?], a much larger frequency than the circuit on FPGA.
The frequency of the hardware version will also vary by small amounts, this
is because of the complicated routing done by the synthesis tool. Certify uses
random seeds when it routes so the results may vary slightly.
The table below presents the achieved results. SW/HW is the software and
hardware versions of the fractal generator. MAX_ITER is the number of
iterations at each point. NU is the number of FPGs, i.e. the number of
points that are calculated in parallel. Time is the average frame time over
500 frames of the demo. FRQ is the frequency of the fractal generator used
in that test.
To get more precise measure of how the fractal generator is doing, another
performance measurement is done using the instrumented mode of the Mali
drivers. This mode dumps a lot of performance data to a text ﬁle, which
allows a precise measurement of the hardware jobs. This will show where the
time is consumed in hardware while avoiding the time consumption of the
demo.
16Frame time is the time taken to draw a single frame and is the inverse of frames pr
second(FPS). Frame time is considered to be a more accurate performance measure than
FPS since frame time is a linear value.
17The swapbuﬀer command posts the current color buﬀer to the native screen.
59
HW/SW MAX_ITER NU TIME(s) FRQ
SW 80 N/A 0.757 400 Mhz
SW 160 N/A 1.167 400 Mhz
HW 80 64 0.316 20 Mhz
HW 160 32 0.427 20 Mhz
HW 80 32 0.429 20 Mhz
HW 80 16 0.431 20 Mhz
HW 80 4 0.574 20 Mhz
Table 2: Average frame time performance over 500 frames of the demo.
Since the instrumented drivers are much slower when performing the mea-
surements, running 500 frames of the demo is not feasible. Instead, this mea-
surement will test the computing capabilities of the fractal generator in the
worst-case scenario. The worst-case scenario is a landscape where each point
reaches theMAX_ITERATIONS limit, and the measure is performed over
a duration of 30 frames. In addition to this, the performance dump allows
frequency scaling, so the frame time at higher frequencies can be estimated.
This will be used to estimate the actual frame time of the fractal generator
at 400 Mhz, to compare properly with the software implementation.
HW MAX_ITER NU FT(ms) VST(ms) PLBUT(ms) FPS
HW 80 64 2.592 0.637 0.622 450
HW 80 32 3.907 0.637 2.713 261
Table 3: Average geometry processor frame time, vertex shader time, and
Polygon List Builder Unit time during 30 frames of the worst-case scenario.
All results estimated at 400 Mhz.
The detailed measurements show that the vertex shader time is the same
for both fractal conﬁgurations, this is as expected since the delay should
be ahead of the vertex shader. Increasing the performance of the fractal
generator instead improves the PLBU time.
60
9 Conclusion
The fractal generator designed in this thesis is working as intended. It is
able to calculate the heights of a landscape based on the Mandelbrot set and
provide the Mali-400 GPU with the results. The fractal generator is designed
with scaleable performance at the cost of area.
The area consumption of the design is equal to 1826 LUTs plus 1750 LUTs
pr FPG. For a system with 32 FPGs the area consumption is about 58k
LUTs. The achieved clock frequency of the fractal generator is 48.1Mhz on a
Virtex-6x FPGA. This frequency is higher than Mali's maximum frequency
so the frequency of the system is limited by Mali.
The fractal generator outperformed a software implementation of itself by a
large margin. With 64 cores running at 20Mhz, the demo running with the
fractal generator achieved a total frame time of 0.3 seconds.
Without the demo, the fractal generator with 64 cores achieved a frame time
of 2.592ms at 400Mhz which equals 51.8 ms at 20Mhz. The performance
of the fractal generator is fast enough to generate real time 3D-landscapes.
Thus, the thesis has successfully implemented a useful fractal generator for
landscape generation.
9.1 Future Work
Scaling the resolution of the fractal landscape did not work as intended(Section 7.3.1).
Increasing the resolution has been left for future work.
The fractal generator is currently storing all the z-coordinates of the land-
scape in cache, this is not necessary for operation and optimizing the cache
size has been left for future work(Section 5.6.2).
The fractal generator is currently transferring its z-coordinates to Mali in
32 bit ﬂoating point format, reducing the format to 16 bits ﬂoating point is
possible without loss of detail and has been left for future work.
61
References
[1] Hamilton B.Carter and Shankar G. Hemmady. Metric Driven Design
Veriﬁcation. 2007.
[2] Bodil Branner. The mandelbrot set. Proceedings of Symposia in Applied
Mathematics, 39:75105, 1989.
[3] Tolga Capin, Kari Pulli, and Tomas Akenine-Moller. The state of the
art in mobile graphics research. 2008.
[4] Per Christian Corneliussen. Design of a fractal generator for on-the-ﬂy
generation of textures for mali gpu. Master's thesis, Norwegian Univer-
sity of Science and Technology, 2011.
[5] David Darling. The Universal Book of Mathematics: From Abracadabra
to Zeno's Paradoxes. 2004.
[6] Dan Ginsburg Dave Shreiner, Aaftab Munshi. OpenGL ES 2.0 Program-
ming Guide. 2008.
[7] Richard S. Wright et al. OpenGL SuperBible. 2010.
[8] Clayton Shepard et.al. Livelab: Measuring wireless networks and smart-
phone users in the ﬁeld. 2011.
[9] Inc. Wikimedia Foundation. The mandelbrot set.
http://en.wikipedia.org/wiki/Mandelbrot_set.
[10] Adrien Douady John H. Hubbard. Étude dynamique des polynômes com-
plexes. 1984.
[11] Robert J.Simpson and John Kessenich. The OpenGL ES Shading Lan-
guage, 2009.
[12] William K. Lam. Hardware Design Veriﬁcation. 2005.
[13] ARM Limited. Mali-400 MP GPU Integration Manual, 2009.
[14] ARM Limited. AMBA APB Protocol Speciﬁcation, 2010.
[15] ARM Limited. Mali-400 MP GPU Conﬁguration and Sign-oﬀ Guide,
2010.
[16] ARM Limited. Mali-400 MP GPU Technical Overview, 2010.
62
[17] ARM Limited. AMBA AXI and ACE Protocol Speciﬁcation, 2011.
[18] ARM Limited. Mali-200 and Mali-400 MP GPU Linux DDK Integration
Guide, 2011.
[19] ARM Limited. Mali-200 and Mali-400 MP GPU OpenGL ES Technical
Overview, 2011.
[20] OpenGL.org. Performance. http://www.opengl.org/wiki/Performance.
[21] Sharon Rosenberg and Kathleen A Meade. A Practical Guide to Adopt-
ing the Universal Veriﬁcation Methodology. 2010.
[22] Chris Spear. System Verilog for Veriﬁcation. 2006.
63
10 Appendices
A Source Code for the Fractal Generator
module apb_inter face (
2 input CLK,
input RESET_N,
4 // APB s i g n a l s f o r r e c e i v i n g s t a r t coo rd ina t e s from Mali
d r i v e r .
input PSEL, //x ?
6 input PWRITE,
input [ 3 1 : 0 ] PWDATA,
8 input [ 3 1 : 0 ] PWADDR,
input PENABLE, // Current ly unused cause I assume no wait
s t a t e s .TODO: OK?
10 output reg PREADY,
output reg [ 3 1 : 0 ] PRDATA,
12 // S i gna l s to and from the f r a c t a l generato r core (FGC)
output reg [ 3 1 : 0 ] x_fgc , // x coord inate o f bottom l e f t p i x e l
14 output reg [ 3 1 : 0 ] y_fgc ,
output reg [ 3 1 : 0 ] s t eps i z e_fgc , // s t e p s i z e between x and y
coords .
16 input [ 3 1 : 0 ] AXI_DEBUG,
input [ 3 1 : 0 ] ARBITER_DEBUG
18 ) ;
20 // CSM fo r the f r a c t a l_gene ra to r
reg [ 1 : 0 ] s t a t e ;
22 parameter i d l e = 0 , r e c e i v e = 1 , send=2;
// Don ' t know how the APB addre s s ing works e n t i r e l y so t h i s
parameter can be
24 // s e t to change the address space o f the apb_inter face module .
// Current ly x = start_address , y=x+1, s t e p s i z e = y+1;
26 parameter s tar t_address = 32 ' h0000C000 ; //
28 always @ ( posedge CLK or negedge RESET_N)
begin
30 i f ( !RESET_N) begin
s t a t e <= i d l e ;
32 PREADY <= 0 ;
x_fgc <= 0 ;
34 y_fgc <= 0 ;
s t ep s i z e_ fg c <= 0 ;
36 PRDATA<= 0 ;
end
38 e l s e begin
case ( s t a t e )
I
40 i d l e : begin // wait f o r APB ac t i v i t y , s t o r e data .
i f ( (PSEL && PWRITE) == 1) begin
42 // no wait s t a t e s needed . so doesn ' t have to check enable
i f (PWADDR[ 1 5 : 0 ] == 16 'hC000 ) begin
44 s t a t e <= r e c e i v e ;
PREADY <= 1 ;
46 x_fgc <= PWDATA;
end
48 e l s e i f (PWADDR[ 1 5 : 0 ] == 16 'hC004 ) begin
s t a t e <= r e c e i v e ;
50 PREADY <= 1 ;
y_fgc <= PWDATA;
52 end
e l s e i f (PWADDR[ 1 5 : 0 ] == 16 'hC008 ) begin
54 s t a t e <= r e c e i v e ;
PREADY <= 1 ;
56 s t ep s i z e_ fg c <= PWDATA;
end
58 end
e l s e i f ( (PSEL && ~PWRITE) == 1) begin
60 // APB read i n t e r f a c e f o r debugging purposes .
//TODO: Rewrite in to case
62 i f (PWADDR[ 1 5 : 0 ] == 16 'hC000 ) begin
s t a t e <= send ;
64 PREADY <= 1 ;
PRDATA <= x_fgc ;
66 end
e l s e i f (PWADDR[ 1 5 : 0 ] == 16 'hC004 ) begin
68 s t a t e <= send ;
PREADY <= 1 ;
70 PRDATA <= y_fgc ;
end
72 e l s e i f (PWADDR[ 1 5 : 0 ] == 16 'hC008 ) begin
s t a t e <= send ;
74 PREADY <= 1 ;
PRDATA <= st ep s i z e_ fg c ;
76 end
e l s e i f (PWADDR[ 1 5 : 0 ] == 16 ' hc00c ) begin
78 s t a t e <= send ;
PREADY <= 1 ;
80 PRDATA <= ARBITER_DEBUG;
end
82 e l s e i f (PWADDR[ 1 5 : 0 ] == 16 ' hc010 ) begin
s t a t e <= send ;
84 PREADY <= 1 ;
PRDATA <= AXI_DEBUG;
86 end
e l s e i f (PWADDR[ 1 5 : 0 ] == 16 ' hc014 ) begin
88 s t a t e <= send ;
II
PREADY <= 1 ;
90 PRDATA <= 32 ' h1337feed ;
end
92 end // e l s e i f
end
94 r e c e i v e : begin
PREADY <= 0 ;
96 s t a t e <= i d l e ;
end
98 send : begin
PREADY <= 0 ;
100 s t a t e <= i d l e ;
PRDATA <= 0 ; // TODO: This i s not needed I suppose .
102 end
endcase
104 end // e l s e i f r e s e t i s o f f
end // CSM APB
106 endmodule
../source/generator/apb_interface.v
// The job o f the a r b i t e r i s to c a l c u l a t e f r a c t a l c oo rd ina t e s
based on addre s s e s
2 // from the ax i_ in t e r f a c e . I f no p r i o r i t y addre s s e s are g iven the
a r b i t e r t r i e s
// to p r ed i c t coo rd ina t e s and put them in to cache .
4 //
6
// Does log2 (n) : )
8 ` d e f i n e c logb2 (n) ( ( n) <= (1<<0) ? 0 : (n) <= (1<<1) ? 1 : \
(n) <= (1<<2) ? 2 : (n) <= (1<<3) ? 3 : \
10 (n) <= (1<<4) ? 4 : (n) <= (1<<5) ? 5 : \
(n) <= (1<<6) ? 6 : (n) <= (1<<7) ? 7 : \
12 (n) <= (1<<8) ? 8 : (n) <= (1<<9) ? 9 : \
(n) <= (1<<10) ? 10 : (n) <= (1<<11) ? 11 : \
14 (n) <= (1<<12) ? 12 : (n) <= (1<<13) ? 13 : \
(n) <= (1<<14) ? 14 : (n) <= (1<<15) ? 15 : \
16 (n) <= (1<<16) ? 16 : (n) <= (1<<17) ? 17 : \
(n) <= (1<<18) ? 18 : (n) <= (1<<19) ? 19 : \
18 (n) <= (1<<20) ? 20 : (n) <= (1<<21) ? 21 : \
(n) <= (1<<22) ? 22 : (n) <= (1<<23) ? 23 : \
20 (n) <= (1<<24) ? 24 : (n) <= (1<<25) ? 25 : \
(n) <= (1<<26) ? 26 : (n) <= (1<<27) ? 27 : \
22 (n) <= (1<<28) ? 28 : (n) <= (1<<29) ? 29 : \
(n) <= (1<<30) ? 30 : (n) <= (1<<31) ? 31 : 32)
24
26 module a r b i t e r (
III
input CLK,
28 input RESET_N,
input [ 3 1 : 0 ] ADDRESS, // t h i s i s a p r i o r i t y address from the
ax i_ in t e r f a c e .
30 input PRIORITY, // shouldn ' t be needed but here f o r now .
input [ 3 1 : 0 ] FRAME1_X, FRAME1_Y, FRAME1_STEPSIZE,
32 output reg [ 3 1 : 0 ] CACHE_ADDRESS,
34 // CACHE WRITER SIGNALS
output reg CACHE_WRITE, // t e l l s the cache to wr i t e
36 output reg [ 7 : 0 ] CACHE_DATA,
/∗ ARBITER_DEBUG i s used to debug v ia tha APB−i n t e r f a c e .
38 ∗ I t cu r r en t l y keeps t rack o f some o f the a r b i t e r s t a t e s .
∗ This i s mainly done to conf i rm that the a r b i t e r has s t a r t ed
v ia the d r i v e r .
40 ∗/
output reg [ 3 1 : 0 ] ARBITER_DEBUG
42 ) ;
44 //CONFIGURABLE PARAMETERS:
parameter NUMPOINTS = 32 ' d400 ; //Number o f po in t s in each
d i r e c t i o n .
46 parameter NUMUNITS = 16 ; //Number o f f r a c t a l po int gene ra to r s (
fpg )
parameter LOG2_NUMPOINTS =` c l o gb2 (NUMPOINTS) ;
48
50 // ARBITER − FPG COMMUNICATION TODO: Comment what these r eg s are
f o r .
reg [ 3 1 : 0 ] feed_x , feed_y ;
52 // fpg_address i s the input address to the fpgs , whi l e
address_reg i s a temp
// r e g i s t e r used f o r c a l c u l a t i n g feed_x and feed_y . Might not
need both TBH.
54 wire [ 3 1 : 0 ] ADDRESS_reg, fpg_address ; // TODO: Need two
ass ignments ? Try to smarten i t up .
reg [ 0 :NUMUNITS−1] fpg_en ;
56 wire [ 1 : 0 ] fpg_busy [ 0 :NUMUNITS−1] ;
reg [ 0 :NUMUNITS−1] fpg_read ;
58 wire [ 7 : 0 ] fpg_n [ 0 :NUMUNITS−1] ;
wire [ 3 1 : 0 ] fpg_addr_out [ 0 :NUMUNITS−1] ;
60
generate
62 //GenerateFPGs .
genvar g i ;
64 f o r ( g i =0; gi<NUMUNITS; g i=g i+1)begin :UNITS
fpg f p g i (
66 . c l k (CLK) , . reset_n (RESET_N) ,
. c_re ( feed_x ) , . c_im( feed_y ) , . enable ( fpg_en [ g i ] ) , . addr_in (
IV
fpg_address ) ,
68 . n ( fpg_n [ g i ] ) , . busy ( fpg_busy [ g i ] ) , . addr_out ( fpg_addr_out [ g i
] ) ,
. i t e r a t i on s_read ( fpg_read [ g i ] )
70 ) ;
end
72 endgenerate
74 // MEMORY FOR X AND Y COORDINATES
// So that each coord inate only i s c a l c u l a t ed once .
76 reg [ 3 1 : 0 ] xmem_addr , ymem_addr ;
reg [ 3 2 : 0 ] xmem_in , ymem_in ; // 33 b i t s , 1 b i t i s a va l i d b i t .
78 wire [ 3 2 : 0 ] xmem_out , ymem_out ;
reg xmem_en,xmem_we, ymem_en, ymem_we;
80
block_ram #(NUMPOINTS, 33) xmem(
82 .CLK(CLK) , .ADDRESS(xmem_addr) ,
.DATA_IN(xmem_in) , .DATA_OUT(xmem_out) ,
84 .WE(xmem_we) , .EN(xmem_en)
) ;
86
block_ram #(NUMPOINTS, 33) ymem(
88 .CLK(CLK) , .ADDRESS(ymem_addr) ,
.DATA_IN(ymem_in) , .DATA_OUT(ymem_out) ,
90 .WE(ymem_we) , .EN(ymem_en)
) ;
92
// ARITMETHIC COMPONENTS
94 // Used f o r c a l c u l a t i n g FP32 numbers .
reg [ 3 1 : 0 ] converter_in ;
96
wire [ 3 1 : 0 ] mul_in ;
98 reg mul_enable ;
reg [ 3 1 : 0 ] add_in ;
100 reg add_enable ;
wire [ 3 1 : 0 ] mul_out , add_out ;
102 wire mul_valid , add_valid ;
104 // 32 b i t i n t e g e r to 32 b i t fp conve r t e r .
vithar_lib_u32_to_f32 conver t e r ( . outp (mul_in ) , . inp ( converter_in )
) ;
106
// fp32 mu l t i p l i e r
108 vithar_lib_f32_mul mul (
. c l k (CLK) , . reset_n (RESET_N) , . enable (mul_enable ) ,
110 . a (FRAME1_STEPSIZE) , . b (mul_in ) , . dout (mul_out ) ,
. v a l i d ( mul_valid )
112 ) ;
V
114 // fp32 adder
vithar_lib_f32_addsub add (
116 . c l k (CLK) , . reset_n (RESET_N) , . enable ( add_enable ) ,
. a (mul_out ) , . b ( add_in ) , . dout ( add_out ) ,
118 . va l i d ( add_valid ) ,
. dout_guard ( ) , . dout_round ( ) , . dout_sticky ( ) // TODO: Needed?
120 ) ;
122
// ARBITER CSM START
124 reg [ 3 1 : 0 ] pred_address ;
126 i n t e g e r i ; // loop counter .
reg [ 3 : 0 ] s t a t e ;
128 // pred_status i s 1 i f the cur rent coo rd ina t e s are pred ic ted , i . e
not p r i o r i t y .
reg pred_status ;
130 reg found ; //Used in feed_coord inates to f i nd an av a i l a b l e fpg
reg [ 3 : 0 ] bu r s t_o f f s e t ;
132
// Burst o f f s e t i s used to get a l l the coo rd ina t e s in an AXI
burst from ju s t
134 // the i n i t i a l address .
a s s i gn fpg_address = ( pred_status ) ? pred_address : (ADDRESS+
bur s t_o f f s e t ) ;
136 a s s i gn ADDRESS_reg =(PRIORITY) ? (ADDRESS+bur s t_o f f s e t ) :
pred_address ;
138 parameter i d l e = 0 , check_xy = 1 , ca lcu late_x=2, ca lcu late_x2=3,
ca lcu late_y=4,
ca lcu late_y2 = 5 , f eed_coord inates = 6 , flush_mem = 7 ,
140 wait_cache_flush = 8 ;
always @ ( posedge CLK or negedge RESET_N)
142 begin
i f ( !RESET_N) begin
144 // Flush memories upon r e s e t
ARBITER_DEBUG <= 0 ;
146 xmem_addr <= (NUMPOINTS−1) ;
xmem_en <= 1 ' b1 ;
148 xmem_we <= 1 ' b1 ;
xmem_in <= 33 ' b0 ;
150 ymem_en <= 1 ' b1 ;
ymem_addr <= (NUMPOINTS−1) ;
152 ymem_we <= 1 ' b1 ;
ymem_in <= 33 ' b0 ;
154 s t a t e <= flush_mem ;
// I n i t i a l s i g n a l va lue s
156 feed_x <= 0 ;
feed_y <= 0 ;
VI
158 fpg_en <= 0 ;
pred_address <= 0 ;
160 pred_status <= 0 ;
bu r s t_o f f s e t <= 0 ;
162 converter_in <= 0 ;
mul_enable <= 0 ;
164 add_in <= 0 ;
add_enable <= 0 ;
166 end
e l s e begin
168 case ( s t a t e )
wait_cache_flush : begin
170 ARBITER_DEBUG <= 1 ;
// Wait f o r the ax i_ in t e r f a c e to f i n i s h f l u s h i n g the cache .
172 // Needed to avoid wr i t i ng to the cache during f l u s h .
i f (ADDRESS == 32 ' h f f f f f f f f ) begin
174 s t a t e <= wait_cache_flush ;
end
176 e l s e begin
s t a t e <= i d l e ;
178 end
end
180 i d l e : begin
fpg_en <= 0 ; // fpg_enable i s s e t in feed_coord inates , only
need high f o r 1 cy c l e .
182 i f ( pred_status ) begin
// prev ious coord inate was a pred i c t ed one .
184 pred_status <= 0 ;
pred_address <= pred_address + 1 ;
186 end
i f ( pred_address >= (NUMPOINTS∗NUMPOINTS) ) begin
188 // Pred i c to r has p r ed i c t ed a l l needed coo rd ina t e s here .
// wait f o r r e s e t
190 s t a t e <= i d l e ;
end
192 e l s e i f (FRAME1_STEPSIZE != 0) begin
// S t ep s i z e i s the l a s t to get updated in my TB. TODO: MAli
might
194 // s t a r t with s t e p s i z e . . . make generaL ? : )
// ADDRESS_reg mod NUMPOINTS, assumes that NUMPOINTS = 2^n
196 xmem_addr <= ADDRESS_reg & (NUMPOINTS−1) ;
// ADDRESS_reg / NUMPOINTS, same assumption .
198 ymem_addr <= ADDRESS_reg >> LOG2_NUMPOINTS;
xmem_en <= 1 ;
200 ymem_en<= 1 ;
s t a t e <= check_xy ;
202 pred_status <= !PRIORITY;
end
204 end // i d l e
VII
206 check_xy : begin
ARBITER_DEBUG <= 2 ;
208 i f ( (xmem_en && xmem_we) | | (ymem_en && ymem_we) ) begin
// a wr i t e has been done during t h i s cyc l e , wr i t e has
p r i o r i t y so
210 // data has now been wr i t t en r e g a r d l e s s o f read . Need to s e t
wr i t e low
// to get data out .
212 xmem_we <= 0 ;
ymem_we <= 0 ;
214 s t a t e <= check_xy ;
end
216 e l s e i f ( (ymem_out [ 3 2 ] && xmem_out [ 3 2 ] ) == 1) begin
// The va lue s e x i s t in the rams and are va l i d .
218 feed_y <= ymem_out ;
feed_x <= xmem_out ;
220 s t a t e <= feed_coord inates ;
xmem_en <= 0 ;
222 ymem_en <= 0 ;
end
224 e l s e i f (xmem_out [ 3 2 ] == 1) begin
// I f xmem == 1 , ymem != 1 , could be x so need to check t h i s
way .
226 converter_in <= ymem_addr ;
mul_enable <= 1 ;
228 s t a t e <= ca lcu late_y ;
end
230 e l s e begin
// Either ymem == 1 , or (xmem && ymem) != 1 . Both could be 0
or x .
232 // Ca lcu la t e xmem.
converter_in <= xmem_addr ;
234 mul_enable <= 1 ;
s t a t e <= calcu late_x ;
236 end
end // check_xy
238
ca lcu late_y : begin
240 // Ca lcu la te the Y coord inate corre spond ing to the address
g iven .
// Y = (FRAME1_Y − (FRAME1_STEPSIZE∗(ADDRESS/NUMPOINTS)
242 // Y = FRAME1_Y − FRAME1_STEPSIZE∗ymem_addr
/∗ The s i gn o f the y coord inate i s i nve r t ed to enable
sub t ra c t i on us ing the same adder .
244 The s i gn o f the r e s u l t i s a l s o inve r t ed to obta in the
c o r r e c t r e s u l t .
(−y+x)∗−1 = y−x
246 ∗/
VIII
i f ( mul_valid ) begin
248 // fp32 add i t i on .
add_in [ 3 0 : 0 ] <= FRAME1_Y[ 3 0 : 0 ] ;
250 add_in [ 3 1 ] <= ~FRAME1_Y[ 3 1 ] ;
add_enable <= 1 ;
252 s t a t e <= calcu late_y2 ;
end
254 e l s e begin
s t a t e <= calcu late_y ;
256 end
end
258 ca lcu late_y2 : begin
i f ( add_valid ) begin
260 ymem_in [ 3 2 ] <= 1 ' b1 ;
ymem_in [ 3 0 : 0 ] <= add_out ;
262 ymem_in [ 3 1 ] <= ~add_out [ 3 1 ] ;
ymem_we <= 1 ;
264 add_enable <= 0 ;
mul_enable <= 0 ;
266 s t a t e <= check_xy ;
end
268 e l s e begin
s t a t e <= calcu late_y2 ;
270 end
end
272
ca lcu late_x : begin
274 // Ca lcu la te the X coord inate corre spond ing to the address
g iven .
// X = FRAME1_X + FRAME1_STEPSIZE∗(ADDRESS % NUMPOINTS) ;
276 // X = FRAME1_X + FRAME1_STEPSIZE∗xmem_addr ;
i f ( mul_valid ) begin
278 // fp32 add i t i on
add_in <= FRAME1_X;
280 add_enable <= 1 ;
s t a t e <= calcu late_x2 ;
282 end
e l s e begin
284 s t a t e <= ca lcu late_x ;
end
286 end
288 ca lcu late_x2 : begin
i f ( add_valid ) begin
290 xmem_in [ 3 2 ] <= 1 ' b1 ;
xmem_in [ 3 1 : 0 ] <= add_out ;
292 xmem_we <= 1 ;
add_enable <= 0 ;
294 mul_enable <= 0 ;
IX
s t a t e <= check_xy ;
296 end
e l s e begin
298 s t a t e <= calcu late_x2 ;
end
300 end
302 feed_coord inates : begin
// Finds a f r e e fpg . Enable i s r e s e t to 0 in i d l e .
304 found = 0 ;
f o r ( i =0; i<NUMUNITS; i=i +1) begin : f ind_free_fpg
306 i f ( ! found ) begin
i f ( fpg_busy [ i ]==2'b00 ) begin
308 fpg_en [ i ] <= 1 ;
found = 1 ;
310 s t a t e <= i d l e ;
// Adjust bu r s t_o f f s e t to get the next coord inate in the
burst .
312 // Needs to be i n s i d e loop to only add when s t a t e changes
: )
i f ( ! pred_status ) begin
314 i f ( bu r s t_o f f s e t == 15)
bu r s t_o f f s e t <= 0 ;
316 e l s e
bu r s t_o f f s e t <= bur s t_o f f s e t + 1 ;
318 end
end
320 end
end
322 end
324 flush_mem : begin
// Reset a l l va lue s in xmem and ymem to zero .
326 i f ( ! ymem_addr) begin
// Flush complete
328 s t a t e <= wait_cache_flush ;
xmem_en <= 1 ' b0 ;
330 xmem_we <= 1 ' b0 ;
ymem_en <= 1 ' b0 ;
332 ymem_we <= 1 ' b0 ;
end
334 e l s e begin
// I t e r a t e through memories .
336 // Inputs are zero , wr i t e and enable == 1 here .
xmem_addr <= xmem_addr − 1 ' b1 ;
338 ymem_addr <= ymem_addr − 1 ' b1 ;
s t a t e <= flush_mem ;
340 end
end // flush_mem
X
342 endcase
end // e l s e RESET_N
344 end
346
348 // CACHE_WRITER − wr i t e s f i n i s h e d coo rd ina t e s to cache .
reg wr i t e r_sta te ;
350 reg z_found ; // Used to f i nd f i n i s h e d fpg
i n t e g e r j ; // wr i t e r fpg counter
352 parameter wr i t e = 1 ;
354 always @ ( posedge CLK or negedge RESET_N)
begin
356 i f ( !RESET_N) begin
// r e s e t s t u f f
358 CACHE_WRITE <= 0 ;
CACHE_DATA <= 0 ;
360 CACHE_ADDRESS <= 0 ;
fpg_read <= 0 ;
362 wr i t e r_sta te <= i d l e ;
end
364 e l s e begin
case ( wr i t e r_sta te )
366 i d l e : begin
z_found = 0 ;
368 f o r ( j =0; j<NUMUNITS; j=j+1)begin : f ind_f in i shed_fpg
i f ( ! z_found ) begin
370 i f ( fpg_busy [ j ]==2'b10 ) begin
// The fpg has c a l c u l a t ed the number o f i t e r a t i o n s f o r the
Z coord inate @ fpg_addr_out .
372 CACHE_ADDRESS <= fpg_addr_out [ j ] ;
CACHE_DATA[ 7 : 0 ] <= fpg_n [ j ] ;
374 CACHE_WRITE <= 1 ;
fpg_read [ j ] <= 1 ;
376 wr i t e r_sta te <= wr i t e ;
z_found = 1 ;
378 end
end
380 end
end
382 wr i t e : begin
fpg_read<= 0 ; // Not read ing from anyone here . They can keep
working i f f r e e .
384 CACHE_WRITE <= 0 ;
wr i t e r_sta te <= i d l e ;
386 end
endcase
388 end // e l s e i f RESET_N
XI
end
390 endmodule
../source/generator/arbiter.v
// This module i s an axi_s lave r e s p on s i b l e f o r t r a n s f e r r i n g the
2 // Z−coo rd ina t e s s to r ed in the coordinate_cache through the AXI
bus to MALI.
// The module reads the address g iven by Mali and the 31
subsequent addre s s e s
4 // from the coordinate_cache . This w i l l g ive a t o t a l o f 32 Z−
coo rd ina t e s to
// t r an s f e r , 16 each burst . Each coord inate i s 8 b i t s .
6 // I f any o f the Z−coo rd ina t e s are miss ing from the cache , the
i n t e r f a c e
// has to ask the FPG−a r b i t e r to c a l c u l a t e them and then t r a n s f e r
.
8
// Written by Per Kr i s t i an K jø l l .
10
// Need to d e f i n e parameter NUMPOINTS.
12
module ax i_ in t e r f a c e (
14 // AXI SIGNALS − have only inc luded the ones in use .
input ACLK,
16 input ARESET_N,
input [ 3 1 : 0 ]ARADDR,
18 input [ 3 : 0 ] ARLEN, // Number o f bur s t s . Usual ly = 2
input [ 4 : 0 ] ARID,
20 // ARSIZE = max number o f bytes in a burst , s e e t ab l e A3−2,
Usual ly = 0b100 = 16
// bytes
22 input [ 2 : 0 ] ARSIZE ,
input [ 1 : 0 ] ARBURST, // Burst type . Usual = Incrementa l = 0b01
24 input ARVALID,
input RREADY, // Master i s ready to accept the read data and
response in fo rmat ion
26 output reg ARREADY, // Ready to accept address and con t r o l
s i g n a l s .
output reg [ 4 : 0 ] RID ,
28 output reg RVALID,
output reg RLAST,
30 output reg [ 1 2 7 : 0 ] RDATA,
output reg DEBUG, // s i g n a l to i nd i c a t e unexpected inputs . . . f o r
t e s t i n g only .
32 /∗ AXI_DEBUG i s used to debug v ia APB−i n t e r f a c e .
∗ I t cu r r en t l y counts the number o f i n i t i a t e d AXI−reads .
34 ∗/
output reg [ 3 1 : 0 ]AXI_DEBUG,
XII
36 // COORDINATE_CACHE SIGNALS
output reg [ 3 1 : 0 ] CACHE_ADDRESS, // cache has space f o r one
frame cu r r en t l y .
38 output reg CACHE_READ_BURST,
output reg CACHE_CLEAR_BURST,
40
input [ 1 2 7 : 0 ] CACHE_BURST,
42 input [ 1 : 0 ] CACHE_BURST_VALID,
input ARBITER_WRITE, // The a r b i t e r i s cu r r en t l y wr i t i ng to the
cache .
44 // ARBITER SIGNALS
// Used to c a l c u l a t e coo rd ina t e s o f the s i g n a l s that are i n v a l i d
at time o f r eque s t .
46 output reg [ 3 1 : 0 ] ARBITER_ADDR, // TODO: Combine t h i s with
CACHE_ADDRESS?
output reg ARBITER_PRIORITY
48 ) ;
50 // AXI r e g i s t e r s
reg [ 1 : 0 ] burst_counter ;
52 // Keeps t rack o f how many o f t r a n s f e r c o o r d i n a t e s have yet to be
checked
reg [ 1 2 7 : 0 ] burst1 ;
54 reg [ 1 2 7 : 0 ] burst2 ;
56 reg [ 2 : 0 ] s t a t e ;
parameter wait ing_for_address = 0 , check_coordinates_a=1,
check_coordinates_b=2,
58 check_coordinates = 3 , send_coordinates = 4 ,
send_coordinates2 = 5 ,
end_transfer = 6 , f lush_cache=7;
60
parameter s tar t_address = 32 ' h1000_0000 ; // This i s the s t a r t o f
the AXI address r eg i on o f the f r a c t a l generato r .
62 // State machine
always @ ( posedge ACLK or negedge ARESET_N)
64 begin
i f ( !ARESET_N) begin
66 s t a t e <= flush_cache ;
CACHE_ADDRESS <= 32 ' h00003 f f c ; // TODO: Check , 3 f e a = 128x128
l a s t address .
68 CACHE_CLEAR_BURST <= 1 ;
ARBITER_ADDR <= 32 ' h f f f f f f f f ;
70 burst1 <= 0 ;
burst2 <= 0 ;
72 burst_counter <= 0 ;
ARREADY <= 0 ;
74 RID <= 0 ;
RVALID <= 0 ;
XIII
76 RDATA <= 0 ;
RLAST <= 0 ;
78 DEBUG <= 0 ;
CACHE_READ_BURST <= 0 ;
80 ARBITER_PRIORITY <= 0 ;
AXI_DEBUG <= 0 ;
82 end
e l s e begin
84 case ( s t a t e )
wait ing_for_address : begin
86 ARREADY <= 1 ;
i f (ARVALID == 1) begin
88 s t a t e <= check_coordinates_a ;
AXI_DEBUG <= AXI_DEBUG + 1 ' b1 ;
90 RID <= ARID;
// This works s i n c e address i s 0x1000_0000
92 // ARADDR i s d iv ided by 4 . Each coord inate i s 4 bytes and in
the
// cache each coord inate has i t s own address whi l e e x t e r n a l l y
i t i s
94 // byte addressed .
CACHE_ADDRESS[ 3 1 : 2 8 ] <= 4 ' b0000 ;
96 CACHE_ADDRESS[ 2 7 : 0 ] <= (ARADDR[27:0] >>2) ; // General : ARADDR
[31 :0 ] − s tar t_address ; // TODO: Faster g ene ra l way?
CACHE_READ_BURST <= 1 ;
98 // Number o f bur s t s = ARLEN + 1
i f (ARLEN == 1) begin
100 burst_counter <= 2 ;
end
102 e l s e i f (ARLEN == 0) begin
// burst_counter = 0 supports bur s t s o f l ength 1 .
104 DEBUG <= 1 ' b1 ;
burst_counter <= 0 ;
106 end
e l s e begin
108 // I n t e r f a c e only supports ARLEN==1 and ARLEN==2
DEBUG <= 1 ' b1 ;
110 end
i f ( (ARBURST != 2 ' b01 ) | | (ARSIZE != 3 ' b100 ) ) begin
112 DEBUG <= 1 ; // Not incrementa l burst , f i nd out why .
end
114 end
end
116
check_coordinates_a : begin
118 // Read burst a r r i v e s to ax i on the second cy c l e . Wait a cy c l e
here .
s t a t e <= check_coordinates_b ;
120 end
XIV
check_coordinates_b : begin
122 s t a t e <= check_coordinates ;
end
124
check_coordinates : begin
126 // Checks i f a l l c oo rd ina t e s in burst are va l id , i f not
c a l c u l a t e s the
// ones that aren ' t by us ing the a r b i t e r .
128 // TODO: This uses 1 cy c l e each coord inate , ihope . Could modify
coordinate_cache
// to check a l l c oo rd ina t e s in a cy c l e . . but then would not know
which , i f
130 // any f a i l s . Or I could , with a big check vec to r . .hmm.
// Find out how o f t en a l l coords are va l i d . . i f they are most o f
132 // the time , make a f a s t e r check . Po t en t i a l speedup here , i f i t
i s too
// slow .
134 i f (CACHE_BURST_VALID[ 0 ] == 1 ' b1 ) begin
// The e n t i r e burst i s v a l i d .
136 ARBITER_ADDR <= 0 ;
ARBITER_PRIORITY <= 0 ;
138 burst_counter <= burst_counter − 1 ;
i f ( burst_counter == 1) begin
140 // Store the second burst .
burst2 <= CACHE_BURST;
142 CACHE_READ_BURST <= 0 ;
s t a t e <= send_coordinates ;
144 end
e l s e begin
146 // Store the f i r s t burst
burst1 <= CACHE_BURST;
148 CACHE_ADDRESS <= CACHE_ADDRESS + 4 ; // TODO: Check t h i s .
s t a t e <= check_coordinates_a ;
150 end
end
152 e l s e i f (CACHE_BURST_VALID[ 1 ] == 1 ' b1 ) begin
// There was a wr i t e during the l a s t read . This should r a r e l y
happen .
154 // Burst might be va l i d here but not updated yet . Wait .
s t a t e <= check_coordinates ;
156 end
e l s e begin
158 // The burst i s not va l id , send the i n t i t a l address o f the
burst to
// a r b i t e r . Some o f the coo rd ina t e s in the burst might be
ready , but no matter .
160 ARBITER_ADDR <= CACHE_ADDRESS; // stay in same s t a t e un t i l
v a l i d .
ARBITER_PRIORITY <= 1 ;
XV
162 s t a t e <= check_coordinates ;
end
164 end
166 send_coordinates : begin
// Sends two bur s t s in two cy c l e s .
168 // TODO: Check t imings here more c a r e f u l l y .
i f (RREADY <= 1) begin
170 RDATA <= burst1 ;
RVALID <= 1 ; // t iming ?
172 i f ( burst_counter == 2 ' b00 ) begin
s t a t e <= send_coordinates2 ;
174 end
e l s e begin
176 // TODO: As o f now , t h i s w i l l never happen?
// The burst_counter s t a r t ed at 0 (ARLEN == 0 , burst l ength =
1)
178 RLAST <= 1 ;
s t a t e <= end_trans fer ; // This i s added to support ARLEN == 0
180 end
end
182 e l s e
s t a t e <= send_coordinates ;
184 end
186 send_coordinates2 : begin
RLAST <= 1 ;
188 RDATA <= burst2 ;
s t a t e <= end_trans fer ;
190 end
192 end_transfer : begin
RVALID <= 0 ; // Here or the prev ious cy c l e ?
194 RLAST <= 0 ;
ARREADY <= 0 ; // TODO: Don ' t th ink t h i s i s nece s sa ry
196 s t a t e <= wait ing_for_address ;
end
198
f lush_cache : begin
200 // De l e t e s a l l data in the cache . This s h a l l be done at r e s e t
between each frame .
i f ( !CACHE_ADDRESS) begin
202 // Flush complete
s t a t e <= wait ing_for_address ;
204 CACHE_CLEAR_BURST <= 1 ' b0 ;
ARBITER_ADDR <= 32 ' h00000000 ;
206 end
e l s e begin
208 CACHE_ADDRESS <= CACHE_ADDRESS − 4 ;
XVI
s t a t e <= flush_cache ;
210 end
end // f lush_cache
212
d e f au l t : begin
214 // shouldn ' t happen ^^,
DEBUG <= 1 ;
216 end
endcase
218 end // reset_n e l s e
end
220 endmodule
../source/generator/axi_interface.v
// block ram f o r use with the coord inate cache module .
2 // Simple synchronous ram to make use o f the X i l i nx block ram on
the FPGA.
//
4
6 module block_ram (CLK, ADDRESS, WE, EN, DATA_IN, DATA_OUT) ;
8 parameter DEPTH = 1024 ;
parameter WIDTH = 8 ; // b i t pr read burst .
10
input CLK,WE,EN;
12 input [ 3 1 : 0 ] ADDRESS;
input [WIDTH−1:0 ] DATA_IN;
14
output reg [WIDTH−1:0 ] DATA_OUT;
16
18 // Note that the l a s t width b i t i s used as a va l i d b i t .
reg [WIDTH−1:0 ] mem [ 0 : (DEPTH−1) ] ;
20
always @ ( posedge CLK)
22 begin
i f (EN) begin
24 i f (WE==1)
// Disp lays new value ASAP.
26 DATA_OUT <= DATA_IN;
e l s e
28 DATA_OUT <= mem[ADDRESS ] ;
end
30 e l s e
DATA_OUT <= 0 ;
32 end
XVII
34 always @ ( posedge CLK)
begin
36 i f (EN && WE)
begin
38 mem[ADDRESS] <= DATA_IN;
end
40 end
endmodule
../source/generator/block_ram.v
1 // Explain f u n c t i o n a l i t y here : )
3
// Set syn_preserve s ynp l i f y d i r e c t i v e to deac t i va t e equ iva l en t
opt imiza t i on .
5 // Otherwise i t opt imize s away s e v e r a l o f the RAM−b locks .
7
module coordinate_cache (
9 input CLK,
input [ 3 1 : 0 ]AXI_ADDRESS,
11 input [ 3 1 : 0 ]ARBITER_ADDRESS,
input READ_COORDINATE,
13 input WRITE,
input READ_BURST, // s p e c i a l f unc t i on to speed up the usua l
r eque s t behavior .
15 input [ 7 : 0 ] DATA_IN, // data in should be a l s o conta in the two
s ta tu s b i t s : )
input CLEAR_BURST,
17
output [ 1 : 0 ] BURST_VALID,
19 output reg [ 7 : 0 ] COORDINATE, //data out from s i n g l e read
output [ 1 2 7 : 0 ]COORDINATE_BURST //data out from burst read
21 ) ;
23
// reg [ 3 3 : 0 ] mem_bank [ 0 : (NUMPOINTS^2) ] ;
25 // mem_bank has two extra s t a tu s b i t s in add i t i on to the 8 f o r Z−
i t e r a t i o n s ;
// read and va l i d . Read i s used by a r b i t e r and va l i d i s used by
ax i .
27 parameter xpo int s = 256 ;
parameter ypo int s = 256 ;
29 parameter to ta l_po int s = ( xpo int s ∗ ypo int s ) ;
parameter BLOCKSIZE = tota l_po int s /4 ;
31
33 parameter NUMBER_OF_RAMS = 4 ;
XVIII
35
reg [ 3 1 : 0 ] blockAddress ;
37
reg [ 7 : 0 ] ram_in ;
39 reg we ;
reg [NUMBER_OF_RAMS−1:0 ] en ;
41 wire [ 7 : 0 ] blockOut [NUMBER_OF_RAMS−1 : 0 ] ;
wire byte_val id ;
43 reg write_during_read ;
45 a s s i gn BURST_VALID[ 0 ] = byte_val id ;
a s s i gn BURST_VALID[ 1 ] = write_during_read ;
47
49
// Generate memories . These should be i n f e r r e d to Xi l i nx block
RAM.
51
generate
53 genvar k ;
f o r ( k=0;k<NUMBER_OF_RAMS; k=k+1)begin :BLOCKS
55 /∗ s yn th e s i s syn_preserve = 1 ∗/
block_ram #(BLOCKSIZE) ram(
57 .CLK(CLK) , .EN( en [ k ] ) ,
.ADDRESS( blockAddress ) , .WE(we) ,
59 .DATA_IN( ram_in ) , .DATA_OUT( blockOut [ k ] )
) ;
61 end
endgenerate
63
// Convert burst output from u8 to f32 . This i s done s i n c e
65 // f l o a t i s r equ i e r ed ( ? ) when us ing a t t r i b u t e s in the shader .
wire [ 3 1 : 0 ] u32_in [ 3 : 0 ] ;
67 wire [ 3 1 : 0 ] f32_out [ 3 : 0 ] ;
69 generate
genvar k ;
71 f o r ( k=0;k<4;k=k+1) begin
vithar_lib_u32_to_f32 u_vithar_lib_u32_to_f32
73 (
. inp ( u32_in [ k ] ) , . outp ( f32_out [ k ] )
75 ) ;
end
77 endgenerate
79
a s s i gn u32_in [ 0 ] = blockOut [ 0 ] ;
81 a s s i gn u32_in [ 1 ] = blockOut [ 1 ] ;
XIX
a s s i gn u32_in [ 2 ] = blockOut [ 2 ] ;
83 a s s i gn u32_in [ 3 ] = blockOut [ 3 ] ;
85 a s s i gn COORDINATE_BURST[ 3 1 : 0 ] = f32_out [ 0 ] ;
a s s i gn COORDINATE_BURST[ 6 3 : 3 2 ] = f32_out [ 1 ] ;
87 a s s i gn COORDINATE_BURST[ 9 5 : 6 4 ] = f32_out [ 2 ] ;
a s s i gn COORDINATE_BURST[ 1 2 7 : 9 6 ] = f32_out [ 3 ] ;
89
// I f e i t h e r o f the bur s t s are a l l ze roes , the burst i s not va l i d
.
91 a s s i gn byte_val id=((COORDINATE_BURST[ 3 1 : 0 ] & COORDINATE_BURST
[ 6 3 : 3 2 ] &
COORDINATE_BURST[ 9 5 : 6 4 ] & COORDINATE_BURST [ 1 2 7 : 9 6 ] )
93 == 32 ' b0 ) ? 1 ' b0 : 1 ' b1 ;
// Block RAM con t r o l
95 // TODO: This does not have to be c locked ? : )
always @ ( posedge CLK) begin
97 // Write has p r i o r i t y .
// Arb i te r never has to read , only wr i t e .
99 ram_in <= DATA_IN;
i f (WRITE) begin
101 we <= 1 ;
en <= 0 ;
103 en [ARBITER_ADDRESS % NUMBER_OF_RAMS] <= 1 ;
COORDINATE <= 0 ;
105 blockAddress <= (ARBITER_ADDRESS >> 2) ;
i f (READ_BURST) begin
107 write_during_read <= 1 ;
end
109 end
e l s e i f (CLEAR_BURST == 1) begin
111 // Clear s one row in the RAM.
ram_in <= 8 ' h00 ;
113 we <= 1 ;
en [ 3 : 0 ] <= 4 ' hf ;
115 blockAddress <= (AXI_ADDRESS >> 2) ;
end
117 e l s e i f (READ_BURST) begin
we <= 0 ;
119 blockAddress <= (AXI_ADDRESS >> 2) ;
en [ 3 : 0 ] <= 4 ' hf ;
121 end
e l s e begin
123 we <= 0 ;
en <= 0 ;
125 blockAddress <= 0 ;
write_during_read <= 0 ;
127 COORDINATE <= 0 ;
end
XX
129 end
endmodule
../source/generator/coordinate_cache.v
// This i s top−l e v e l o f the f r a c t a l generator , i t w i l l connect
a l l the sub−components .
2
module fracta l_generator_main (
4 // General
input CLK,
6 input RESET_N,
// APB s i g n a l s f o r r e c e i v i n g s t a r t coo rd ina t e s from Mali d r i v e r .
8 input PSEL, //x ?
input PWRITE,
10 input [ 3 1 : 0 ] PWDATA,
input [ 3 1 : 0 ] PWADDR,
12
input PENABLE,
14 output PREADY,
output [ 3 1 : 0 ] PRDATA,
16 // AXI I /O
input [ 4 : 0 ] ARID,
18 input [ 3 1 : 0 ]ARADDR, // How big should t h i s be?
input [ 3 : 0 ] ARLEN, // Only need 4 b i t s , AXI 3 .
20 input [ 2 : 0 ] ARSIZE , // Max number o f bytes in a burst , s e e t ab l e
A3−2, Usual ly =
input [ 1 : 0 ] ARBURST, // Burst type
22 input ARVALID,
input RREADY, // Master i s ready to accept the read data and
response in fo rmat ion .
24 // l o t s o f ext ra inputs that the module doesn ' t care about .
output ARREADY, // Ready to accept address and con t r o l s i g n a l s .
26 output DEBUG,
// data i s 128 b i t s each burst , u sua l l y two bur s t s => 16∗2 = 32
= 8 coo rd ina t e s
28 output [ 4 : 0 ] RID ,
output [ 1 2 7 : 0 ] RDATA,
30 output RVALID, RLAST
) ;
32 wire [ 3 1 : 0 ] FRAME1_X, FRAME1_Y,FRAME1_STEPSIZE; // APB −> ARBITER
wire [ 1 2 7 : 0 ] CACHE_BURST_OUT; // CACHE −> AXI
34 wire [ 3 1 : 0 ] CACHE_ADDRESS_AXI, CACHE_ADDRESS_ARBITER,
ARBITER_ADDR;
wire [ 7 : 0 ] CACHE_DATA;
36 wire [ 7 : 0 ] CACHE_READ_OUT; // TODO: Bad name IMO. Not cu r r en t l y
used a c t ua l l y .
wire [ 1 : 0 ] CACHE_BURST_VALID;
38 wire CACHE_READ_BURST, CACHE_CLEAR_BURST,
XXI
ARBITER_PRIORITY, CACHE_WRITE, CACHE_READ;
40 wire [ 3 1 : 0 ] ARBITER_DEBUG, AXI_DEBUG;
// r e s o l u t i o n o f f r a c t a l
42 //
parameter numpoints = 128 ; // Points in each d i r e c t i on , t o t a l r e s
= np^2.
44 parameter numunits = 8 ; // The number o f fpgs in the a r b i t e r .
46 apb_inter face apb (
.CLK(CLK) , .RESET_N(RESET_N) ,
48 .PSEL(PSEL) , .PWRITE(PWRITE) , .PWDATA(PWDATA) , .PWADDR(PWADDR) ,
.PRDATA(PRDATA) , .PENABLE(PENABLE) , .PREADY(PREADY) ,
50 . x_fgc (FRAME1_X) , . y_fgc (FRAME1_Y) , . s t ep s i z e_ fg c (
FRAME1_STEPSIZE) ,
.AXI_DEBUG(AXI_DEBUG) , .ARBITER_DEBUG(ARBITER_DEBUG)
52 ) ;
54 ax i_ in t e r f a c e ax i (
.ACLK(CLK) , .ARESET_N(RESET_N) ,
56 .ARID(ARID) , .ARADDR(ARADDR) , .ARLEN(ARLEN) , .ARSIZE(ARSIZE) ,
.ARBURST(ARBURST) , .ARVALID(ARVALID) , .RREADY(RREADY) ,
58 .ARREADY(ARREADY) , .DEBUG(DEBUG) ,
.RID(RID) , .RDATA(RDATA) , .RVALID(RVALID) , .RLAST(RLAST) ,
60 .CACHE_ADDRESS(CACHE_ADDRESS_AXI) , .CACHE_BURST(CACHE_BURST_OUT)
,
.CACHE_READ_BURST(CACHE_READ_BURST) , .CACHE_CLEAR_BURST(
CACHE_CLEAR_BURST) ,
62 .CACHE_BURST_VALID(CACHE_BURST_VALID) , .ARBITER_WRITE(
CACHE_WRITE) , // Arb i te r i s wr i t i ng to cache
.ARBITER_ADDR(ARBITER_ADDR) , .ARBITER_PRIORITY(ARBITER_PRIORITY) ,
64 .AXI_DEBUG(AXI_DEBUG)
) ;
66 //need to add s i n g l e to ax i ?
68 a r b i t e r #(numpoints , numunits ) a r b i t e r (
.CLK(CLK) , .RESET_N(RESET_N) , .ADDRESS(ARBITER_ADDR) , .PRIORITY(
ARBITER_PRIORITY) ,
70 .FRAME1_X(FRAME1_X) , .FRAME1_Y(FRAME1_Y) , .FRAME1_STEPSIZE(
FRAME1_STEPSIZE) ,
.CACHE_ADDRESS(CACHE_ADDRESS_ARBITER) ,
72 .CACHE_WRITE(CACHE_WRITE) , .CACHE_DATA(CACHE_DATA) ,
.ARBITER_DEBUG(ARBITER_DEBUG)
74 ) ;
76 // Current ly no s i n g l e read opera t i on done with the cache . Might
need to
// l a t e r , i f not a l l t r a n s f e r s are in b u r s t s i z e s .
78 coordinate_cache #(numpoints , numpoints ) cache (
.CLK(CLK) ,
XXII
80 .AXI_ADDRESS(CACHE_ADDRESS_AXI) , .ARBITER_ADDRESS(
CACHE_ADDRESS_ARBITER) ,
.READ_COORDINATE(CACHE_READ) , .WRITE(CACHE_WRITE) ,
82 .READ_BURST(CACHE_READ_BURST) , .DATA_IN(CACHE_DATA) ,
.COORDINATE(CACHE_READ_OUT) , .COORDINATE_BURST(CACHE_BURST_OUT) ,
84 .CLEAR_BURST(CACHE_CLEAR_BURST) , .BURST_VALID(CACHE_BURST_VALID)
) ;
86
endmodule // f r a c t a l_gene ra to r
../source/generator/fractal_generator_main.v
XXIII
B Source Code for UVM Veriﬁcation Frame-
work
// Apb i n t e r f a c e t e s t c l a s s e s
2 ` i f n d e f APB_COMP_LIB_SVH
` d e f i n e APB_COMP_LIB_SVH
4
// APB INTERFACE
6 i n t e r f a c e apb_if ; // APB pro to co l s i g n a l s
l o g i c c lk , r e s e tn ;
8 l o g i c pse l , pwrite , pready , penable ;
l o g i c [ 3 1 : 0 ] pwdata , pwaddr , prdata ;
10 l o g i c [ 3 1 : 0 ] x_fgc , y_fgc , s t ep s i z e_ fg c ;
e nd i n t e r f a c e : apb_if
12
// APB DATA ITEM
14 c l a s s apb_item extends uvm_sequence_item ;
rand in t unsigned pwaddr ;
16 rand i n t unsigned pwdata ;
i n t unsigned prdata ;
18 rand l o g i c pwr i te ;
// c on s t r a i n t c1 { pwaddr == 32 ' h000C000 ; } //Random value atm . . .
20
22 // UVM automation macros
`uvm_object_uti ls_begin ( apb_item )
24 `uvm_fie ld_int (pwaddr , UVM_DEFAULT)
`uvm_field_int ( pwdata , UVM_DEFAULT)
26 `uvm_object_utils_end
28 // Constructor
func t i on new ( s t r i n g name = "apb_item" ) ;
30 super . new(name) ;
endfunct ion : new
32 endc l a s s : apb_item
34 // APB SEQUENCE AND SEQUENCER
c l a s s apb_sequencer extends uvm_sequencer #(apb_item ) ;
36 `uvm_component_utils ( apb_sequencer )
func t i on new( s t r i n g name , uvm_component parent ) ;
38 super . new(name , parent ) ;
$d i sp l ay ( "Apb sequencer " ) ;
40 endfunct ion : new
endc l a s s : apb_sequencer
42
c l a s s apb_sequence extends uvm_sequence #(apb_item ) ;
44 func t i on new( s t r i n g name="apb_sequence" ) ;
super . new(name) ;
XXIV
46 $d i sp l ay ( "Apb sequence " ) ;
endfunct ion
48
`uvm_object_uti ls ( apb_sequence )
50
v i r t u a l task body ( ) ;
52 uvm_test_done . r a i s e_ob j e c t i on ( th i s , "APB sequence " ) ;
`uvm_info ( "apb_sequence" , " S ta r t i ng sequence " , UVM_MEDIUM)
54 `uvm_create ( req )
// Conf igure the f r a c t a l generato r
56 // X0 = −1.5
start_item ( req ) ;
58 req . randomize ( ) with { pwaddr==32'h0000c000 ; pwdata==32'
hbfc00000 ; pwr i te ==1'b1 ; } ;
f in i sh_i tem ( req ) ;
60 // Y0 = 0 .8
start_item ( req ) ;
62 req . randomize ( ) with { pwaddr==32'h0000c004 ; pwdata==32'
h3 f4ccccd ; pwr i te ==1'b1 ; } ;
f in i sh_i tem ( req ) ;
64 // STEPSIZE = 0.01
start_item ( req ) ;
66 req . randomize ( ) with { pwaddr==32'h0000c008 ; pwdata==32'
h3c23d70a ; pwr i te ==1'b1 ; } ;
f in i sh_i tem ( req ) ;
68 // Check the ve r s i on r e g i s t e r to ensure c o r r e c t setup .
start_item ( req ) ;
70 req . randomize ( ) with { pwaddr==32'h0000c014 ; pwr i te ==1'b0 ; } ;
f in i sh_i tem ( req ) ;
72 i f ( req . prdata==32'h1337feed ) begin
$d i sp l ay ( "LEETFEED" ) ;
74 end
e l s e begin
76 $d i sp l ay ( " Ikke l e e t : ( " ) ;
$d i sp l ay ( "prdata = %h" , req . prdata ) ;
78 end
// Wait f o r the c l e a r i n g o f caches and memories to complete .
80 // Use the DEBUG r e g i s t e r s to check t h i s
// Check AXI_DEBUG
82
f o r e v e r begin
84 start_item ( req ) ;
req . randomize ( ) with { pwaddr==32'h0000c00c ; pwr i te ==1'b0 ; } ;
86 f in i sh_i tem ( req ) ;
i f ( req . prdata==32'h00000002 ) begin
88 $d i sp l ay ( "ARBITER STARTED. CACHE IS FLUSHED HERE. Ready to
read v ia AXI . " ) ;
uvm_test_done . drop_object ion ( th i s , "APB sequence " ) ;
90 break ;
XXV
end
92 e l s e begin
// Send de lay item
94 $d i sp l ay ( "AXI not s t a r t ed . Prdata = %h" , req . prdata ) ;
start_item ( req ) ;
96 req . randomize ( ) with { pwaddr==32'h0000_0000 ; pwr i te ==1'b0
; } ;
f in i sh_i tem ( req ) ;
98 end
end
100 endtask
endc l a s s : apb_sequence
102
104 //APB DRIVER
c l a s s apb_driver extends uvm_driver #(apb_item ) ;
106 v i r t u a l apb_if apb_vif ;
// UVM automation macros
108 `uvm_component_utils ( apb_driver )
110 // Constructor
func t i on new ( s t r i n g name = "apb_driver " , uvm_component parent=
nu l l ) ;
112 super . new(name , parent ) ;
endfunct ion : new
114
// Build phase r e g i s t e r s the v i f r e s ou r c e
116 func t i on void build_phase (uvm_phase phase ) ;
s t r i n g inst_name ;
118 super . build_phase ( phase ) ;
i f ( ! uvm_config_db#(v i r t u a l apb_if ) : : get ( th i s , "" , "apb_vif " ,
apb_vif ) )
120 begin
`uvm_fatal ( "NOVIF" ,{ " v i r t u a l i n t e r f a c e must be s e t f o r : " ,
get_full_name ( ) , " . apb_vif " }) ;
122 end
endfunct ion : build_phase
124
// run_phase r e t r i e v e s data_items and d r i v e s them
126 task run_phase (uvm_phase phase ) ;
apb_item a_item ;
128 super . run_phase ( phase ) ;
@( posedge apb_vif . r e s e tn ) ;
130 f o r e v e r begin
seq_item_port . get_next_item ( a_item ) ;
132 drive_item ( a_item ) ;
seq_item_port . item_done ( ) ;
134 end
endtask : run_phase
XXVI
136
// drive_item i s the l o g i c r equ i r ed to push the item onto the
apb
138 // In t h i s case i t i s the APB pro to co l s i g n a l s and timing .
task drive_item ( input apb_item item ) ;
140 i f ( item . pwaddr == 0) begin
repeat (1000) @ ( posedge apb_vif . c l k ) ;
142 end
e l s e begin
144 @ ( posedge apb_vif . c l k )
begin
146 apb_vif . p s e l <= 1 ' b1 ;
apb_vif . pwr i te <= item . pwri te ;
148 apb_vif . pwaddr <= item . pwaddr ;
i f ( item . pwr i te == 1 ' b1 ) begin
150 apb_vif . pwdata <= item . pwdata ;
// $d i sp l ay ("Her ska l man sende et item ") ;
152 end
// read
154 e l s e begin
@( posedge apb_vif . pready ) ;
156 // $d i sp l ay ("Her ska l man motta et item ") ;
item . prdata<= apb_vif . prdata ;
158 end
end
160 @ ( posedge apb_vif . c l k )
apb_vif . p s e l <= 1 ' b0 ;
162 apb_vif . pwr i te <= 1 ' b0 ;
end // e l s e
164 endtask : drive_item
endc l a s s : apb_driver
166
// APB MONITOR
168 c l a s s apb_monitor extends uvm_monitor ;
v i r t u a l apb_if apb_vif ;
170 b i t checks_enable = 1 ;
b i t coverage_enable = 1 ;
172
uvm_analysis_port#(apb_item ) item_col lected_port ;
174 event apb_written_event ; // Events needed to t r i g g e r covergroups
protec ted apb_item apb_write ;
176 `uvm_component_utils_begin ( apb_monitor )
`uvm_field_int ( checks_enable , UVM_ALL_ON)
178 `uvm_fie ld_int ( coverage_enable , UVM_ALL_ON)
`uvm_component_utils_end
180 covergroup cov_apb_write @apb_written_event ;
opt ion . per_instance =1;
182 pwaddr : cove rpo int apb_write . pwaddr{ opt ion . auto_bin_max=8;}
pwdata : coverpo int apb_write . pwdata{ opt ion . auto_bin_max=8;}
XXVII
184 endgroup : cov_apb_write
186 // Constructor
func t i on new( s t r i n g name , uvm_component parent ) ;
188 super . new(name , parent ) ;
cov_apb_write = new ( ) ;
190 cov_apb_write . set_inst_name ({ get_full_name ( ) , " . cov_apb_write"
}) ;
apb_write = new ( ) ;
192 item_col lected_port = new( " item_col lected_port " , t h i s ) ;
endfunct ion : new
194
func t i on void build_phase (uvm_phase phase ) ;
196 super . build_phase ( phase ) ;
i f ( ! uvm_config_db#(v i r t u a l apb_if ) : : get ( th i s , "" , "apb_vif " ,
apb_vif ) )
198 `uvm_fatal ( "NOVIF" ,{ " v i r t u a l i n t e r f a c e must be s e t f o r : " ,
get_full_name ( ) , " . apb_vif " }) ;
200 endfunct ion : build_phase
202 v i r t u a l task run_phase (uvm_phase phase ) ;
f o rk
204 c o l l e c t_ t r an s a c t i o n s ( ) ; // Spawn c o l l e c t o r task
j o i n
206 endtask : run_phase
208 v i r t u a l protec t ed task c o l l e c t_ t r an s a c t i o n s ( ) ; // TODO: Why
protec ted ?
@( posedge apb_vif . r e s e tn )
210 f o r e v e r begin
@( posedge apb_vif . c l k i f f ( ( apb_vif . pwr i te==1'b1 ) && ( apb_vif .
p s e l==1'b1 ) ) ) ;
212 begin
// Co l l e c t data from the bus in to apb_write .
214 apb_write . pwaddr = apb_vif . pwaddr ;
apb_write . pwdata = apb_vif . pwdata ;
216 `uvm_info ( get_type_name ( ) , $ s fo rmat f ( "Trans fe r c o l l e c t e d :\\n%
s" ,
apb_write . s p r i n t ( ) ) , UVM_FULL)
218 i f ( checks_enable )
per form_transfer_checks ( ) ;
220 i f ( coverage_enable )
per form_transfer_coverage ( ) ;
222 item_col lected_port . wr i t e ( apb_write ) ;
end
224 end
endtask : c o l l e c t_ t r an s a c t i o n s
226
v i r t u a l protec t ed func t i on void per form_transfer_coverage ( ) ;
XXVIII
228 // S igna l coverage event , t h i s samples the cove rpo in t s .
−> apb_written_event ;
230 endfunct ion : per form_transfer_coverage ;
232 v i r t u a l protec t ed func t i on perform_transfer_checks ( ) ;
// Perform data checks on t r an s_co l l e c t ed here . . .
234 endfunct ion : per form_transfer_checks
236 v i r t u a l f unc t i on void report_phase (uvm_phase phase ) ;
`uvm_info ( get_full_name ( ) , $ s fo rmat f ( "Covergroup ' cov_apb_write '
coverage :%2 f " ,
238 cov_apb_write . get_inst_coverage ( ) ) , UVM_LOW)
endfunct ion : report_phase
240 endc l a s s : apb_monitor
242 // APB AGENT
c l a s s apb_agent extends uvm_agent ;
244 // Agent components
uvm_active_passive_enum i s_ac t i v e = UVM_ACTIVE;
246 apb_sequencer sequencer ;
apb_driver d r i v e r ;
248 apb_monitor monitor ;
v i r t u a l apb_if apb_vif ;
250
// Constructor and UVM macros here
252 `uvm_component_utils_begin ( apb_agent )
`uvm_field_enum ( uvm_active_passive_enum , i s_act ive , UVM_DEFAULT
)
254 `uvm_fie ld_object ( sequencer , UVM_ALL_ON)
`uvm_fie ld_object ( dr ive r , UVM_ALL_ON)
256 `uvm_fie ld_object ( monitor , UVM_ALL_ON)
`uvm_component_utils_end
258 func t i on new( s t r i n g name , uvm_component parent ) ;
super . new(name , parent ) ;
260 endfunct ion : new
262 // Create subcomponents us ing f a c t o r y
v i r t u a l f unc t i on void build_phase (uvm_phase phase ) ;
264 super . build_phase ( phase ) ;
monitor = apb_monitor : : type_id : : c r e a t e ( "monitor " , t h i s ) ;
266 i f ( i s_ac t i v e == UVM_ACTIVE) begin
// Build the sequencer and d r i v e r .
268 sequencer = apb_sequencer : : type_id : : c r e a t e ( " sequencer " , t h i s ) ;
d r i v e r = apb_driver : : type_id : : c r e a t e ( " d r i v e r " , t h i s ) ;
270 end
i f ( ! uvm_config_db#(v i r t u a l apb_if ) : : get ( th i s , "" , "apb_vif " ,
apb_vif ) ) begin
272 `uvm_fatal ( "APB/AGT/NOVIF" , "No v i r t u a l i n t e r f a c e s p e c i f i e d
f o r agent " )
XXIX
end
274 endfunct ion : build_phase
276 // Connect d r i v e r and sequencer
v i r t u a l f unc t i on void connect_phase (uvm_phase phase ) ;
278 i f ( i s_ac t i v e == UVM_ACTIVE) begin
d r i v e r . seq_item_port . connect ( sequencer . seq_item_export ) ;
280 end
endfunct ion : connect_phase
282 endc l a s s : apb_agent
284 ` e n d i f
../source/tbench/apb_comp_lib.sv
// Apb i n t e r f a c e t e s t c l a s s e s
2 ` i f n d e f AXI_COMP_LIB_SVH
` d e f i n e AXI_COMP_LIB_SVH
4
// AXI INTERFACE
6 i n t e r f a c e ax i_ i f ; // APB pro to co l s i g n a l s
l o g i c c lk , r e s e tn ;
8 l o g i c [ 4 : 0 ] ar id , r i d ;
l o g i c [ 3 : 0 ] a r l en ;
10 l o g i c [ 2 : 0 ] a r s i z e ;
l o g i c [ 1 : 0 ] a rbur s t ;
12 l o g i c a rva l id , rready , arready , rva l i d , r l a s t ;
l o g i c [ 3 1 : 0 ] araddr ;
14 l o g i c [ 1 2 7 : 0 ] rdata ;
e nd i n t e r f a c e : ax i_ i f
16
// APB DATA ITEM
18 c l a s s axi_item extends uvm_sequence_item ;
rand l o g i c [ 3 1 : 0 ] araddr ;
20 rand l o g i c [ 4 : 0 ] a r i d ;
// Both bur s t s are s to r ed in rdata .
22 //255−128 i s the f i r s t burst
l o g i c [ 2 5 5 : 0 ] rdata ;
24
l o g i c [ 4 : 0 ] r i d ;
26
// UVM automation macros
28 `uvm_object_uti ls_begin ( axi_item )
`uvm_field_int ( araddr , UVM_DEFAULT)
30 `uvm_fie ld_int ( ar id , UVM_DEFAULT)
`uvm_field_int ( rdata , UVM_DEFAULT)
32 `uvm_fie ld_int ( r id , UVM_DEFAULT)
`uvm_object_utils_end
34
XXX
// Constructor
36 func t i on new ( s t r i n g name = "axi_item" ) ;
super . new(name) ;
38 endfunct ion : new
endc l a s s : axi_item
40
// APB SEQUENCE AND SEQUENCER
42 c l a s s axi_sequencer extends uvm_sequencer #(axi_item ) ;
`uvm_component_utils ( axi_sequencer )
44 func t i on new( s t r i n g name , uvm_component parent ) ;
super . new(name , parent ) ;
46 $d i sp l ay ( "Axi sequencer " ) ;
endfunct ion : new
48 endc l a s s : axi_sequencer
50 c l a s s axi_sequence extends uvm_sequence #(axi_item ) ;
func t i on new( s t r i n g name="axi_sequence " ) ;
52 super . new(name) ;
$d i sp l ay ( "AXI sequence " ) ;
54 endfunct ion
56 `uvm_object_uti ls ( axi_sequence )
58 v i r t u a l task body ( ) ;
uvm_test_done . r a i s e_ob j e c t i on ( th i s , "AXI sequence " ) ;
60 `uvm_info ( " axi_sequence " , " S ta r t i ng sequence " , UVM_MEDIUM)
`uvm_create ( req )
62 // Perform an AXI read to address 0000_0000 .
start_item ( req ) ;
64 $d i sp l ay ( "axi_item s ta r t ed " ) ;
req . randomize ( ) with { araddr==32'h0000_4000 ; } ;
66 f in i sh_i tem ( req ) ;
uvm_test_done . drop_object ion ( th i s , "AXI sequence " ) ;
68 endtask
endc l a s s : axi_sequence
70
72 //APB DRIVER
c l a s s ax i_dr iver extends uvm_driver #(axi_item ) ;
74 v i r t u a l ax i_ i f ax i_v i f ;
// UVM automation macros
76 `uvm_component_utils ( ax i_dr iver )
78 // Constructor
func t i on new ( s t r i n g name = " axi_dr iver " , uvm_component parent=
nu l l ) ;
80 super . new(name , parent ) ;
endfunct ion : new
82
XXXI
// Build phase r e g i s t e r s the v i f r e s ou r c e
84 func t i on void build_phase (uvm_phase phase ) ;
s t r i n g inst_name ;
86 super . build_phase ( phase ) ;
i f ( ! uvm_config_db#(v i r t u a l ax i_ i f ) : : get ( th i s , "" , " ax i_v i f " ,
ax i_v i f ) )
88 begin
`uvm_fatal ( "NOVIF" ,{ " v i r t u a l i n t e r f a c e must be s e t f o r : " ,
get_full_name ( ) , " . ax i_v i f " }) ;
90 end
endfunct ion : build_phase
92
// run_phase r e t r i e v e s data_items and d r i v e s them
94 task run_phase (uvm_phase phase ) ;
axi_item a_item ;
96 super . run_phase ( phase ) ;
@( posedge ax i_vi f . r e s e tn ) ;
98 f o r e v e r begin
seq_item_port . get_next_item ( a_item ) ;
100 drive_item ( a_item ) ;
seq_item_port . item_done ( ) ;
102 end
endtask : run_phase
104
// drive_item i s the l o g i c r equ i r ed to push the item onto the
ax i
106 // In t h i s case i t i s the APB pro to co l s i g n a l s and timing .
task drive_item ( input axi_item item ) ;
108 // Wait f o r the f r a c t a l generato r to be ready .
// Then perform an AXI read .
110 $d i sp l ay ( "Driv ing AXI item" ) ;
f o r e v e r begin
112 @( posedge ax i_vi f . c l k i f f ax i_v i f . arready==1) ;
ax i_v i f . a r v a l i d <= 1 ' b1 ;
114 ax i_v i f . a r l en <= 4 ' b0001 ;
ax i_v i f . a rburs t <= 2 ' b01 ;
116 ax i_v i f . a r s i z e <= 3 ' b001 ;
118 ax i_v i f . araddr <= item . araddr ;
ax i_v i f . a r i d <= item . a r id ;
120 $d i sp l ay ( "AXI read reques ted " ) ;
break ;
122 end
@ ( posedge ax i_vi f . c l k ) ;
124 @ ( posedge ax i_vi f . c l k )
ax i_v i f . a r v a l i d <= 1 ' b0 ;
126 ax i_v i f . rready <= 1 ' b1 ; // Ready to accept data .
128 // Wait f o r t ransmi s s i on from the f r a c t a l generato r
XXXII
@( posedge ax i_vi f . r v a l i d ) ;
130 $d i sp l ay ( "Rece iv ing AXI−data . " ) ;
item . rdata [127:0] <= ax i_vi f . rdata ;
132 item . r i d <= axi_vi f . r i d ;
@( posedge ax i_vi f . c l k ) ;
134 item . rdata [ 2 5 5 : 1 2 8 ] <= ax i_vi f . rdata ;
endtask : drive_item
136 endc l a s s : ax i_dr iver
138 // APB MONITOR
c l a s s axi_monitor extends uvm_monitor ;
140 v i r t u a l ax i_ i f ax i_v i f ;
b i t checks_enable = 1 ;
142 b i t coverage_enable = 1 ;
144 uvm_analysis_port#(axi_item ) item_col lected_port ;
event axi_read_event ; // Events needed to t r i g g e r covergroups
146 protec t ed axi_item axi_read ;
`uvm_component_utils_begin ( axi_monitor )
148 `uvm_fie ld_int ( checks_enable , UVM_ALL_ON)
`uvm_field_int ( coverage_enable , UVM_ALL_ON)
150 `uvm_component_utils_end
covergroup cov_axi_read @axi_read_event ;
152 opt ion . per_instance =1;
rdata : cove rpo int axi_read . rdata { opt ion . auto_bin_max=8;}
154 r i d : coverpo int axi_read . r i d { opt ion . auto_bin_max=8;}
endgroup : cov_axi_read
156
// Constructor
158 func t i on new( s t r i n g name , uvm_component parent ) ;
super . new(name , parent ) ;
160 cov_axi_read = new ( ) ;
cov_axi_read . set_inst_name ({ get_full_name ( ) , " . cov_axi_read" }) ;
162 axi_read = new ( ) ;
i tem_col lected_port = new( " item_col lected_port " , t h i s ) ;
164 endfunct ion : new
166 func t i on void build_phase (uvm_phase phase ) ;
super . build_phase ( phase ) ;
168 i f ( ! uvm_config_db#(v i r t u a l ax i_ i f ) : : get ( th i s , "" , " ax i_v i f " ,
ax i_v i f ) )
`uvm_fatal ( "NOVIF" ,{ " v i r t u a l i n t e r f a c e must be s e t f o r : " ,
170 get_full_name ( ) , " . ax i_v i f " }) ;
endfunct ion : build_phase
172
v i r t u a l task run_phase (uvm_phase phase ) ;
174 phase . r a i s e_ob j e c t i on ( t h i s ) ;
f o rk
176 c o l l e c t_ t r an s a c t i o n s ( ) ; // Spawn c o l l e c t o r task
XXXIII
j o i n
178 phase . drop_object ion ( t h i s ) ;
endtask : run_phase
180
v i r t u a l protec t ed task c o l l e c t_ t r an s a c t i o n s ( ) ; // TODO: Why
protec ted ?
182 @( posedge ax i_vi f . c l k i f f ( ax i_v i f . r v a l i d==1'b1 ) ) ;
// Co l l e c t data from the bus in to axi_read .
184 $d i sp l ay ( " Co l l e c t i n g ax i data" ) ;
axi_read . araddr =ax i_v i f . araddr ;
186 axi_read . rdata [ 1 2 7 : 0 ] = ax i_v i f . rdata ;
axi_read . r i d = ax i_vi f . r i d ;
188 @( posedge ax i_vi f . c l k ) ;
axi_read . rdata [ 2 5 5 : 1 2 8 ] = ax i_v i f . rdata ;
190 $d i sp l ay ( " Co l l e c t i n g ax i data2" ) ;
`uvm_info ( get_type_name ( ) , $ s fo rmat f ( "Trans fe r c o l l e c t e d :\\n%s
" ,
192 axi_read . s p r i n t ( ) ) , UVM_FULL)
i f ( checks_enable ) begin
194 perform_transfer_checks ( ) ;
end
196 i f ( coverage_enable ) begin
perform_transfer_coverage ( ) ;
198 $d i sp l ay ( " wr i t i ng ax i data" ) ;
i tem_col lected_port . wr i t e ( axi_read ) ;
200 end
endtask : c o l l e c t_ t r an s a c t i o n s
202
v i r t u a l protec t ed func t i on void per form_transfer_coverage ( ) ;
204 // S igna l coverage event , t h i s samples the cove rpo in t s .
−> axi_read_event ;
206 endfunct ion : per form_transfer_coverage ;
208 v i r t u a l protec t ed func t i on perform_transfer_checks ( ) ;
// Perform data checks on t r an s_co l l e c t ed here . . .
210 endfunct ion : per form_transfer_checks
212 v i r t u a l f unc t i on void report_phase (uvm_phase phase ) ;
`uvm_info ( get_full_name ( ) , $ s fo rmat f ( "Covergroup ' cov_axi_read '
coverage :%2 f " ,
214 cov_axi_read . get_inst_coverage ( ) ) , UVM_LOW)
endfunct ion : report_phase
216 endc l a s s : axi_monitor
218 // APB AGENT
c l a s s axi_agent extends uvm_agent ;
220 // Agent components
uvm_active_passive_enum i s_ac t i v e = UVM_ACTIVE;
222 axi_sequencer sequencer ;
XXXIV
ax i_dr iver d r i v e r ;
224 axi_monitor monitor ;
v i r t u a l ax i_ i f ax i_v i f ;
226
// Constructor and UVM macros here
228 `uvm_component_utils_begin ( axi_agent )
`uvm_field_enum ( uvm_active_passive_enum , i s_act ive , UVM_DEFAULT
)
230 `uvm_fie ld_object ( sequencer , UVM_ALL_ON)
`uvm_fie ld_object ( dr ive r , UVM_ALL_ON)
232 `uvm_fie ld_object ( monitor , UVM_ALL_ON)
`uvm_component_utils_end
234 func t i on new( s t r i n g name , uvm_component parent ) ;
super . new(name , parent ) ;
236 endfunct ion : new
238 // Create subcomponents us ing f a c t o r y
v i r t u a l f unc t i on void build_phase (uvm_phase phase ) ;
240 super . build_phase ( phase ) ;
monitor = axi_monitor : : type_id : : c r e a t e ( "monitor " , t h i s ) ;
242 i f ( i s_ac t i v e == UVM_ACTIVE) begin
// Build the sequencer and d r i v e r .
244 sequencer = axi_sequencer : : type_id : : c r e a t e ( " sequencer " , t h i s ) ;
d r i v e r = axi_dr iver : : type_id : : c r e a t e ( " d r i v e r " , t h i s ) ;
246 end
i f ( ! uvm_config_db#(v i r t u a l ax i_ i f ) : : get ( th i s , "" , " ax i_v i f " ,
ax i_v i f ) ) begin
248 `uvm_fatal ( "APB/AGT/NOVIF" , "No v i r t u a l i n t e r f a c e s p e c i f i e d
f o r agent " )
end
250 endfunct ion : build_phase
252 // Connect d r i v e r and sequencer
v i r t u a l f unc t i on void connect_phase (uvm_phase phase ) ;
254 i f ( i s_ac t i v e == UVM_ACTIVE) begin
d r i v e r . seq_item_port . connect ( sequencer . seq_item_export ) ;
256 end
endfunct ion : connect_phase
258 endc l a s s : axi_agent
260
` e n d i f /∗ AXI_COMP_LIB_SVH ∗/
../source/tbench/axi_comp_lib.sv
// Frac ta l v e r i f i c a t i o n top l e v e l module
2
` i n c l u d e "uvm_pkg . sv"
4 import uvm_pkg : : ∗ ;
XXXV
` i n c l u d e "apb_comp_lib . sv"
6 ` i n c l u d e "axi_comp_lib . sv"
` i n c l u d e " f r a c t a l_ t e s t l i b . sv"
8 ` i n c l u d e " fracta l_generator_main . v"
10 module fractal_tb_top ;
12 apb_if apb_vif ( ) ; // I n t e r f a c e to the component
ax i_ i f ax i_v i f ( ) ;
14
fracta l_generator_main f r a c t a l_gene ra to r (
16 .CLK( apb_vif . c l k ) , .RESET_N( apb_vif . r e s e tn ) ,
// APB
18 .PSEL( apb_vif . p s e l ) , .PWRITE( apb_vif . pwr i te ) ,
.PREADY( apb_vif . pready ) , .PENABLE( apb_vif . penable ) ,
20 .PWDATA( apb_vif . pwdata ) , .PWADDR( apb_vif . pwaddr ) ,
.PRDATA( apb_vif . prdata ) ,
22 // AXI
.ARID( ax i_v i f . a r i d ) , .ARADDR( ax i_vi f . araddr ) , .ARLEN( ax i_v i f .
a r l en ) ,
24 .ARSIZE( ax i_v i f . a r s i z e ) , .ARBURST( ax i_v i f . a rbur s t ) ,
.ARVALID( ax i_v i f . a r v a l i d ) , .ARREADY( ax i_vi f . arready ) ,
26 .RREADY( ax i_vi f . rready ) ,
.RID( ax i_v i f . r i d ) , .RDATA( ax i_vi f . rdata ) ,
28 .RVALID( ax i_v i f . r v a l i d ) , .RLAST( ax i_v i f . r l a s t ) ,
.DEBUG()
30 ) ;
/∗
32 apb_inter face apb (
.CLK( apb_vif . c l k ) , .RESET_N( apb_vif . r e s e tn ) ,
34 .PSEL( apb_vif . p s e l ) , .PWRITE( apb_vif . pwr i te ) , .PREADY( apb_vif .
pready ) , .PENABLE( apb_vif
. penable ) ,
36 .PWDATA( apb_vif . pwdata ) , .PWADDR( apb_vif . pwaddr ) ,
. x_fgc ( apb_vif . x_fgc ) , . y_fgc ( apb_vif . y_fgc ) , . s t ep s i z e_ fg c (
apb_vif . s t ep s i z e_ fg c )
38 ) ;
∗/
40
a s s i gn ax i_ i f . c l k = apb_if . c l k ;
42 a s s i gn ax i_ i f . r e s e tn = apb_if . r e s e tn ;
44 i n i t i a l begin
uvm_config_db#(v i r t u a l apb_if ) : : s e t (uvm_root : : get ( ) , "∗" , "
apb_vif " , apb_vif ) ;
46 uvm_config_db#(v i r t u a l ax i_ i f ) : : s e t ( uvm_root : : get ( ) , "∗" , "
ax i_v i f " , ax i_v i f ) ;
run_test ( ) ;
48 end
XXXVI
50 i n i t i a l begin
$vcdpluson ;
52 $vcdplusmemon ;
apb_vif . r e s e tn <= 1 ' b0 ;
54 apb_vif . c l k <= 1 ' b1 ;
#51 apb_vif . r e s e tn <= 1 ' b1 ;
56 end
58 always begin
#5 apb_vif . c l k <= ~apb_vif . c l k ;
60 end
62 endmodule : f ractal_tb_top
../source/tbench/fractal_tb_top.sv
// F ra c t a l_ t e s t l i b con s t ru c t s the f r a c t a l v e r i f i c a t i o n env and
t e s t s
2
4 // VIRTUAL SEQUENCE
import "DPI" func t i on i n t unsigned c a l c u l a t e_ i t e r a t i o n s ( input
s h o r t r e a l re , input s h o r t r e a l im) ;
6
c l a s s f r a c ta l_v i r tua l_sequence extends uvm_sequence ;
8 `uvm_object_uti ls ( f r a c ta l_v i r tua l_sequence )
10 func t i on new( s t r i n g name=" f rac ta l_v i r tua l_sequence " ) ;
super . new(name) ;
12 $d i sp l ay ( " Frac ta l sequence " ) ;
endfunct ion : new
14
16 apb_sequencer apb_seqr ;
axi_sequencer axi_seqr ;
18
apb_sequence apb_seq ;
20 axi_sequence axi_seq ;
22 v i r t u a l task body ( ) ;
apb_seq = apb_sequence : : type_id : : c r e a t e ( "apb_seq" ) ;
24 axi_seq = axi_sequence : : type_id : : c r e a t e ( " axi_seq" ) ;
apb_seq . s t a r t ( apb_seqr , t h i s ) ;
26 axi_seq . s t a r t ( axi_seqr , t h i s ) ;
endtask : body
28 endc l a s s : f r a c ta l_v i r tua l_sequence
30
XXXVII
c l a s s f r a c ta l_sco r eboa rd extends uvm_scoreboard ;
32 `uvm_component_utils ( f r a c ta l_sco r eboard )
i n t sbd_error = 0 ;
34 `uvm_analysis_imp_decl (_axi )
`uvm_analysis_imp_decl (_apb)
36 uvm_analysis_imp_apb #(apb_item , f r ac ta l_sco r eboard ) apb_export ;
uvm_analysis_imp_axi #(axi_item , f r a c ta l_sco r eboa rd ) axi_export ;
38 s h o r t r e a l x0 = nu l l ;
s h o r t r e a l y0 = nu l l ;
40 s h o r t r e a l s t e p s i z e = nu l l ;
42 protec t ed b i t d i sab le_scoreboard = 0 ;
44 func t i on new( s t r i n g name , uvm_component parent ) ;
super . new(name , parent ) ;
46 endfunct ion : new
48 func t i on void build_phase (uvm_phase phase ) ;
apb_export = new( "apb_export" , t h i s ) ;
50 axi_export = new( " axi_export " , t h i s ) ;
endfunct ion : build_phase
52
v i r t u a l f unc t i on void write_apb ( apb_item apb ) ;
54 i f ( ! d i sab le_scoreboard )
begin
56 $d i sp l ay ( "apb_write" ) ;
i f ( apb . pwaddr == 32 'h000_C000 ) begin
58 t h i s . x0=$b i t s t o s h o r t r e a l ( apb . pwdata ) ;
end
60 e l s e i f ( apb . pwaddr == 32 'h000_C004 ) begin
t h i s . y0=$b i t s t o s h o r t r e a l ( apb . pwdata ) ;
62 end
e l s e i f ( apb . pwaddr == 32 'h000_C008 ) begin
64 t h i s . s t e p s i z e=$b i t s t o s h o r t r e a l ( apb . pwdata ) ;
end
66 end
endfunct ion : write_apb
68
v i r t u a l f unc t i on void write_axi ( axi_item ax i ) ;
70 i n t row = 0 ;
i n t column = 0 ;
72 i n t unsigned l o l =0;
i n t unsigned i t e r a t i o n = 0 ;
74 i n t unsigned temp [ 8 ] ;
i n t s ta r t , stop ;
76 s h o r t r e a l x , y ;
i f ( ! d i sab le_scoreboard ) ;
78 $d i sp l ay ( " axi_write with araddr = %h" , ax i . araddr [ 3 1 : 0 ] ) ;
// rdata1 i s the 4 f i r s t coo rd ina t e s
XXXVIII
80 row = ( ax i . araddr /4) /128 ;
column = ( ax i . araddr /4)%128;
82 x=th i s . x0+( t h i s . s t e p s i z e ∗column ) ;
y=th i s . y0−( t h i s . s t e p s i z e ∗row ) ;
84 $d i sp l ay ( "x , y=%f ,% f " ,x , y ) ;
// Assumes r eque s t s doesnt c r o s s rows .
86 f o r ( i n t i =0; i <8; i++) begin
i t e r a t i o n = ca l c u l a t e_ i t e r a t i o n s ( x+( i ∗ t h i s . s t e p s i z e ) , y ) ;
88 // Assert each o f the coo rd ina t e s in the t r a n s f e r
temp [0 ]= fp32 t o i n t ( ax i . rdata [31+(0∗32) : 0 ∗ 3 2 ] ) ;
90 temp [1 ]= fp32 t o i n t ( ax i . rdata [31+(1∗32) : 1 ∗ 3 2 ] ) ;
temp [2 ]= fp32 t o i n t ( ax i . rdata [31+(2∗32) : 2 ∗ 3 2 ] ) ;
92 temp [3 ]= fp32 t o i n t ( ax i . rdata [31+(3∗32) : 3 ∗ 3 2 ] ) ;
temp [4 ]= fp32 t o i n t ( ax i . rdata [31+(4∗32) : 4 ∗ 3 2 ] ) ;
94 temp [5 ]= fp32 t o i n t ( ax i . rdata [31+(5∗32) : 5 ∗ 3 2 ] ) ;
temp [6 ]= fp32 t o i n t ( ax i . rdata [31+(6∗32) : 6 ∗ 3 2 ] ) ;
96 temp [7 ]= fp32 t o i n t ( ax i . rdata [31+(7∗32) : 7 ∗ 3 2 ] ) ;
// There are some rounding i s su e s , so I g ive a b i t o f
98 // leeway .
a s s e r t ( temp [ i ]−1 <= i t e r a t i o n <= temp [ i ]+1) begin
100 $d i sp l ay ( "ASSERT PASSED" ) ;
$d i sp l ay ( " i t e r a t i o n=%d , temp=%d" , i t e r a t i o n , temp [ i ] ) ;
102 $d i sp l ay ( " rdata = %h" , ax i . rdata ) ;
end
104 e l s e begin
$ e r r o r ( " i t e r a t i o n=%d , temp=%d" , i t e r a t i o n , temp [ i ] ) ;
106 $d i sp l ay ( " rdata = %h" , ax i . rdata ) ;
end
108 end
endfunct ion : write_axi
110
// Used to convert fp32 i t e r a t i o n s to i n t e g e r s .
112 //Fp32 i t e r a t i o n s are always i n t e g e r s and need no rounding .
func t i on i n t f p32 t o i n t ( i n t l o l ) ;
114 i n t temp=0;
i f ( ( l o l [ 3 1 ] != 1 ' b0 ) | | ( l o l [ 2 1 : 0 ] != 0) ) begin
116 $d i sp l ay ( "Decimal va lue in i t e r a t i o n l o l=%h" , l o l ) ;
end
118
// Raw exponent
120 temp [ 7 : 0 ]= l o l [ 3 0 : 2 3 ] ;
temp −= 127 ;
122 temp = 2∗∗temp ;
i f ( l o l [22]==1)
124 temp+=1;
re turn temp ;
126 endfunct ion : f p 32 t o i n t
128 endc l a s s : f r a c ta l_sco r eboard
XXXIX
130
// APB TESTBENCH
132 c l a s s f r a c ta l_tb extends uvm_env ;
`uvm_component_utils ( f r a c ta l_tb )
134 v i r t u a l apb_if apb_vif ;
v i r t u a l ax i_ i f ax i_v i f ;
136
apb_agent apb0 ;
138 axi_agent ax i0 ;
140 f ra c ta l_sco r eboa rd f rac ta l_sco reboard0 ;
// Constructor
142 func t i on new( s t r i n g name , uvm_component parent ) ;
super . new(name , parent ) ;
144 endfunct ion : new
146 v i r t u a l f unc t i on void build_phase (uvm_phase phase ) ;
super . build_phase ( phase ) ;
148 uvm_config_db#( i n t ) : : s e t ( th i s , " ∗ . apb0 . sequencer " , " count" , 0) ;
uvm_config_db#( i n t ) : : s e t ( th i s , " ax i0 . sequencer " , " count" , 0) ;
150 apb0 = apb_agent : : type_id : : c r e a t e ( "apb0" , t h i s ) ;
i f ( ! uvm_config_db#(v i r t u a l apb_if ) : : get ( th i s , "" , "apb_vif " ,
apb_vif ) )
152 begin
`uvm_fatal ( "NOVIF" ,{ " v i r t u a l i n t e r f a c e must be s e t f o r : " ,
154 get_full_name ( ) , " . apb_vif " }) ;
end
156
ax i0 = axi_agent : : type_id : : c r e a t e ( " ax i0 " , t h i s ) ;
158 i f ( ! uvm_config_db#(v i r t u a l ax i_ i f ) : : get ( th i s , "" , " ax i_v i f " ,
ax i_v i f ) )
begin
160 `uvm_fatal ( "NOVIF" ,{ " v i r t u a l i n t e r f a c e must be s e t f o r : " ,
get_full_name ( ) , " . ax i_v i f " }) ;
162 end
164
166 f rac ta l_sco reboard0 = f rac ta l_sco r eboa rd : : type_id : : c r e a t e ( "
f rac ta l_sco reboard0 " , t h i s ) ;
endfunct ion : build_phase
168
v i r t u a l f unc t i on void connect_phase (uvm_phase phase ) ;
170 // Connect monitors to scoreboard .
apb0 . monitor . i tem_col lected_port . connect ( f r ac ta l_sco reboard0 .
apb_export ) ;
172 ax i0 . monitor . i tem_col lected_port . connect ( f r ac ta l_sco reboard0 .
axi_export ) ;
XL
endfunct ion : connect_phase
174 endc l a s s : f r a c ta l_tb
176 c l a s s f rac ta l_base_tes t extends uvm_test ;
`uvm_component_utils ( f rac ta l_base_tes t )
178 f ra c ta l_tb f rac ta l_tb0 ;
uvm_table_printer p r i n t e r ;
180 b i t test_pass = 1 ;
182 // Constructor
func t i on new( s t r i n g name =" f rac ta l_base_tes t " , uvm_component
parent = nu l l ) ;
184 super . new(name , parent ) ;
endfunct ion : new
186
// Build phase
188 v i r t u a l f unc t i on void build_phase (uvm_phase phase ) ;
// Conf igure d e f au l t sequence in the sequencer
190 super . build_phase ( phase ) ;
192 f rac ta l_tb0 = f rac ta l_tb : : type_id : : c r e a t e ( " f rac ta l_tb0 " , t h i s ) ;
194 // Enable t r an sa c t i on r e co rd ing f o r everyth ing
196 uvm_config_db#( i n t ) : : s e t ( th i s , "∗" , " r e co rd ing_deta i l " ,
UVM_FULL) ;
198 // Create t e s t bench and p r i n t e r
p r i n t e r = new ( ) ;
200 p r i n t e r . knobs . depth = 3 ;
endfunct ion : build_phase
202
task run_phase (uvm_phase phase ) ;
204 // Build v i r t u a l sequence
f rac ta l_v i r tua l_sequence v i r tua l_sequence ;
206 v i r tua l_sequence = f rac ta l_v i r tua l_sequence : : type_id : : c r e a t e ( "
v i r tua l_sequence " ) ;
phase . r a i s e_ob j e c t i on ( t h i s ) ;
208
210 // Connect sequence to sequencer s
// Connect v i r t u a l sequencer to sequencer s
212 v i r tua l_sequence . apb_seqr = f rac ta l_tb0 . apb0 . sequencer ;
v i r tua l_sequence . axi_seqr = f rac ta l_tb0 . ax i0 . sequencer ;
214 v i r tua l_sequence . grab ( f rac ta l_tb0 . apb0 . sequencer ) ;
216 // Star t sequence
v i r tua l_sequence . s t a r t ( nu l l ) ;
218
XLI
phase . drop_object ion ( t h i s ) ;
220 endtask : run_phase
222
func t i on void extract_phase (uvm_phase phase ) ;
224 i f ( f r ac ta l_tb0 . f r a c ta l_sco reboard0 . sbd_error )
test_pass = 1 ' b0 ;
226 endfunct ion : extract_phase
228 func t i on void report_phase (uvm_phase phase ) ;
i f ( test_pass ) begin
230 `uvm_info ( get_type_name ( ) , "∗∗ UVM TEST PASSED ∗∗" ,
UVM_NONE)
end
232 e l s e begin
`uvm_error ( get_type_name ( ) , "∗∗ UVM TEST FAIL ∗∗" )
234 end
endfunct ion : report_phase
236
endc l a s s : f r ac ta l_base_tes t
../source/tbench/fractal_testlib.sv
XLII
C Source Code for the Fractal Demo
// Generation o f f r a c t a l f i g u r e s here
2
#inc lude " . . / i n c lude / f r a c t a l s . h"
4
6 //#de f i n e SKYPOINTS 64
//#de f i n e DEBUG
8
10 // Arrays f o r 2D f r a c t a l ( sky )
/∗GLfloat skyVer t i c e s [ (SKYPOINTS∗SKYPOINTS∗4) ] ;
12 GLfloat s k y I t e r a t i o n s [ (SKYPOINTS∗SKYPOINTS) ] ;
GLushort sky Ind i c e s [ ( (SKYPOINTS−1)∗(2∗SKYPOINTS) )+ (2∗ (SKYPOINTS
−2) ) ] ;
14 ∗/
16 #i f n d e f FRACTAL_GENERATOR
void zoomAtPoint ( s t r u c t f r a c ta l_ landscape ∗ l andscape )
18 {
landscape−>f r a c t a l_con f i g u r a t i o n [ 0 ] = landscape−>
f r a c t a l_con f i g u r a t i o n [ 0 ] − ( (NUMPOINTS/2) ∗ landscape−>
f r a c t a l_con f i g u r a t i o n [ 2 ] ) ;
20 landscape−>f r a c t a l_con f i g u r a t i o n [ 1 ] = landscape−>
f r a c t a l_con f i g u r a t i o n [ 1 ] + ( (NUMPOINTS/2) ∗ landscape−>
f r a c t a l_con f i g u r a t i o n [ 2 ] ) ;
upda t e I t e r a t i on s ( landscape ) ;
22 }
#end i f /∗ ! FRACTAL_GENERATOR ∗/
24
// getFractPoint i s copied from Corne l i u s s en s master t h e s i s .
26 i n t getFractPoint ( f l o a t re , f l o a t im)
// Returns the number o f i t e r a t i o n s be f o r e | z | exceeds 2 .
28 // I f the number i s 80 the po int i s not in the Mandelbrot s e t .
{
30 i n t n ;
f l o a t z_re = 0 .0 f , z_im = 0.0 f ;
32 f o r (n=1;n<80;n++)
{
34 f l o a t z_re_old = z_re ;
z_re = z_re∗z_re − z_im∗z_im + re ;
36 z_im = 2.0 f ∗ z_re_old ∗ z_im + im ;
38 // S imp l i f i e d boundary check
i f ( ( z_re >= 2.0 f ) | | ( z_re <= −2.0 f ) | | ( z_im >= 2.0 f ) | | (
z_im <= −2.0 f ) )
40 break ;
XLIII
// Pre c i s e booundary check .
42
// i f ( _hypotf ( z_re , z_im) >= 2.0 f )
44 // break ;
}
46 return n ;
}
48
50 #i f n d e f FRACTAL_GENERATOR
void ge tFrac ta lHe igh t s ( f l o a t x , f l o a t y , f l o a t s t ep s i z e , i n t
numpoints , GLfloat ∗ i t e r a t i o n s )
52 // Ca l cu l a t e s the he ight / i t e r a t i o n s o f a l l the v e r t i c e s in the
f r a c t a l s p e c i f i e d
// by x , y , s t e p s i z e and numpoints . The f r a c t a l i s assumed to be
square = numpoints ^2.
54 // Assumes an array with space f o r numpoints^2 i n t e g e r s .
{
56 i n t i , j ;
f o r ( i =0; i<numpoints ; i++) //y−ax i s = imaginary
58 {
f o r ( j =0; j<numpoints ; j++) //x−ax i s
60 {
i t e r a t i o n s [ j+( i ∗numpoints ) ] = getFractPo int ( x+( j ∗ s t e p s i z e ) , y−(
i ∗ s t e p s i z e ) ) ;
62 }
}
64 }
#end i f /∗ ! FRACTAL_GENERATOR ∗/
66
void p r in tFrac t a lHe i gh t s ( GLfloat ∗ i t e r a t i o n s , i n t numpoints )
68 {
i n t i , j ;
70 f o r ( i =0; i<numpoints ; i++) //y−ax i s = imaginary
{
72 f o r ( j =0; j<numpoints ; j++) //x−ax i s
{
74 p r i n t f ( "%2.0 f " , i t e r a t i o n s [ j+( i ∗numpoints ) ] ) ;
}
76 p r i n t f ( "\n" ) ;
}
78 }
void p r i n tF r a c t a lVe r t i c e s ( f l o a t ∗ v e r t i c e s , i n t numpoints )
80 {
i n t i ;
82 f o r ( i =0; i <(numpoints∗numpoints ) ∗4 ; i++) //y−ax i s = imaginary
{
84 i f ( ( i % ( numpoints∗numpoints ) == 0) && ( i != 0) )
p r i n t f ( "\n" ) ;
XLIV
86 e l s e i f ( ( i % numpoints == 0) && ( i != 0) )
p r i n t f ( " " ) ;
88 p r i n t f ( "%1.1 f , " , v e r t i c e s [ i ] ) ;
}
90 p r i n t f ( "\n" ) ;
p r i n t f ( "\n" ) ;
92
}
94 void g e tF r a c t a lVe r t i c e s ( GLfloat x , GLfloat y , GLfloat s t ep s i z e ,
GLfloat ∗ v e r t i c e s )
/∗ Creates a f r a c t a l square with numpoints∗numpoints po in t s .
96 ∗ Reaching from x −> x+s t e p s i z e ∗numpoints , same with y .
∗ Note that both x and y can be negat ive .
98 ∗ Al l y−coo rd ina t e s are == 0.0 f
∗/
100 {
i n t i = 0 ; // Counts the index in v e r t i c e s .
102 i n t k ; // Loop counter s .
i n t j ;
104 // S ta r t s with the top l e f t coo rd inate .
f o r ( j =0; j<NUMPOINTS; j++)
106 {
f o r ( k=0;k<NUMPOINTS; k++)
108 {
v e r t i c e s [ i ] = x + (k∗ s t e p s i z e ) ; // x
110 v e r t i c e s [ i +1]= 0 .0 f ; // y
v e r t i c e s [ i +2]= y − ( j ∗ s t e p s i z e ) ; // z
112 v e r t i c e s [ i +3]= 1 .0 f ; // w
i = i +4;
114 }
}
116 }
118
void g e tF r a c t a l I nd i c e s (GLushort∗ i nd i c e s , i n t numpoints )
120 // Takes in a po in t e r to a square o f v e r t i c e s and gene ra t e s an
array o f i n d i c e s to draw
// the square .
122 {
i n t i ;
124 i n t j = 0 ;
i n t x , y ;
126 // For each row o f the v e r t i c e s
i = 0 ;
128 y = 0 ;
whi l e ( y < ( numpoints∗numpoints ) )
130 {
x = i ∗numpoints ;
132 y = x+numpoints ;
XLV
whi le ( x < ( ( i ∗numpoints )+(numpoints−1) ) )
134 {
// Draws two t r i a n g l e s to c r e a t e a l i t t l e square f o r each
column o f x .
136 i n d i c e s [ j ] = x ;
i n d i c e s [ j +1] = x+1;
138 i n d i c e s [ j +2] = y ;
i n d i c e s [ j +3] = x+1;
140 i n d i c e s [ j +4] = y ;
i n d i c e s [ j +5] = y+1;
142 y++;
x++;
144 j=j +6;
}
146 i++;
}
148 }
150 void g e tF r a c t a l I nd i c e sT r i a n g l e S t r i p ( s t r u c t f r a c ta l_ landscape ∗
l andscape )
{
152 // Uses degenerate t r i a n g l e s to c r e a t e one big t r i a n g l e s t r i p .
i n t i = 0 ;
154 i n t j = 0 ;
GLushort x , y ;
156 // For each row o f the v e r t i c e s
y=NUMPOINTS, x = 0 ;
158 whi l e ( y < (NUMPOINTS∗NUMPOINTS) )
{
160 whi l e ( (x<(( i ∗NUMPOINTS)+(NUMPOINTS) ) ) )
{
162 landscape−>ind i c e s [ j ] = x ;
landscape−>ind i c e s [ j+1]=y ;
164 j=j +2;
x++;
166 y++;
}
168 // Add degenerate t r i a n g l e s
landscape−>ind i c e s [ j ] = y−1;
170 landscape−>ind i c e s [ j +1] = x ;
j=j +2;
172 i++;
}
174 }
176 #i f n d e f FRACTAL_GENERATOR
void upda t e I t e r a t i on s ( s t r u c t f r a c ta l_ landscape ∗ l andscape )
178 { // Used f o r zooming in on the landscape by updating the
i t e r a t i o n s .
XLVI
ge tFrac ta lHe igh t s ( landscape−>f r a c t a l_con f i g u r a t i o n [ 0 ] , landscape
−>f r a c t a l_con f i g u r a t i o n [ 1 ] ,
180 landscape−>f r a c t a l_con f i g u r a t i o n [ 2 ] ,NUMPOINTS,
landscape−>i t e r a t i o n s ) ;
g lB indBuf f e r (GL_ARRAY_BUFFER, landscape−>vboIds [ 1 ] ) ;
182 GL_CHECK( glBuf fe rData (GL_ARRAY_BUFFER, s i z e o f ( landscape−>
i t e r a t i o n s ) ,
landscape−>i t e r a t i o n s , GL_STATIC_DRAW) ) ;
184 }
#end i f /∗ ! FRACTAL_GENERATOR ∗/
186
void createFracta lLandscape ( s t r u c t f r a c ta l_ landscape ∗ l andscape )
188 {
#i f d e f DEBUG
190 i n t debug ;
#end i f // DEBUG
192 // Get a t t r i b u t e l o c a t i o n s o f non−f i x e d a t t r i b u t e s l i k e co l our
and tex ture coo rd ina t e s from the shader
194 pos i t i onLoc = GL_CHECK( g lGetAttr ibLocat ion ( landscape−>shader , "
av4pos i t i on " ) ) ;
a fhe ightLoc = GL_CHECK( g lGetAttr ibLocat ion ( landscape−>shader , "
a f h e i gh t " ) ) ;
196 // Set number o f dimensions to three .
// This i s done to use the same shader when drawing the 3D
landscape and the sky .
198 // TODO: Remove t h i s and add extra shader ?
dimensionLoc = GL_CHECK( glGetUniformLocation ( landscape−>shader ,
" dimension " ) ) ;
200 g lUni form1f ( dimensionLoc , 3 ) ; // Set number o f dimensions to
three .
#i f n d e f FRACTAL_GENERATOR
202 ge tFrac ta lHe ight s ( landscape−>f r a c t a l_con f i g u r a t i o n [ 0 ] , landscape
−>f r a c t a l_con f i g u r a t i o n [ 1 ] ,
landscape−>f r a c t a l_con f i g u r a t i o n [ 2 ] ,NUMPOINTS,
landscape−>i t e r a t i o n s ) ;
204 #end i f /∗ ! FRACTAL_GENERATOR ∗/
// VBO i n i t i a l i z a t i o n f o r f r a c t a l landscape here
206 landscape−>vboIds = c a l l o c (3 , s i z e o f (GLuint ) ) ;
GL_CHECK( glGenBuf fers (3 , landscape−>vboIds ) ) ; // One coord inate
v e r t i c e s , one ind i c e s , one he i gh t s .
208 // Vertex 2−D coord ina t e s
GL_CHECK( g lBindBuf f e r (GL_ARRAY_BUFFER, landscape−>vboIds [ 0 ] ) ) ;
// Vertex data in vboIds [ 0 ]
210 GL_CHECK( glBuf fe rData (GL_ARRAY_BUFFER, s i z e o f ( landscape−>
v e r t i c e s ) ,
landscape−>ve r t i c e s , GL_STATIC_DRAW) ) ;
212
XLVII
// Vertex Height coord inate s , Y coord inate . Merged with 2D in
shader .
214 GL_CHECK( g lBindBuf f e r (GL_ARRAY_BUFFER, landscape−>vboIds [ 1 ] ) ) ;
//
GL_CHECK( glBuf fe rData (GL_ARRAY_BUFFER, s i z e o f ( landscape−>
i t e r a t i o n s ) ,
216 landscape−>i t e r a t i o n s , GL_STATIC_DRAW) ) ;
218 GL_CHECK( g lBindBuf f e r (GL_ELEMENT_ARRAY_BUFFER, landscape−>vboIds
[ 2 ] ) ) ;
GL_CHECK( glBuf fe rData (GL_ELEMENT_ARRAY_BUFFER, s i z e o f ( landscape
−>ind i c e s ) ,
220 landscape−>ind i c e s , GL_STATIC_DRAW) ) ;
222 #i f d e f DEBUG
// f o r ( debug=0;debug <(((NUMPOINTS−1)∗(2∗NUMPOINTS) )+ (2∗ (
NUMPOINTS−2) ) ) ; debug++)
224 // f p r i n t f ( s tde r r , " l ands cape Ind i c e s [% i ] = %i \n" , debug ,
landscape−>ind i c e s [ debug ] ) ;
f o r ( debug=0;debug<(4∗NUMPOINTS∗NUMPOINTS) ; debug=debug+1)
226 f p r i n t f ( s tde r r , " l ands capeVer t i c e s [% i ] = %f \n" , debug , landscape
−>v e r t i c e s [ debug ] ) ;
228 // f o r ( debug=0;debug<NUMPOINTS∗NUMPOINTS; debug++)
// f p r i n t f ( s tde r r , " i t e r a t i o n s [% i ] = %i \n" , debug , landscape−>
i t e r a t i o n s [ debug ] ) ;
230
p r i n t f ( "IndexVBOid = %i \n" , landscape−>vboIds [ 2 ] ) ;
232 f p r i n t f ( s tde r r , " s i z e o f l a nd s c ap e I t e r a t i o n s = %i \n" , s i z e o f (
landscape−>i t e r a t i o n s ) ) ;
f p r i n t f ( s tde r r , " s i z e o f l ands cape Ind i c e s = %i \n" , s i z e o f (
landscape−>ind i c e s ) ) ;
234 f p r i n t f ( s tde r r , " s i z e o f l ands capeVer t i c e s = %i \n" , s i z e o f (
landscape−>v e r t i c e s ) ) ;
236 f p r i n t f ( s tde r r , " s i z e o f GLfloat = %i \n" , s i z e o f ( GLfloat ) ) ;
#end i f /∗ DEBUG ∗/
238 }
../source/demo/fractals.c
1
#inc lude " . . / i n c lude /main . h"
3 #inc lude " . . / i n c lude / shader . h"
#inc lude " . . / i n c lude /matrix . h"
5 #inc lude " . . / i n c lude / f r a c t a l s . h"
#inc lude " . . / i n c lude / plat form . h"
7
#inc lude "GLES2/ g l2 . h"
XLVIII
9 #inc lude "EGL/ eg l . h"
11 #i f d e f PROFILING
#inc lude " sys / time . h"
13 #end i f /∗ PROFILING ∗/
15 /∗ Global v a r i a b l e s ∗/
#i f d e f _WIN32
17 HWND hWindow ;
HDC hDisplay ;
19 #end i f /∗ _WIN32 ∗/
GLint iLocPos i t i on = 0 ;
21 GLint iLocColour , iLocMVP ;
GLuint uiProgram , uiFragShader , uiVertShader ;
23
s t r u c t f r a c ta l_ landscape ∗ l andscapes [ (TOTAL_POINTS / NUMPOINTS) ∗(
TOTAL_POINTS / NUMPOINTS) ] ; // Array o f po i n t e r s to
landscapes
25 EGLDisplay sEGLDisplay ;
EGLContext sEGLContext ;
27 EGLSurface sEGLSurface ;
const unsigned i n t uiWidth = 1 ;
29 const unsigned i n t u iHeight = 1 ;
GLushort num_of_landscapes ;
31 i n t RENDER_STATE = 5 ;
// Camera movement e tc
33 f l o a t I d en t i t y [ 1 6 ] = { 1 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 ,
0 , 1 } ;
f l o a t yRotateAngle = 0 .0 f ;
35 f l o a t xRotateAngle = 65 . f ;
// Set i n i t i a l camera po s i t i o n
37 f l o a t zTrans la t i on = −3.5 f ; // Move camera a l i t t l e back
i n i t i a l l y .
f l o a t yTrans la t ion = 0 .0 f ;
39 f l o a t xTrans la t ion = 0 .0 f ;
41 // Create landscapes combines s e v e r a l NUMPOINTS^2 landscapes
in to one l a r g e TOTALPOINTS^2 landscape
void create_landscape ( s t r u c t f r a c ta l_ landscape ∗ l andscapes [ ] ,
GLushort ∗ num_of_landscapes ,
43 GLfloat x , GLfloat y , GLfloat s t e p s i z e )
{
45 shor t i , j , temp ;
// S t ep s i z e between the s t a t i c ver tex coo rd ina t e s .
47 f l o a t ve r t ex_st ep s i z e = (2 . 0 f / (TOTAL_POINTS −1.0 f ) ) ;
∗num_of_landscapes = (TOTAL_POINTS / NUMPOINTS) ;
49 // Landscapes i s an array o f po i n t e r s to landscape s t r u c t s
// landscapes = c a l l o c (∗ num_of_landscapes ∗(∗ num_of_landscapes ) ,
s i z e o f ( s t r u c t f r a c ta l_ landscape ∗) ) ;
XLIX
51 f o r ( i =0; i <(∗num_of_landscapes ) ; i++)
{
53 f o r ( j =0; j <(∗num_of_landscapes ) ; j++)
{
55 temp = ( i ∗(∗ num_of_landscapes ) ) + j ;
l andscapes [ temp]= mal loc ( s i z e o f ( s t r u c t f r a c ta l_ landscape ) ) ;
57 landscapes [ temp]−>f r a c t a l_con f i g u r a t i o n [ 0 ] = x + ( j ∗NUMPOINTS∗
s t e p s i z e ) ;
p r i n t f ( " landscapes [ temp]−>f r a c t a l_con f i g u r a t i o n [0 ]= %f \n" ,
landscapes [ temp]−>f r a c t a l_con f i g u r a t i o n [ 0 ] ) ;
59 landscapes [ temp]−>shader = uiProgram ;
landscapes [ temp]−>f r a c t a l_con f i g u r a t i o n [2 ]= s t e p s i z e ;
61 landscapes [ temp]−>f r a c t a l_con f i g u r a t i o n [ 1 ] = y − ( i ∗NUMPOINTS∗
s t e p s i z e ) ;
g e tF r a c t a l I nd i c e sT r i a n g l e S t r i p ( landscapes [ temp ] ) ;
63 g e tF r a c t a lVe r t i c e s ( (−1.0 f+( j ∗(NUMPOINTS−1)∗ ve r t ex_st ep s i z e ) ) ,
( 1 . 0 f−( i ∗(NUMPOINTS−1)∗ ve r t ex_st ep s i z e ) ) ,
65 ver tex_steps i ze , l andscapes [ temp]−>v e r t i c e s
) ;
67 p r i n t f ( " loop l o l \n" ) ;
}
69 }
}
71
73 // Frac ta l po int f un c t i on s
/∗
75 ∗ i n i t i a l_ r o t a t i o n r o t a t e s around the y − ax i s in p o s i t i v e or
negat ive d i r e c t i o n un t i l ang le
∗/
77
void i n i t i a l_ r o t a t i o n ( s t r u c t f r a c t a l_po in t ∗ fp , f l o a t ∗
rotat ion_matr ix )
79 {
yRotateAngle += 0.1 f ∗ fp−>r_speed ;
81 rotate_matrix ( yRotateAngle , 0 . 0 f , 1 . 0 f , 0 . 0 f , rotat ion_matr ix ) ;
i f ( yRotateAngle >= fp−>in i t i a l_an g l e ) // Rotation i s f i n i s h e d
83 RENDER_STATE = 1 ;
}
85 void navigate_to_point ( s t r u c t f r a c t a l_po in t ∗ fp , f l o a t ∗ matrix )
{
87 matrix [ 1 4 ] += 0 .1 f ∗ fp−>n_speed ;
i f ( matrix [ 1 4 ] <= fp−>di s t anc e )
89 RENDER_STATE = 2 ;
}
91 /∗ zoom_point
∗ This i s cu r r en t l y hardcoded to use landscapes [ 0 ] , to enable
zoom with a
93 ∗ h igher r e s o l u t i o n than 128x128 , t h i s needs to be made gene ra l .
L
∗/
95 void zoom_point ( s t r u c t f r a c t a l_po in t ∗ fp )
{
97 landscapes [0]−> f r a c t a l_con f i g u r a t i o n [ 0 ] = fp−>x − (NUMPOINTS/2) ∗
l andscapes [0]−> f r a c t a l_con f i g u r a t i o n [ 2 ] ;
l andscapes [0]−> f r a c t a l_con f i g u r a t i o n [ 1 ] = fp−>y + (NUMPOINTS/2) ∗
l andscapes [0]−> f r a c t a l_con f i g u r a t i o n [ 2 ] ;
99 landscapes [0]−> f r a c t a l_con f i g u r a t i o n [ 2 ] = fp−>z_speed ∗0 .95 f ∗
l andscapes [0]−> f r a c t a l_con f i g u r a t i o n [ 2 ] ;
#i f n d e f FRACTAL_GENERATOR
101 upda t e I t e r a t i on s ( landscapes [ 0 ] ) ;
#end i f /∗ ! FRACTAL_GENERATOR ∗/
103 i f ( l andscapes [0]−> f r a c t a l_con f i g u r a t i o n [ 2 ] <= fp−>
f i n a l_ s t e p s i z e )
RENDER_STATE = 3 ;
105 }
107 void pr intMatr ix ( f l o a t ∗matrix )
{
109 i n t i ;
f o r ( i =0; i <4; i++)
111 {
p r i n t f ( "%1.1 f %1.1 f %1.1 f %1.1 f \n" , matrix [ i ] , matrix [ i +4] , matrix
[ i +8] , matrix [ i +12]) ;
113 }
}
115
#i f d e f _WIN32
117 void i n i t i a l i z eEGL ( ) {
/∗ EGL Conf igurat ion ∗/
119 EGLint aEGLAttributes [ ] = {
EGL_RED_SIZE, 8 ,
121 EGL_GREEN_SIZE, 8 ,
EGL_BLUE_SIZE, 8 ,
123 EGL_DEPTH_SIZE, 16 ,
EGL_RENDERABLE_TYPE, EGL_OPENGL_ES2_BIT,
125 EGL_NONE
} ;
127
EGLint aEGLContextAttributes [ ] = {
129 EGL_CONTEXT_CLIENT_VERSION, 2 ,
EGL_NONE
131 } ;
133 EGLConfig aEGLConfigs [ 1 ] ;
EGLint cEGLConfigs ;
135
/∗ EGL I n i t ∗/
137 hDisplay = EGL_DEFAULT_DISPLAY;
LI
sEGLDisplay = EGL_CHECK( eg lGetDisp lay ( hDisplay ) ) ;
139 EGL_CHECK( e g l I n i t i a l i z e ( sEGLDisplay , NULL, NULL) ) ;
EGL_CHECK( eglChooseConf ig ( sEGLDisplay , aEGLAttributes ,
aEGLConfigs , 1 , &cEGLConfigs ) ) ;
141 i f ( cEGLConfigs == 0) {
p r i n t f ( "No EGL con f i g u r a t i o n s were returned . \ n" ) ;
143 e x i t (−1) ;
}
145 hWindow = create_window ( uiWidth , u iHeight ) ;
sEGLSurface = EGL_CHECK( eglCreateWindowSurface ( sEGLDisplay ,
aEGLConfigs [ 0 ] , (EGLNativeWindowType)hWindow , NULL) ) ;
147 i f ( sEGLSurface == EGL_NO_SURFACE) {
p r i n t f ( " Fa i l ed to c r e a t e EGL su r f a c e . \ n" ) ;
149 e x i t (−1) ;
}
151 sEGLContext = EGL_CHECK( eglCreateContext ( sEGLDisplay ,
aEGLConfigs [ 0 ] , EGL_NO_CONTEXT, aEGLContextAttributes ) ) ;
i f ( sEGLContext == EGL_NO_CONTEXT) {
153 p r i n t f ( " Fa i l ed to c r e a t e EGL context . \ n" ) ;
e x i t (−1) ;
155 }
EGL_CHECK( eglMakeCurrent ( sEGLDisplay , sEGLSurface ,
sEGLSurface , sEGLContext ) ) ;
157 }
#end i f /∗ _WIN32 ∗/
159
void i n i t i a l i z e S h a d e r s ( )
161 {
163 /∗ Shader I n i t i a l i s a t i o n ∗/
process_shader(&uiVertShader , " shader . ve r t " , GL_VERTEX_SHADER
) ;
165 p r i n t f ( " vertx shader completed . \ n" ) ;
process_shader(&uiFragShader , " shader . f r a g " , GL_FRAGMENT_SHADER)
;
167
p r i n t f ( " fragment shader completed . \ n" ) ;
169 /∗ Create uiProgram ( ready to attach shaders ) ∗/
uiProgram = GL_CHECK( glCreateProgram ( ) ) ;
171
/∗ Attach shaders and l i n k uiProgram ∗/
173 GL_CHECK( glAttachShader ( uiProgram , uiVertShader ) ) ;
GL_CHECK( glAttachShader ( uiProgram , uiFragShader ) ) ;
175 GL_CHECK( glLinkProgram ( uiProgram ) ) ;
177 /∗ Get uniform l o c a t i o n s ∗/
iLocMVP = GL_CHECK( glGetUniformLocation ( uiProgram , "mvp" ) ) ;
179
GL_CHECK( glUseProgram ( uiProgram ) ) ;
LII
181 }
#i f n d e f _WIN32
183 void render ( platform_info_t ∗ plat form_info )
{
185 #e l s e
void render ( )
187 {
MSG sMessage ;
189 #end i f /∗ ! _WIN32 ∗/
i n t bDone = 0 ;
191 i n t i ;
#i f d e f WORST_CASE
193 i n t num_frames = 30 ;
#e l s e
195 i n t num_frames = 500 ;
#end i f /∗ WORST_CASE ∗/
197
#i f d e f PROFILING
199 s t r u c t t imeval tim ;
double t1 = 0 ;
201 double t2 = 0 ;
double time_passed = 0 ;
203 #end i f /∗ PROFILING ∗/
s t r u c t f r a c t a l_po in t fp [ 4 ] ;
205 s t r u c t f r a c t a l_po in t i n i t i a l_ f p ;
s t r u c t f r a c t a l_po in t ∗ current_fp ;
207 // Matr ices
f l o a t r o t a t e [ 1 6 ] = {1 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 ,
1 } ;
209 f l o a t l andscapeSca l i ng [ 1 6 ] = { 1 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 1 ,
0 , 0 , 0 , 0 , 1 } ;
f l o a t landscapeMvp [ 1 6 ] ;
211 f l o a t aPer spec t ive [ 1 6 ] = { 1 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 1 , 0 ,
0 , 0 , 0 , 1 } ;
f l o a t landscapeModelView [ 1 6 ] = { 1 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 ,
1 , 0 , 0 , 0 , 0 , 1 } ;
213 /∗ I n t e r e s t i n g po in t s
−0.1011 f , 0 .9563 f
215 −0.743643 , 0 .131825
∗/
217 /∗ I n i t i a l fp ∗/
219 #i f n d e f WORST_CASE
i n i t i a l_ f p . x = −1.5;
221 i n i t i a l_ f p . y = 0 . 8 ;
i n i t i a l_ f p . i n i t i a l _ s t e p s i z e = 0 . 0 1 ;
223 i n i t i a l_ f p . f i n a l_ s t e p s i z e = 0 . 0 1 ;
i n i t i a l_ f p . z_speed = 1 ;
225 #e l s e
LIII
// Worst case
227 i n i t i a l_ f p . x = 0 . 0 ;
i n i t i a l_ f p . y = 0 . 0 ;
229 i n i t i a l_ f p . i n i t i a l _ s t e p s i z e = 0 . 00001 ;
i n i t i a l_ f p . f i n a l_ s t e p s i z e = 0 . 00001 ;
231 i n i t i a l_ f p . z_speed = 1 ;
#end i f /∗ !WORST_CASE ∗/
233
fp [ 0 ] = i n i t i a l_ f p ;
235
fp [ 1 ] . x = −0.743643 f ;
237 fp [ 1 ] . y = 0.131825 f ;
fp [ 1 ] . i n i t i a l_an g l e = 0 .5 f ;
239 fp [ 1 ] . d i s t ance = −0.5;
fp [ 1 ] . r_speed = 1 ;
241 fp [ 1 ] . n_speed = 1 ;
fp [ 1 ] . z_speed = 1 ;
243 fp [ 1 ] . f i n a l_ s t e p s i z e = 0.0000001 f ;
245 fp [ 2 ] . x = −0.1011 f ;
fp [ 2 ] . y = 0.9563 f ;
247 fp [ 2 ] . i n i t i a l_an g l e = 0 .5 f ;
fp [ 2 ] . d i s t ance = −0.5;
249 fp [ 2 ] . r_speed = 1 ;
fp [ 2 ] . n_speed = 1 ;
251 fp [ 2 ] . z_speed = 1 ;
fp [ 2 ] . f i n a l_ s t e p s i z e = 0.0000001 f ;
253
//Worst case po int 0 .0
255 fp [ 3 ] . x = 0 .0 f ;
fp [ 3 ] . y = 0 .0 f ;
257 fp [ 3 ] . i n i t i a l_an g l e = 0 .5 f ;
fp [ 3 ] . d i s t ance = −0.5;
259 fp [ 3 ] . r_speed = 1 ;
fp [ 3 ] . n_speed = 1 ;
261 fp [ 3 ] . z_speed = 1 ;
fp [ 3 ] . f i n a l_ s t e p s i z e = 0.0000001 f ;
263
265
glEnable (GL_DEPTH_TEST) ;
267 // g lCul lFace (GL_BACK) ;
// g lD i s ab l e (GL_CULL_FACE) ;
269 /∗ Enter event loop ∗/
#i f d e f PROFILING
271 gett imeofday(&tim ,NULL) ;
t2 = tim . tv_sec + ( tim . tv_usec /1000000 .0) ;
273 #end i f /∗ PROFILING ∗/
whi l e (bDone<num_frames )
LIV
275 {
#i f d e f _WIN32
277 i f ( PeekMessage(&sMessage , NULL, 0 , 0 , PM_REMOVE) ) {
i f ( sMessage . message == WM_QUIT) {
279 bDone = 1 ;
} e l s e {
281 TranslateMessage(&sMessage ) ;
DispatchMessage(&sMessage ) ;
283 }
}
285 #end i f /∗ _WIN32 ∗/
287 perspect ive_matr ix ( 1 . 0 5 , ( double ) uiWidth /( double ) uiHeight ,
0 . 01 , 100 .0 , aPer spec t ive ) ;
multiply_matrix ( aPerspect ive , landscapeModelView , landscapeMvp )
;
289
// Draw landscape VBO
291 glUni formMatrix4fv ( iLocMVP , 1 , GL_FALSE, landscapeMvp ) ;
g lC l ea r (GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT |
GL_STENCIL_BUFFER_BIT) ;
293
f o r ( i =0; i <(num_of_landscapes∗num_of_landscapes ) ; i++)
295 {
// Bind a t t r i b u t e s
297 // p r i n t f (" Landscapes . xy = %f ,% f \n" , landscapes [ i ]−>x ,
landscapes [ i ]−>y) ;
// p r i n t f (" Landscapes . vxy = %f ,% f \n" , landscapes [ i ]−>v e r t i c e s
[ 0 ] , l andscapes [ i ]−>v e r t i c e s [ 2 ] ) ;
299 g lB indBuf f e r (GL_ARRAY_BUFFER, landscapes [ i ]−>vboIds [ 0 ] ) ;
g lEnableVertexAttr ibArray ( pos i t i onLoc ) ;
301 g lVer t exAtt r ibPo in te r ( pos i t ionLoc , 4 , GL_FLOAT, GL_FALSE, 0 ,
0) ;
303 g lB indBuf f e r (GL_ARRAY_BUFFER, landscapes [ i ]−>vboIds [ 1 ] ) ;
g lEnableVertexAttr ibArray ( a fhe ightLoc ) ;
305 #i f d e f FRACTAL_GENERATOR
glVer t exAtt r ibPo in te r ( a fhe ightLoc , 1 , GL_FLOAT, GL_FALSE, 0 ,
307 landscapes [ i ]−>f r a c t a l_con f i g u r a t i o n ) ;
#e l s e
309 g lVer t exAtt r ibPo in te r ( a fhe ightLoc , 1 , GL_FLOAT, GL_FALSE, 0 ,
0) ;
#end i f /∗ FRACTAL_GENERATOR ∗/
311 g lB indBuf f e r (GL_ELEMENT_ARRAY_BUFFER, landscapes [ i ]−>vboIds [ 2 ] )
;
GL_CHECK( glDrawElements (GL_TRIANGLE_STRIP, s i z e o f ( l andscapes [ i
]−>ind i c e s ) /2 ,
313 GL_UNSIGNED_SHORT, NULL) ) ;
}
LV
315 // p r i n t f ("\n\n") ;
// Draw Sky
317 //GL_CHECK( glDrawElements (GL_TRIANGLE_STRIP, index In fo [ 1 ] . s i z e ,
GL_UNSIGNED_SHORT, NULL) ) ;
#i f d e f _WIN32
319 i f ( ! eg lSwapBuf fers ( sEGLDisplay , sEGLSurface ) ) {
p r i n t f ( " Fa i l ed to swap bu f f e r s . \ n" ) ;
321 }
Sleep (20) ;
323 #e l s e
plat form_swapbuf fers ( p lat form_info ) ;
325 bDone++;
#i f d e f PROFILING
327 gett imeofday(&tim , NULL) ;
t1 = tim . tv_sec + ( tim . tv_usec /1000000 .0) ;
329 time_passed += ( t1−t2 ) ;
t2 = t1 ;
331 p r i n t f ( "Time passed = %.6 l f seconds \n" , time_passed ) ;
#end i f /∗ PROFILING ∗/
333 #end i f /∗ _WIN32 ∗/
335 switch (RENDER_STATE)
{
337 case (0 ) :
p r i n t f ( "Demo : Performing i n i t i a l r o t a t i on . \ n" ) ;
339 i n i t i a l_ r o t a t i o n ( current_fp , r o t a t e ) ;
multiply_matrix ( rotate , landscapeModelView , landscapeModelView
) ;
341 break ;
case (1 ) :
343 p r i n t f ( "Demo : Navigat ing to po int . \ n" ) ;
navigate_to_point ( current_fp , landscapeModelView ) ;
345 multiply_matrix ( rotate , landscapeModelView , landscapeModelView
) ;
break ;
347 case (2 ) :
p r i n t f ( "Demo : Zooming towards po int \n" ) ;
349 zoom_point ( current_fp ) ;
break ;
351 d e f au l t :
// Move camera to s t a r t i n g po s i t i o n
353 p r i n t f ( "Demo : Moving camera to s t a r t i n g po s i t i o n . \ n" ) ;
l andscapes [0]−> f r a c t a l_con f i g u r a t i o n [ 0 ] = i n i t i a l_ f p . x ;
355 landscapes [0]−> f r a c t a l_con f i g u r a t i o n [ 1 ] = i n i t i a l_ f p . y ;
l andscapes [0]−> f r a c t a l_con f i g u r a t i o n [ 2 ] = i n i t i a l_ f p .
i n i t i a l _ s t e p s i z e ;
357 rotate_matrix ( xRotateAngle , 1 . 0 f , 0 . 0 f , 0 . 0 f , r o t a t e ) ;
multiply_matrix ( rotate , Ident i ty , landscapeModelView ) ;
359 rotate_matrix ( yRotateAngle , 0 . 0 f , 1 . 0 f , 0 . 0 f , r o t a t e ) ;
LVI
multiply_matrix ( rotate , landscapeModelView , landscapeModelView
) ;
361 landscapeModelView [ 1 4 ] = zTrans la t i on ;
landscapeModelView [ 1 3 ] = yTrans lat ion ;
363 // S e l e c t a random point from the l i s t o f i n t e r e s t i n g po in t s .
// There i s no c a l l to srand so always same sequence , but that
i s okay .
365 #i f n d e f WORST_CASE
// current_fp=&fp [ ( rand ( )%2)+1] ;
367 current_fp=&fp [ 1 ] ;
#e l s e
369 current_fp=&fp [ 3 ] ;
#end i f /∗ !WORST_CASE ∗/
371 RENDER_STATE = 0 ;
break ;
373 }
} /∗ whi le ∗/
375 p r i n t f ( "The average frame time was = %.6 l f \n" , time_passed/
num_frames ) ;
}
377
void cleanup ( ) {
379 /∗ Cleanup shaders ∗/
GL_CHECK( glUseProgram (0) ) ;
381 GL_CHECK( g lDe le teShader ( uiVertShader ) ) ;
GL_CHECK( g lDe le teShader ( uiFragShader ) ) ;
383 GL_CHECK( glDeleteProgram ( uiProgram ) ) ;
#i f d e f _WIN32
385 /∗ EGL c lean up ∗/
EGL_CHECK( eglMakeCurrent ( sEGLDisplay , EGL_NO_SURFACE,
EGL_NO_SURFACE, EGL_NO_CONTEXT) ) ;
387 EGL_CHECK( eg lDes t roySur face ( sEGLDisplay , sEGLSurface ) ) ;
EGL_CHECK( eglDestroyContext ( sEGLDisplay , sEGLContext ) ) ;
389 EGL_CHECK( eglTerminate ( sEGLDisplay ) ) ;
#end i f /∗ _WIN32 ∗/
391 }
393
i n t main ( i n t argc , char ∗∗ argv )
395 {
// Dec l a ra t i on s on top
397 s t r u c t f r a c ta l_ landscape f r a c t a l ;
i n t i ;
399 #i f n d e f _WIN32
platform_info_t plat form_info ;
401 plat form_info . num_bits_rgba [ 0 ]=5 ;
p lat form_info . num_bits_rgba [ 1 ]=6 ;
403 plat form_info . num_bits_rgba [ 2 ]=5 ;
p lat form_info . num_bits_rgba [ 3 ]=0 ;
LVII
405 plat form_info . num_samples = 4 ;
p lat form_info . width = 400 ;
407 plat form_info . he ight = 300 ;
p lat form_info . ap i = PLATFORM_EGL_API_GLES2;
409 i f ( FALSE == plat fo rm_in i t (&plat form_info ) )
{
411 p r i n t f ( " Fa i l ed to i n i t i a l i z e windowing system .\ n" ) ;
r e turn −1;
413 }
#end i f /∗ ! _WIN32 ∗/
415 // Pre render ing
p r i n t f ( " I n i t i a l i z i n g \n" ) ;
417 #i f d e f _WIN32
in i t i a l i z eEGL ( ) ;
419 p r i n t f ( " I n i t i a l i z e d EGL\n" ) ;
#end i f /∗ _WIN32 ∗/
421 i n i t i a l i z e S h a d e r s ( ) ;
p r i n t f ( " I n i t a l i z e d shaders \n" ) ;
423 // I n i t i a l i z e and c r ea t e f r a c t a l s . TODO: Should be func t i on s to
make l a r g e landscape .
f r a c t a l . shader=uiProgram ;
425 f r a c t a l . f r a c t a l_con f i g u r a t i o n [0 ]= −1.5 f ;
f r a c t a l . f r a c t a l_con f i g u r a t i o n [ 1 ] = 0 .8 f ;
427 f r a c t a l . f r a c t a l_con f i g u r a t i o n [ 2 ] = 0 .01 f ;
p r i n t f ( "Creat ing landscape \n" ) ;
429 create_landscape ( landscapes ,&num_of_landscapes ,
f r a c t a l . f r a c t a l_con f i g u r a t i o n [ 0 ] ,
431 f r a c t a l . f r a c t a l_con f i g u r a t i o n [ 1 ] ,
f r a c t a l . f r a c t a l_con f i g u r a t i o n [ 2 ] ) ;
433 p r i n t f ( "Landscape i n i t i a l i z e d \n" ) ;
p r i n t f ( "num_of_l = %i \n" , num_of_landscapes ) ;
435 f o r ( i =0; i <(num_of_landscapes∗num_of_landscapes ) ; i++)
createFracta lLandscape ( landscapes [ i ] ) ;
437
p r i n t f ( "Landscape c rea ted \n" ) ;
439 // Render and cleanup
#i f d e f _WIN32
441 render ( ) ;
#e l s e
443 render (&plat form_info ) ;
#end i f /∗ _WIN32 ∗/
445 p r i n t f ( " Fin i shed render \n" ) ;
c leanup ( ) ;
447 return 0 ;
}
449
#i f d e f _WIN32
451 HWND create_window ( i n t uiWidth , i n t u iHeight ) {
WNDCLASS wc ;
LVIII
453 RECT wRect ;
HWND sWindow ;
455 HINSTANCE hInstance ;
457 wRect . l e f t = 0L ;
wRect . r i g h t = ( long ) uiWidth ;
459 wRect . top = 0L ;
wRect . bottom = ( long ) u iHeight ;
461
hInstance = GetModuleHandle (NULL) ;
463
wc . s t y l e = CS_HREDRAW | CS_VREDRAW | CS_OWNDC;
465 wc . lpfnWndProc = (WNDPROC) process_window ;
wc . cbClsExtra = 0 ;
467 wc . cbWndExtra = 0 ;
wc . hInstance = hInstance ;
469 wc . hIcon = LoadIcon (NULL, IDI_WINLOGO) ;
wc . hCursor = LoadCursor (NULL, IDC_ARROW) ;
471 wc . hbrBackground = NULL;
wc . lpszMenuName = NULL;
473 wc . lpszClassName = "OGLES" ;
475 Reg i s t e rC la s s (&wc) ;
477 AdjustWindowRectEx(&wRect , WS_OVERLAPPEDWINDOW, FALSE,
WS_EX_APPWINDOW | WS_EX_WINDOWEDGE) ;
479 sWindow = CreateWindowEx (WS_EX_APPWINDOW | WS_EX_WINDOWEDGE, "
OGLES" , "main" , WS_OVERLAPPEDWINDOW | WS_CLIPSIBLINGS |
WS_CLIPCHILDREN, 0 , 0 , uiWidth , uiHeight , NULL, NULL,
hInstance , NULL) ;
481 ShowWindow(sWindow , SW_SHOW) ;
SetForegroundWindow (sWindow) ;
483 SetFocus ( sWindow) ;
485 return sWindow ;
}
487 /∗
∗ process_window ( ) : This func t i on handles Windows c a l l b a ck s .
489 ∗/
LRESULT CALLBACK process_window (HWND hWnd, UINT uiMsg , WPARAM
wParam , LPARAM lParam ) {
491 switch ( uiMsg )
{
493 case WM_CLOSE:
PostQuitMessage (0 ) ;
495 return 0 ;
case WM_CHAR:
LIX
497 switch (wParam)
{
499 case 0x1B : //Escape
PostQuitMessage (0 ) ;
501 return 0 ;
}
503 case WM_ACTIVATE:
case WM_KEYDOWN:
505 switch (wParam)
{
507 case VK_LEFT:
// Rotates camera to the l e f t
509 yRotateAngle +=0.5 f ;
r e turn 0 ;
511 case VK_RIGHT:
// Rotates camera to the r i g h t .
513 yRotateAngle −=0.5 f ;
r e turn 0 ;
515 case VK_DOWN:
// Moves camera backward
517 zTrans la t i on −= 0.02 f ; // Move camera a l i t t l e back .
re turn 0 ;
519 case VK_UP:
// Moves camera forward
521 zTrans la t i on += 0.02 f ;
r e turn 0 ;
523 case VK_PRIOR: // Page up .
// Rotates camera up
525 xRotateAngle += 0 .5 f ;
r e turn 0 ;
527 case VK_NEXT: // Page down .
// Rotates camera down .
529 xRotateAngle −= 0.5 f ;
r e turn 0 ;
531 case VK_HOME:
return 0 ;
533 case VK_INSERT:
return 0 ;
535 case VK_DELETE:
return 0 ;
537 case VK_END:
return 0 ;
539 }
case WM_KEYUP:
541 case WM_SIZE:
re turn 0 ;
543 }
return DefWindowProc (hWnd, uiMsg , wParam , lParam ) ;
545 }
LX
#end i f /∗ _WIN32 ∗/
../source/demo/main.c
/∗
2 ∗ This p rop r i e t a ry so f tware may be used only as
∗ author i s ed by a l i c e n s i n g agreement from ARM Limited
4 ∗ (C) COPYRIGHT 2009 − 2011 ARM Limited
∗ ALL RIGHTS RESERVED
6 ∗ The en t i r e no t i c e above must be reproduced on a l l author i s ed
∗ cop i e s and cop i e s may only be made to the extent permitted
8 ∗ by a l i c e n s i n g agreement from ARM Limited .
∗/
10
#de f i n e MAX_ITERATIONS 80 .0
12 a t t r i b u t e vec4 av4pos i t i on ;
a t t r i b u t e f l o a t a f h e i gh t ;
14
uniform mat4 mvp ;
16
vary ing f l o a t v f c o l o r ;
18
void main ( )
20 {
// a fhe i gh t2 i s used to avoid two d i v i s i o n s by 80 ,
22 // once here and once in the fragment shader .
f l o a t a fh e i gh t2 = a fh e i gh t /MAX_ITERATIONS;
24 v f c o l o r = a fhe i gh t2 ;
vec4 pos = vec4 ( av4pos i t i on . x , a fhe ight2 , av4pos i t i on . zw ) ;
26 g l_Pos i t ion = mvp ∗ pos ;
}
../source/demo/shader.vert
/∗
2 ∗ This p rop r i e t a ry so f tware may be used only as
∗ author i s ed by a l i c e n s i n g agreement from ARM Limited
4 ∗ (C) COPYRIGHT 2009 − 2011 ARM Limited
∗ ALL RIGHTS RESERVED
6 ∗ The en t i r e no t i c e above must be reproduced on a l l author i s ed
∗ cop i e s and cop i e s may only be made to the extent permitted
8 ∗ by a l i c e n s i n g agreement from ARM Limited .
∗/
10
p r e c i s i o n lowp f l o a t ;
12
vary ing f l o a t v f c o l o r ;
14
void main ( )
LXI
16 {
18 vec3 v_color ;
20 const vec3 c1 = vec3 (0 . 165 , 0 . 244 , 0 . 518 ) ; // Light blue
const vec3 c2 = vec3 ( 1 . 0 , 1 . 0 , 1 . 0 ) ; // White
22
i f ( v f c o l o r == 1 . 0 ) // MAX_ITERATIONS REACHED
24 v_color = vec3 ( 1 . 0 , 0 . 0 , 0 . 0 ) ; // Red
e l s e
26 v_color = mix ( c1 , c2 , ( v f c o l o r ) ) ;
28 gl_FragColor = vec4 ( v_color , 1 . 0 ) ;
}
../source/demo/shader.frag
LXII
