On-chip bus standards in a broadcast architecture by Romaguera i Restudis, Josep-Oriol
  
 
 
 
 
 
 
FINAL PROJECT 
 
On-chip bus standards in a broadcast architecture 
 
 
  
 
  
 
Field: Electronic Engineering 
Author: Josep-Oriol Romaguera i Restudis 
Supervisor: Lauri Koskinen 
May 2012 
 
   
 
Aalto University - School of Electrical Engineering 
Department of Micro- and Nanosciences (Electronic Circuit Design unit) 
 
Bachelor’s Thesis 
 
Author: Josep-Oriol Romaguera i Restudis 
 
 
Title: On-chip bus standards in a broadcast architecture 
 
 
Date: 21.05.2012 Language: English Number of Pages: 48 
 
 
Supervisor: Lauri Koskinen 
 
 
 
Keywords: Broadcast communication, AMBA, wire model, wire delay, 
VHDL. 
 
 
On-chip bus standards in a broadcast architecture                  3 
 
 
   
Contents 
 
Contents ................................................................................................................. 3 
Acknowledgment .................................................................................................... 4 
List of abbreviations and symbols .......................................................................... 5 
Abstract .................................................................................................................. 6 
1. Introduction ........................................................................................................ 7 
1.1 Objective and Scope .................................................................................. 7 
1.2 Outline ....................................................................................................... 8 
2. Theory ................................................................................................................ 9 
2.1 AMBA Bus ................................................................................................. 9 
2.2 CAN Bus .................................................................................................. 15 
3. System approach ............................................................................................. 17 
3.1 Specifications ........................................................................................... 17 
3.2 Design alternatives .................................................................................. 18 
3.3 System block diagram .............................................................................. 21 
4. Design .............................................................................................................. 23 
4.1 AMBA AHB system (VHDL) ..................................................................... 23 
4.1.1 Master ................................................................................................. 23 
4.1.2 Slave ................................................................................................... 26 
4.1.3 Arbiter ................................................................................................. 27 
4.1.4 Decoder .............................................................................................. 30 
4.2 Wire model ............................................................................................... 31 
4.2.1 Calculation of wire parameters ........................................................... 31 
4.2.2 Wire delay and energy per bit ............................................................. 33 
4.2.3 Wire delay model in VHDL .................................................................. 38 
5. Verification ....................................................................................................... 42 
6. Conclusions ..................................................................................................... 45 
7. References ....................................................................................................... 47 
Appendix A. AHB Signals ........................................................................................ I 
Appendix B. VHDL Code ....................................................................................... IV 
On-chip bus standards in a broadcast architecture                  4 
 
 
   
Acknowledgment 
 
 
I would like to thank my supervisor, Lauri Koskinen, for giving me the 
opportunity to work in a project and providing inspiration and help when 
needed. 
 
In addition, I want to thank the student service personal, especially 
Nina Huovinen, for all the answered mails and make my arrival easier. 
 
Last but not least, I would like to thank my family and, mainly, my 
girlfriend, for the support given during the last 8 months and “allow me” 
to do the final project abroad. 
On-chip bus standards in a broadcast architecture                  5 
 
 
   
List of abbreviations and symbols 
 
AMBA Advanced Microcontroller Bus Architecture 
AHB Advanced High-performance Bus 
CAN Controller Area Network 
VHDL Combination of the abbreviations “VHSIC” and “HDL” 
VHSIC Very High Speed Integrated Circuit 
HDL Hardware Description Language 
 
   Sheet resistance 
  Electrical resistivity 
     Relative permittivity 
    Oxide permittivity  
   Vacuum permittivity (8.854…e-12 F/m) 
    Delay of a wire (using RC lumped model)   
      Wire resistance per millimetre 
      Wire capacitance per millimetre 
       Total delay of a 1mm wire 
          Repeater delay 
     Optimal sections of the wire (to minimize the delay) 
   Load capacitance 
  Supply voltage 
   Total bit energy per useful bit and per millimetre 
  
  Bit energy per useful bit and per millimetre (without repeaters) 
      Repeater energy per useful bit 
On-chip bus standards in a broadcast architecture                  6 
 
 
   
Abstract 
 
This final project has been developed in the School of Electrical Engineering (in 
Aalto University) and its main goal is to value whether AMBA (Advanced 
Microcontroller Bus Architecture) is a good option to develop an on-chip broadcast 
architecture.  
To carry it out, various alternatives are taken in account: use full AMBA (with all its 
signals), use CAN (Controller Area Network) bus instead of AMBA, and use modified 
AMBA (using only the signals that will be used). Once one alternative is chosen, the 
implementation is done, using VHDL (Very high speed integrated circuit - Hardware 
Description Language). 
Afterwards, wire delay per millimetre is calculated (with and without buffering). 
With the optimization, a 40.7% of improvement is obtained, with a delay 
of       
  
  
. In addition, a delay model in VHDL is designed and implemented. 
Finally, energy per bit is also calculated and compared with a point-to-point 
architecture system. Obviously, the performed design consumes more energy per bit 
(     
  
     
) than a point-to-point system. Nevertheless, it is important to highlight 
that a point-to-point communication needs many wires. 
 
On-chip bus standards in a broadcast architecture                  7 
 
 
   
1.Introduction 
This chapter shows the main goals of the thesis and describes, chapter by 
chapter, the contents of each one. 
 
1.1 Objective and Scope 
The first goal of this project is to implement an on-chip broadcast architecture, 
using VHDL (Very high speed integrated circuit - Hardware Description Language). 
To carry it out, a kind of shared bus, such as AMBA (Advanced Microcontroller Bus 
Architecture) or CAN (Controller Area Network), must be used. Therefore, before 
doing any VHDL description, various design alternatives have to be set out and 
evaluated. Furthermore, it is important to point out that this architecture must be 
designed for a huge number of modules, which can transmit between each other. 
However, the design starts with a reduced number of modules (eight), allowing an 
easier explanation and implementation. After that, and once the eight modules 
system has been simulated, the design is extended to N modules, where N may take 
any value. 
Following that, the second goal is to model the wire delay of the interconnections 
between these modules. Throughout most of the past history of integrated circuits, 
on-chip interconnect wires were only considered in special cases. With the 
introduction of deep-submicron semiconductor technologies, the situation changed 
rapidly. The parasitic effects introduced by the wires show a scaling behaviour that 
differs from the active devices such as transistors, and tend to gain in importance as 
device dimensions are reduced and circuit speed is increased. In fact, they start to 
dominate some of the relevant metrics of signal integrated circuits such as speed, 
energy-consumption and cost. Therefore, a careful and in-depth analysis of the role 
and the behaviour of the interconnect wire in a semiconductor technology is not only 
desirable, bus even essential. Consequently, this part of the thesis consists of find 
the wire parameters (such as resistance and capacitance per millimetre), the delay 
per millimetre, and energy consumed per useful bit and per millimetre. Note that in 
the last part (energy per bit), the placement of modules must be discussed in order 
to obtain minimum distances between modules. 
Finally, after to model the delay using VHDL, the system is simulated using 
ModelSIM. At this point, the behaviour of the system might be checked. In addition, a 
comparison between non-delayed and delayed system should be carried out, in 
order to obtain conclusions about delay model, and determine over how many 
modules the system can transfer. 
 
On-chip bus standards in a broadcast architecture                  8 
 
 
   
 
1.2 Outline 
The main contents of this document is split in five chapters: 
 Chapter 2: some lines of theory are introduced in this chapter, which deals with 
AMBA bus, and briefly CAN bus.  
 
 Chapter 3: the system approach is shown in this chapter. Specially, the 
specifications of the design may be seen, as well as the design alternatives, 
and the block diagram of the chosen alternative.  
 
 Chapter 4: all design process may be seen in this chapter, which is divided in 
two sections. The first one shows all the VHDL design of the chosen 
alternative. On the other hand, the second section deals with the wire 
parameters, such as resistance and capacitance, the calculation of the wire 
delay and the energy per bit and, finally, it shows the wire model and its VHDL 
implementation.  
 
 Chapter 5: the main goal of this chapter is to verify that the system works 
satisfactory. To carry it out, some commented simulations are shown. 
 
 Chapter 6: finally, last chapter presents some conclusions about the design. 
 
 
 
On-chip bus standards in a broadcast architecture                  9 
 
 
   
2.Theory 
This chapter presents theory that helps to understand this document. Basically, it 
shows how AMBA bus works and, more briefly, how CAN bus works. 
 
2.1 AMBA Bus 
AMBA (Advanced Microcontroller Bus Architecture) is a specification introduced 
by ARM Ltd in 1996 and it is used as on-chip bus in system-on-chip (SoC) designs. 
The first AMBA buses were “Advanced System Bus” (ASB) and “Advanced 
Peripheral Bus” (APB). Then, AHB (Advanced High-performance Bus) protocol was 
introduced, completing the AMBA2 version. 
In 2003, ARM introduced the third generation, AMBA3, and later AMBA4. These 
generations have also its own protocols, but they do not be explained because they 
are not the aim of this thesis. In fact, AMBA2 is enough because it is the simplest 
one [1].  
Regarding to AMBA 2, it is important to point out that it has three different buses. 
Their specifications are shown below [1] [2]: 
1) Advanced High-performance Bus (AHB) 
a. High-performance. 
b. High clock frequency. 
c. It is the main bus. Supports efficient connections to processors, 
memories, etc. 
 
2) Advanced System Bus (ASB) 
a. It is also for high-performance. 
b. It is an alternative to the AHB bus when the high-performance features 
of AHB are not required. 
 
3) Advanced Peripheral Bus (APB) 
a. It is for low-power peripherals. 
b. Interface less complex than AHB and ASB. 
c. AHB/ASB – APB bridge is required to connect an AHB/ASB bus with 
APB bus. 
 
A typical AMBA 2 system is illustrated in Figure 1. 
 
On-chip bus standards in a broadcast architecture                  10 
 
 
   
 
 
 
 
 
 
 
 
 
 
 
 
 
In this project, only AMBA AHB protocol is used because it is enough for our 
goals. APB is not needed because it is oriented to peripherals devices, and ASB is 
similar to AHB. However, the last one has more performances hence AHB will be 
used. 
Focusing now to AHB bus, it is important to highlight that it is a bus designed to 
support the requirements of the high-performance designs, high operating 
frequencies, including among other features [1] [3]: 
- Burst transfer. 
- Split transactions. 
- Single cycle bus master handover. 
- Single clock edge operation. 
- Non-tristate implementation. 
- Wider data bus configurations (64/128 bits). 
In the following lines, a brief introduction to AMBA AHB is given. Specially, the 
introduction is divided in three topics: (1) the basic elements in AHB system and the 
function of each one, (2) what is the interconnection between these elements and 
the main signals, and (3) how to do a basic transfer. 
 
 
Figure 1: Typical AMBA 2 system [1] 
On-chip bus standards in a broadcast architecture                  11 
 
 
   
MAIN ELEMENTS OF AHB SYSTEM 
The main elements are the master/s, the slave/s, one arbiter, and one decoder. A 
brief description of each element is shown below [3]: 
1) MASTER: 
It is able to start read and write operations, providing the address and 
control signals of the transfer. AHB bus supports a master-slave 
communications with multiple masters, but only one master can transfer at the 
same time; in other words, more than one master cannot use the bus 
simultaneously. Examples of masters are the microprocessor, the errors 
interface, or Direct Memory Access (DMA). 
 
2) SLAVE: 
Its function is to answer the read and write operations in a given address 
rank. The slave generates specific signals which indicate the state of the 
transfer to the active master. Examples of slaves are internal memories or 
external memory interfaces. 
 
3) ARBITER: 
Its function is to decide which master can use the bus at each moment. The 
arbitration algorithm can be implemented in accordance with the application 
necessities. It is important to point out that every design must have only one 
arbiter. 
 
4) DECODER: 
Its function is to decode the address sent by the active master with the goal 
to select the slave involved in the transfer. Only one decoder is needed in the 
same system. 
 
BUS INTERCONNECTION 
Firstly, note that a list of the AHB signals and arbitration signals is shown in 
Appendix A. In addition, a brief description of each signal is given, helping to 
understand the design. 
The AMBA AHB bus protocol is designed to be used with a central multiplexor 
interconnection scheme, as can be seen in Figure 2. With this interconnection, all 
masters can generate the address and control signals of the transfer that they want 
to do, and the arbiter decides which signals (from the same master) arrive to the 
slaves. On the other side, a decoder is needed to select the slave whom the active 
master is communicating.  
On-chip bus standards in a broadcast architecture                  12 
 
 
   
 
Figure 2: Multiplexor interconnection [3] 
 
BASIC TRANSFER 
Before a transfer between a master and a slave starts, the master must request 
the bus to the arbiter using a requesting signal, HBUSREQx. After that, the arbiter 
can give access to the bus asserting HGRANTx signal of the requesting master. 
Later, when HREADY is asserted, the transfer can start. 
It is important to highlight that AHB transfer consists of an address and control 
cycle, and one or more cycles for the data. Therefore, the master starts a transfer by 
setting up the address and control signals. These signals indicate the transfer 
direction (read or write), data width, and if the transfer belongs to a burst. This 
information is available only for one clock cycle, hence all the slaves have to read 
this information at this time. 
Two buses are used to transfer data: one for write operations, and one for read 
operations. The write data bus is used to transfer from masters to slaves, while read 
On-chip bus standards in a broadcast architecture                  13 
 
 
   
data bus is used in the other direction. Unlike control signals, data can be in the bus 
for the time that is needed; just keep HREADY low is necessary. 
During the transfer, the slave indicates the state of the transfer with HRESP 
signal. This signal can take four values: OKAY when the transfer is done normally; 
ERROR when something wrong has happened; RETRY and SPLIT when the 
transfer has to be interrupted and resumed later [3]. 
At this point, an idea about how to do a transfer is had. However, a simple transfer 
example might clarify the procedure: one clock cycle for control signals and address, 
and one clock cycle for data (see Figure 3). In this transfer, the master writes the 
address and control signal before the first rising edge clock; the slave saves these 
values in the next edge. Afterwards, the slave sends its answer which is received by 
the master in the third edge. 
 
Figure 3: Simple transfer [3] 
 
The slave can extend data phase keeping HREADY low, as can be seen in Figure 
4. It allows slave to have additional time to complete the transfer. Regarding to write 
transfers, master must keep data in the bus during all data phase until HREADY is 
not asserted. The same happens in read transfers: the slave does not have to write 
data in the bus until HREADY is not asserted by the slave. 
 
 
On-chip bus standards in a broadcast architecture                  14 
 
 
   
 
Figure 4: Transfer with wait states [3] 
 
Address phase and data phase occur in different clock cycles. In fact, address 
phase of one transfer overlaps with data phase of the previous transfer. This 
overlapping of address and data is fundamental to the pipelined nature of the bus 
and allows for high performance operation, while still providing adequate time for a 
slave to provide the response to a transfer. As can be seen in Figure 5, if a data 
phase of a transfer is extended, address phase of the next transfer will also be 
extended. Despite this, slaves only read control signals when HREADY is high. 
 
Figure 5: Multiple transfers [3] 
 
More information about AMBA AHB may be found in [3].  
On-chip bus standards in a broadcast architecture                  15 
 
 
   
2.2 CAN Bus 
CAN bus (Controller Area Network) is a multi-master broadcast serial bus 
standard for connecting electronic control units, and it was developed by Robert 
Bosch. Each node is able to send and receive messages, but not at the same time. 
A message consists primarily of an ID (identifier), which represents the priority of the 
message, and up to eight data bytes. This information is transmitted serially by the 
bus. 
If the bus is free, any node may begin to transmit. If two or more nodes begin  
sending messages at the same time, the message with the more dominant ID (which 
has more dominant bits - it represents, zeroes -) will overwrite other nodes less 
dominant IDs, so that only the dominant message will remain and will be received by 
all nodes. This mechanism is referred to as priority based bus arbitration. Messages 
with numerically smaller values of IDs have higher priority and are transmitted first. 
Some of the basic features of this bus are: 
- Message priority 
- Formed by only two wires 
- Flexible configuration 
- Multicast communication 
- Multi-master system 
Due to this bus only have two wires, information that is sent in the bus has to be 
encapsulated and transmitted bit to bit. The frame format is illustrated in Figure 6: 
 
 
Figure 6: Frame format of CAN bus [4] 
 
The Data Frame begins with a dominant Start of Frame (SOF) bit for hard 
synchronization of all nodes. The SOF bit is followed by the Arbitration Field 
reflecting content and priority of the message. The next field is the Control Field 
which specifies mainly the number of bytes of data contained in the message. 
On-chip bus standards in a broadcast architecture                  16 
 
 
   
The Cyclic Redundancy Check (CRC) Field is used to detect possible 
transmission errors. It consists of a 15-bit CRC sequence completed by the 
recessive CRC delimiter bit. During the Acknowledgement (ACK) Field the 
transmitting node sends out a recessive bit. Any node that has received an error free 
frame acknowledges the correct reception of the frame by sending back a dominant 
bit. The recessive bits of the End of Frame end the Data Frame. Between two frames 
there must be a recessive 3-bit Intermission field. 
 
On-chip bus standards in a broadcast architecture                  17 
 
 
   
3.System approach 
This chapter describes the main specifications and goals of this project, evaluates 
various design alternatives, and finally chooses the better one, using reasoned 
explanations.  
 
3.1 Specifications 
In this section, two kinds of specifications are presented (behavioural and 
electrical specifications). Behavioural specifications are related to what the system 
must do and they will be used to implement the VHDL code. On the other hand, 
electrical specifications will be used to find a simple, but accurate, wire model and 
the delay of the wire in this system. These specifications are shown below. 
 
BEHAVIOURAL SPECIFICATIONS 
- The system under design has to be formed by different modules (N), each 
one able to send and receive data. 
- Every module must be able to send 8 bits (1 byte) of data, at least, to the 
other modules of the system that have an upper identifier than the sender 
(see Figure 7). 
…
D0 D1 D2 D3 DN-2 DN-1
N modules
Shared bus
 
Figure 7: System architecture 
 
At the end, second module must have its own data (D1) and data from 
modules with lower identifier (only D0). On the other hand, last module must 
have its own data (DN-1) and also data from modules with lower identifier (all 
data packets; from D0 until DN-2). 
On-chip bus standards in a broadcast architecture                  18 
 
 
   
- Interconnection between these modules must be done by a shared bus. 
AMBA bus is recommended, but other options can be analysed. 
- The placement of these modules, on the chip, should be designed in order to 
minimize the delay. The distribution may be on a one-dimensional system, as 
Figure 7 shows, or on a two-dimensional system. 
 
ELECTRICAL SPECIFICATIONS 
- To find a wire model, four wires around each one are considered. Figure 8 
describes exactly the wires that are important to consider, the distances 
between them and useful electrical parameters.  
 
distL
d
is
tH
W
H
L
k
 W=100nm, H=200nm
 distL=100nm, distH=150nm
 k=3
 R□=150mΩ 
 
Figure 8: The wire 
 
3.2 Design alternatives 
Before start any design it is very important to analysed different alternatives. This 
section considers three alternatives and mentions their advantages and their 
drawbacks. Finally, one of these alternatives is chosen to implement. 
On-chip bus standards in a broadcast architecture                  19 
 
 
   
ALTERNATIVE 1 
The first alternative use AMBA bus, and every module is formed by a master and 
a slave. A scheme of this system, with 8 modules, is illustrated in Figure 9. 
AMBA AHB BUS
M0 M1 M2 M3 M4 M5 M6 M7
ARBITER
D0 D1
D0
D2
D1
D0
D3
D2
D1
D0
D4
D3
D2
D1
D0
D5
D4
D3
D2
D1
D6
D5
D4
D3
D2
D7
D6
D5
D4
D3
D2
D1
D0
D1
D0
D0Dx
Dz
DATA OF OWN MODULE
DATA RECEIVED FROM OTHER MODULES
TRANSFER
 
Figure 9: Scheme of first alternative with 8 modules (using AMBA bus) 
 
In this case, each module sends its own data to other modules with upper 
identifier. Note that: 
- These transfers happen serially: first, one module sends its data to all other 
modules. After that, another module sends data and so on. 
- The order of the transfers is determined by the Arbiter. It is no necessary 
that M0 sends data in first place, M1 in second place, etc. Nevertheless, in 
this design, the arbitration algorithm will give priority to the module with lowest 
ID. 
- At the end, each module has data packets that are shown below them. 
Finally, Table 1 shows the advantages and the drawbacks of this design 
alternative.  
ADVANTAGES DRAWBACKS 
AMBA bus has a parallel data bus that allows 
all bits of the packet are sent at the same 
time. 
Due to AMBA bus is used, all modules must 
be master and slave at the same time. It 
denotes that it is necessary N masters and 
N slaves for a system with N modules. 
If N is big, total transfers will take a 
On-chip bus standards in a broadcast architecture                  20 
 
 
   
prohibitive time, because transfers happen 
serially. 
It is necessary an arbiter to grant the bus 
 
Table 1: Advantages and drawbacks of alternative 1 
 
ALTERNATIVE 2 
The second alternative uses CAN bus. It denotes that when a module sends its 
own data, all other modules can read this data at the same time. Therefore, using 
this alternative, only N transfers are required, even it is important to remember that 
every transfer is done bit to bit [4] [5]. Hence, some clock cycles are needed for each 
transfer (see section 2.2 for more information). 
The advantages and drawbacks of this alternative are summarized in Table 2. 
ADVANTAGES DRAWBACKS 
The frame sent by a module, can be read for 
all other modules at the same time. 
The transfer of a frame is done bit to bit 
because there are only two wires. 
Consequently, although only N frames are 
sent, the total transfers will take a 
prohibitive time. 
Only two wires are required for this bus. 
Therefore, lower resources than AMBA bus. 
Arbiter is not necessary because arbitration 
process is done at the beginning of the 
frame, using dominant bits. 
 
Table 2: Advantages and drawbacks of alternative 2 
 
ALTERNATIVE 3 
After alternatives 1 and 2 have been analysed, consider a third alternative, that 
combines features of the first and second one, might be a good option. The idea is 
take the first alternative (with AMBA bus) as a base and perform some modifications. 
The main idea is not use address bus of AMBA, because when a module is 
transmitting data, all other modules should read this data at the same time (hence, 
transfers from the same module do not happen serially). In addition, there will be 
other differences between original AMBA signals and used signals; these differences 
are shown in section 4.1. 
The advantages and drawbacks of these alternatives are summarized in Table 3. 
ADVANTAGES DRAWBACKS 
AMBA bus has a parallel data bus that allows 
all bits of the packet are sent at the same 
time. 
An arbiter and control signals are needed. 
However, not all AMBA control signals will 
be used. 
In this case, transfers do not happen serially 
(total transfers time is lower). 
 
Table 3: Advantages and drawbacks of alternative 3 
On-chip bus standards in a broadcast architecture                  21 
 
 
   
3.3 System block diagram 
At this point, the alternative that will be implemented is known. Below, the blocks 
that comprise the system are shown, as well as the interconnection between blocks. 
The whole system is illustrated in Figure 11. Note that there is not any decoder 
because the alternative that is implemented does not use address bus. Therefore, no 
decoder is needed in the design. Nevertheless, in section 4.1.4, there is a 
description of the decoder design (for example, decoder might be used in alternative 
1). In addition, remember that each module is formed by a master and slave, as can 
be seen in Figure 10. 
 
MODULE_n
HBUSREQn
MASTER
AHB
HRESETn
HCLK
SLAVE
AHB
HGRANTn
   HMASTER
 HWDATA_Mn
   HWDATA_Sn
CONTR_OUTn
  CONTR_INn
CONTR_OUTmaster
CONTR_INslave
      CONTR_OUTslave
CONTR_INmaster
 
Figure 10: Block diagram of “Module_n” 
 
 
 
 
On-chip bus standards in a broadcast architecture                  22 
 
 
   
 
 
CONTR_IN(N-1)
CONTR_IN0
CONTR_IN1
Module_0
HRESETn
HCLK
Module_1
Module_N-1
ARBITER
HBUSREQ
HGRANT
HCLK
HRESETn
...
...
...
HMASTER
                                HWDATA_M0
                                             HWDATA_M1
...
HWDATA_S0
HWDATA_S1
HWDATA_S(N-1)
HRESETn
HCLK
HRESETn
HCLK
HBUSREQ(0)
HGRANT(0)
HBUSREQ(1)
HGRANT(1)
HBUSREQ(N-1)
HGRANT(N-1)
C
O
N
TR
O
L
M
U
X
D
A
TA
M
U
X
                                                   CONTR_OUT0
                                     CONTR_OUT1
                                 CONTR_OUT(N-1)
...
                           HWDATA_M(N-1)
 
Figure 11: Block diagram of the system 
On-chip bus standards in a broadcast architecture                  23 
 
 
   
4.Design 
This chapter shows the design of the system (using VHDL), and its blocks are 
explained. Moreover, the wire model is calculated and designed: calculation of the 
wire parameters, calculation of the delay per millimetre and the energy per bit and, 
finally, the implementation of the wire model using VHDL.  
 
4.1 AMBA AHB system (VHDL) 
In this section, the VHDL description is shown. Remember that the system needs 
four specific blocks (Master, Slave, Arbiter, and Decoder). Due to this, each module 
is described in the next lines, paying special attention on the main goal, the basic 
features, and its implementation. In addition, it is important to highlight that, as has 
been presented in section 3.2, the system is designed using AMBA AHB bus, but not 
all AHB signals are used in this design. The differences between original signals and 
used signals are also shown below. 
 
4.1.1  Master 
Master is the block that must start a transfer. It has to be able to request for the 
bus, write the address and control values to the bus, and after that, write or read 
data to/from the bus.  
 
FEATURES 
This block has the interface of signals [3] that is shown in Figure 12, and the 
functions of these signals are explained in Table 4. 
MASTER
AHB
HRESETn
HCLK HBUSREQ
HGRANT
   HWDATA
HTRANS
HRESP
HREADY
2
2
8
 
Figure 12: Master AHB block 
On-chip bus standards in a broadcast architecture                  24 
 
 
   
Signal Direction (bits) Description 
HCLK In (1) Clock of the system. All signal timings are related 
to the rising edge of HCLK. 
HRESETn In (1) Reset of the system. Active low. 
HBUSREQ Out (1) Indicates that the master requires the bus. 
HGRANT In (1) Indicates that the master is currently the highest 
priority master in the system and has access to 
the bus. 
HWDATA Out (8) The write data bus is used to transfer data from 
the master to the slaves. 
HREADY In (1) When HIGH, the HREADY signal indicates that a 
transfer has finished on the bus. This signal may 
be driven LOW to extend a transfer (by the slave). 
HTRANS Out (2) Indicates the type of the current transfer. It 
would be enough use only 1 bit. 
HRESP In (2) Indicates additional information on the status of 
the transfer. It would be enough use only 1 bit. 
Table 4: Inputs and outputs of Master AHB block 
 
 Differences between original signals and used signals: 
 
 There is no read data bus in this Master block. Remember that the master 
only needs write information to the slaves. 
 There is no address bus because this Master block writes data to all 
slaves at the same time. Therefore, there is no need to identify the slaves. 
 AMBA specification uses other signals that are not used in the design: 
- HSIZE: Indicates the size of the transfer; but all our transfers have a 
size of 8 bits. 
- HBURST: Indicates if the transfer forms part of a burst. Nevertheless, 
all our transfers are simple. 
- HPROT: Provides additional information. No necessary for this design. 
 
IMPLEMENTATION 
The procedure that must be followed to make a simple transfer is shown in 
Figure 3. Before that, the master must request the bus, asserting HBUSREQ 
signal until HGRANT does not become active high. Afterwards, the current master 
has access to the bus and may start the simple transfer. The Algorithmic State 
Machine (ASM) of this block is shown in Figure 13.  
 
 
On-chip bus standards in a broadcast architecture                  25 
 
 
   
HBUSREQ=’1'
HGRANT=’1' & 
HREADY=’1'?
S0
HTRANS=”10"
HBUSREQ=’0'
S1
YES
NO
S2
HREADY=’1'?
NO
HWDATA=”DATA”
HTRANS=”00"
YES
S3
newDATA=’1'?
NO
YES
 
Figure 13: ASM of Master AHB block 
 
It is important to highlight that the function of state S0 is to request the bus. 
After that, state S1 waits for the response from the arbiter (assertion of HGRANT) 
and immediately the master indicates that the transfer will be a simple transfer 
(HTRANS=”10”). Afterwards, the master waits in state S2 until all slaves have 
read control signals (HREADY=’1’) and then, data is written in the bus and is read 
by slaves. Finally, if the master has new data to send, the procedure starts again. 
Otherwise, the master waits in state S3. 
 
 
On-chip bus standards in a broadcast architecture                  26 
 
 
   
4.1.2  Slave 
Slave is the block that must wait for a write or read request from one master. In 
the design, the slave waits for a write request, which is formed by two phases: 
control phase and data phase. 
 
FEATURES 
This block has the interface of signals [3] that is shown in Figure 14. 
SLAVE
AHB
HRESETn
HCLK
 HWDATA
HTRANS
HRESP
HREADY
2
2
8
HMASTERM
M=log2(N)
N= number of modules
 
Figure 14: Slave AHB block 
 
Note that the signals of this block are approximately the same as the Master 
AHB has. The only new signal is HMASTER, which indicates the ID of the master 
that is currently the owner of the bus. 
 
 Differences between original signals and used signals: 
 
 All the differences commented in section 4.1.1 apply in this section. 
 In addition, original AMBA protocol uses HSEL signal, whose role is select 
the slave that the master are writing to. This signal comes originally from 
the decoder. However, as has been previously mentioned, all slaves read 
data at the same time, thus this signal is not necessary. 
 
IMPLEMENTATION 
Figure 15 shows the ASM of the slave block, which is formed by two states (S0 
and S1). The first one, S0, waits for HTRANS=”10” (it waits until a simple transfer 
from a master is initiated). After that, the slave asserts HREADY (if it is not busy) 
On-chip bus standards in a broadcast architecture                  27 
 
 
   
and stores, in signal pos, the ID of the master that is transmitting. This signal will 
be used to store the received data in the corresponding position of a vector that 
stores data from all masters. Afterwards, in state S1, data from the master is 
stored, and a response is given (HRESP=”00” – Okay). Thereafter, if there is not 
any new transfer (HTRANS≠”10”), the state machine goes to state S0. Otherwise, 
the ID of the new master is stored in signal pos and the procedure of state S1 is 
repeated. 
HTRANS=”10"?
S0
HREADY=’1'
pos=HMASTER
DATA_VECTOR(pos)=HWDATA
HRESP=”00"
HREADY=’1'
S1
HTRANS=”10"?pos=HMASTER YES
YES
NO
NO
 
Figure 15: ASM of Slave AHB block 
 
4.1.3  Arbiter 
The roles of this block are to manage the access to the bus and to provide the 
HMASTER signal (ID of the current master that has access to the bus).  
 
FEATURES 
This block has the interface of signals [3] that is shown in Figure 16, and the 
functions of these signals are explained in Table 5. 
 
On-chip bus standards in a broadcast architecture                  28 
 
 
   
ARBITER
HRESETn
HCLK
HBUSREQ N
HMASTERM
HGRANTN
 
Figure 16: Arbiter block 
 
Signal Direction (bits) Description 
HCLK In (1) Clock of the system. All signal timings are related 
to the rising edge of HCLK. 
HRESETn In (1) Reset of the system. Active low. 
HBUSREQ In (N) This bus is formed by all HBUSREQ bits that come 
from the masters. 
HGRANT Out (N) This bus is formed by all HGRANT bits that go to 
the masters. 
HMASTER Out (M) Indicates the ID of the master that has access to 
the bus. The ID is expressed in binary code. 
Table 5: Inputs and outputs of Arbiter block 
 
 Differences between original signals and used signals: 
 
 Original HMASTER signal has four bits (it denotes a maximum of sixteen 
masters in the system). In this design, M bits are used because more than 
sixteen masters will be probably necessary. 
 In addition, there are three arbitrations signals that are not used: 
- HLOCKx: When high, this signal indicates that the master requires 
locked access to the bus. This feature is not needed in the design. 
- HMASTLOCK: Indicates the ID of the master which is performing a 
locked sequence of transfers. 
- HSPLITx[15:0]: It is used by a slave when it can re-attempt a transfer 
that has been stopped before, due to the slave was busy. This feature 
is not implemented in the design. 
 
IMPLEMENTATION 
Figure 17 shows the ASM of the arbiter block, which is formed by two states 
machines. The first one (a) is used to arbitrate the access to the bus, while the 
second one (b) is used to provide HMASTER signal. 
On-chip bus standards in a broadcast architecture                  29 
 
 
   
 
Figure 17: ASM of Arbiter bloc. (a) Arbitration process. (b) HMASTER generation 
 
FOR LOOP
j=0
HBUSREQ≠”0"?
S0
HGRANT(j)=’1'
act_master=j
NO
HBUSREQ(j)=’1'?
YES
YES
j=j+1
NO
j<N
YES
NO
j=0
S1
HBUSREQ(act_master)=’0'?
HGRANT(act_master)=’0'
YES
NO
HBUSREQ≠”0"?
FOR LOOP
YES
NO
act_master≠prev_master
& HREADY=’1'?
NO
prev_master=act_master
HMASTER=act_master
YES
(a) (b)
In (a), ‘j’ is a variable of a “for loop”. 
It means that its changes are 
immediate and it is not necessary 
wait until the next rising edge for its 
update.
On-chip bus standards in a broadcast architecture                  30 
 
 
   
The arbitration process starts waiting for the assertion of any bit of HBUSREQ 
bus. It denotes that there is, at least, one master that wants to access the bus. 
Immediately, there is a “for loop” looking for the master that requires the bus. 
When it is found, the corresponding HGRANT signal is asserted. Afterwards, in 
state S1, the process waits until deassertion of HBUSREQ from the master that 
has the bus and, posteriorly, HGRANT signal is deasserted. Finally, if there are 
more masters that are requesting the bus, the process of the state S1 will start 
again. Otherwise, the state machine will go to state S0. 
On the other hand, HMASTER generation process consists on compare the 
actual master that has the bus with the previous master that had the bus. While 
these values are equal, no action is required because the owner of the bus have 
not changed. Otherwise, if the values are different, HMASTER signal will be 
updated with the ID of the new master. 
 
4.1.4  Decoder 
The decoder in an AMBA system is used to perform a centralized address 
decoding function, which improves the portability of peripherals, by making them 
independent of the system memory map. In our design, the decoder is not used 
because address bus is not used [3]. However, this block is implemented and 
explained because it can be useful in future configurations, where address bus is 
used. 
 
FEATURES 
The interface of signals of this block is shown in Figure 18. Note that it is a very 
simple interface, with one input and one output. The input signal is the address 
bus, while the output is corresponding to HSEL signal (commented in section 
4.1.2). In addition, it is important to point out that this block is totally combinational 
(without clock signal) because the block cannot introduce any cycle delay. 
DECODERSEL N OUTPUT N
 
Figure 18: Decoder block 
 
 
On-chip bus standards in a broadcast architecture                  31 
 
 
   
IMPLEMENTATION 
 The implementation of this block is done using a VHDL process without using 
a clock. This process is shown in Figure 19. 
 
Figure 19: VHDL process of Decoder block 
 
Input SEL is in the sensitivity list of the process. Therefore, every time that SEL 
changes, this process will run and will decode the input to an output vector of N 
bits (N is the total number of slaves). For instance, if SEL=”11”, then 
OUTPUT(3)=’1’. 
 
4.2 Wire model 
This section is divided in three parts. The first one shows the calculation of the 
wire parameters, such as resistance and capacitance. The second one shows the 
delay of the wire (with and without repeaters) and the energy consumed per useful 
bit. Finally, last part shows the implementation of the wire model using VHDL. 
 
4.2.1  Calculation of wire parameters 
First, as has previously commented in the specifications section, Figure 8 must 
be considered to calculate the parameters of the wire, where the main wire is 
surrounded by four more wires. 
To calculate the resistance of the wire (per length), equation (1) is used [6] [7] 
[8]: 
    
 
    
   
 
   
                 
 
(1) 
 
 
 
On-chip bus standards in a broadcast architecture                  32 
 
 
   
Therefore:  
         
 
 
 
     
     
                                
 
(2) 
 
 
NOTES (APPLY FOR ALL DOCUMENT): 
- L is the length of the wire in millimetres. 
- To differentiate between distributed (per unit length) wire parameters versus 
total lumped values, lowercase will be used to denote the former and 
uppercase for the latter.  
 
Next parameter to find is the capacitance of the wire. Capacitance might be 
modelled by four parallel-plate capacitors for the top, bottom, right, and left sides 
[9]. In addition, as can be seen in Figure 20, total capacitance is formed by the 
contribution of various capacitances: inter-wire, ground, and parallel-plate 
capacitances [6]. However, according to the specifications values (W/H=0.5), the 
total capacitance might be approximated by inter-wire (or wire-to-wire) 
capacitance. Therefore, other contributions will be disregarded and the scenario of 
Figure 21 will be considered. 
 
 
Figure 20: Effect of wire capacitances [6] 
 
When W becomes smaller than 
1.75 H, the inter-wire capacitance 
starts to dominate (red line). 
 
Our case: W/H=0.5 (green line) 
Ctotal ≈ Cinterwire 
On-chip bus standards in a broadcast architecture                  33 
 
 
   
ChCh
Cv
Cv
distL
d
is
tH
W
H
 W=100nm
 H=200nm
 distL=100nm
 distH=150nm
 
Figure 21: Wire-to-wire capacitances in our scenario 
 
To calculate the capacitances, parallel-plate model is used, without consider 
fringe capacitance [6] [8]: 
 
  
        
        
                           (        
   )       
  
 
 
 
 
 
 
   
       
     
 
           
     
                            
 
 
(3) 
   
       
     
 
           
     
                            
 
 
 
Therefore, the total wire capacitance is the sum of the four capacitances in 
Figure 21 [7] [10]: 
                                               (4) 
 
  
4.2.2  Wire delay and energy per bit 
In this subsection, wire delay (per millimetre) and energy per useful bit are 
found. These parameters will be useful to characterize the system and know its 
viability. 
 
On-chip bus standards in a broadcast architecture                  34 
 
 
   
WIRE DELAY 
At this point, it is necessary to find the wire model to calculate the delay (TRC), 
which will be a function of the length (L). The simplest one is RC lumped model 
[6], shown in Figure 22: 
 
Figure 22: RC lumped model [6] 
The Elmore delay of this chain network is shown in equation (5): 
          (     )       (       )     ∑  ∑  
 
   
 
   
 
 
(5) 
 
 
This model can be used as an approximation of a resistive-capacitive wire. 
Imagine this wire has a total length of L and it is partitioned into N identical 
segments, each with a length of L/N. The resistance and capacitance of each 
segment are hence given by rL/N and cL/N, respectively. Using the Elmore 
formula [6], an approximation of the time-delay of the wire can be obtained: 
    (
 
 
)
 
(            )  (    )  
 (   )
   
     
   
  
 
 
(6) 
 
 
Note that R=r·L and C=c·L, and they are the total resistance and capacitance 
of the wire of length L. In addition, note that for large values of N, expression (7) is 
obtained: 
    
   
 
     
  
 
   
 
(7) 
 
 
It is important to point out that the delay of a wire is a quadratic function of its 
length. This denotes that doubling the length of the wire quadruples its delay [6]. 
Following with the calculation, substituting the r and c values from (2) and (4) 
respectively, in equation (7), time-delay (TRC) may be found: 
    
 
 
 
     
  
 
        
  
                                               
 
(8) 
 
 
On-chip bus standards in a broadcast architecture                  35 
 
 
   
The next step is to optimize the delay of a 1mm wire with repeaters insertion. 
Note that without repeaters and without the effect of the load capacitance, the 
total delay of this wire is 105.75 ps (8). 
However, the load capacitance at the end of the wire must be considered, as 
Figure 23 shows. This capacitance is given by [7]: 
                          
                                        
 
(9) 
 
 
Considering that the Rdriver of the circuit is 50Ω, the delay in (10) is obtained 
(without repeaters) [7]: 
VDRIVER
RDRIVER ½ RWIRE ½ RWIRE
CWIRE CLOAD
L=1mm
 
Figure 23: Wire model without repeaters 
      (                 )  (             )  (           )  
 
 
             
      (                 )            
 
(10) 
 
 
After this calculation, repeaters are inserted in the design, as Figure 24 shows. 
Specifically, N-1 repeaters are inserted in a wire formed by N parts. 
 
 
VDRIVER
RDRIVER ½ rwire·(L/N) ½ rwire·(L/N)
cwire·(L/N)
CLOAD
Drepeater
½ rwire·(L/N) ½ rwire·(L/N)
cwire·(L/N)
Drepeater
½ rwire·(L/N) ½ rwire·(L/N)
cwire·(L/N)
Figure 24: Wire model with repeaters 
On-chip bus standards in a broadcast architecture                  36 
 
 
   
As a result, the total delay is: 
       (              
 
 
)        
 
 
 
 
 
             (
 
 
)
 
          
 (   )  (
 
 
             (
 
 
)
 
          )       
 
 
 
(      
 
 
      )  
 
 
             (
 
 
)
 
 
 
 
 
(11) 
 
 
 
To optimize delay, equation (12) must be solved. 
       
  
   
 
(12) 
 
 
However, to simplify the calculation, the optimization will be done without the 
effect of Rdriver and Cload [6]. In that case, the delay is shown in equation (13): 
                 (
 
 
)
 
 (   )            
 
(13) 
 
 
As a consequence of the derivation, expression (14) is obtained: 
       
  
          √
       
         
 √
     (          )
         
 
 
(14) 
 
Subsequently, and considering that the delay of the repeater is about 15ps [11], 
the optimal number of repeaters is obtained in (15): 
     √
        
    
                        (           ) 
 
(15) 
 
Consequently, substituting (15) into (13) gives the optimal delay in a 1mm wire:  
                                                     (16) 
 
Therefore, an improvement of 40.7% in delay is obtained.  
 
ENERGY PER USEFUL BIT 
Finally, the energy per useful bit is calculated. In this part, would be very 
interesting to compare the energy per bit in three different scenarios: in a full 
AMBA system, in the system designed (modified AMBA), and in a point-to-point 
communication. The calculations are performed below: 
On-chip bus standards in a broadcast architecture                  37 
 
 
   
 Full AMBA system 
In this case, all control and arbitration signals of AMBA AHB system are 
considered. Therefore, every module of the system (master and slave) would 
have [3]: 
- 64 bits of data (32 bits of write data and 32 bits of read data) 
- 17 bits of control and 32 bits of address 
- 3.3125 bits of arbitration 
 
It represents that there are 64 useful bits (u_bits) per 132.3125 total bits (per 
each module). Consequently, the energy per bit [12] without considering the 
energy of the buffers (Eb’) is given in (17): 
           
  
       
         
       
  
  
 (    )  
             
         
 
     
  
   
  
 
 
(17) 
 
 
On the other hand, every buffer consumes energy at switch time. 
Considering that a buffer is a CMOS inverter (it denotes that Cin=2·Cgate), the 
energy that consumes each buffer per useful bit is: 
          
  
       
         
 (                   )  (    )
  
             
         
 (      
  
  
               )  (    )  
             
         
      
  
   
 
 
 
 
(18) 
 
 
Finally, using equations (17) and (18), it is possible to calculate the total 
energy per useful bit (remember that there are 2 buffers in a length of 1mm). 
     
  
     
  
            
  
      
 
 
(19) 
 
 
 Modified AMBA system 
In this case, only the signals described in section 4.1 are considered to 
calculate the Eb. Therefore, every module has: 
- 8 bits of data 
- 3 bits of control 
- 2 bits of arbitration  
It represents that there are 8 useful bits per 13 total bits. Therefore, Eb’ and 
Ebuff are given by equations (20) and (21). 
On-chip bus standards in a broadcast architecture                  38 
 
 
   
          
  
  
 (    )  
       
        
      
  
      
 
 
(20) 
 
      (      
  
  
               )  (    )  
       
        
      
  
      
 
 
(21) 
 
Finally, using equations (20) and (21), the total Eb is obtained. 
     
  
     
  
       (            )
  
      
      
  
      
 
 
(22) 
 
 
 
 Point-to-point communication system 
In the last case, the arbitration signals are not considered because every link 
has its own wire. In addition, there are no control signals because a point-to-
point system can be operated synchronously. Therefore, every module would 
have 8 bits of data and hence, 8 useful bits. Consequently, Eb’ and Ebuff are: 
          
  
  
 (    )       
  
      
 
 
(23) 
 
      (      
  
  
               )  (    )       
  
      
 
 
(24) 
 
Finally, using (23) and (24), the total Eb is obtained: 
     
  
     
  
       (            )
  
      
      
  
      
 
 
(25) 
 
 
 
4.2.3  Wire delay model in VHDL 
Once delay per millimetre is known, it is necessary decide the placement of the 
modules. After that, the distance between modules will be known, and VHDL 
model may be implemented.   
The placement proposed in this project is shown in Figure 25. Note that the 
modules are distributed on a two-dimensional system because the distances 
between the modules are shorter than on a one-dimensional system. 
Consequently, the delay is also smaller. 
Remember that any module that wants to transfer must request the bus to the 
arbiter. In Figure 25, this request costs three steps of 0.316mm (worst case – 
On-chip bus standards in a broadcast architecture                  39 
 
 
   
request from orange modules). Consequently, the answer from the arbiter to the 
module costs three steps as well.  
On the other hand, when a module from the last region (orange modules in 
Figure 25) sends data to other module in the last region, the distance is six steps. 
Therefore, it is the worst case to calculate over how many modules the system 
can transfer.  
ARB
MUX
M0
M1
M2
M3
M4
M5
M6
M7
M8
M9
M10
M11
M12
M13
M14
M15
M16
M17M19
M20
M22
M21
M23
M18
0.316 mm
0.316 mm
 
Figure 25: Placement of the modules 
 
Note that the distance between two modules is a function of their region ID. 
Considering that the modules with the same colour (in Figure 25) belong to the 
same region, and regions are formed by the modules shown in Table 6, the 
distance between two regions (Rx and Ry) is x+y steps of 0.316mm. 
 
Region Formed by modules 
R1 0  to  3 
R2 4  to  11 
R3 12  to  23 
… … 
Rn 2(n-1)n  to  2n(n+1)-1  
Table 6: Regions and modules 
At this point, it is important to know the maximum distance between two 
modules. Obviously, it depends on the total area of the system. Therefore, it is 
On-chip bus standards in a broadcast architecture                  40 
 
 
   
necessary to show the maximum distance as a function of the area. First, note 
that every side of the square should be formed by an odd number of elements (0.1 
mm2 each). In this case, the configuration is totally symmetric respect the centre 
element (arbiter and multiplexers) and the calculation is easier. 
Consequently, two maximum distances might be calculated: 
1) Considering only the modules of the complete regions (coloured modules in 
Figure 25): 
              (⌊√
    (   )
      
⌋   )          
 
(26) 
 
2) Considering all modules in the square: 
                (⌊√
    (   )
      
⌋   )          
 
(27) 
 
 
NOTE: In equations (26) and (27), floor function is used. It returns the previous 
integer of its result. 
Therefore, when total area is known, maximum distance may be found. 
Consequently, with maximum distance, it is possible to find over how many 
regions the system might transfer and also, using Table 6, over how many 
modules.  
 
Finally, the last step is to implement the delay model in VHDL. To carry it out, it 
is necessary a function that returns the region ID depending on the module ID 
(Table 6). Hence, all distances in our configuration will be known. 
In addition, every module must have delay blocks for all input and output 
signals. It is important to highlight that these blocks introduce a delay 
corresponding to the distance between the module and the centre of configuration 
(arbiter and multiplexers). Hence, all paths become modelled.  
Figure 26 shows an example of how the delay of arbitration signals is carried 
out (in Master AHB). Note that: 
- The signals that end with “_aux” are the input and output signals to the master 
block. Remember that HBUSREQ is an output signal, and HGRANT is an 
input signal. Thus, HBUSREQ_aux is the HBUSREQ signal (from the arbiter) 
On-chip bus standards in a broadcast architecture                  41 
 
 
   
delayed. On the other hand, HGRANT (goes to arbiter) is HGRANT_aux 
signal (from the master) delayed. 
- Delays are implemented using “after statements” inside a process. The 
sensitivity list of this process is formed by the signal that must be delayed. 
Thus, the new signal is updated “reg*STEP_DELAY” after the non-delayed 
signal has changed. 
- “reg*STEP_DELAY” is a time constant: reg indicates the region of the module 
(besides the steps between the module and the center), and STEP_DELAY 
indicates the delay per each step. In this case, the step is 0.316mm and its 
delay is 21.651ns. 
 
 
Figure 26: VHDL delay model: after statements 
 
  
 
 
 
On-chip bus standards in a broadcast architecture                  42 
 
 
   
5.Verification 
Finally, it is important to verify the behaviour of all design. In this chapter, two 
simulations of an 8 modules system are shown. It is significant to note that both 
simulations are without the delay model due to simulations with delay don’t add 
information. This is because the clock frequency is 10MHz (period=100ns) and 
delays are insignificant compared to the period. 
 
SIMULATION 1: 
Simulation 1 shows all AMBA signals in an 8 modules system. Hence, how all 
modules request the bus and send data to other modules may be validated. Next 
points give a complete description of the procedure: 
1) After reset, all modules request the bus using HBUSREQ signal. 
2) The arbiter decides which module will have the bus. In this case, the arbiter 
asserts HGRANT(0), thus M0 is the new owner of the bus. Note that in the 
next rise edge, HBUSREQ(0) is deactivated. 
3) Once M0 has the bus, next step is to write the control signals. Note that 
HTRANS=”10” (indicates that the transfer is starting). 
4) Next cycle, data must be written in the bus. Note that data from M0 is 
“00000001”. In addition, it is important to highlight that this cycle is the last 
cycle that master needs the bus. Therefore, the arbiter asserts HGRANT(1) to 
give access to M1 in the next cycle (as long as HREADY is asserted). 
5) Finally, in the next cycle, HREADY is asserted and HMASTER signal changes 
to “001” (indicates that the new owner is M1). After that, the procedure starts 
again. 
 
SIMULATION 2: 
Simulation 2 shows all signals of a module, in particular module 4 (M4).  
1) First, M0 has the bus (HMASTER=”000”). Note that HTRANS_S (S=slave) 
changes to “10”, thus the transfer of M0 starts. 
2) Then, M0 writes its data to the bus and this data arrives to all other modules. 
3) In this rise edge, data from M0 is stored in “data_vector” signal, whose 
function is to store data from all modules in the system. Note that points (1), 
(2) and (3) are repeated until M4 gets the bus. 
4) This is the first rise edge that both HGRANT(4) and HREADY are asserted, 
thus the transfer of M4 starts, following the same steps described above. 
5) Once its transfer is finished, M4 continues receiving and storing data from 
other modules. 
On-chip bus standards in a broadcast architecture                  43 
 
 
   
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
 
2
 
3
 
4
 
5
 
S
im
u
la
ti
o
n
 1
 
On-chip bus standards in a broadcast architecture                  44 
 
 
   
 
 
 
 
 
 
 
1
2
 
3
 
4
 
5
 
S
im
u
la
ti
o
n
 2
 
On-chip bus standards in a broadcast architecture                  45 
 
 
   
6.Conclusions  
Once all design has been implemented and verified, the main goals have been 
achieved. It denotes that a broadcast communication using a shared bus (AMBA) 
has been implemented. In addition, a wire delay model has been designed and 
implemented using VHDL, and energy per bit has been calculated. However, 
concerning to this, it is important to highlight some points: 
 
1) Full AMBA system is not used. The design is performed using only the signals 
that are needed. Consequently, the performed design uses 84.6% bits less 
than a full AMBA system.  
 
2) The delay of the wire is a quadratic function of its lengths. This represents 
that doubling the length of the wire quadruples its delay. 
 
3) Using a buffering architecture, an improvement of 40.7% in delay is obtained. 
 
4) Delay model in VHDL has been implemented, even though it has not 
provided additional information due to: 
a. Transfers are very quick because the buffering architecture has been 
optimized for speed. It represents, for instance, that a system of 800 
modules has a maximum length of 12mm and, therefore, a maximum 
delay of 1ns. 
 
b. The clock frequency is 10MHz, thus the clock period is 100ns. It 
denotes that a delay of 1ns is only 1% of the clock period. 
 
5) The performed design has an energy per bit of      
  
      
  It is: 
 
a. 21.52% lower than Eb in a full AMBA system (     
  
      
). 
Therefore, it is a good decision not to use full AMBA and use a modified 
AMBA, using only the signals that are useful for the design. 
 
b. 62.21% higher than Eb in a point-to-point communication system 
(     
  
      
). However, in this case, it is important to point out that a 
point-to-point communication may not be a good option because many 
wires are needed in a system with a huge number of modules. 
 
6) The modules are distributed on a two-dimensional system because the 
distances between the modules are shorter than on a one-dimensional system. 
Consequently, the delay is also smaller. 
 
On-chip bus standards in a broadcast architecture                  46 
 
 
   
Finally, it is important to highlight that this design has been optimized in speed, 
but not in energy. Next step would be the optimization in energy and, therefore, 
reduce the number of buffers. Obviously, it will have a negative impact on speed, but 
the goal is to get a compromise between energy and speed. In addition, this kind 
of optimization will reduce the energy differences between the performed design and 
a point-to-point communication system. 
 
 
On-chip bus standards in a broadcast architecture                  47 
 
 
   
7.References 
 
[1] WIKIPEDIA, FREE ENCYCLOPEDIA. “Advanced Microcontroller Bus Architecture”.  
URL: http://en.wikipedia.org/wiki/Advanced_Microcontroller_Bus_Architecture  
 
[2] ARM, Company. “AMBA Open Specifications”.  
   URL:http://www.arm.com/products/system-ip/amba/amba-open-
specifications.php  
 
[3] ARM HOLDINGS Company. “AMBA™ Specification (Rev. 2.0)”. Year 1999. 
Pages: (2-1) to (3-58).  
 
[4] SOFTING AG, Company. “CAN bus Data Frame”.  
 URL:http://www.softing.com/home/en/industrial-automation/products/can-
bus/more-can-bus/data-frame.php?navanchor=3010395 
 
[5] BOSCH, Robert. “CAN Specification (Version 2.0)”. Year 1991.  
 
[6] RABAEY, Jan M.; CHANDRAKASAN, Anantha; NIKOLIC, Borivoje. “Digital Integrated 
circuits – A design perspective (2nd edition)”. Pearson Education, 2003. Pages: 
136-156, 425-427. 
 
[7] KNEPPER, Ronald W. Slides “Introduction to CMOS logic circuits”. Boston 
University. URL: people.bu.edu/rknepper/sc571/chapter4_a.ppt 
 
[8] WHITE, Richard M. Slides of the course ‘Introduction to Microelectronics’ 
(Lecture 25a - fall 2004). University of California, Berkeley. 
 URL: http://inst.eecs.berkeley.edu/~ee40/fa04/  
 
[9] HO, Ron. “On-chip wires: Scaling and efficiency”. Stanford University, August 
2003. Pages: 5-22, 61-68. 
 
[10] WESTE, Neil H.E.; MONEY HARRIS, David. Slides from the book “CMOS VLSI 
design – A circuits and systems perspectives (4TH edition)”. Addison Wesley, 
2010. URL: http://esaki.ee.boun.edu.tr/courses/ee537/lect14-wires.pdf 
On-chip bus standards in a broadcast architecture                  48 
 
 
   
[11] GINOSAR, Ran; KOLODNY, Avinoam. Slides “Fast Asynchronous Shift Register 
for Bit-Serial Communication”. Israel Institute of Technology, March 2006.  
URL:http://tima.imag.fr/conferences/async/Technical_Program/Tuesday/Sessio
n_4/SrPresAsync06_10_pdf.pdf. Pages: 18. 
 
[12] BROCKMAN, Jay; HARRIS, David. Slides “Interconnect: a.k.a wires” of the course 
‘Introduction to CMOS VLSI design’. Harvey Mudd College, 2011.  
URL:http://www.cse.nd.edu/courses/cse60462/www/Public/Lectures/L07C_Inte
rconnect.pdf. Pages: 7-9, 15-22, 25, 39-43. 
 
[13] OWENS, John D.; DALLY, William J.; RON, Ho.; JAYASIMHA, D.N. (Jay); KECKLER, 
Stephen W.; PEH, Li-Shiuan. “Research Challenges for On-Chip 
Interconnection Networks”, September-October 2007. Pages: 99-100. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
On-chip bus standards in a broadcast architecture                  I 
 
 
   
Appendix A. AHB signals 
 
 
 
On-chip bus standards in a broadcast architecture                  II 
 
 
   
 
 
 
 
On-chip bus standards in a broadcast architecture                  III 
 
 
   
 
 
 
 
On-chip bus standards in a broadcast architecture                  IV 
 
 
   
Appendix B. VHDL Code 
 
On-chip bus standards in a broadcast architecture                  V 
 
 
   
 
 
On-chip bus standards in a broadcast architecture                  VI 
 
 
   
 
 
On-chip bus standards in a broadcast architecture                  VII 
 
 
   
 
 
On-chip bus standards in a broadcast architecture                  VIII 
 
 
   
 
 
On-chip bus standards in a broadcast architecture                  IX 
 
 
   
 
 
On-chip bus standards in a broadcast architecture                  X 
 
 
   
 
 
On-chip bus standards in a broadcast architecture                  XI 
 
 
   
 
 
On-chip bus standards in a broadcast architecture                  XII 
 
 
   
 
 
On-chip bus standards in a broadcast architecture                  XIII 
 
 
   
 
 
On-chip bus standards in a broadcast architecture                  XIV 
 
 
   
 
 
On-chip bus standards in a broadcast architecture                  XV 
 
 
   
 
 
On-chip bus standards in a broadcast architecture                  XVI 
 
 
   
 
 
On-chip bus standards in a broadcast architecture                  XVII 
 
 
   
 
 
On-chip bus standards in a broadcast architecture                  XVIII 
 
 
   
 
 
On-chip bus standards in a broadcast architecture                  XIX 
 
 
   
 
