Design modified architecture for MCS-51 with innovated instructions based on VHDL  by Fouda, Abd-Elmoneim Mohamed & Eldeen, Assem Badr
Ain Shams Engineering Journal (2013) 4, 723–733Ain Shams University
Ain Shams Engineering Journal
www.elsevier.com/locate/asej
www.sciencedirect.comELECTRICAL ENGINEERINGDesign modiﬁed architecture for MCS-51
with innovated instructions based on VHDLAbd-Elmoneim Mohamed Fouda 1, Assem Badr Eldeen *Computer Engineer Department, Modern Academy for Engineer & Technology, Cairo, EgyptReceived 8 May 2011; revised 15 November 2012; accepted 10 December 2012
Available online 3 May 2013*
E-
En
1
Pe
20
htKEYWORDS
VHDL;
FPGA;
lC;
ISA;
Memory and Amdahl’s lawCorresponding author. Tel.:
mail addresses: Fouda196
gineerasmbdr@hotmail.com
Tel.: +20 1227644930.
er review under responsibilit
90-4479  2013 Ain Shams
tp://dx.doi.org/10.1016/j.asej+20 100
4@makt
(A.B. El
y of Ain
Universit
.2012.12.0Abstract This paper introduces two new complex instructions over the application with speciﬁc
instruction set processor. For the MCS-51 family, utilizing a reserved bit, and the unused machine
code ‘‘A5h’’ we can modify the conventional instruction set architecture (ISA) and develop two
macro instructions for data manipulation. One of them is to move a block of data from speciﬁc
memory locations to any other memory locations, while the other developed instruction is to obtain
maximum byte-value within a group of 8-bytes and load it into the Accumulator. There are two
basic steps to achieve such developments, step-1; at which we modify the architecture of the conven-
tional microcontroller 8051 using hardware description language HDL. In the second step we mod-
ify the instruction set architecture (ISA) of lC 8051. Such development improves the performance
of the lC including fast execution time, decrease machine code size, so decrease storage require-
ments and provide low power consumption.
 2013 Ain Shams University. Production and hosting by Elsevier B.V.
All rights reserved.1. Introduction
The conventional general purpose lCs are insufﬁcient to
achieve the high performance/cost ratio for advanced commu-
nication systems, control system, and digital signal processing
(DSP). To satisfy these requirements instruction set design is5874973.
oob.com (A.-E.M Fouda),
deen).
Shams University.
y. Production and hosting by Elsev
01one of the important issues at which an instruction can be
customized for speciﬁc applications to make better perfor-
mance. However the limited encoding space doesn’t allow for
adding speciﬁc complex instructions to the conventional ISA.
So it is required to develop the conventional microcontroller
to satisfy trade-off between reaching the speciﬁed application
and costs [1]. The developed lC would be dedicated for the
speciﬁc application, as data manipulation or DSP, therefore
this work presents a synthesizable VHDL lC core (and it
can be later on implemented on the FPGA chip).
The question here what is the type of conventional lC will
going to develop it? This question asked before by Dallas
Semiconductors Corporation. Dallas Corporation created 26
lcs such as (DS2252T, DS5000T, DS5250, DS80C310,
DS80C390, DS80C400, DS87C520 and DS89C450) based on
the venerable 8051 instruction set, because it is one of the mostier B.V. All rights reserved.
724 A.-E.M Fouda, A.B. Eldeenpopular 8-bit lC architectures in the world. The instruction set
is simple to understand, making it favorite of embedded system
designers. Many of the instructions directly address I/O pins,
allowing quick manipulation (bitbanging) of external peripher-
als. A tremendous variety of on-chip peripherals is available in
an almost limitless number of combinations. In addition,
development tools for the 8051-lC family are widely available,
so it is easy and inexpensive to start developing an application
[2]. More than 1000 modern modiﬁed lCs depend on family
8051 in ‘‘www.keil.com/dd’’.
From literature survey in the datasheet and the manual of
instructions set for Intel lCs MCS-51 family, it is found that
there is one reserved machine code ‘‘A5h’’ as shown in Fig. 1
and reserved bit in the program status word register
(PSW(1)) as shown in Fig. 2 [3]. Taking the advantages of
making combination between the ‘‘A5’’ machine code and
the status of the bit ‘‘PSW(1)’’, we can develop two new macro
operations (set of instructions) as shown in Fig. 3.
The idea is based on depending on the status of PSW(1).
Initially when the state of PSW(1) = ‘‘0’’ we can use ‘‘A5’’
to perform the ﬁrst proposed instruction ‘‘MOVBK Adr2,Figure 1 Partial of MCS-51 family instruction set.
Figure 2 Program status word (PSW) bit assignment.
Figure 3 Modiﬁed MCS-5Adr1’’. During execution of this function a block of 8-byte
data is transferred from address ‘‘Adr1’’ (at the lC’s RAM)
to others eight address starting from ‘‘Adr2’’. The other case,
when the status of PSW(1) = ‘‘1’’ the second proposed
instruction ‘‘GetMAX Adr’’ is performed for obtaining the
maximum data bytes for block of data (eight data bytes), start-
ing from address ‘‘Adr’’ at the lC’s RAM, the result of max-
imum byte is stored into Accumulator. All those proposed
instructions (‘‘MOVBK’’ and ‘‘GetMAX’’) which compiled
into machine code ‘‘A5h’’ will be executed in two machine
cycles.
The paper is organized as follows, in Section 2, the overall
design steps of the developed lC are described, so that in Sec-
tion 2.1 the ISA design methodology is introduced, while in
Section 2.2, the design speciﬁcation and the VHDL designing
technique are achieved. Finally, Section 3 presents an overall
conclusion.
2. Design processor
This section introduces the necessary basic principles to design
a new VHDL code for the conventional complex instruction
set computer (CISC) lC 8051 to improve its performance.
The developed code is obtained by inserting two advanced
VHDL code to the standard designed architecture of lC
8051 and to modify the mentioned lC’s ISA to match with in-
serted code. The approach to accomplish this developments is
to design VHDL architecture for the conventional lC 8051 (as
soft processor), then modify the instruction set of such soft
processor (by inserting two VHDL codes which represent a
modiﬁed unit) as will be explained in Section 2.1, and ﬁnally
modify the organization and architecture for the conventional
lC to match this requirements as will be explained in
Section 2.2.
2.1. Instruction set modiﬁcation
This section is to modify processor’s ISA, and as a design
methodology, the ISA can be adapted or extended to meet
the modern application requirements [4]. Mean by ISA modi-
ﬁcation is to add two macro instructions (each one of them is a
group of individual conventional instructions); these two
developed macro operations can be used in the ﬁeld of signal
processing as well as for data manipulations.
Two proposed instructions will associated with the main
memory of the conventional lC, the ﬁrst one will transfer
the contents of eight successive memory locations to another
eight successive memory locations through eight parallel data
channels concurrently, the second one will transfer the con-
tents of eight successive memory locations to comparator cir-
cuit through eight parallel data channels, the logic
comparator circuit will getting the maximum value among
the data bytes which received from eight channels.1 family instruction set.
Design modiﬁed architecture for MCS-51 with innovated instructions based on VHDL 725The Direct Memory Access (DMA) is a capability provided
by some computer bus architectures that allows data to be sent
directly from an attached device (such as a disk drive) to the
memory on the computer’s motherboard. The microprocessor
is freed from involvement with the data transfer, thus speeding
up overall computer operation [5]. Our modiﬁcation similar to
the simple DMA architecture, but not completely, it has eight
data channels among addressable locations, but it have single
command to read/write for the main memory, no error detect,
parity check, handshaking, etc., just eight parallel data bus to
transfer 8-bytes concurrently. Our trend is to use the reserved
op-code (machine code ‘‘A5h’’) coupled with the reserved bit
PSW(1) to perform two modiﬁed CISC instructions; one for
moving data block, the other to get maximum byte for data
block, as shown in Fig. 3.
Previously, during execution of A5, the instruction decoder
ID, decode A5 as no operation because it is a reserved machine
code, only the effect is a delay by one machine cycle. Also the
code ‘‘00h’’ is already used for ‘‘no operation’’. Now we utilize
‘‘A5 h’’, combined with the PSW(1) bit in such a way to per-
form a new data manipulation (in two machine cycles)
(‘‘MOVBK’’ and ‘‘GetMAX’’). Such instruction developments
have no meaning if it is performed with the same standard
internal architecture of MCS-51 lCs because their conven-
tional ALU (arithmetic logic unit) is designed to operate
sequentially with all ISA except the code ‘‘A5h’’. Now if A5
is decoded via ID, it will be branched in such a way (gated with
PSW(1)) as shown in Fig. 4, so that the developed code will be
compiled into the same machine cycle in a developed architec-
ture with SUB-ALU. Later on in Section 3, it will be shown
that such development and the overall enhancement.
2.2. Processor’s architecture modiﬁcation
This section is to modify the organization and architecture for
the conventional lC. High-level design tools and ﬁeld
programmable gate arrays (FPGAs) signiﬁcantly reduce the
effort, cost and risk of hardware implementation. These
technologies can be incorporated into a manageable and
affordable prototyping framework a VLSI-scale ‘‘breadboard’’
for exploring and evaluating new microprocessor designs [6,7].Figure 4 Architecture of the conventioUtilizing FPGA technologies, we have the ability to modify the
conventional processor. Based on Harvard architecture,
program and data are accessed on separate buses, having
two separate memory spaces (one for instructions, the other
is for data), which offer big chance to improve and extend
the system architecture by adding more modiﬁed blocks [1].
So, the developed architecture can be obtained by adding addi-
tional blocks (including branching, gating, and SUB-ALU). So
that, after decoding process, if the instruction decoder (ID) de-
tect the code ‘‘A5h’’, the modiﬁed control signal ‘‘Signal_A5h’’
is branched from the instruction register, then gated with
PSW(1) bit (using two AND gates) to execute two new macro
operations as shown in Fig. 4. Depending on the status of
PSW(1), the output from the ﬁrst AND gate is used when
PSW(1) = ‘‘1’’ to control the addition modiﬁed block for
instruction ‘‘GetMax Adr’’. The output from the second
AND gate is used when PSW(1) = ‘‘0’’ to control the addition
modiﬁed block for instruction ‘‘MOVBK’’. The execution of
these two previous instructions is achieved via the developed
SUB-ALU block.
The manipulation of the ﬁrst modiﬁed instructions
‘‘MOV-BK’’ using VHDL code (which is achieved in the
SUB-ALU part of the modiﬁed architecture) is shown in
Fig. 5. Which can be explained as follows; the instruction
‘‘MOVBK Adr2, Adr1’’ is activated if both signal A5h is set
to ‘‘1’’ and PSW(1) is clear to ‘‘0’’ at line 257. All lines from
257 to 268 will perform the overall particular instruction.
The lines from 258 to 260 for reading the source operand
‘‘Adr1’’ (which represent the least address for group of eight
successive RAM locations), those locations contain eight suc-
cessive bytes of data, and it should be copied and transferred
to the destination locations in the RAM. Lines 262 and 263
for obtaining the destination operand ‘‘Adr2’’ (which represent
the least address for destination group of eight RAM loca-
tions), and the lines from 265 to 268 represent FOR-LOOP
to copy eight RAM locations starting from source address
‘‘Adr1’’ and transfer to other eight RAM locations starting
from destination address ‘‘Adr2’’.
From the lock up table point of view, special synthesized
Multiplexer/Demultiplexer called ‘‘vector MUX’’ and ‘‘vector
DEMUX’’ is used to transfer data to/from 256 · 8 RAMnal lC 8051 with its modiﬁed unit.
726 A.-E.M Fouda, A.B. Eldeenlocations into another RAM locations at single clock pulse is
shown in Fig. 6a. A group of 8-Multiplexers (vector MUXs)
is used to transfer data from 256-RAM locations to the 8-data
lines selected by the status of source operand ‘‘Adr1’’. While a
group of 8-demultiplexers (vector DEMUXs) is used to transfer
data from 8-data lines to the 256-RAM locations selected by
the status of destination operand ‘‘Adr2’’ as shown in
Fig. 6a. After synthesizing process of the VHDL code and
downloading it into the FPGA chip (using ISE12.31 software
package from XILINX corporation) [8], the lines from 257 to
268 will conﬁgure to additional internal eight data lines such
that each data line is a group of eight wires (buses from 1 to
8) as shown in Fig. 6b, which explore that if ‘‘MOVBK Adr2,
Adr1’’ is activated, then any one of the 256-RAM location
can be assigned as starting Address ‘‘Adr1’’ for that block
(say for example Rs) is selected by MUX(0), and the next con-
sequence 7-RAM locations is simultaneously activated by
incrementing Adr1 by values = 1, 2, . . . , 7, so the (Rs + 1) se-
lected by MUX(1) and (Rs + 2) selected by MUX(2), and so
on till (Rs + 7) selected by MUX(7) in such way that all 8-data
block is loaded on the 8-data lines (1, 2, . . . , 8) at the same
time. And vice versa, this activated block of data is loaded intoFigure 5 VHDL code for executing th
Figure 6a Transferring data to/from 256 Æ 8 R8-successive RAM locations starting Address ‘‘Adr2’’ for that
block (say for example Rd) is selected by DEMUX(0), and
the next consequence 7-RAM locations is simultaneously acti-
vated by incrementing Adr2 by values = 1, 2, . . . , 7, so the
(Rd + 1) selected by DEMUX(1) and (Rd + 2) selected by
DEMUX(2), and so on till (Rd + 7) selected by DEMUX(7)
in such way that all 8-data block is loaded into another 8
RAM locations at the same time.
With respect to the second instruction ‘‘GetMaxAdr’’, it will
be activated if both signal A5h and PSW(1) are set to ‘‘1’’. From
theVHDLcode shown inFig. 7, all lines from275 to line 290will
perform this modiﬁed instruction. Deeply; line 277 to clear the
buffer register, the line 281 for obtaining least address for group
of eight RAM locations, the lines from 284 to 287 for calculating
maximum byte of the assigned eight data bytes by comparing
each data byte with the buffer register, and then line 288 store
the result of maximum byte in the lC’s accumulator.
After synthesizing process the VHDL code and download-
ing it into the FPGA chip, the lines from 284 to 287 will con-
ﬁgure to additional internal eight data lines such that each data
line is a group of eight wires as shown in Fig. 8. Which explore
that if ‘‘GetMax Adr’’ is activated, then any one of thee modiﬁed instruction ‘‘MOVBK’’.
AM locations into another RAM locations.
Figure 6b Block diagram for executing the modiﬁed instruction ‘‘MOVBK’’.
Figure 7 VHDL code for executing the modiﬁed instruction ‘‘GetMax’’.
Design modiﬁed architecture for MCS-51 with innovated instructions based on VHDL 727256-RAM location (each line with width of 8-bits) can be as-
signed as starting Address ‘‘Adr’’ for that block (according
to the user source code, say for example Rs) is selected by
MUX(0), and the next consequence locations, so the
(Rs + 1) selected by MUX(1) and till (Rs + 7) selected by
MUX(7) is loaded on the 8-data lines (1, 2, . . . , 8) at the same
time.
Now to provide comparison process, the scenario is as fol-
lows; a group of 8-comparison stages, each stage contains one
comparator (COMP-0 to COMP-7) attached with 2 · 1 multi-
plexer (MUX0-b to MUX7-b).
The output of the ﬁrst comparator (COMP-0) is the
maximum value calculated from Rs and 00h, this maximum
output is delivered to the second comparator (COMP-1) via
MUX0-b, which compare Rs with Rs + 1 such that the max-
imum output deliver the third comparator COMP-2 via
MUX1-b, then it will be compared with Rs + 2, and so on tillCOMP-7, the maximum output between the location Rs + 7
and the maximum of previous stage COPM-6 is obtained
and ﬁnally loaded into the accumulator via MUX7-b.
All design was made from scratch using the block diagrams
from Intel and Atmel data sheet for MCS-51 family speciﬁcally
for chip 8051[3].3. Numerical results
Form the simulation point, it’s required to simulate the
execution scenario of the two developed command using a well
known simulation package. The ‘‘Mentor-graphic Modelsim
SE 6.5’’ simulator provides possibility for advanced debugging
and simulation of the VHDL code [8,9]. To illustrate the
scenario of the ﬁrst modiﬁed instruction, assume the given
RAM locations of the modiﬁed lC (from address ‘‘00h’’ to
Figure 8 Block diagram for executing the modiﬁed instruction ‘‘GetMax’’.
Figure 9 Initial state of RAM locations for the modiﬁed lC.
Figure 10 RAM locations after execution of ‘‘MOVBK 20h, 00h’’ for modiﬁed lC.
728 A.-E.M Fouda, A.B. Eldeenaddress ‘‘18 h’’) have its initial data as show in Fig. 9. Also the
PSW register (the PSW register at address D0h) contain 00h,
so it is in the clear state i.e., PSW(1) = ‘‘0’’ (highlighted part
in Fig. 10). It is required to move a block of 8-bytes data (from
address 00h to 07h as highlighted in Fig. 9). After execution of
the modiﬁed instruction ‘‘MOVBK 20h, 00h’’, the required8-bytes in sequence is copied into the other memory locations
speciﬁed as destination address (from 20h to 27h respectively)
as shown in Fig. 10.
Moreover, to illustrate the scenario of the second modiﬁed
instruction, the PSW register (the PSW register at address
D0h) contain 02h, so it is in the setting state i.e.,
Design modiﬁed architecture for MCS-51 with innovated instructions based on VHDL 729PSW(1) = ‘‘1’’ (highlighted part in Fig. 11a). It is required to
simulate the scenario how to get the maximum value for 8-byte
block of data starting from address 20h to 27h (highlighted
part in Fig. 11a). After execution of the modiﬁed instruction
‘‘GetMax 20h’’, the result is the value ‘‘F4h’’ at address
‘‘24h’’ (highlighted part in Fig. 11a), which is loaded into the
accumulator at the address E0h (highlighted part in
Fig. 11a). Furthermore Fig. 11b is print screen of the simulator
Modelsim, it illustrates that the initial value of accumulator
was ‘‘FFh’’ at time ‘‘5907.8ns’’, and after one clock pulse at
time ‘‘5927.8ns’’, it loaded by the maximum byte ‘‘F4h’’ form
the 8-memory locations which stored by (04h, E1h, E2h, C2h,
F4h, A6, 55h, 66h), so it spend propagation time equal to
‘‘20ns’’ to get the maximum byte.
In this section we develop to measure the performance of
the modiﬁed lC, and it must be compared with the conven-
tional lC (for lC 8051or any similar lC based on MCS-51
family) to execute the same two modiﬁed instructions. The
performance parameters includes; the number of bytesFigure 11a RAM locations after executio
Figure 11b Modelsim’s analyzer indi(in program memory) required to store the machine code,
the number of machine cycles required to execute each modi-
ﬁed instruction, and the total execution time for each modiﬁed
instruction.
With respect to the ﬁrst modiﬁed instruction (‘‘MOVBK
Adr2, Adr1’’), for evaluating both the number of bytes (in pro-
gram memory) required for storing machine codes, and the
number of machine cycles required for execution; assume the
following scenario, and using advanced assembly code soft-
ware package ‘‘prog-studio’’ from ‘‘Batronix.com’’ [10]. It is
required to transfer stream of data from (eight consequence
memory locations) starting at address Adr1 = ‘‘00h’’ to anther
8-memory locations starting at address Adr2 = ‘‘20h’’. The
conventional code to perform this instruction is shown in
Fig. 12. At which each line in this code transfer individual
source location to corresponding destination location, for
example the ﬁrst line code transfer contents of RAM location
(at address ‘‘00h’’) to another RAM location (at address
‘‘20h’’). Similarly, the next lines of the code move then of ‘‘GetMax 20h’’ for modiﬁed lC.
cates the execution of ‘‘GetMax’’.
Figure 15 Modiﬁed machine code (5 bytes) for move block.
Figure 14 Modiﬁed instruction code for move block.
730 A.-E.M Fouda, A.B. Eldeensequenced RAM locations (01h, 02h, . . . , 07h) to another con-
sequence RAM locations (21h, 22h, . . . , 27h), respectively.
Each code line represent a single data transfer instruction
(occupy 3 bytes in the program memory and execute in 2 ma-
chine cycles), so the overall conventional code for transferring
8-data bytes occupy (3 \ 8 = 24 bytes) stored in the program
memory (ﬂash memory) from address ‘‘0000000h’’ to
‘‘0000023h’’ as shown in Fig. 13, and they need (2 \ 8 = 16-
machine cycles) for execution.
It is found that, the above conventional task for data
transfer is so long for the number of execution cycles, also it
occupy more location bytes in the program memory. So it
can be modiﬁed such that we can edit it quickly using single
modiﬁed instruction ‘‘MOVBK 20h, 00h’’ as shown in
Fig. 14. In this way, this modiﬁed instruction occupy only 5
bytes of program memory form ‘‘0000000h’’ to ‘‘0000005h’’;
there are 2-bytes for clearing PSW(1) (B2 and D1) [2] and
the other 3-bytes for the reserved code ‘‘A5’’ (A5,00 and 20)
as shown in Fig. 15. Moreover this modiﬁed instruction needs
only 2-machine cycles executed in SUB-ALU (one machine cy-
cle for clearing and the other cycle to move data block) rather
than 16-machine cycles executed in conventional ALU of lC.
With respect to the second modiﬁed instruction (‘‘GetMax
Adr’’), for evaluating both the number of bytes (in program
memory) required for storing machine codes, and the number
of machine cycles required for execution; assume the following
scenario, it is required to get maximum byte for stream of 8-
data byte (8-consequence RAM locations) starting at address
Adr = ‘‘00h’’. The conventional code of this instruction is
shown in Fig. 16. The ﬁrst line code uses ‘‘If-statement’’ to
compare between the contents of RAM location R1 with the
contents RAM location R0 (initially 00 h), to get the maximum
value and store it into R0. Similarly, the second line code
compare between the contents of RAM location R2 with the
contents RAM location R0 to get the maximum value and store
it again into R0, and so on for the others (R3, R4, R5, R6, andFigure 12 Assembly codes to transfer eight data
Figure 13 Compiled machine code for transferring 8-data bytes fro
locations.R7). The ﬁnal result of such comparison is temporarily stored in
the RAM location R0 which represent the maximum value.
In the last line code (MOV ACC, R0), the content of R0
RAM location is loaded into the accumulator.
So, the overall conventional code to get maximum value of
such data block occupy (2 + 7 \ 12 = 86 bytes) stored in the
program memory (ﬂash memory) from address ‘‘0000000h’’
to ‘‘0000085h’’ as shown in Fig. 17, and they need
(2 + 9 \ 7 = 65 machine cycles) for execution.
It is found that, the above task to get maximum value of
data block is so long for the number of machine cycles re-
quired for execution, and also it occupy more number of bytes
in the program memory. So it can be modiﬁed such that we can
edit it quickly using single modiﬁed instruction ‘‘GetMax 00h’’
as shown in Fig. 18. In this way, this modiﬁed instruction oc-
cupy only 5 bytes of program memory form ‘‘0000000h’’ to
‘‘0000005h’’; there are 2-bytes for setting PSW(1) (D2 and
D1) [2] and the other 3-bytes for the reserved code ‘‘A5’’
(A5, 00 and 00) as shown in Fig. 19. Moreover, this modiﬁed
instruction needs only 2-machine cycles executed in SUB-ALU
(one machine cycle for setting and the other cycle to get max-
imum value) rather than 65 machine cycles (executed in con-
ventional ALU of lC).bytes to another eight data RAM locations.
m 8-consequence RAM locations to other 8-consequence RAM
Figure 16 Assembly codes to get maximum byte of 8-data bytes RAM locations.
Figure 17 Compiled machine codes to get maximum byte of 8-data bytes RAM locations.
Figure 18 Modiﬁed instruction code for get maximum byte in
block.
Figure 19 Modiﬁed machine code (5 bytes) for get maximum
byte in block.
Design modiﬁed architecture for MCS-51 with innovated instructions based on VHDL 731As a ﬁnal results of all the above discussions; as shown in
Table 1 which represents a summary of the overall number
of storage bytes, and machine cycles required for both
conventional and modiﬁed instructions. To perform the same
task, it’s clear that the program memory of the developed
lC need only 10 bytes (5 + 5 = 10) for storage of the two
modiﬁed instructions rather than 110 bytes of conventional
lC (86 + 42 = 110). So there is a great enhancement (about
more than 90% compression ratio) of program memory stor-
age. Also it is clear that the total number of machine cycles
of the developed lC need only 4-machine cycles (2 + 2 = 4)
to execute the two modiﬁed instructions rather than 81Table 1 Number of storage bytes (in program memory) and machin
Tasks Number of machine
codes (bytes)
lC 8051 VHDL lC
Move block of data (8-bytes) 24 5
Get max value for block of data (8-bytes) 86 5
Both 110 10machine cycles (16 + 65 = 81) of conventional lC. So there
is a great enhancement (about 95% enhancement).
With respect to the CPU speed, it is only one of many
aspects of overall system performance. Amdahl’s law describes
the impact of the speedup of a single component (e.g., the
CPU) of a complex system. To compare the speed-up factor
for both the modiﬁed VHDL lC and conventional MCS-51
family, it is found that the period of one machine cycle Tm
can be expressed as [3].
Tm ¼ Np=fcp ð1Þ
‘‘Np’’ is the Number of clock pulses in one machine cycle
(Np = 12), and fcp is the operating frequency of the speciﬁed
lC.
The maximum operating frequency for conventional 8051
lC is fcp = 24 MHz, and the number of the clock pulses in
one machine cycle is Np = 12 pulses, then Tm = 12/
24 MHz = 0.5 ls.
Similarly, the maximum operating frequency for the
developed lC based on XILINX-FPGA, is fcp = 50 MHz,
and the number of the clock pulses in one machine cycle is
Np = 12 pulses, then Tm = 12/50 MHz = 0.24 ls.
The total execution time for only one task Texe can be ex-
pressed as:
Texe ¼ Tm M ¼M Np
fcp
¼ Ntotal
fcp
ð2Þe cycles required for both conventional and modiﬁed instructions.
Ratio of storage
enhancement
Number of machine
cycles
Ratio of machine
cycles enhancement
lC 8051 VHDL lC
16 2
65 2
90% 81 4 95%
T
a
b
le
2
E
n
h
a
n
ce
d
sp
ee
d
u
p
fo
r
m
o
d
iﬁ
ed
in
st
ru
ct
io
n
s.
M
o
d
iﬁ
ed
in
st
ru
ct
io
n
s
N
o
.
o
f
m
a
ch
in
e
cy
cl
es
fo
r
o
n
e
ta
sk
M
N
p
T
o
ta
l
n
o
o
f
cl
o
ck
p
u
ls
es
N
to
ta
l
=
N
P
\
M
f c
E
x
ec
u
te
ti
m
es
fo
r
o
n
e
ta
sk
T
e
x
e
=
N
to
ta
l/
f c
p
S
p
ee
d
u
p
e
n
h
a
n
c
e
d
l
C
8
0
5
1
V
H
D
L
lC
lC
8
0
5
1
V
H
D
L
l
C
l
C
8
0
5
1
V
H
D
L
lC
lC
8
0
5
1
(l
s)
V
H
D
L
l
C
(l
s)
M
O
V
B
K
A
d
r2
.
A
d
rl
1
6
2
1
2
1
9
2
2
4
2
4
(M
H
z)
5
0
(M
H
z)
8
0
.4
8
8
/0
.4
8
=
1
6
.6
7
G
et
M
a
x
A
d
r
6
5
2
1
2
7
8
0
2
4
3
2
.5
0
.4
8
3
2
.5
/0
.4
8
=
6
7
.7
B
o
th
8
1
4
9
7
2
4
8
4
0
.5
0
.9
6
4
0
.5
/0
.9
6
=
4
2
.2
732 A.-E.M Fouda, A.B. Eldeen‘‘M’’ is the number of machine cycles for one speciﬁc complete
task, and Ntotal is the total number of clock pulses for one
speciﬁc task.
From Amdahl’s law for overall speed up [11]:
Oerall speed up ¼ 1ð1 FÞ þ F=S ð3Þ
‘‘F’’ is the fraction enhanced (Fractionenhanced), it is the fraction
of computation time that actually can take advantage of the
enhanced component, ‘‘S’’ is the speedup of the enhanced frac-
tion (speedenhanced), it is the Performance of the enhanced com-
ponent in comparison with the replaced, original component.
So the new execution time (Texe_new) compared with the old
execution time (Texe_old) can be expressed as:
Texe new ¼ Texe old=Oerall speed up
Texe new ¼ Texe old ð1 FractionenhÞ þ Fractionenh
speed upenh
 
ð4Þ
For execution of total task so the Fractionenhanced = 1 then
from Eq. (4):
Speed upenhanced ¼
Texe old
Texe new
ð5Þ
All the above discussion can be used to compare the execution
time of the conventional instructions to move data block (pre-
viously shown in Fig. 12) and the modiﬁed instruction (previ-
ously shown in Fig. 14). The total CPU execute time for the
conventional lC (Texe_old) is:
Texe old ¼ Nmachine Np
fcp
¼ 16  12
24  106 ¼ 8 ls
Similarly the total CPU execute time for the modiﬁed lC
(Texe_new) is:
Texe new ¼ Nmachine Np
fcp
¼ 2  12
50  106 ¼ 0:48 ls
Using Eq. (5), the speed up enhancement:
Speed upenhanced ¼
Texe old
Texe new
¼ 8
0:48
¼ 16:67
Similarly all the above discussion can be used to compare the
execution time of the conventional instructions to get maxi-
mum value of data block (previously shown in Fig. 16) and
the modiﬁed instruction (previously shown in Fig. 18).
Speed upenhanced ¼
Texe old
Texe new
¼ 32:5
0:48
¼ 67:7
Finally as a summary all the above computations can be listed
in Table 2.
4. Conclusion
This paper presented two innovated macro operations based
on conventional famous Intel CISC MCS-51 lCs family over
the FPGA technique, the two modiﬁed instructions able to
meet widely application domains, without any conﬂict with
the main lC’s ISA and its characteristics. The ﬁrst instruction
‘‘MOVBK Adr2, Adr1’’ was designed for transferring set of
eight data bytes starting from lC’s consequence memory
Design modiﬁed architecture for MCS-51 with innovated instructions based on VHDL 733locations ‘‘Adr1’’ to others eight consequence memory loca-
tions starting from ‘‘Adr2’’, and the second instruction ‘‘Get-
MAX Adr’’ was designed for getting the maximum data
bytes for set of eight data bytes starting from lC’s memory
location ‘‘Adr’’. The two modiﬁed instructions executed only
in 4-machine cycle, while as the same conventional instructions
executed in 81 machine cycles, so there is great enhancement in
the speed processing, and decreasing the power consumption
for any embedded system using the modiﬁed lC. Finally, we
have compared our two modiﬁed macro instructions against
their corresponding two assembly programs respectively, it is
concluded that the overall embedded system based on the
VHDL lC with its modiﬁed instructions have many advanta-
ges including shorter in syntax, compacter in machine code,
faster in execution, and so provide less total power
consumption.
References
[1] Buhus Elena Roxana. A system-on-chip approach in designing a
dedicated RISC microcontroller unit using the ﬁeld-programma-
ble gate array. In: Fifth international conference on systems.
IEEE Computer Society; 2010.
[2] <http://www.maxim-ic.com/app-notes/index.mvp/id/2035>
[accessed 25.10.11, hour 1:14:47 PM].
[3] Uday A, Mallikar MS. 8051 Microcontroller hardware software
and applications. McGraw Publishing; 2009.
[4] Shiva Sajjan G. Computer organization, design and architecture.
NY: Marcel Dekker; 2000, pp. 181–450 [revised and expanded].
[5] Oshana Robert. DSP software development techniques for
embedded and real time systems. Jordan Hill; 2006.
[6] Ray Joydeep. High-level modeling and FPGA prototyping of
microprocessors. IEEE; 2003.
[7] Ashenden Peter, Lewis Jim. The designer’s guide to VHDL. 3rd
ed. Elsevier Inc.; 2008.[8] Chu Pong. FPGA prototyping by VHDL examples. John Wiley &
Sons Inc.; 2008.
[9] Cofer RC, Harding Ben. Rapid system prototyping with FPGAs-
accelerating the design process. Burlington, USA; 2006. p. 35–150.
[10] <http://www.Batronix.com> [accessed 09.05.09, hour 14:00].
[11] Null Linda, Lobur Julia. The essential of computer organization
and architecture. 2nd ed. Jones and Bartlett Publishers; 2006.
Abd-Elmoneim Fouda: B.S degree in Electrical
Engineering, M.S and Ph.D. degrees from
Military Technical College (M.T.C) Egypt.
The current position is a staff member in
Modern Academy for Engineering and Tech-
nology in Cairo. Former Member of satellite
communication group i Army forces. Former
head of branch ‘‘Applied Technical Resear-
ches’’, Army Forces. The ﬁeld of interest and
experience includes: - digital communication,
satellite communication. Embedded systems
(Microprocessor and microcontroller based systems), Speech analysis,speech recognition, signal processing, Wireless sensor network (WSN).
Assem Badr Eldeen Abd-Almagid: B.S degree
in Electrical Engineering from Military Tech-
nical College Egypt. M.S and Ph.D. degrees
from Al-Azhar University in Cairo. The ﬁeld
of interest and experience includes: - computer
architecture and organization, Embedded
systems based on microcontrollers and
VHDL.
