




























n and MIPS16: High-density MIPS for the Embedded Market1
Kevin D. KISSELL
Silicon Graphics MIPS Group
kevink@acm.org
 Origins of MIPS® RISC Technology
The invention of RISC, or Reduced Instruction Set Computer technology, has been credited to 
several people. Certainly a great deal of the credit must go to John Cocke of IBM’s Yorktow
research labs, where a processor called the 801 (named for the building in which Cocke an
team worked) pioneered the concept of designing processors and compilers in tandem. In t
when most applications were coded in assembly language, there was some programmer p
tivity to be gained in providing powerful machine instructions that handled complex tasks, su
single instructions that search memory for a particular value or even solve polynomial equa
The earliest 16-bit microprocessors, such as the 8086, 68000, Z8000 and NS32016 inherit
legacy from the minicomputers they were intended to replace. 
By the late 1970’s, however, more and more application code was being written in higher-le
languages such as FORTRAN, Pascal, and C. Compilers for these languages rarely if ever
these complex instructions, so the complexity and silicon area expended to implement them
wasted. Cocke and his group at IBM conducted their project by designing a compiler, then d
ing a CPU that executed only the instructions that the compiler would emit. This resulted in 
simple and very fast CPU. Cocke was later a visitor at the University of California at Berkel
where his ideas generated a great deal of excitement, particularly with a group of students
ing under Professor David Patterson. They initiated a student project which they called the
I, contrasting their reduced instruction set computer with the previous tendency to build Com
Instruction Set Computer (CISC) designs. Many of the students who worked on the Berkele
RISC-I and RISC-II projects went on to Sun Microsystems, where they formed the core of t
SPARC design team. Many of the design features of the SPARC, such as register “window
came directly from the RISC architecture, and RISC became a general term.
The University of California at Berkeley and Stanford University, both in the San Francisco 
are great rivals in many things, from sports to science. Thus, it is hardly surprising that at the
time as Professor Patterson’s students were designing the RISC processor, there was a sim
project at Stanford, under Professor John Hennessy. This project was called MIPS, which s
for Microprocessor without Interlocked Pipeline Stages. Again, the principle of hardware/so
ware co-design was used, in this case by replacing some CPU pipeline control logic with in
gence in the compiler that would schedule instructions to use the CPUs resources correctly
Hennessy and his team were pleased enough with the results that they founded MIPS Com
1984 to commercialize what was then becoming known as RISC technology. MIPS is now 
sidiary of Silicon Graphics, Inc., and the MIPS architecture is perhaps the most widely know
used RISC.











 32 bits 
cess a 
 in Fig-The MIPS instruction set has evolved gradually over the 12 years since it was first defined. The 
original version is known as MIPS-I, and all MIPS processors are capable of correctly executing 
MIPS-I code. MIPS-II added better multiprocessor synchronization instructions. MIPS-III added 
support for 64-bit addressing and 64-bit integer data types, as well as a richer set of floating-point 
instructions. MIPS-IV added a prefetch capability, as well as further refinements for high perfor-
mance floating-point calculation. MIPS-V was announced in October of 1996, and adds support 
for SIMD (Single-Instruction, Multiple Data) operations on single precision floating point values. 
These last enhancements are quite significant at the high end of the MIPS product line, in super-
computing applications and for real-time 3D graphics. Most MIPS processors are MIPS-II or 
MIPS-III.
 Characteristics of RISC Architecture
There has been, and continues to be, some debate as to exactly what constitutes a “RISC”
sor. It is generally agreed, however that they can be characterized by having a load-store a
ture and a fixed instruction size, both of which substantially simplify the design. A load/stor
machine can only read memory with a load instruction and can only write memory with a st
instruction. They cannot, for example, increment the contents of a memory location with a 
instruction: one must load the value into a register, increment the register, and store the new
to memory.   This need for intermediary storage makes it highly desirable to have a large n
of on-chip registers. A fixed instruction size means that all discrete operation that can be p
formed by the CPU must be expressed in the same number of bits. Most RISC architecture
designed around a 32-bit instruction word. They were developed at a time when memory te
ogy made 32-bit addresses (thus 32-bit registers, and thus 32-bit memory words) desirable.
is sufficient to encode a reasonably rich set of operations, and to give them the ability to ac













Immediate Value (16 bits)
J-Type (Jump)






























tarted  RISC Code Density
As in most engineering decisions, RISC architecture involves certain trade-offs. RISC designs are 
easier to pipeline, and generally support a higher clock rate and higher performance, but they also 
generally require more instructions to do the same amount of useful work. This translates to a 
high instruction bandwidth requirement, which is usually satisfied by an instruction cache. More-
over, since all instructions are the same size, a certain amount of both program memory and 
instruction bandwidth is wasted for those simple instructions that could in theory be expressed in 
fewer bits. Indeed, in CISC achitectures, which have a variable instruction size, simple operations 
have a compact expression. Thus RISC processors have historically, and by and large correctly, 
had a reputation for having relatively poor code density.
For the manufacturers of workstations who were the first adopters of RISC CPUs, this “cod
bloat” was a small price to pay for the advantage in performance to be gained. By the late 
every major workstation vendor had converted to one of the RISC CPU families. Largely du
the performance advantage, and in part due to the intrinsically lower cost of a simpler CPU,
also made rapid headway in the real-time sector. But for many embedded applications, RIS
not an attractive solution. There was not a perceived need for 32-bit capability, and in mass
duction applications the disadvantage in code density implied more memory and a more ex
sive bill of materials. This situation has now changed
 Evolution of the Embedded CPU Market
The embedded uses of microprocessors ranges from simple logic replacement functions to
complex applications that require what once would have been considered to be mainframe
performance. Simple 4- and 8-bit CPUs suffice for many logic replacement functions, but th
increasing capability and complexity of high-end embedded and real-time applications has s
a notable trend toward the use of 16 - and 32-bit CPUs. See Figure 2.























Figure 2: Trends in Embedded Microprocessor Technology


























 of This reflects partly the need for more raw computing power, and partly the need to use more 
advanced software tools and methodologies to achieve reliability and reduce the time to market of 
complex systems. The share of 16/32/64-bit processor based designs is expected to reach nearly a 
third of the total by the end of the century. 
Another factor in this evolution is the trend toward “system on a chip” solutions. The level of
gration attainable with today’s semiconductor technology allows most or all of the digital log
required for an embedded product to be placed on a single chip. For cost-sensitive applica
with sufficiently high production volumes, this is a very attractive alternative. The premium 
silicon “real estate” makes RISC architecture the natural choice for an embeddable proces
core. One MIPS RISC core, the LSI Logic CW4003, measures only 1.8 square millimeters.
sort of technology makes possible a huge variety of innovative solutions, but the ASIC env
ment exacerbates the problems already mentioned with RISC instruction sets: the need for
instruction bandwidth and the increase in code size.
Memory structures in ASIC processes and methodologies tend to be both slower and less 
than is readily achieved in dedicated memory chips. This puts downward pressure on the s
on-chip instruction caches. In addition, highly integrated designs put a premium on I/O inter
A 32-bit memory interface may simply not be practical for a low-cost, highly integrated com
nent, yet making a 32-bit RISC wait two cycles for each instruction fetch from a 16-bit mem
interface can negatively impact performance.
 MIPS16: An Architecture Extension for Compressed RISC Code
MIPS16 is a solution to the code density and bandwidth issues for MIPS RISC designs. It i
sified by MIPS as an “architecture extension”, meaning that, while MIPS16 support is not m
tory for all future MIPS implementations, it is the standard mechanism for code compressio
across all suppliers of MIPS RISC CPUs. As the name suggests, MIPS16 is a 16-bit instru
encoding. It has been designed to be 100% compatible with the existing 32-bit (MIPS-I/II) a
64-bit (MIPS-III) architecture and programming model. It will be available in multiple implem
tations from multiple vendors of MIPS RISC microprocessors and cores, and supported by
ple compiler and tool-chain vendors.
MIPS16 maps onto the established MIPS architecture by presenting a subset of the CPU t
programmer. Each MIPS16 instruction corresponds to exactly one MIPS-III instruction. MIP
instructions can be translated into MIPS-III instructions on-the-fly using relatively simple tra
tion hardware. This can be done serially as a preprocessor between the I-cache and the st
MIPS instruction decode, or in parallel in a 16/32-bit decoder. The reference model from M
uses the serial model, as it is modular, portable, and easy to verify. Figure 3 shows the flow






ng 8 or 




edi- The MIPS16 Design Trade-offs
MIPS16 can be thought of as a reduced reduced instruction set. To reduce the number of instruc-
tion bits by half, one must attack all three components of the instruction word, opcodes, register 
numbers, and immediate values.
The first step that was taken was to perform statistical analysis on a number of MIPS binaries 
from a variety of applications from embedded, real-time, and workstation environments. This was 
to determine the frequency distributions of opcode use, of the number of registers simultaneously 
in use, and of the number of significant bits in immediate values. The results showed that, while 
the opcode and function code fields could be reduced, and some instructions “thrown away
MIPS instruction set was already very lean. While the base MIPS instruction set has 6 bits 
major opcode field, sometimes modified by a 6 bit function code, MIPS16 reduces the majo
opcode field and the function modifier to 5 bits each, and defines a total of 79 instructions, 
which 24 are only required for MIPS-III implementations supporting 64-bit data words. 
More leverage was to be had in reducing the size and number of register specifier fields in
instructions. The analysis showed that, most of the time, compiler-generated code was usi
fewer registers. Restricting MIPS16 to 8 registers allows register specifiers of 3 bits instead
The R-Format standard MIPS instructions support 3 operands, two inputs and an output.   In
cases, MIPS16 only permits two register specifiers per arithmetic instruction. One of the in
registers must also be used as the result register, overwriting that input.
Perhaps the biggest saving comes from restrictions on the size of immediate values expres
the place of the 16-bit immediate field of the MIPS I-Format instructions, most MIPS16 imm


















ctions ate fields are restricted to 5 bits. Some are restricted to 3 or 4. An example of the resulting map-
ping is given in Figure 4.
 Overcoming the Restrictions
The basic MIPS16 encodings restrict the register name space significantly and the immediate 





• Load/Store Offset Shifts
The EXTEND instruction prefix contains only an opcode and an immediate value. It does not 
generate a MIPS instruction on its own, but instead contributes 11 bits to be concatenated with the 
5 bits of immediate data carried in the following MIPS16 instruction. EXTENDing an instruction 
to 32-bits in this manner allows the same order of immediate value magnitude as is available in 
the native MIPS instruction set.
For immediate values greater than 16 bits, the standard MIPS instruction set uses the “Load
Immediate” (LUI) instruction to load the upper 16 bits of a register, which can be followed w
an “Or” of an immediate value into the lower 16 bits. MIPS16 does not support the LUI inst
tion, but instead introduces program counter relative addressing. 
To exploit PC-relative addressing, 32-bit, or even 64-bit, constants can be embedded in the





































 will within the nearby routines can reference this data with a single instruction. Even with the over-
head of the constant storage in the code space, this is more compact than the two 32-bit instruc-
tions required by the basic MIPS instruction set. It is also possible to do arithmetic using the value 
of the program counter as an input, which is useful for manipulating strings and data structures 
embedded in the code space.
The stack pointer is another special register in MIPS16. In the base MIPS instruction set, there is 
no hardware-designated stack pointer. The program stack pointer is by convention maintained in 
one of the general purpose registers, $29. MIPS16 refers to it implicitly through special opcodes 
in order to conserve precious register name space. Loads and stores may be done relative to the 
stack pointer and the program counter with 8 bits of immediate offset rather than the usual 5, since 
the base register is implicit in the opcode.
One further mechanism for getting the maximum advantage from the limited immediate value 
range available to MIPS16 code is the promotion of load-store offsets to aligned values. Loads 
from unaligned addresses are permitted in the base MIPS instruction set, and the immediate value 
associated with a load or store instruction is simply added to the contents of the base register to 
derive the effective address. In MIPS16, load and store offsets are shifted left until they are 
aligned to the data type being loaded or stored. In the case of memory operations on 32-bit words, 
the shift is two bit positions. In the case of 16-bit “halfwords”, the value is shifted by one bit
Taken together with the PC or SP relative addressing modes, this makes possible the direc
addressing of relatively large amounts of data despite the constraints on immediate offsets
loads relative to the stack pointer can span a range of 1K bytes of memory without the nee
EXTEND the instruction. If EXTEND is used, this shifting feature is unneeded and is disabl
Thus unaligned accesses can still be generated if necessary by using the EXTEND prefix.
 Switching Between MIPS16 and 32-bit Modes
The MIPS16 architecture provides for the efficient run-time switching between compressed
32-bit modes of operation through the JALX, or Jump And Link with eXchange, instruction. 
is like the MIPS-I JAL instruction in that it transfers control to the specified address while sa
the address of the instruction logically following the jump, the return address, in a link regis
The JAL is the MIPS mechanism for subroutine calling. The JALX extends the semantics b
toggling the state of the instruction decode logic between 32-bit MIPS mode and MIPS16 m
A 32-bit subroutine can call a 16-bit subroutine, and vice versa. The previous state is merge
the return address, and restored automatically on return from the subroutine.
Similarly, on traps to the operating system, the instruction mode is merged with the exceptio
gram counter, and is restored on return from the exception. Most operating system kernels
require no modification to support MIPS16 CPUs and MIPS16 binaries.
ust be  Which Registers Are Visible
Only 8 of the 32 general MIPS registers are normally visible to MIPS16 code. The choice of these 
registers was made so as to preserve the standard MIPS calling conventions used by compilers. 
This minimizes the work necessary for compiler writers to support MIPS16. The mapping can be 
seen in Figure 5.
The 24 general MIPS registers that are not directly visible to the MIPS16 instruction stream can 
nevertheless be accessed using the MOV32R and MOVR32 instructions, which copy between the 
specified MIPS general register and the specified MIPS16 register.
The MIPS16 instruction set specifically excludes coprocessor instructions, including floating 
point instructions and those which reference the “system” coprocessor. These instructions m













$16                     s0
MIPS Calling Convention
$17                     s1
$2                      v0
$3                      v1
$4                      a0
$5                      a1
$6                      a2
$7                      a3
SP
$29                    sp
Figure 5. MIPS16 Register Mapping
T















inary  Handling of Conditional Execution
The MIPS instruction set has no designated condition code bits or condition registers. The execu-
tion of conditional branches is determined by the state of general registers. Since general registers 
are a scarce resource in the compressed instruction set, MIPS16 has a more restricted repertoire 
for conditional branching. Branches may always be made conditionally on any MIPS16 register 
being equal to or not equal to zero. In addition, one of the MIPS general registers ($24) serves as 
a special condition register, called “T”, for handling inequalities. “Set-on-less-than” instructi
which in the base MIPS instruction set may specify any general register to hold the result, i
itly use the T register as the destination. The base MIPS instruction set has no “compare” i
tion as such. Programmers are expected to perform subtraction explicitly and test for a zero
In MIPS16, the shortage of available registers makes a non-destructive compare more des
and a non-destructive compare-immediate instruction is provided, which maps into an excl
or of the specified register with an 8-bit immediate field, again using the T register as the im
destination. The BTEQZ and BTNEZ then allow branches based on the zero or nonzero st
the T register.
 Effectiveness of MIPS16
MIPS16 instructions are, except when EXTENDed, half the size of their standard MIPS cou
parts, but also somewhat less expressive. The careful design of the compressed instruction
minimized the impact of this loss of expression. More instructions are required to perform s
operations, but with compilers optimized for MIPS16, a net code saving of 40% has been 
achieved across a range of embedded and desktop codes. This makes MIPS16 code dens
than any conventional RISC or CISC instruction set. 
The performance effects of MIPS16 are complex and variable. While the larger absolute nu
of instructions can have a negative impact, the higher code density improves the hit ratio o
instruction cache and reduces the off-chip instruction bandwidth requirements, which can, 
depending on the application, more than make up for the increased instruction count.
 Conclusion
MIPS16 is a very efficient code compression mechanism that preserves architectural and b
compatibility with the long established MIPS RISC architecture. While other high-density 
instruction sets have been proposed, no other scheme provides for 64-bit data or 16-bit 
EXTENDed immediate fields.
 Acknowledgments
I would like to thank Earl Killian at MIPS and Hartvig Ekner of LSI Logic for defining the 
MIPS16 instruction set architecture and for their assistance in preparing this paper
 References
For more complete information on MIPS16 see http://www.mips.com/arch/MIPS16
