A method for dynamically interleaving multiple MPEG-1 video streams by Deihl, William B., III
A METHOD FOR DYNAMICALLY INTERLEAVING 
MULTIPLE MPEG-1 VIDEO STREAMS 
BY 
WILLIAM B. DIEHL III 
B.S., University of Illinois, 1994
THESIS 
Submitted in partial fulfillment of the requirements 
for the degree of Master of Science in Electrical Engineering 
in the Graduate College of the 
University of Illinois at Urbana�Champaign, 1997 
Urbana, Illinois 
ACKNOWLEDGEMENTS 
The growth of this thesis would have been seriously stunted if not for the help of many 
people along the way. This group includes people who made large contributions such as the 
loan of test equipment, and individuals responsible for small contributions such as a samples of 
inexpensive, yet invaluable parts. 
Specifically, my thanks go out to Professor Ricardo Uribe, my thesis advisor, who has 
given me all the support that I have ever requested. I would like to thank Jonathan Greenlaw 
for the use of his lab equipment and the helpful suggestions that I received each time that I 
found myself backed up against a wall. Wendy Arnott Bishop, whose last-minute collaboration 
in the design process helped me to maintain my sanity, receives my undying gratitude. I would 
also like to thank Darren Neuman for his willingness to share his wealth of MPEG knowledge 
with me. SGS-Thompson Microelectronics, Inc., for their gracious donation of an MPEG 
decoder chip, and my family for always believing that I might, one day, graduate. 
llJ 
TABLE OF CONTENTS 
Page 
1. INTRODUCTION .-.............................................................................................................. 1 
1.1 Background ............................................................................. ; .... 0 ................................ 1 
1.2 Thesis Impetus ............................................................................................................... 2 
1.3 Thesis OvervieY.' ............................................................................................................ 2 
1.4 Thesis Roadmap ............................................................................................................ 3 
2. INTERLEAVE ALGORITHM .......................................................................................... 4 
2.l l\1PEG Background ...................................................................................................... 4 
2.2 Implementation of Interleave Scheme ......................................................................... 5 
3. SYSTEM DESIGN ............................................................................................................ 11 
3.1 System Overvie\\' ......................................................................................................... 11 
3.2 Bus Interface Unit ....................................................................................................... 11 
3.3 Hard\\-'are Interleave ................................................................................................... 14 
3.4 Soft\\·are Interleave ..................................................................................................... 17 
4. MPEG PICTURE DECODER ......................................................................................... 19 
4.1 Overvie"'' ...................................................................................................................... 19 
4.2 Performance Calculations .......................................................................................... 19 
5. FUTURE DIRECTIONS .................................................................................................. 23 
5.1 System O\'ervie\.\' ......................................................................................................... 23 
5.2 MPEG Feeder Subsystem ........................................................................................... 23 
5.2.1 Serial input ........................................................................................................... 23 
5.2.2 Parallel input ....................................................................................................... 25 
5.3 MPEG Decoder Block ................................................................................................. 26 
5.4 Video Subsystem ......................................................................................................... 28 
6. CONCLUSIONS ................................................................................................................ 30 
APPENDIX A. ISA BUS INTERFACE ......................................................................... 31 
APPENDIX B. HARDWARE INTERLEAVE DESIGN ............................................. 33 
APPENDIX C. PATTERN DETECTOR ...................................................................... 47 
APPENDIX D. SOFTWARE INTERLEAVE DESIGN .............................................. 52 
REFERENCES .................................................................................................................. 57 
REFERENCES NOT CITED .......................................................................................... 58 
JV 
-- ----·--- ··---- ·····------ ----- --
1. INTRODUCTION
1.1 Background 
Originally, users interfaced to a computer using only a command line interface. The 
computer was used only for its raw, number-crunching ability. Thus, a low resolution monitor 
was perfectly adequate because data were exchanged between user and computer mainly with 
alphanumeric characters, although some bit-mapped graphics were available. At these 
resolutions, it was acceptable to require the computer's CPU (Central Processing Unit) to 
handle the mapping of pixels into the video memory. 
Soon people found graphics much more valuable. Not only could data be better 
represented by graphs and charts, but a more user-friendly interface to the computer could be 
created. Several independent command line interfaces could be run concurrently within 
separate windows, and icons could be used to represent abstract concepts such as directories in 
a manner more comprehensible to new users. 
These graphical entities quickly cluttered the screen and a substantially larger amount of 
data transfer was required to maintain and update the display. Higher resolutions were deemed 
necessary, as well as a more efficient method of displaying graphics. Maintenance of the 
display consumed such a large portion of CPU processing time that video processing came into 
being. Instead of sending raw pixel data to the display adapter, the CPU could transmit simple 
instructions to a video processor in order to perform functions such as drawing lines or 
windows on the display. This advancement greatly reduced the burden upon the CPU. 
As the computer evolved, it began to take on tasks more diverse than mathematical 
calculations and word processing. It became a device for playing games and presenting 
instructional material in a new, intriguing manner. Audio and video capabilities became an 
integral part of the computer. These media, inherently analog in nature, can consume an 
exorbitant amount of digital storage space if sampled at a reasonable rate. 
I 
In an attempt to reduce the storage space requirements, algorithms have been developed 
to compress the digital data. Some efforts, such as the MPEG (Moving Pictures Expert Group) 
compression standard, require several years of intense work to ensure that an efficient technique 
results. Despite all of the effort put into developing the best hardware and software possible for 
increasing the speed of multimedia applications, the demand for higher performance is ever 
present. 
1.2 Thesis Impetus 
This thesis addresses one of the many facets of the current demand-enhanced video. 
As the multimedia market grows more sophisticated, the use of multiple video streams becomes 
less of a novelty and more of a necessity. The uses range from texture mapping in video games 
to the concurrent display of several sequences to monitor a number of security cameras. A 
method is proposed that can increase performance without changing the compression algorithm, 
and requires only modest enhancements to the hardware currently available. The system 
proposed is capable of dynamically demultiplexing and decoding multiple MPEG-1 video 
streams in real time. 
The ability to dynamically interleave different video streams can eliminate some 
inconvenience and save digital storage space. Using the equipment currently available, one 
would need to encode a separate, interleaved video sequence for each combination of video 
streams. This intermediate step can be avoided using the method proposed here. A decoder 
system capable of dynamically interleaving video would make the process transparent to the 
user. 
1.3 Thesis Overview 
An algorithm is presented which is capable of interleaving MPEG-1 video streams in a 
manner that is decipherable at the decoder. No modification of the MPEG protocol is required. 
Instead. additional information is added to the data stream using the user data construct 
provided by the compression standard. 
2 
·----- -------·---------------
The algorithm is illustrated in two different implementations as a peripheral of a 
personal computer. In a completely hardware oriented design, the host CPU has only the 
burden of sending the raw, encoded video to the board. The decoding hardware determines 
picture boundaries and communicates with the system delivering the data via interrupts to 
request a,context switch. The resulting system has a very complicated front-end because of the. 
necessary stream parsing circuitry and the need to preserve machine state between different 
streams. 
It is possible, though, to place more of the work upon the host CPU. Rather than simply 
transferring the video stream from system memory to the decoder board, the host processor 
performs the stream interleave as it sends the data. The outgoing bit-streams are searched for 
particular bytes that indicate picture boundaries, and the stream is separated into pictures from 
alternating streams without hardware assistance. 
1.4 Thesis Roadmap 
Utilizing the algorithm presented in this thesis, a context switching MPEG-1 video 
decoder could be constructed using a single MPEG decoder chip. The only requirement for the 
decoder chosen is that it can be controlled on each frame it decodes. A system built upon the 
proposed algorithm and single decoder chip would be cost-effective and could provide the 
benefits described above. The basic design of such a system is outlined after the discussion of 
the interleave algorithm. 
3 
2. INTERLEAVE ALGORITHM
2.1 MPEG Background 
The MPEG-1 video bit-stream is organized into a hierarchy of data. The entire movie is 
termed a sequence and forms the highest level of the hierarchy. A sequence can be broken into 
five lower-level components, each of which encompasses all subsequent data in the video 
stream that occupy a lower tier of the hierarchy. These components are named group of 
pictures (GOP), picture, slice, macroblock, and block. The relationship between these 
components is shown in Figure 2.1. 
Sequence 
GOP 
GOP • • • 
Sequence • • • 
End of Sequence 
Picture 
Picture • • • 
Slice 
Slice • • • 
Macroblock 
Block: 
Block: 
Block., 
Block., 
Block5 
Block0 
Macroblock • • • 
Figure 2.1 MPEG-1 Bit-stream Hierarchy 
4 
Within the sequence, each sub-component can be repeated within its parent as shown by 
the ellipsis. The only real restrictions on the number of repetitions are imposed upon the slice 
and the block. There can be only 175 slice divisions per picture and six blocks per macroblock. 
The other components can be repeated unconditionally, assuming that the sequence has not 
been defined as a constrained bit-stream. 
2.2 Implementation of Interleave Scheme 
The problem posed is how to separate the data from multiple, distincti video streams 
when they may become overlapped during any point of image display. A method was needed to 
maintain the separation between the data streams without corrupting the individual streams. 
The method employed by this decoder system is to add an additional layer of encapsulation, as 
well as to use the available user data section provided within a sequence header. 
The data distribution program that runs on the host computer breaks each data stream 
into a number of data blocks and interleaves blocks from alternating streams. Each of these 
blocks is prefaced with an identifying code (Stream ID) which uniquely specifies the source of 
the video stream. In a design that divides the work more evenly between the host CPU and the 
decoder board, the data blocks are broken into segments delineated by context switching points 
before they are transferred to the board. 
In a hardware intensive design, though, an interrupt is generated by the decoder board 
when an appropriate switching point is encountered to notify the host computer that it should 
transmit a new block of data; it may be a new video stream if one is available. Every video 
stream needs its own area of buffer memory in the hardware design because of the finite delay 
between the. detection of a switch point and the time at which the host CPU discontinues the 
transfer of the current stream. If these intermediate data were discarded, the video stream could 
become corrupted. The manner in which the data blocks are organized is independent of 
system implementation choices and can be seen in Figure 2.2. 
5 
Stream ID 
Raw MPEG Bitstream 
SEQ/GOP/PIC Header 
Stream ID 
Raw MPEG Bitstream 
SEQ/GOP/PIC Header 
• 
• 
Figure 2.2 Structure of Data Transfer Blocks 
Once multiple data streams have been detected, parsing circuitry is enabled. The data 
streams are processed before being sent to the core MPEG decoder chip so that the bit-streams 
are interleaved at picture boundaries. It is vital that the compressed data given to the decoder 
chip contain a regularly alternating pattern of pictures so that the display rate of each video 
sequence appears continuous. If the pictures decoded did not toggle from one stream to another 
on each boundary, one display might appear to visibly jump between images. 
The key to the algorithm is the data embedded within the user data section of the 
sequence header. Because any video sequence can have multiple sequence headers in order to 
redefine quantization values, the appearance of a sequence header does not necessarily 
announce a new stream. If, however, the decoder system reads a different stream identification 
number within the user data section of the sequence header, then the current data are known to 
belong to a different stream. After a new sequence has been detected, compressed data are 
assumed to be interleaved at picture boundaries as it is provided to the core decoder chip. 
6 
The introduction of interleaved data into the decoder chip increases the amount of 
attention required for the actual decoding of the MPEG streams. The decoding parameters used 
by the MPEG chip must be switched for each picture decoded. This function is performed by a 
dedicated microcontroller whose sole job is to prepare the decoder chip for each picture that it 
must decode. A small bank of memory exists simply for the temporary storage of the 
parameters associated with each video stream. In the event that the system does not adhere to
the strict alternation of pictures from each stream, inappropriate values will be used in the 
decoding .of at least one picture. Such an error could corrupt the remaining frames of all video 
streams. 
The only time that more than a single picture can be sent to be decoded before toggling 
to the other stream Occurs when either a sequence header or group of-pictures (GOP) header 
appears in the stream. In these instances, the higher level headers are part of the data packet 
sent to the decoder chip. Two possible scenarios are illustrated in Figure 2.3 below. 
I Picture l 
I Picture 1 I 
I Sequence 2 I 
I GOP2 
Picture 2 
I Picture 1 
I Picture 2 
I Picture 1 
I Picture 2 I 
Figure 2.3 Multiple Video Streams Interleaved 
7 
The algorithm can be implemented with the small state machine illustrated in Figure 
2.4. Each box represents the state of the machine between clock pulses. Traqsitions between 
the different states occur only on clock edges. It is important to note that the conditional 
transitions always occur in pairs. If the binary value of the incoming byte does not fit into the 
sequence of a start code, the machine is reset to the idle state and the start code search begins 
anew with the next byte. 
IDLE 
BYTE != 'OOH' 
BYTE='OO 
FIRST 'O' 
BYTE !=' OOH' 
BYTE='OOH' 
SECOND ·o· t--------< 
BYTE='OJH' 
FOUND 
START 
CODE 
BYTE!= 'OIH' 
BYTE != ('B3H' II 'B7H' 
'B8H' II 'OOH') 
BYTE='B8H' 
SEQUENCE 
HEAD 
SEQUENCE 
END GOP-I PICTURE-I 
GOP-2 PICTURE-2 
Figure 2.4 Data Interleave State Diagram 
8 
- - --·· ·------- ------· --
• 
I 
i 
i 
! 
• 
• 
i 
i 
The first four states of the diagram above, IDLE,FIRST 'O', SECOND 'O', and FOUND 
START CODE are merely placeholders in the search for a start code. Each of the final four 
states of the machine performs a function which can potentially modify the bit-stream. The 
behavior of these states is defined in Table 2.1. 
Table 2.1 State Definitions 
State Name Function Ski 
pMark 
SEQUENCE HEAD Mark byte preceding header for context Set 
switch. to 'l' 
Stuff Stream ID into User Data section. 
i
i 
I 
SEQUENCE END If hardware implementation; Una 
I
Notify system of sequence end. 
ffected If software implementation; 
I No action taken. • 
If Skip Mark= I; 
GOP-I 
Mark byte preceding header for switch; Una 
Else; ffected 
No action taken . 
GOP-2 No action other than setting Skip Mark. Set 
to '1 · 
If Skip Mark = I; 
PICTURE-I Una Mark byte preceding header for switch. 
Else: ffected 
No action taken. 
PICTURE-2 No action other than setting Skip Mark. Set 
to 'O' 
Three of the states perform a byte marking function. The specifics of this function 
differ between the hardware intensive design and the split hardware/software design. In the 
purely hardware system, the byte preceding the five header bytes is literally marked in some 
manner (e.g., using a 9-bit FIFO the extra bit can be employed as a context switch indicator 
rather than parity) to notify the remainder of the decoder system that another stream should be 
9 
read in after the marked byte. An interrupt is also sent to the host CPU to indicate that the data 
transferred should now come from an alternate stream. In the split design, marking a byte 
simply involves noting the new header and switching to the next stream; no bytes are actually 
flagged. 
Several of the states modify the Skip Mark variable. This is the mechanism used to 
ensure that back-to-back sequence headers, GOPs, and picture headers are not split up, but 
remain within the same transfer block as depicted in Figure 2.3. In order to make the state 
machine as portable as possible, the GOP and PICTURE states were each split into two 
separate components. This allows predictable behavior in synthesized hardware designs. Using 
this structure there is no race condition between setting the SKIP MARK and testing its current 
value. 
SEQUENCE END is unlike the other states in that its function differs between 
implementations. Because the hardware/software design has no individual stream buffers, the 
last header in a sequence is simply appended to the preceding picture and treated as additional 
data. When there are separate buffers for each stream, though. this header can be used to 
inform the system that those resources can be freed for new streams because the current stream 
has terminated. 
lO 
3. SYSTEM DESIGN
3.1 System Overview 
This thesis was approached from two different angles. The first design implemented the 
interleave algorithm in hardware. The second implementation utilized software to interleave 
multiple MPEG-1 video streams into a single, unified bit-stream. Both prototype designs were 
built as peripheral boards on a personal computer (PC). In order to provide a means for the 
CPU to communicate with the decoder board, a bus interface unit (BIU) was designed to 
intercept data sent over the ISA bus. Both implementations of the interleave scheme and the 
design of the BIU are discussed in-depth below. 
3.2 Bus Interface Unit 
The decoder system interfaces to the PC through the ISA bus. All sixteen bits of the 
ISA bus were used in order to employ the largest possible transfer bandwidth. Because MPEG-
I video was targeted at a bit-rate of 1.5 Mbits/s, a transfer rate of only 200 kHz using sixteen 
bits provides 3.2 Mbits/s, that is more than sufficient to support two bit-streams which conform 
to the original recommendations. 
The final interface design was very straightforn:ard because programmed IJO was used 
for the data transfer. The only hardware required was a single PLD(Programmable Logic 
Device) to decode the address lines and IOR/IOW lines, bus transceivers to isolate the board's 
data lines from the ISA bus during those periods in which the decoder is not being accessed; 
and two latches used to store incoming data. The code used for programming the interface PLD 
is provided in Appendix A The interface is represented by the block diagram in Figure 3.1. 
11 
ISA--..-----,--------, 
\I SB LSB A9,AO/JOW/JOR 
Bus Transceinr Bus Tranmim Address PAL 
Control lines 
M SB Latch LSB Latch 
To Decoder System 
Figure 3.1 Bus Interface Unit 
The latches in the BIU serve a dual purpose. The most obvious use for the latches is to 
grab the data from the ISA bus while it is valid, thereby, ensuring that the data can be reliably 
processed long after the bus lines have been tri-stated. The latches can also be used to gate the 
data bytes onto the internal decoder board bus in any manner desired. This approach provides 
much more flexibility than cross-wiring. 
Using the enable lines on latches, the order of each pair of bytes read from the ISA bus 
can be swapped as shown in Figure 3.1. The bytes are shown transposed because of a memory 
storage paradigm chosen by Intel microprocessors. The PC uses the storage model known as 
Little Endian. This model places the least significant byte (LSB) at a smaller address than the 
most significant byte (MSB ). A sixteenMbit quantity .interpreted as an integer is assumed to be 
stored in memory with its LSB, first. Different processors sometimes use the Big Endian model 
12 
which stores -data in exactly the opposite manner. Both models can be more easily understood 
by glancing at Figure 3.2. 
Integer= OxOOOIOOOO Integer = Ox 12345678 
Data: OxOOOO Data: OxOOO I Data: Ox5678 Data: Oxl234 
Little Endian Addr: 0 Addr: I Addr: 2 Addr: 3 
Integer= OxOOOIOOOO Integer= Oxl2345678 
Big Endian Data: OxOOO 1 Data: OxOOOO Data: Ox 1234 Data: Ox5678 Addr: 0 Addr: 1 Addr: 2 Addr: 3 
Figure 3.2 Memory Models 
The memory model is significant simply because it will define the order in which the 
bytes will be appearing on the bus. Because the binary file for an MPEG- l video stream is 
essentially one contiguous string of bytes, it is important that the data are processed in the 
proper order. Attempting to decode bytes that are not presented in the proper order will 
produce unpredictable results. 
The signal used to latch data into the two registers of the bus interface unit is actually 
generated in another sub-block. The PLD responsible for latching the data also serializes the 
bytes on the internal bus in the correct order and sends these bytes to either the flow control 
FIFO (first-in, first-out) of the software design, or to the pattern detector of the hardware 
interleave implementation. 
Some interesting problems were encountered during the design of the BIU. After the 
BIU had proven to operate on an older PC system, the decoder board was moved to a new 
system. The new PC failed to boot properly with the decoder board installed. The problem was 
traced to the plug and play (PnP) protocol that is built into new PCs. Machines with PnP 
capability query the bus to detennine what types of devices are present. The peripheral boards 
are -expected to return an identification packet when their addresses are queried. The prototype 
13 
board built for this thesis was designed only to read from the bus and had absolutely no built-in 
PnP support, although the internal bus was isolated via the bus transceivers. The isolation. 
though, was not sufficient. The fix to the problem was to use pull-up resistors between the ISA 
bus data lines and the bus transceivers so that the hexadecimal value OxFFFF appears when the 
decoder board is queried. In the PnP protocol, a query response of OxFFFF is considered an 
empty slot, and is, thus, ignored. 
Another interesting observation made during the BIU construction dealt with the 
utilization of PLDs. The first pass at the interface was made using an Altera EPM5016 and the 
Altera Hardware Description Language (AHDL). This chip was chosen because it had a logic 
array that was dense enough to incorporate the entire design. and little of the logic would be 
wasted. The EPM5016 also has enough latches to buffer all sixteen bits of the ISA bus. 
Unfortunately, the functionality of this particular design was intermittent. The reason for the 
unpredictable performance was never pinpointed, but a later design with an Altera part led me 
to suspect that utilizing over 70% of a chip's logic caused speed path problems in synthesized 
AHDL. 
3.3 Hardware Interleave
The state machine of Figure 2.4 was implemented on an Altera EPM5064 using AHDL. 
As shown in the diagram, one byte is analyzed to determine the direction of each possible 
branch. A state machine had been considered which made transitions based on two byte 
quantities in order to reduce processing time by one cycle, but the minuscule time savings was 
deemed insignificant in comparison to the complexity introduced into the evaluations. It makes 
more sense to march through the video stream one byte at a time because the MPEG-1 headers 
are guaranteed to be byte-aligned. The source code for the state machine can be found in 
Appendix B. 
Design of a hardware interleave system is a complex task which becomes more 
complicated for each additional video stream that can be potentially decoded. The system 
illustrated in Figure 3.3 can support only two streams and is the most rudimentary of all 
14 
- -·---·----·---- ----------
hardware systems. The mini-FIFO, state memory. pattern detector, interleave state machine, 
and bank of video stream-FIFOs comprise the five distinct blocks of this interleave 
implementation. 
• 
l B)"te of Video
Stream -
, 
Bil Control
�------
Mini FIFO
----, 
Control I
I 
Pattern I 
Detector I L " Pattern Match �
---- -
Pattern Incle;,;
- --
Marker Bit
Control ::,,,
r--
Interleave
State
Machine
State
Memory
Figure 3.3 Hardware Interleave System 
' , 
-> 
Stream 1 Stream�
FIFO FIFO 
t" To Decoder
The mini-FIFO serves as intermediate buffering between the incoming raw video stream 
data and the processed stream that indicates interleave boundaries with a 'marker' bit. The 
'marker' is actually just the ninth bit of a parity FIFO. When the state machine detects an 
appropriate header following the last byte of the current picture, the parity bit is set. If the 
correct condition is not met, the parity bit is not set. In order to set the parity bit on the correct 
byte, there must be a buffer zone between the incoming bytes and those that are stored in the 
stream-FIFO. Once a byte has been written to the stream-FIFO, it cannot be modified. The 
buffer zone is implemented as a five byte window between incoming data and the processed 
data ip the stream-FIFO. These five bytes accbmmodate the last byte of the picture and the four 
bytes of the next header. When the context switching point has been detected, the remaining 
bytes of the mini-FIFO are flushed into the appropriate stream·FIFO. 
The state memory is necessary if the system is to properly execute the interleave 
algorithm after each context switch. Because the data in the mini-FIFO is flushed on a context 
15 
switch, it is important that the system knows whether the extra bytes were significant. The state 
of the machine indicates the contents of the incoming stream up to the point of the context 
switch. Preserving the state of the machine allows the search for start codes to resume exactly 
at the point of the switch. 
The most pressing consideration for the implementation of the state machine was the 
l/0 pins. Pins were necessary for controlling FIFOs, controlling the BIU latches, making the 
current state visible, marking the byte that ended a picture, and potentially to indicating which 
stream-FIFO was currently active. The eight pins required to read a byte of data from the 
incoming video stream could not be spared. A separate pattern detector was implemented in a 
PALCE22V10 to alleviate the VO problem with the EPM5064. Three pins were gained in this 
arrangement because the detector requires four bits to decide which pattern to search for and 
responds to the EPM5064 on a single bit line. Separating the pattern search from the state 
machine also aided in the debugging process. The complete design file for the pattern detector 
is presented in Appendix C. 
The size of the stream-FIFO block is directly proportional to the maximum number of 
streams that are to be supported. Each stream needs its own block of storage space so that the 
data bytes that are flushed at a context switch remain associated with the correct stream. The 
stream-FIFOs should have tri-state capability and have their outputs connected to a common 
bus. Whatever device is retrieving the interleaved data must determine which FIFO should be 
read and could, potentially, get that infonnation from the interleave state machine PLD. It 
would be possible to maintain a single stream-FIFO that would hold the interleaved data, but 
that would require a separate storage facility for the bytes that follow the context switching 
point of the current stream. The control scheme necessary for handling the mini-FIFO refuse 
would be considerably more complex. 
A problem similar to the bus interface unit was encountered in the implementation of 
this state machine. The preliminary design was functional and ran at the clock frequency that 
had been chosen for the board. As more functionality was built into the PLD, the chip began to 
16 
, ___ ,, _ ______ _ _
miss its timing edges and the clock had to be slowed. After pushing chip utilization over 75Vr 
the clock frequency had to be cut once again. The resulting clock was barely over half of the 
desired frequency. AHDL was also used for this design, and the compiler was not the newest 
available. It is possible that the synthesizer simply was not able to efficiently place a design in 
a chip at high utilization. Manual layout may be required in high density PLDs to ensure an 
efficient implementation. However, the most recent version of the design software may be 
intelligent enough to accomplish the task. 
3.4 Software Interleave 
Performing the interleave in software is significantly less complex than attempting to 
track context switches in hardware. The diagram in Figure 3.4 should be compared to the block 
diagram of the hardware system in Figure 3.3. The software implementation requires only a 
unit to control the data flow on the decoder board. ln this design, the incoming data has already 
been interleaved and can be input directly to the flow control FIFO. There is no need to 
preserve the state of any machine at the picture boundaries because all of the details are handled 
in the software. 
I Byte of Video 
Stream 
3IU Control 
�---
�ontr� 
---;», 
' , 
Flow 
Control 
Data Transfer FIFO 
Control ---------
' 
To Decoder 
Figure 3.4 Software Interleave Data Transfer 
The interleave program was written in C and is included in Appendix D. The source 
code in the appendix is designed to be an analysis program. It takes two MPEG� I video files as 
input and produces an interleaved file which can be analyzed using a hexadecimal editor. The 
17 
-- -- --- ---- ----- ---
code can easily be converted to a program that will send the interleaved file to a port address by 
converting the 'fwrite( <data>,outfile)' statements to functions designed for l/0, such as 
• outpw( <data> ,port_addr).'
Although the software cannot operate as quickly as custom hardware, the rate at which 
interleaved data can be dispatched using the software method exceeds 3 Mbits/s, which is 
sufficient to support two video streams in real time display. Considering the limitations of 
MPEG dec'oders that operate at the slice-level, the bandwidth of the software interleave design 
is not a bottleneck. The details of a commercial MPEG decoder are discussed in the next 
chapter. 
!8
4. MPEG PICTURE DECODER
4.1 Overview 
The STi3400 from SGS Thompson is a commercially available MPEG-1 picture 
decoder. It was chosen for this thesis because it is capable of decoding an interleaved video 
stream when properly controlled. The chip is only smart enough to decode from the slice-level 
downward. What this means is that the decoder requires assistance in setting up each frame of 
the sequence, but can decode everything within the frame. Due to the fact that each frame is, 
essentially, an independent unit as far as the STi3400 is concerned, the input to the chip can be 
manipulated so as to interleave pictures from different streams. Thi-s data manipulation allows 
the display of multiple, independent video streams. 
Although very useful in facilitating the decode of multiple video streams, the chip's 
architecture imposes some constraints upon the manner in which multiple streams can be 
decoded. Namely, the frame size of all sequences concurrently decoded must be identical due 
to the manner in which the chip stores data, The method used for storing the luminance and 
chrominance blocks of the video stream is dependent upon the number of macroblocks in the 
video sequence. If the number of macroblocks changes from frame to frame, the data stored in 
memory will become corrupted and the output produced will be meaningless according to the 
information in [I], 
4.2 Performance Calculations 
The feasibility of this thesis hinges on whether the actual decoder chip has enough 
bandwidth to decode two independent MPEG video streams. The formulae given in [I] will 
show that the STi3400 can easily support two streams. It is apparent from these calculations 
that the first bottleneck to be encountered will be the bandwidth limitations of the DRAM 
(Dynamic Random Access Memory). 
The minimum primary clock for the MPEG decoder chip is defined in Equation ( 4.1 ). 
The equation is a worst-case estimation which uses a processing time of 814 cycles for the 
19 
--- ----- ----- ----- ----------- ---- -- -
decode of a B-macroblock and 515 cycles for a P-block, and omits the consideration of the 216 
cycles necessary for the decode of an I-block. The inclusion of I-blocks would reduce the 
calculated bandwidth and would not be indicative of video streams that use few I-blocks to 
achieve maximum compression. 
This bandwidth requirement is based upon the refresh rate of the DRAM (RFrate), the 
input rate of the compressed data (CDrate), the rate at which macroblocks must be decoded 
(MBrate), the number of B-pictures between two P-pictures (M - 1), and the average rate at 
which decoded pixels must be displayed (P/Xrate). Each of these variables is defined in 
Equations (4.2)-(4.5) which can be found in [2]. 
Bandwidth Constraint for Decoder 
(F
l'
""""'.' 13) � RFrate*3 + CDrate*27 I 128 + 
MBrate*((M -1)*814+ 515) I M + PIXrate*5 I 4 
Refresh Rate Definition 
RF rate= (#rows to refresh) I (seconds between refreshes) 
Compressed Data Definition 
(4.1 l 
(4.2) 
CD rate= (average rate of encoded data input I stream)*(X video streams) ( 4.3) 
Macroblock Decode Rate Definition 
MB rate= [((pixels/ frame)/ 256 )*(frames I second) ]*(X video streams) ( 4.4) 
Pixel Output Rate Definition 
PIXrate = [ (pixels I frame )*(frames/ second) ]*(X video streams) (4.5) 
The equations above make it obvious that the bandwidth constraints imposed upon the 
decoding process are dependent not only upon system design, but also upon the properties of 
the video stream to be decoded because 'the CD rate and size of the frames are detennined when 
the video stream is encoded. The bandwidth constra�nt will be calculated for a system using 
DRAM with 9 address bits which requires a refresh of all 512 rows every 8 ms. The video 
streams that were analyzed for this thesis, and shall be used in these calculations, have a frame 
size of 192xl44 pixels, place two B-pictures between each pair of P-pictures (M = 3), and have 
20 
·-----·--· --
a display rate of 30 Hz. A worst-case estimated average CDrate of L3 Mbits/s will be used fat 
each stream, although the analysis of test MPEG files showed that most streams had an average 
requirement of less than 1 Mbit/s. 
Using the given configuration, it is found that RF rare has a value of 64,000 cycles/s, 
MBrate is 3,240*X macroblocks/s, and PIX rate is calculated to be 829,440*X pixels/s. Because 
the STi3400 can support a maximum F
primary 
of 50 MHz, Equation (4.1) can be solved for X to 
determine the maximum number of video streams that can be decoded in real time. Plugging in 
the calculated values for variables results in Equation (4.6), which shows that designing a 
system around the STi3400 permits the real time decode of up to four video streams. By 
removing the stipulation of real-time decode, i.e., reducing the frame rate, more video streams 
could be sent through the chip. 
Maximum Number of Video Streams 
!6.67exp6;, 192exp3+ 274,218• X + 2.3exp6* X + l.04exp6* X ; X,; 4.5 (4.6) 
It can be seen from Equation (4.6) that the decoder chip can decode up to four video 
streams with a compressed data rate of about 1.3 Mbits/s and a display rate of 30 Hz. This 
value, of course, is the theoretical maximum throughput of the chip. Due to constraints of the 
chip's implementation, this level of interleave is not easily attained. 
The STi3400 begins display processes only upon receipt of a VSYNC pulse. This 
behavior imposes additional constraints on decoding performance. In order to produce a system 
that can decode the video streams in real time, the VSYNC signal, as seen by the decoder chip, 
must become faster as more video streams are interleaved. Decoding only two streams requires 
a VSYNC frequency of 60 Hz, which happens to be the refresh rate for the NTSC video 
standard. Thus, using progressive scan-out, an entire decoded frame can be obtained from the 
STi3400 on each VSYNC. Using external logic, each frame can be separated into odd and even 
fields, and it can then be given to an NTSC encoder. Thus, two 30 Hz streams can be displayed 
in real time. An additional stream would require an extra 30 Hz per refresh period giving a 
VSYNC frequency of 90 Hz, A computer monitor might be able to handle such a refresh rate, 
21 
but an NTSC system would not. In fact, it would be necessary to completely decouple the 
display process from the chip such that the VSYNC seen by the chip becomes strictly a 
decoding signal. Then, a completely different subsystem would be required to store decoded 
information and ensure that the display process was in synchronism with the true VSYNC 
signal. 
However, if the sequences need not be displayed in real time, more than two streams 
can be accommodated at 60 Hz. It would be possible, for example, to decode four streams if 
each were displayed at 15 Hz. The required refresh rate in this case is, again, only 60 Hz. 
22 
5. FUTURE DIRECTIONS
5.1 System Overview 
Using either of the interleaving designs presented in Chapter 3 and a slice-level MPEG-
1 video decoder such as the STi3400, a complete context-switching decoder board can be 
assembled. The block diagram in Figure 5.1 outlines such a system. In addition to the auxiliary 
microcontroller required for the MPEG decoder, subsystems must be implemented to feed data 
to the core decoder and to interface the decoder to the video display. 
Control I 16 data 
Lines R bits 
ISA 8 bits Data 8 bits 
Bus Transfer/ 
Interface Interleave 
Flow 
Control MPEG 
Feeder 
!1 bit/
To 24 Core 18 bits
Display Video bits MPEG-1 . 
Subsystem Video l<·-·_J 
Decoder 
Figure 5.1 Decoder Board 
5.2 MPEG Feeder Subsystem 
The STi3400 can accept the compressed MPEG data in either of two ports. The video 
can be sent to a dedicated serial input pin, or compressed data and control messages for the 
dedicated microcontroller can be multiplexed on the chip's sixteen bit 1/0 port. Both 
alternatives have their advantages and shall be described. 
5.2.1 Serial input 
Figure 5.2 shows a system designed to deliver single bits to the MPEG decoder. 
Software interleaving is assumed in aider to simplify the block diagram. This subsystem 
23 
-- ------- ------ ------ -------
consists of two control PLDs, a FIFO, and a shift register. One PLD is used to latch incoming 
data, gate the trl-stated latches onto the internal bus, and then write data into the FIFO. This is 
simply the data transfer system described earlier. The second PLD is used to control the shift 
register and implements flow control by handshaking with the decoder chip and sending 
interrupts to the host CPU. 
Data(7 .. 0) 
MSB Latch I LSB Latch I 
' • ' 
' 
Output Enable 
Latch Enable Output Enable I Steer Data PAL F 
Incoming Data Signal ' Write FIFO . 0 
Read FIFO 
-low Control Interrupt ' , 
MPEG Feeder PAL Shift Control 
Shift Register 
! Clock Control One bit 
Clock Clock 
toMPEG 
Enable Serial Clock 
Figure 5.2 Serial Input MPEG Feeder 
The output of this subsystem goes directly to the serial input of the core decoder. There 
is c:1lso a clock dedicated to running the serial port of the decoder chip. Each edge of the serial 
clock should correspond with the introduction of a new bit. The implication is that unless 
24 
·--------·----- ---··---
continuous input can be guaranteed, once the decoder leaves reset, external clock-enable 
circuitry will be required to hold the clock in a high state until the next group of data is 
prepared for delivery. Without this clock-enable mechanism, garbage will be introduced into 
the decoder on each clock in which valid data are not present on the bit line. 
The frequency of the serial clock is limited to 20 MHz by the STi3400. Because a 
VSYNC of 60 Hz can be used to decode a maximum of two 30 Hz streams in real time, 20 
MHz is more than sufficient bandwidth even if making the assumption that each stream has an 
average bandwidth of 8 Mbits/s. Some bandwidth margin should be left over to accommodate 
for periods in which the input requirements for a particular stream exceed the average. 
5.2.2 Parallel input 
The alternative to serial input is to fully utilize the communications port of the STi3400. 
Although not a dedicated port like the serial pin, the input bandwidth is much higher. 
Configuring the STi3400 to use the full sixteen bit interface, input bursts of up to 80 Mbit/s can 
be achieved according to [1]. The caveat to this approach is that data cannot be sent to the chip 
while the microcontroller is passing messages through the I/0 port. This increases the 
complexity ofthe code required for controlling the decoder, but also reduces the hardware 
required as shown in Figure 5.3. 
In contrast to the diagram for the serial input design,- the only hardware added to the data 
transfer system is the PLD used for controlling data flow between the video stream-FIFO and 
the decoder chip. The stream is no longer broken into individual bits by a shift register, and the 
rigid timing requirements imposed by the serial port are no longer a concern. Although 
constant input is not an option, as it is with the serial method, this design is both easier to 
physically assemble and more readily debugged because variables such as proper shift rate and 
ringing on the serial clock are eliminated. 
25 
Data(7 .. 0) 
MSB Latch LSB Latch I 
� 
Latch Enable Output Enable 
Output Enable F 
� I 
Steer Data PAL F 
ncoming Data Signal ' Write FIFO 
Read FIFO 
, 
Control Lines to 
Microcontroller 
8 bits 
MPEG Feeder PAL 
ToMPEG I, 
Decoder 
Figure 5.3 Parallel Input MPEG Feeder 
5.3 MPEG Decoder Block 
This block is composed of the core decoder, a microcontroller, some program memory, 
and the bank of memory used for holding both raw compressed data and the decoded frames of 
all incoming video'streams. The block diagram of Figure 5.4 shows the incoming data arriving 
at the serial input. If the incoming data stream was one byte wide instead of one bit wide, the 
serial clock and serial bit-stream of the diagram would be replaced with an 8�bit bus attached to 
the communications interface. 
The associated rnicrocontroller is used to properly initialize decoding parameters for 
each picture and to watch for a context switch. When a new stream is detected, a few of the 
STi3400's decoding parameters have to be stored so that the first sequence can be properly 
decoded once more of its data are presented to the chip. These variable parameters are stored in 
26 
,_,  ___ __ _ ------ ----- -- ----- --
the dedicated memory area shown in Figure 5.4, and the parameters for the new stream are used 
for decoding the next picture. Thereafter. the decoding parameters are swapped on each picture 
to be decoded. When an end of sequence header is read, the number of parameter storage areas 
is decremented by one. If the resulting number of streams is one, then the parameter swapping 
mechanism is disabled until a new stream is detected. The parameters that need to be preserved 
include the backward frame pointer (BFP), the displayed frame pointer (DFP), the forward 
frame pointer (FFP), and the current quantization matrix. 
Microcontro!ler Memory 
Program Store 
Memory 
Bus . Microcontroller 
' 
MPEG Stream I' c ' .omml)n1cauons 
Parameters ,,,Interface 
Serial Bit·stream 
STI3400 
MPEG Serial Clock In 
Decoder 
Memory 
. I, Bus
!Video Data Out
Decoder Memory 
, 
. 
VSYNC In 
Figure 5.4 MPEG Decoder Block 
In a system where it is possible to display a fi;ame with a different size than that of the 
sequence presently being decoded, the decoded frame size (DFS) would also need to be stored. 
Unfortunately, the architecture of the STi3400 does not allow for such flexibility. Therefore. 
values such as the DFS, the decoded frame width (DFW), the size of the bit buffer, and general 
configuration register assignments should be determined before the decoding process begins 
and maintained throughout. 
27 
5.4 Video Subsystem 
Once the video .meam has been decoded it must be displayed. This stage, like the 
others, requires enhancing the capability of the MPEG decoder chip. As far as the picture 
decoder is concerned, it is simply decoding one frame of a video stream on each VSYNC 
signal. The decoder is unaware of multiple video streams, so the video must be separated as it 
leaves the decoder. 
Assuming a VSYNC rate of 60 Hz, a typical display order for two 30 Hz MPEG-1 video 
streams is shown in Figure 5.5. In each 1/60 of a second, a frame is decoded from one video 
stream and the previously decoded frame from another stream is output. Because the video 
output of the decoder chip is dedicated to one video stream during each VSYNC period, it 
obviously cannot provide the data for the entire video display. Thus, the video data must have a 
temporary storage location so that each frame can be repeated as many times as possible. 
Stream I Stream l Str'eam I Stream I Stream 1 
Frame 1 Frame l Frame 2 Frame 1 Frame 3 ••• 
Stream 2 Stream 2 Stream 1 Stre:i.m 1 
Frame 1 Frame 1 Frame2 Frame 1 
IME 1160 2/60 3/60 4/60 5/60 
Figure 5.5 Displa)' Order 
The mechanism for the temporary storage is shown in Figure 5.6. The method shown in 
the block diagram is to separate the video streams into their own banks of memory. Using this 
effectively interleaved memory structure allows the system to write a frame from one video 
stream to its respective bank, while reading the information from another bank without access 
conflicts. 
Obviously, timing is critical for such a system. A display control must be used to 
arbitrate access to the memory banks based on the video timing signals. If the video adapter 
received the data from a particular video stream at the wrong time, the monitor would display a 
corrupted picture. 
28 
···-----------·---
Memory 
, 
Bank I 
24 bits (RGB) . Video 
Adapter 
Bank 2 
- Display
Video Timing 
, Control
Signals 
Figure 5.6 Video Block 
When the display is to be interlaced as i'n NTSC video, this task becomes more difficult. 
Due to the interlacing of the video, only half of the lines are displayed on each VSYNC. The 
display is separated into even and odd lines, each set being referred to as a field. If the target 
display is interleaved, it might be necessary to divide the memory banks into storage locations 
for each stream, and then subdivide each stream's memory block into an even field and odd 
field. Fortunately, chips like the STi3400 provide the option of generating video data in a 
progressive or interlaced order. 
29 
··-------------------- -- ·------------------ ---
6. CONCLUSIONS
Two methods of implementing the interleave algorithm of Chapter 2 have been 
presented. When deciding whether to use the hardware or software implementation, the 
intended application must be, taken into consideration. The software design, although the 
slower of the two options, is sufficient to support two real-time video streams. If the target 
decoder is a commercial, slice-level decoder, then two streams are the most that can be 
expected in real time. The software interleave approach, however, requires a large amount of 
processing time on a CPU, 
If the decoder is not to be an accessory on a PC, but rather a stand-alone system, the 
hardware design may be the appropriate choice. Most likely, such an embedded application 
would use direct memory access transfers to move raw data to the decoder. The hardware 
system would then be responsible for interleaving the incoming data and redirecting the DMA 
to the next stream at picture boundaries. Another possibility is that the target decoder is a 
custom-built MPEG picture decoder capable of processing more than two streams in real time. 
Once again, the hardware system would be preferable, this time because of the higher transfer 
requirement of the additional real-time video sequences. 
Although each method has its own particular applications, they both effectively 
interleave MPEG-1 video data. In addition, either can be interfaced to an enhanced decoder 
system as that of Chapter 5, to produce an efficient, high performance video system. The result 
is a new direction in multimedia. 
30 
APPENDIX A. ISA BUS INTERFACE 
This design file implements the BIU described in Chapter 3 with Advanced Micro 
Device's PALASM programming language. To use this design file, one has to procure a 
P ALASM compiler and a PLD burner. 
:P ALASM Design Description 
;---------------------------------- Declaration Segment-----------­
TITLE Address Decoder for MPEG Board 
PATTERN 
REVISION 
AUTHOR Bill Diehl 
COMPM'Y ADSL 
DATE 8/12/96 
CHIP _addr PALCE22VJO 
;----------------------.- PIN Declarations---------------
PIN 1 elk 
PIN 2 aO 
P1N 3 ,1 
PIN 4 ,1 
PIN '5 ,3 
PIN 6 a4 
PIN 7 ,5 
PlN 8 ,6 
PIN 9 ,? 
PIN 10 ,8 
PlN 11 ,9 
: _IOW and _JOR are active ]ow inputs 
PIN 13 _IOW 
PIN 14 _lOR 
PIN 15 ISA_RESET 
PIN 16 _MPG_RESET 
PlN 17 EQT 
PIN 18 clear_IRQ 
PlN 19 / _bus_enable 
PIN 20 /_isa_write 
PIN 21 /_isa_read 
PIN 22 /_reset 
PIN 23 _1016 
PIN 12 GND 
PIN 24 vcc 
REGISTERED 
REGJSTERED 
REGISTERED 
COMBINATORJAL 
COMBINATORIAL 
COMBINATORIAL 
COMBINATORIAL 
;-------------------------- ------·-- Boolean Equation Segment-----­
EQUATIONS 
_reset= ISA_RESET + /_MPG_RESET 
; The signals _bus_enable, 
31 
; INPUT 
; INPUT 
; INPUT 
; INPUT 
; INPUT 
; INPUT 
; INPUT 
; INPUT 
; INPUT 
; INPUT 
; INPUT 
; INPUT 
; INPUT 
; INPUT 
; lNPUT 
; OUTPUT 
;OUTPUT 
;OUTPUT 
; OUTPUT 
; OUTPUT 
;OUTPUT 
; OUTPUT 
; INPUT 
; INPUT 
: _isa_write, and _isa_read are associated with address Ox380 
_bus_enable = (a9*a8* a7*/a6*/a5*/a4 */a3*/a2 */al */aO*(/ _IOW +/ _IOR ))*/_reset 
_isa_ write = (a9*a8_*a7*/a6*/a5* /a4 "'/a3" /a2*/a 1 * /aO*/ _IOWJ*/ _reset 
:isa_read also serves as the direction bit for the bus transceivers 
;when high. it indicates incoming data. A low signal means data 
;goes out to the bus 
_isa_read = (a9*a8*a7*/a6*/a5*/a4*/a3*/a2*/a 1 */aO*/_lOR)*/_reset 
; EQT is the end of transmission for a stream. It is associated with 
; address Ox38 J 
EOT = (a9*a8*a7*/a6"'/a5*/a4*/a3*/a2*/a 1 *aO*/_IOW +EOT)*/ _isa_ write+ _reset 
dear _IRQ = (a9*a8*a7*/a6*/a5*/a4*/a3*/a2*/al *aO*/_IOR)+ _reset 
_1016=GND 
_1016.TRST = _bus_enable 
32 
------ --------·-·-�- ---·--·---- --- --- ---------
APPENDIX B. HARDWARE INTERLEAVE DESIGN 
9c INPUT Header_Found will be asserted during the recovery of !MFIFO_ Write 'k 
% INPUT Run_Machine is asserted the cycle before Header_Found can be asserted 9'r 
INCLUDE "74374"; 
CONSTANT WRITE = B"O"; 
CONSTANT READ = B''O"; 
CONSTANT RECOVER = B"l"; 
CONSTANT OFF =B"O"; 
CONSTANT ON =B"l": 
CONSTA!\TT SEQ_TYPE =H"O"; 
CONSTANT GOP_TYPE = H"l "; 
CONSTANT PIC_TYPE =H"2"; 
CONSTANT END_'rYPE =H"3"; 
DESIGN IS fifo_ctl 
DEVICE fifo_ctl IS EPM5064: 
SUBDESIGN fifo_ctl 
( 
clock 
Stan_Fill 
Run_Machine 
Stream_ID 
Header_Found 
/RESET 
EQT 
/MFIF03mpty 
Header_Type[l .. O) 
Mark_bit_R 
Udata_out[7 .. 0) 
FIFO_State[4 .. 0J 
Skip_Sta1e[ 1 .. 0} 
fMFIFO_ Write 
/MFIFO_Read 
/FIFO_l_ Write 
IFIFO_O_ Write 
Error_LED 
VARIABLE 
:INPUT: 
:INPUT; 
:INPUT: 
:INPUT: 
:INPUT: 
:INPUT: 
:INPUT; 
:INPUT: 
JNPUT: 
:OUTPUT; 
:OUTPUT: 
:OUTPUT: 
:OUTPUT, 
,OUTPUT: 
:OUTPUT: 
,OUTPUT: 
:OUTPUT; 
:OUTPUT; 
% State2 of Pattern Detector--> NO_START o/c 
o/c State3 of Pattern detector 9c 
'7c State[ l .. OJ of Pattern detector '7c 
o/c The SeqX bits and Mark_bit are registered. therefore. changes 
applied do not appear at the outputs during the next state 9'c 
Seq_RO.Seq_Rl 
Mark_bit_R 
cDFFE: 
cDFFc 
o/c Flags for showing detection of SEQ header 9'c 
o/c Mark context switch point with glh bit ofFIFO '7c 
33 
stuff_ctl 
Dump_Flag 
Error_LED 
: DFF: 9c Synchronize the tri-state control of the User Data(Udata) ou1puc 9c 
:DFF; 
: DFF; % This wonderful synthesizer glitches in combo logic% 
FIFO_machine ; MACHII'\'E 
Skip_machine 
clk,master_reset 
Switch_context 
SEQ_Flagged 
Stuff_ID 
OF BITS (FIFO_State[4 .. 0]) 
WITH ST A TES ( 
IDLE 
PACK_PlPE , % Stream ID gone-write next byte% 
GOT] 
GET2 , o/c Write next byte '7c 
GOT2 
GET3 
GOT3 
GET4 
GOT4 
GETS 
PIPE_FULL 
MARK_STATE 
R/\V_STATE 
9c·wri1e next byte% 
% Write next byte% 
% Write next byte% 
o/c Recover from write % 
, % Used to hold write low for context mark o/c 
% Simultaneously Write and 9'c 
% Read Mini-Fifo 9, 
R.USH 9c Flush MFIFO into FIFO_O or '7c 
% FIFO_l % 
R.USH_RECOVER. 9'c Recovery time for MFIFO % 
% read and FIFO write 9'c 
% STUFF Stream_ID into bitstream between SEQ and GOP 9c 
PRE_STUFF, 
STUFF] , 'le O % 
STUFFED 1. 
STUFF2 , 9c O % 
STUFFED 1. 
STUFF3 9c 1 9c 
STUFFED3. 
STUFF4 9, B2 % 
STUFFED4. 
STUFFS , % Scream_ID 9c 
STUFFED5, 
DUMP _BYTE, o/c Extra byte in MFIFO % 
ERROR ); % Should never go here!! 'k 
:MACHINE 
OF BITS (Skip_State[l..0]) 
WITH ST A TES ( 
NEITHER 
S1REAMO 
STREAM! 
BOTH 
: GLOBAL; 
:NODE: 
:NODE; 
:NODE: 
=H"O", 
=H'T". 
=H"2", 
= H"3"1: 
34 
Found_Problem 
Udata 
/MFIFO_ Write 
/MFIFO_Read 
/FIFO_l_ Write 
/FIFO_O_ Write 
BEGIN 
DEFAULTS 
/MFIFO_ Write 
/MFIFO_Read 
/FIFO_l_Write 
/FIFO_O_ Write 
Error_LED 
Mark_bit_R 
Udata.d[) 
Seq_RO 
Seq_Rl 
Dump_Flag 
stuff_ctl 
END DEFAULTS: 
elk.in 
master_reset.in 
Udata.clk 
Udata_out[J 
FIFO_machine.elk 
FIFO_machine.reset 
Skip_machine.clk 
Skip_machine.reset 
Mark_bit_R.c\k 
Mark_bit_R.elrn 
Seq_RO.clk 
Seq_RO.clrn 
Seq_RO.ena 
Seq_RJ .elk 
Seq_R J .elm 
Seq_Rl.ena 
stuff_ctl.clk 
stuff_ctl.prn 
Udata.oen 
Dump_Flag.clk 
Dump_Flag.clrn 
Error_LED.clk 
:NODE: 
: 74374; 
:OFF; 
:DFF; 
:DFF; 
:DFF; 
=RECOVER: 
=RECOVER: 
=RECOVER: 
=RECOVER: 
=OFF; 
=OFF; 
= H"CD"; 
=GND; 
=GND: 
=GND: 
=VCC: 
= clock; 
=NOT/RESET: 
= elk: 
= Udata.q[J: 
= elk; 
= master_reset: 
= elk: 
= master_reset: 
= elk: 
= /RESET; 
= elk: 
=!RESET; 
= Header_Found; 
= elk; 
=!RESET; 
= Header_Found; 
= elk: 
= /RESET; 
= stuff_ctl.q: 
= elk; 
= /RESET: 
= c\k; 
35 
Error_LED.clrn 
/MFIFO_ Write.clk 
MFIFO_ Write.prn 
!MFIFO_Read.clk 
flvlFIFO_Read.prn 
/FJFO_J_ Write.clk 
/FIFO_I_ Write.prn 
/FIFO_O_ Write.elk 
IFIFO_O_ Write.prn 
= /RESET: 
= elk: 
=!RESET; 
= elk; 
=!RESET: 
= elk; 
=!RESET; 
= elk; 
= /RESET; 
CASE Skip_machine IS 
WHEN NEITHER =>
IF (Header_Found == 1} THEN 
CASE Header_Type[J IS 
WHEN SEQ_TYPE => 
IF (Stream_lD == 0) THEN 
Skip_machine = STREAMO: 
Seq_RO = VCC: 9'c Can't assign a number to a node 11: 
ELSE 
Skip_machine = STREAM I; 
Seq_Rl = VCC: 
END IF: 
WHEN GOP _TYPE=> 
IF (Stream_lD = 0) THEN 
Skip_machine = STREAMO; 
Seq_RO = GND; 
ELSE 
Skip_machine = STREAM 1; 
Seq_Rl = GND: 
END IF: 
WHEN OTHERS => 
Seq_Rl = Seq_Rl; 
Seq_RO = Seq_RO; 
Skip_machine = NEITHER: 
END CASE: 
ELSE o/c SeqX registers not enabled if NOT Header_Found% 
Skip_machine = NEITHER; 
END IF; 
WHEN STREAMO => 
IF (Header_Found == 1) THEN 
CASE Header_Type[) IS 
o/c Could have back to back Sequence headers '7r 
WHEN SEQ_ TYPE =>
IF (Srream_ID == l) THEN 
Skip_machine = BOTH; 
36 
Seq_R l = VCC: 
ELSE 
Skip_machine = STREAMO', 
Seq_RO = VCC: 
END IF; 
WHEN GOP_ TYPE => 
IF (Stream_ID = 1) THEN 
Skip_machine = BOTI-I: 
Seq_Rl = GND: 
ELSE 
Skip_machine = STREAMO; 
Seq_RO = GND: 
END IF: 
WHEN OTHERS=> 
IF (Stream_ID == 0) THEN 
'7c PIC header clears skip 9c 
Skip_machine = NEITHER: 
ELSE 
% No skip set for Stream l '7r-
Sklp_machine = STREAMO: 
END IF; 
Seq_Rl = Seq_Rl: 
Seq_RO = Seq_RO: 
END CASE: 
ELSE 
Skip_machine = STREAMO: 
END IF: 
WHEN STREAM 1 => 
IF rHeader_Found = 1) THEN 
CASE Header_Type[J IS 
WHEN SEQ_TYPE => 
IF (Stream_ID == 0) THEN 
Skip_machine = BOTI-I; 
Seq_Ro-= vcc: 
ELSE 
Skip_machine = STREAM 1; 
Seq_R l = VCC; 
END IF; 
WHEN GOP _TYPE=> 
IF (Stream_ID = 0) THEN 
Skip_machine = BOTI-t 
Seq_RO = GND: 
ELSE 
Skip_machine = STREAM I: 
Seq_R 1 = GND; 
END IF; 
37 
------------------------- --- . 
WHEN OTHERS => 
IF (Stream_ID == 1) THEN 9c PIC header clears skip 9c 
Skip_machlne = NEITHER: 
ELSE 
Skip_machine = STREAMl; 9c No skip set for Stream O 9'c 
END IF; 
Seq_R 1 = Seq_R 1: 
Seq_RO = Seq_RO: 
END CASE: 
ELSE 
Skip_machine = STREAM I: 
END IF; 
WHEN BOTH => 
IF (Header_Found == I) THEN 
CASE Header_Type[J IS 
WHEN SEQ_TYPE => 
IF (Stream_ID == 0) THEN 
Seq_RO = VCC; 
ELSE 
Seq_RJ = VCC: 
END IF; 
Skip_machine = BOTH: 
WHEN GOP _TYPE=> 
IF (Stream_lD == OJ THEN 
Seq_RO = GND; 
ELSE 
Seq_RI = GND: 
END IF; 
Skip_machine = BOTH; 
WHEN OTHERS => 
IF (Stream_ID = 0) THEN 
Skip_machine = STREAM 1; o/c Other stream still skips o/,; 
ELSE 
Skip_machine = STREAMO; 
END IF; 
Seq_R I = Seq_R 1 ; 
Seq_RO = Seq_RO: 
END CASE; 
ELSE 
Skip_machine = BOTH: 
END IF; 
END CASE; 
38 
9c Because of the excessive propagation delay, the next state value St 
9c is assigned within the conditional statement of the prev state 9c 
CASE FIFO_machine TS 
WHEN IDLE=> 
IF (Start_Fill == I) THEN 
FIFO_machine = PACK_PIPE; 
/MFIFO_ Write= WRITE; 
ELSE 
FIFO_machine = IDLE: % /MFIFO_ Write defaults to RECOVER o/c 
ENDIF; 
WHEN PACK_PIPE => 
IF (Run_Machine = l) TIIEN 
FIFO_machine =GOT!; 'le RECOVER 'le 
/MFIFO_Write = RECOVER; 
ELSE 
FIFO_machine = PACK_PIPE; 
/MFJFO_Write = WRITE: 
END IF: 
WHEN GOT!=> 
/MFIFO_ Write= WRITE; 
FIFO_machine = GET2; 
WHEN GET2=> 
IF (Run_Machine = I) THEN 
% Next state output 'le 
FIFO_machine = GOT2: '7c RECOVER 9t 
ELSE 
FIFO_machine = GET2: 
/MFIFO_ Write= WRITE: 
END IF: 
WHEN GOT2=> 
/MFIFO_ Write= WRITE; 
FIFO_machine = GET3; 
WHEN GET3=> 
IF (Run_Machine= I) THEN 
% Next state output% 
FIFO_machine = GOT3: % RECOVER % 
ELSE 
FIFO_machine = GET3; 
/MFIFO_ Write= WRITE; 
END IF: 
WHEN GOT3=> 
/MFIFO_Write = WRITE; 
FIFO_machine = GET4; 
WHEN GET4=> 
% Next state output o/c 
39 
IF (Run_Machine = 1) THEN 
FIFO_machine = GOT4; 9c RECOVER 'lr 
ELSE 
FIFO_machine = GET4: 
/MFIFO_ Write= WRITE; 
END IF; 
WHEN GOT4=> 
/MFIFO_ Write= WRITE: 9c Next state output% 
FIFO_machine = GETS; 
WHEN GETS=> 
IF (Run_Machine = 1) THEN 
FIFO_machine = PIPE_FULL; % Next state outputs 7C 
/MFIFO_Read = READ: 
IF (Stream_ID == 0) THEN 
/FIFO_O_ Write= WRITE; 
ELSE 
/FIFO_I_ Write= WRITE: 
END lF; 
ELSE 
FIFO_machine = GETS: 
/MFIFO_ Write= WRITE; 
ENDIF: 
WHEN PIPE_FULL => 9c Will know if header matches by this% 
IF (Header_Found == I) THEN 
9c state because we are recovering from'ic
% the write Sl 
Switch_context = ((Stream_ID = 0) & (Skip_StateO == 0)) # 
((Stream_ID == I) & (Sk1p_Statel == 0)1: 
SEQ_Flagged = ( (Seq_RO = 1) & (Stream_ID == OJ ) # 
( (Seq_Rl == l) & (Stream_lD == l) ); 
Stuff_ID = SEQ_Flagged & (Header_Type[] == GOP _TYPE); 
Found_Problem = SEQ_Flagged & (Header_Type[] != GOP _TYPE) & 
(Header_Type[J != SEQ_TYPE): 
IF Switch_context THEN 
Mark_bit_R = ON; % registered -- effective at next clock% 
FIFO_machine = MARK_ST A TE: 
/MFIFO_Write = WRITE; 
/MFIFO_Read = READ; 
IF (Stream_lD == l) THEN 
/FIFO_l_Write = WRITE: 
ELSE 
/FIFO_O_ Write= WRITE; 
40 
END IF; 
ELSlF (StufCID- == 1) THEN 9, Skipping enabled 9i: 
FIFO_machine-= PRE_STUFF; 
/MFIFO_Read = RECOVER: 'k Next state outputs- 9, 
/FIFO_]_ Write= RECOVER: o/c These are default values 9c 
/FIFO_O_ Write= RECOVER: 
&1FIFO_ Write= WRITE; 
Udata.d[] = H"OO''; 
ELS IF (Found_Problem == 1) THEN 
FIFO_machine = ERROR; 
/MFIFO_ Write= RECOVER; 
ELSE 
HFO_machine = R/\V_STATE: 9c Could be PIC_TYPE % 
/MFIFO_Write = WRITE; 
END IF: 
ELSE 
FIFO_machine = R/W_STA TE; 
/MFlFO_ \\rite=-WRITE; 
END IF; 
WHEN MARK_ST A TE=> % Guarantee that MARK is 'k 
'Jc stable by FIFO write 9, FIFO_machine = RI\V _STA TE; 
/MFIFO_ Write= WRITE; 
WHEN Rl\\'_STATE=> 
---------· 
IF (EQT== I) THEN 
FIFO_machine = FLUSH; 
/MFIFO_Read = READ: 
IF (Stream_lD = 1) THEN 
/FIFO_!_ Write= WRITE: 
ELSE 
/FIFO_O_ Write= WRITE; 
END IF; 
/MFIFO_ Write= WRITE; 
ELS IF (Run_Machine == I) THEN 
FlFO_machine = PIPE_FULL; % Next state outputs o/c 
/MFIFO_Read = READ; 
IF (Stream_lD = 0) THEN 
/FIFO_O_Write = WRITE; 
ELSE 
/FIFO_t_ Write= WRITE; 
END IF; 
ELSE 
FIFO_machine = R/\V _STA TE: 
/MFIFO_ Write= WRITE; 
41 
·------·-·--·---------------
END IF: 
WHEN FLUSH=> 
FIFO_machine = FLUSH_RECOVER: '7c Next state outputs '7c 
/MFIFO_Read = RECOVER: 
/FIFO_O_ Write= RECOVER; 
!FIFO_!_ Write= RECOVER;
/lv1F1FO_ Write= WRITE: 
WHEN FLUSH_RECOVER => 
o/c Keep low '7c 
IF (/MF1FO_empty == 0) THEN % Flush is done '7c 
FIFO_machine = IDLE: 
/lv1FIFO_Read = RECOVER: 
/FIFO_O_ Write= RECOVER; 
/FIFO_]_ Write= RECOVER; 
ELSE 
FIFO_machine = FLUSH: 
/MFIFO_Read = READ: 
IF (Stream_TD = OJ THEN 
IFIFO_O_ Write= WRilE: 
ELSE 
/FIFO_l_ Write= WRITE: 
END IF: 
ENDIF; 
/MFIFO_ Write= WRITE: '7c Keep low '7c 
'7c STUFF sequence is triggered from the PIPE_FULL state so if there 9c 
9'c is a Run_Machine sig:naL it will be in sync with the progression '7c 
'7c of these states. If not. the stuffing process will be completed '7c 
'7c before the next transmission from the HOST '7c 
WHEN PRE_STUFF => 
FIFO_machine =STUFF]; 
Udata.d[] = H"'OO''; 
stuff_ctl = GND: 
/lv1FIFO_Read = RECOVER: 
/MFIFO_ Write= WRI1E: 
IF (Stream_ID == 0) THEN 
/FIFO_O_ Write= WRITE; 
ELSE 
/FIFO_J_ Write= WRITE: 
END IF; 
WHEN STUFF I => 
FIFO_machine = STUFFEDl: 
42 
/MFIFO_Read = RECOVER; o/c- Next state outputs 9c 
/FIFO_l_Write= RECOVER: 
/FIFO_O_ Write= RECOVER: 
Udata.d[l = H"OO": 
stuff_ctl = GND; 
% Shouldn't have any bytes to write after this ... if so. either 
this board is running too slow, or the ISA bus is ·WAY· fast! o/c-
IF (Run_Machine == l) TI-JEN 
/MFIFO_ Write= RECOVER; % latch last byte 'le 
Dump_Flag = VCC; 
ELSE 
/MFIFO_ Wrlte = WRITE; 
END IF: 
WHEN STUFFED l => 
FIFO_machine = STUFF:?:: 
Udata.d[] = H"OO": 
stuff_ctl = GND; 
IF (Stream_ID == OJ THEN 
/FIFO_O_ Write= WRITE: 
ELSE 
/FJFO_l_Write = WRITE: 
END IF: 
/MFIFO_ Write= WRITE; 9c Keep write line low 9r 
Dump_Flag = Dump_Flag: 
WHEN STUFF2 => 
FIFO_machine = STUFFED2: 
/MFIFO_ Wme = WRITE; 
/FIFO_J_ Write= RECOVER; 
/FIFO_O_ Write= RECOVER; 
Udata.d[] = H"OO": 
stuf(_ctl = GND; 
Durnp_Flag = Durnp_Flag; 
WHEN STUFFED2 => 
FIFO_machine = STUFF3; 
Udata.d[) = H"OJ "; 
stuff_ctl = GND; 
IF (Stream_ID = 0) THEN 
/FIFO _O_ Write = WRITE; 
ELSE 
43 
/FIFO_l_Write = WRITE: 
END IF: 
/MFIFO_ Write= WRITE; 
Dump_Flag = Dump_Flag; 
WHEN STUFF3 => 
FIFO_machine = STUFFED3; 
/FIFO_l_ Write= RECOVER; 
/FIFO_O_Write= RECOVER: 
Udata.d[) = H''Ol "; 
stuff_ctl = GND; 
&ffIFO_Write = WRITE; 
Dump_Flag = Dump_Flag: 
WHEN STUFFED3 => 
FIFO_machine = STUFF4: 
Udata.d[] = H"B2"; 
stuff_ctl = GND: 
IF (Stream_ID = OJ THEN 
/FIFO_O_ Write= WRIIT: 
ELSE 
/FIFO_!_ Write= WRITE; 
ENDIF: 
/MFIFO_ Write= WRITE; 
Dump_Flag = Dump_Flag; 
WHEN STUFF4 => 
FIFO_machine = STUFFED4; 
/FIFO_l_ Write= RECOVER: 
/FIFO_O_ Write= RECOVER; 
Udata.d[] = H''B2"; 
stuff_ctl = GND; 
/MFlFO_ Write= WRITE; 
Dump_Flag = Dump_Flag; 
WHEN STUFFED4 => 
FIFO_machine = STUFFS; 
Uda1a.d[8 .. 2] = H"OO"; 
Udata.d[ J 1 = Stream_ID: 
stuff_ct! = GND; 
44 
---------------------··- -------··---
IF (Stream_ID == 0) THEN 
/F1FO_O_ Write= WRITE: 
ELSE 
/FIFO_l_Write = WRITE; 
END IF: 
/MFIFO_ Write= WRITE; 
Dump_Flag = Dump_Flag: 
WHEN STUFFS=>
FIFO_machine = STUFFED5; 
!FIFO_l_ Write= RECOVER: 
/FIFO_O_ Write= RECOVER; 
Udata.d[8 .. 21 = H"OO": 
Udatli.d[ ]J = Stream_ID: 
stuff_c!l = VCC: 
&1FIFO_ Write= WRITE: 
Dump_Flag = Dump_Flag; 
WHEN STUFFED5 => 
IF (Dump_Flag = I) THEN 
FIFO_machine = DUMP _BYTE: 
IF (Stream_ID == 0) THEN 
/FIFO_O_ Write= WRITE; 
ELSE 
/FIF0_1_ Write= WRITE: 
END IF; 
/MFIFO_Read = READ: 
ELSE 
FIFO_machine = R/W _State: 9'c Return to normal control % 
END IF: 
!MFIFO_ Write= WRITE;
WHEN DUMP _BYTE=> 
FIFO_machine = R!Vv _State: 
/l\1FIFO_ Write: WRITE; 
/MFIFO_Read = RECOVER: 
/FIFO_!_ Write= RECOVER: 
/FIFO_O_Write= RECOVER; 
% Need to remove byte from MFIFO to % 
% keep sync with Mark bit line % 
WHEN ERROR=> 'k All write lines should default to RECOVER% 
FIFO_machine = ERROR; 
Error_LED : ON: 
WHEN OTHERS => 
45 
FIFO_machine = ERROR: 
END CASE: 
END: 
46 
-------------··-----·-·--· · 
APPENDIX C. PATTERN DETECTOR 
ENTITY patterns IS PORT ( 
elk: IN bit; 
EOT: IN bit: -- master reset 
clear_lRQ: IN bit; 
bus_data: IN x01z_vector(7 DOWNTO O); 
BUS_enable_LO: IN bn: 
run_parse: IN bit; 
state: INOUT x0lz_vector(3 DOWNTO 0): 
stream_id: INOUT x01z; 
irq_out: OUT xOlz); 
AITRIBUTE pin_numbers OF patterns: ENTITY IS 
"elk:]" & 
"EOT:2 " & 
''clear_IRQ:3" & 
"bus_data(7J:4" & 
"bus_data(6):5" & 
"bus_data(5):6" & 
"bus_data(4):7" & 
"bus_data(3):8" & 
"bus_data(2):9 " & 
"bus_data(l ):10" & 
"bus_data(O): 11 " & 
"BUS_enable_LO: 13" & 
"run_parse:14" & 
"irq_out: 15 " & 
"state(3):l7" & -- indicates start code found 
"state(2):18" & 
''state{]J:19" & -- 2 LSB state bits indicate 
''state(0):20" & start type 
"stream_id:13"; 
END panerns: 
USE work.r1lpkg.all: 
USE work.int_math.all; 
ARCHITECTURE arch_state OF patterns IS 
SIGNAL irq:bit: 
-- States arranged so that legal transitions due to changing ISA bus 
-- conditions can occur with a single bit change 
SUBTYPE state_var IS x0lz_vector(3 DOWNTO O); 
CONSTANT idle:siate_var :="0000":-- states chosen to minimize 
CONSTANT read_id;state_var :="0001 ":-- logic 
CONST ANT no_start:state_ var :="0 I 00": 
CONSTANT first:state_var :="0101 "; 
CONSTANT second:state_var :="0110"; 
47 
CONSTA!\'T third:state_var :=''0111": 
CONST Al\"T sequence;stat<!_Var 
CONST ANT GOP:sta!e_var 
CONST ANT picture:state_ var 
CONST ANT end_seq:state_ var 
:="1000": 
:="1001'': 
:="JOJO"; 
:="IOI I"; 
SUBTYPE input IS xO!z_ vector(7 DOWNTO 0): 
CONST ANT byte 1 :input := x ''00''; 
CONSTANT byte2:input := x"OO": 
CONSTANT byte3:input := x."01 "; 
CONSTANT seq_head:input := x''B3": 
CONSTANT end_head:input := x"BT'; 
CONSTANT gop_head;input := x"B8"; 
CONSTANT pic_head:input := x"OO"; 
CONSTANT !o_ASSERTED:bit 
CONSTANT lo_DEASSERTED:bit 
CONSTANT hi_ASSERTED:bit 
CONST ANT hi_DEASSERTED:bit 
BEGIN 
:='0'; 
:=']'; 
:='!'; 
:='0'; 
moore: PROCESS {elk, EOT, elear_IRQ. BUS_data, run_parse) 
VARIABLE incoming: boolean: 
VARIABLE got] ,got2,got3: boolean; 
VARIABLE got_seq,got_gop: boolean; 
VARIABLE got_plC,got_end: boolean; 
BEGIN 
IF (EOT='l'I THEN 
state<= idle: 
irq <= 'O': 
stream_id <= 'O'; 
ELSIF (clk'event AND elk=']') THEN 
incoming := run_parse='l '; 
gotl := run_parse-='1' AND BUS_data = byte I; 
got2:= run_parse='l' AND BUS_data = byte2; 
got3:= run_parse='J' AND BUS_data = byte3; 
gocseq:= run_parse='l' AND BUS_data = seq_head; 
got_gop:= run_parse='l' AND BUS_data = gop_head; 
got_pic:= run_parse='l' AND BUS_data = pic_head: 
got_end:= run_parse='l' AND BUS_data = end_head: 
CASE state IS 
WHEN idle=> 
IF incoming then 
state<= read_id; 
stream_id <= bus_data(O): 
ELSE 
state <= idle; 
stream_id <= stream_id; 
END IF: 
irq <= lfq: 
48 
WHEN read_id => 
IF incoming THEN 
state<= no_stan: 
ELSE 
state<= read_id: 
END IF: 
irq <= irq; 
sfream_id <= stream_ld; 
\\THEN no_start => 
IF got! THEN 
state <= first: 
ELSE 
state<= no_s!art; 
END IF; 
irq <= irq: 
stream_id <= stream_id. 
WHEN first => 
IF got2 THEN 
state<= second: 
ELSIF incoming THEN 
state<= no_start: 
ELSE 
state <= first: 
END IF: 
1rq <= 1rq; 
stream_id <= stream_id: 
WHEN second => 
IF got3 THEN 
state<= third: 
El,..SIF incoming THEN 
state<= no_start: 
ELSE 
state<= second: 
END IF; 
irq <= irq: 
stream_id <= stream_id: 
-- no pattern 
-- pattern broken 
wait for a byte 
pattern broken 
wait for a byte 
-- check rules for popping the IRQ ... may be using feedback 
WHEN third => 
IF got_seq THEN 
state<= sequence: 
irq <= 'l': 
ELSIF got__gop THEN 
state <= GOP; 
irq <= "\': 
ELSIF g01_pic THEN 
state<= picture; 
49 
irq <= '1': 
ELSIF got_END THEN 
state<= end_seq: 
irq <= T: 
ELSIF incoming THEN 
state<= no_start: 
irq <= irq; 
ELSE 
state <= third: 
irq <= irq; 
END IF; 
-- wait for a byte 
stream_id <= stream_id: 
WHEN sequence => 
state<= no_start 
-irq <= irq: 
stream_id <= stream_id: 
WHEN GOP=> 
state<= no_start: 
irq <= irq: 
stream_id <= stream_id; 
WHEI\' picture => 
state<= no_stan: 
irq <= irq: 
stream_id <= stream_id: 
WHEN END_seq => 
state<= no_start: 
1rq <= irq: 
stream_id <= stream_id; 
WHEN OTHERS => 
stale <= idle; 
irq <= irq: 
stream_id <= stream_id; 
END CASE: 
IF (clear_IRQ = 'J ') THEN 
irq <= 'O': 
END IF: 
END IF: 
END PROCESS. 
-- should only imerrupt after current outpw() completes 
otherwise this outpw() may be repeated after JSR 
50 
---- --- --- --- -
-- thus. irq_out is tristated 
-- enabkd when !BUS_enable 
uO: tnout PORT MAP (irq.BUS_enable_L0.1rq_OUTJ; 
END arch_state'. 
51 
APPENDIX D. SOFTWARE INTERLEAVE DESIGN 
The following program can be used to implement the interleave algorithm discussed in 
Chapter 2. Given two validly encoded MPEG input files, an output file is generated which is 
interleaved on picture boundaries. This program is not designed to perform the run-time 
generation of-interleaved bit-streams, but could be modified to do so if support were added to 
handle interrupts indicating new, incoming data streams. 
I* BINARY MPG interleave program 
Given two MPG files. Jt will interleave them on picture boundaries. and 
insert a user data code to identify the particular stream after each 
sequence header 
., 
/*******"""***'·*"*********************************"'***************************! 
#include <stdio.h> 
#include <stdlib.h> 
#include <string.h> 
#define BUFFER_SIZE 512 
#define CLEAR_FIFO OxFFFFFFFF 
#define HEADER OxOOOOOI 
#define SEQ Ox00000JB3 
#define SEQ_END Ox000001B7
#define GOP OxOOOOOI B8 
#define PIC OxOOOOOJOO 
#define USER_DATA Ox000001B2 
#define STREAM I OxOl 
#define STREAM2 OxOO 
#define SWITCH O 
#define MAINTAIN I 
void one_block (FILE *,unsigned char *,int *,unsigned long* ,int*, 
unsigned long*, int*, int. FILE*, int*); 
void main Ont argc,char *argv[]) { 
FILE *raw I. *raw 2, *outfile: 
int skip 1,skip2.byte_count l .byte_count2.contexr: 
52 
int indexl.index2: 
int lnitial_SEQ l.lnitial_SEQ2: 
I* 8 bits each */ 
unsigned char video I [BUFFER_SIZE]. video2[BUFFER_SIZE]: 
unsigned long chars_readl .chars_read2: 
I* 4 byte FIFO *I
unsigned long fifo J .fifo2: 
unsigned char in_byte,write_byte; 
fprintf(stdout"\n\n This program outputs a binary MPG file \n \n"): 
if (argc != 4) [ 
fprintf(stderr,"\nNeed 3 arguments!\n\n*mpgl* *mpg2* *mpg-out*\n\n"); 
exi1(0): 
I 
if ( (raw! = fopen(argv[l]. "rb")) == NULL) 
I 
I 
printf("**File Sfs no1 opened\n",argv[ I J ): 
exitlO): 
if i (raw2 = fopen(argv[2J. "rb")) == NULL} 
I 
I 
printf("**File 9,s not opened\n",argvf2]): 
exit(OJ: 
if I {outfile = fopen(argv[3], "wb+")) == NULL) 
I 
I 
printf("**File %snot opened\n''.argv[3]): 
exit(O); 
printf("Interleaving %s &\n\t9os\n" ,argv[ J ].argv[2]); 
l" Initialize variables*/ 
byte_count I = 0: 
fifol =0:: 
skipl = O; 
chars_read I = I: 
index!= 1: 
Initial_SEQ 1 = I : 
byte_count2 = 0; 
fifo2 = O;; 
skip2 =0: 
chars_read2 = I; 
index2 = I, 
lnitial_SEQ2 = I: 
53 
I 
context= MAINTAIN: 
while ( (chars_readl > 0) II (c.:hars_read2 > 0) l l 
one_block(rawl, video]. &byte_countl. &fifol. &skipl. &chars_read I. 
&mdexl, 1. outfile. &Initial_SEQl): 
one_block(raw2, video2. &byte_count2. &fifo2. &skip2. &chars_read2. 
&index2, 2. outfile, &Initial_SEQ2): 
byte_count2 = O: 
void one_block (input, array, byte_count. fifo. skip, chars_read, index, ID, 
outfile, initia!_flag) 
FILE *input: 
unsigned char *array; 
int *byte_count: 
unsigned long *fifo: 
int *skip: 
unsigned Jong *chars_read: 
int *index. ID: 
FJLE *outfile: 
int *initial_flag: 
int context: 
unsigned Jong write_long: 
unsigned char write_byte.in_byte: 
context= MAINTAIN: 
while ((context== MAINTAIN) && (*chars_read > 0)) 
I 
/** Fill array if the current array hasn't been exhausted**/ 
if (*index== *chars_readJ { 
*chars_read = fread(array.sizeof(char),BUFFER_SIZE.input); 
*index= 0:
!** Process until array needs to be refreshed or context switch point "'*/ 
/** is located **/ 
while ((*index< *chars_read) && (context== MAIJ\"'TAIN)) 
I 
if (*byte_count == 0) l /* load header code--3 bytes into fifo *I 
*fifo = *fifo I (unsigned long)array[(*index)++];
*fifo = *fifo << 8;
*fifo = *fifo I (unsigned long)array[(*index)++]:
*fifo = *fifo << 8;
*fifo = *fifo I (unsigned !ong)array[*mdex];
*by1e_count = 3;
54 
) else ( /* now go byte by byte */ 
I* If a header code is found, then there is some extra work to 
be done. If not, then pop the head of the *fifo and write it */ 
if ( (*fifo & OxOOffFFffJ = HEADER) { 
I* If this is the beginning of the stream, the fifo will be 
holding one dummy byte*/ 
if ("'initiaUlag != J) ! 
I 
write_byte = (unsigned char)("'fifo » 24); 
fwrite( & wri te_byte,sizeof( char), l ,outfi le J: 
in_byte = array[*index]; 
*fifo = *fifo << 8:
*fifo = *fifo I (unsigned long)in_byte:
switch(*fifo) { 
case SEQ: 
if (*initial_flagJ { 
*initial_flag = O;
) else { 
context = SWITCH: 
I 
*skip=!:
break:
case SEQ_END: 
fwrite(fifo.sizeof(long). l .outfile): 
*fifo = CLEAR_FIFO;
break:
case GOP: 
/* stuff stream ID into user data*/ 
if (*skip== I J l 
write_Jong = USER_DA TA; 
fwrite( &write_] ong,sizeof (Jong), l ,outfile ); 
if(ID == I) I 
write_byte = STREAM 1; 
fwrite( & write _byte,sizeof (char), l .outfile ); 
) else { 
write_byte = STREAM2; 
fwrite( & wri te_byte.sizeof( char), l ,outfile ); 
I 
) else { /* switch */ 
context= SWITCH; 
I 
*skip= 1;
break;
55 
--------· --------·---·'-·---- . --·-· ·----- ··---
I 
case PIC 
if (*skip== 0) { !* no GOP, no SEQ "'/ 
context= SWITCH; 
I 
*skip= O;
break;
default: 
break; 
} else { 
I
write_byte = (unsigned char)(*fifo » 24 ): 
fwrite( & write_byte,sizeof( char), l ,outfile J: 
in_byte = array[*index]; 
*fifo = *fifo << 8:
*fifo = *fife I (unsigned Jong)in_byte:
(*index)++: 
56 
REFERENCES 
[l] STi3400 MPEG/H.261 Video Decoder. SGS-Thomson Microelectronics, January 1996.
[2] STi3240 MPEG Video Decoder. SGS-Thomson Microelectronics, April 1994.
57 
REFERENCES NOT CITED 
Information technology - Coding of moving pictures and associated audio for digital storage 
media at up to about 1.5 Mbits/s. International Standard ISO/IEC 11172-2, Part 2, 1993. 
K. Jack, Video Demystified. Solana Beach, CA: High Text Publications, Inc., 1993.
D. Le Gall, MPEG: A Video Compression Standard for Multimedia Applications,
Trans. ACM, April 1991.
58 
