A CONTROL MECHANISM TO THE ANYWHERE PIXEL ROUTER by Krishnan, Subhasri
University of Kentucky 
UKnowledge 
University of Kentucky Master's Theses Graduate School 
2007 
A CONTROL MECHANISM TO THE ANYWHERE PIXEL ROUTER 
Subhasri Krishnan 
University of Kentucky, skris0@engr.uky.edu 
Right click to open a feedback form in a new tab to let us know how this document benefits you. 
Recommended Citation 
Krishnan, Subhasri, "A CONTROL MECHANISM TO THE ANYWHERE PIXEL ROUTER" (2007). University of 
Kentucky Master's Theses. 441. 
https://uknowledge.uky.edu/gradschool_theses/441 
This Thesis is brought to you for free and open access by the Graduate School at UKnowledge. It has been accepted 
for inclusion in University of Kentucky Master's Theses by an authorized administrator of UKnowledge. For more 
information, please contact UKnowledge@lsv.uky.edu. 
ABSTRACT OF THESIS 
 
 
 
A CONTROL MECHANISM TO THE ANYWHERE PIXEL ROUTER 
 
Traditionally large format displays have been achieved using software. A new technique 
of using hardware based ‘anywhere pixel routing’ is explored in this thesis. Information 
stored in a Look Up Table (LUT) in the hardware can be used to tile two image streams 
to produce a seamless image display. This thesis develops a 1 input-image 1 output-
image system that implements arbitrary image warping on the image, based a LUT stored 
in memory. The developed system control mechanism is first validated using simulation 
results. It is next validated via implementation to a Field Programmable Gate Array 
(FPGA) based hardware prototype and appropriate experimental testing. It was validated 
by changing the contents of the LUT and observing that the resulting changes on the 
pixel mapping were always correct. . 
 
KEYWORDS: Large format displays, Look up table, FPGA, Image warping, Pixel 
mapping 
 
 
 
 
__Subhasri. Krishnan__ 
(Authors Signature) 
 
_____5 – 8 – 07________ 
(Date) 
 
A CONTROL MECHANISM TO THE ANYWHERE PIXEL ROUTER 
 
 
 
 
By 
 
 
Subhasri Krishnan 
 
 
 
 
 
 
 
 
 
 
 
 
_____Dr. Robert J Heath__________ 
(Co-director of Thesis Signature) 
 
_____Dr. Ruigang Yang___________ 
(Co-director of Thesis Signature) 
 
_____Dr. Yuming Zhang___________ 
(Director of Graduate Studies Signature) 
 
____ _____5 – 8 – 07_______________ 
(Date) 
 
 
RULES FOR THE USE OF THESES 
 
 
Unpublished theses submitted for the Master’s degree and deposited in the University of 
Kentucky Library are as a rule open for inspection, but are to be used only with due 
regard to the rights of the authors. Bibliographical references may be noted, but 
quotations or summaries of parts may be published only with the permission of the 
author, and with the usual scholarly acknowledgments. 
 
Extensive copying or publication of the thesis in whole or in part also requires the 
consent of the Dean of the Graduate School of the University of Kentucky. 
 
A library that borrows this thesis for use by its patrons is expected to secure the signature 
of each user. 
 
Name Date
 
________________________________________________________________________ 
________________________________________________________________________ 
________________________________________________________________________ 
________________________________________________________________________ 
________________________________________________________________________ 
________________________________________________________________________ 
________________________________________________________________________ 
________________________________________________________________________ 
________________________________________________________________________ 
________________________________________________________________________ 
________________________________________________________________________ 
________________________________________________________________________ 
________________________________________________________________________ 
 
THESIS 
 
 
 
 
 
 
 
 
 
 
Subhasri Krishnan 
 
 
 
 
 
 
 
 
 
 
The Graduate School 
 
University of Kentucky 
 
2007 
 
 
 
 
 
 
 
 
 
A CONTROL MECHANISM TO THE ANYWHERE PIXEL ROUTER 
 
 
 
 
_______________________________________ 
 
THESIS 
_______________________________________ 
 
A thesis submitted in partial fulfillment of the 
requirements for the degree of Master of Science in the 
College of Engineering 
at the University of Kentucky 
 
 
 
By 
 
Subhasri Krishnan 
 
Lexington, Kentucky 
 
Co-Directors: Dr. J. Robert Heath, Associate Professor of Electrical and Computer 
Engineering  
and Dr. Ruigang Yang, Assistant Professor of Computer Science 
 
Lexington, Kentucky 
 
 
2007 
 
Dedicated to 
Lord Vishnu and Thayar 
 
 
 
ACKNOWLEDGEMENTS 
I would like to thank Dr.Ruigang Yang for providing me with the opportunity to work on 
this project. I am also indebted to his kindness to me throughout my time at the graphics 
lab. I thank Dr. J.R.Heath for his guidance and support while I was writing this thesis. I 
also thank Dr.Elias for having agreed to serve on my committee.  
 
I am extremely thankful to my mom, dad and my brother for their support to me 
throughout my Master’s Degree. I am grateful to them for keeping me inspired, 
motivated and focused in my research and coursework. I couldn’t have made it this far 
without their constant enthusiasm and complete belief in me  
 
I am also thankful to all my friends. 
 
 
 iii
TABLE OF CONTENTS 
 
ACKNOWLEDGEMENTS............................................................................................... iii 
TABLE OF CONTENTS................................................................................................... iv 
LIST OF FIGURES ........................................................................................................... vi 
LIST OF TABLES........................................................................................................... viii 
Chapter 1 : Introduction................................................................................................. 1 
1.1 Background......................................................................................................... 1 
1.2 Motivation for Research ..................................................................................... 1 
1.3 Positioning of Research ...................................................................................... 2 
1.4 Design Approach and Project Goal..................................................................... 3 
Chapter 2 : Concepts Relating to Pixel Router.............................................................. 5 
2.1 Basic Terms ........................................................................................................ 5 
2.2 Image Transformations ....................................................................................... 6 
2.3 Research and Development................................................................................. 7 
2.4 Research, Development and Implementation Goal .......................................... 10 
Chapter 3 : VGA Timing Parameters .......................................................................... 12 
3.1 Basics of Frames ............................................................................................... 12 
3.2 DVI and VGA Timing Signals ......................................................................... 14 
3.3 Memory Constraints.......................................................................................... 15 
Chapter 4 : Implementation Details............................................................................. 17 
4.1 Hardware Implementation Platform ................................................................. 17 
4.2 Need for External Memory ............................................................................... 17 
4.3 Direct Mapping Vs Reverse Mapping .............................................................. 18 
4.4 Example of Image Transformation ................................................................... 20 
4.5 Calculation of Required Bandwidth of Memory Controller ............................. 22 
4.6 Calculation of Actual Bandwidth of Memory Controller ................................. 24 
4.7 Overall System Inside FPGA............................................................................ 25 
4.8 Dynamic Controller State Diagram .................................................................. 27 
4.9 Design Capture, Synthesis and Implementation ............................................... 27 
Chapter 5 : Display Controller..................................................................................... 30 
5.1 Function of Display Controller ......................................................................... 30 
5.2 Overall Description........................................................................................... 31 
Chapter 6 : Memory Controller and Design Aspects................................................... 34 
6.1 SDR-SDRAM ................................................................................................... 34 
6.2 Basic Memory Terminology ............................................................................. 35 
6.3 Memory Retention and Refresh ........................................................................ 36 
6.4 Memory Initialization ....................................................................................... 38 
6.5 Mode Register Contents.................................................................................... 38 
6.6 Isolated Memory Architecture .......................................................................... 41 
6.7 Core Controller ................................................................................................. 42 
6.8 Detailed View of the Memory Controller......................................................... 46 
Chapter 7 : Image Warping.......................................................................................... 50 
7.1 Warping Algorithm........................................................................................... 50 
7.2 State Machine Controller .................................................................................. 50 
 iv
7.3 Detailed View of the Image Warping Controller.............................................. 51 
7.4 Small Functional Units and Circuits Used in the Design ................................. 55 
7.4.1 Address Generation Module ..................................................................... 55 
7.4.2 FIFO.......................................................................................................... 55 
7.4.3 Multiplexing rows, columns, data and valid signals................................. 57 
7.4.4 Tracking Data............................................................................................ 57 
Chapter 8 : Simulation and Image Results .................................................................. 59 
8.1 Validation Tools................................................................................................ 59 
8.2 Test Conditions ................................................................................................. 59 
8.3 Initialization Sequence...................................................................................... 59 
8.4 LUT Storage...................................................................................................... 61 
8.5 Validation of Values Written into LUT............................................................ 61 
8.6 Rapid Operations During Non-Active Display................................................. 65 
8.7 Validation of Image Warping Stages................................................................ 67 
8.8 Validation of Memory Operations .................................................................... 69 
8.9 Simulation Validation of Overall System Organization, Architecture, Design 
and Performance ........................................................................................................... 72 
8.10 User Constraint File .......................................................................................... 73 
8.11 Validation Using Image Results. ...................................................................... 77 
Chapter 9 : Conclusion and future work...................................................................... 80 
9.1 Summary ........................................................................................................... 80 
9.2 Conclusion ........................................................................................................ 80 
9.3 Future Work ...................................................................................................... 82 
9.3.1 Real-Time Images..................................................................................... 82 
9.3.2 Resolution ................................................................................................. 82 
9.3.3 Speed......................................................................................................... 82 
9.3.4 Scalability ................................................................................................. 82 
APPENDIX....................................................................................................................... 83 
REFERENCES ............................................................................................................... 170 
VITA............................................................................................................................... 172 
 
 
 v
LIST OF FIGURES 
 
Figure 1-1 Example of Seamless Projector Display ........................................................... 2 
Figure 2-1 Image Display ................................................................................................... 5 
Figure 2-2 Identity Transform ............................................................................................ 6 
Figure 2-3 Rotation Transform ........................................................................................... 7 
Figure 2-4 Alpha Blending in Multi-Projectors.................................................................. 8 
Figure 2-5 Overall Diagram................................................................................................ 9 
Figure 2-6 Hooking Multiple Boards.................................................................................. 9 
Figure 2-7 High-Level Schematic of Design on Board .................................................... 10 
Figure 3-1 Progressive Scanning ...................................................................................... 13 
Figure 3-2 Horizontal and Vertical Active and Blanking Signals .................................... 13 
Figure 3-3 Vertical Synchronization signal ...................................................................... 14 
Figure 3-4 Horizontal Synchronization Signal ................................................................. 14 
Figure 4-1 Direct Mapping ............................................................................................... 18 
Figure 4-2 Reverse Mapping ............................................................................................ 19 
Figure 4-3 Example of transformation.............................................................................. 20 
Figure 4-4 Concept of Image Transformation .................................................................. 21 
Figure 4-5 Overall System Inside FPGA.......................................................................... 26 
Figure 4-6 State Machine of Overall Dynamic Controller ............................................... 28 
Figure 5-1 Connection of VGA Port to the FPGA ........................................................... 30 
Figure 5-2 Functional View of Display Controller........................................................... 32 
Figure 6-1 Overall Memory Architecture ......................................................................... 35 
Figure 6-2 Inside a Memory Bank .................................................................................... 36 
Figure 6-3 Mode Register Contents.................................................................................. 38 
Figure 6-4  SDR-SDRAM Chip Power-Up or Initialization Sequence ............................ 39 
Figure 6-5 Non-Pipelined Operations............................................................................... 43 
Figure 6-6 Pipelined Operations ....................................................................................... 44 
Figure 6-7 Address Pipelining Stage ................................................................................ 45 
Figure 6-8 State machine for the Memory Controller ...................................................... 47 
Figure 6-9 Detailed View of the Memory Controller ....................................................... 48 
Figure 7-1 State Machine for Image Warping .................................................................. 52 
Figure 7-2 Inside the Image Warping Controller.............................................................. 53 
Figure 8-1 Power-Up Sequence for SDRAM ................................................................... 60 
 Figure 8-2 First Warping Cycle ....................................................................................... 60 
Figure 8-3 LUT Frame Validation.................................................................................... 62 
Figure 8-4 LUT Frame Validation after Interruption by Refresh Operation.................... 62 
Figure 8-5 Multiplexing Data, Row, Column and Valid Signals ..................................... 63 
Figure 8-6 Transition from Writing LUT to Writing Image Frame.................................. 64 
Figure 8-7 Operations during Active Vs Blank Display Time ......................................... 66 
Figure 8-8 Transition from Warping to Writing Input Frame and Scan Out Enabled...... 68 
Figure 8-9 The LUT Read Stage in Image Warping Cycle .............................................. 70 
Figure 8-10 The Image Read Stage in the Image Warping Cycle .................................... 71 
Figure 8-11 The Image Write Stage in the Image Warping Cycle ................................... 71 
Figure 8-12 Memory Operations Involved in Writing an Input Frame ............................ 74 
 vi
Figure 8-13 Memory Operations during Image Warping. ................................................ 75 
Figure 8-14 Post Place and Route Simulation Waveform for 6 Seconds of Operation.... 76 
Figure 8-15 (a) Simulated Input Image (b) Identical Transformed Output Image. .......... 77 
Figure 8-16 (a) Shifted Output Image (b) Output Image Rotated 45° with respect to 
Origin. ....................................................................................................................... 78 
Figure 8-17 (a) Output Image Rotated 45° with respect to Center (240, 320)  (b) Output 
Image Rotated -45° with respect to Center. .............................................................. 79 
Figure 9-1 Hardware Used for Testing Design................................................................. 81 
 vii
LIST OF TABLES 
 
Table 3.1 Resolutions and Corresponding Pixel Clock .................................................... 15 
Table 6.1 Memory Commands and their Description....................................................... 37 
Table 6.2 Burst Type......................................................................................................... 40 
Table 6.3 Burst Length ..................................................................................................... 40 
Table 6.4 CAS Latency..................................................................................................... 41 
Table 6.5 Operating Mode ................................................................................................ 41 
Table 6.6 Write Burst Mode ............................................................................................. 41 
Table 7.1 Configuration Option While Using the Core Generator................................... 56 
Table 7.2 Signals Used in FIFO Module .......................................................................... 56 
Table 8.1 Binary Representation of sd_row_wr_lut Signal.............................................. 65 
Table 8.2 Decimal Representation of LUT Rows............................................................. 65 
Table 9.1 Device Utilization Summary ............................................................................ 80 
 viii
Chapter 1 :  Introduction 
 
This chapter introduces large-format displays and multi-projector systems. It provides the 
motivation for the research. It also describes the problem at hand and discusses the 
current status of this project. 
 
1.1 Background 
Displays are the visual interface to electronic machines. A display is necessary to view 
video. At higher resolutions a video has better quality [Refer to Section 2.1 for a 
definition of resolution]. Large-format displays are high-resolution displays with a screen 
size that is generally greater than 30’’. Typically, two types of displays are used. They are 
flat panels and projectors. In recent years, projectors are becoming more popular as their 
price steadily decreases.  
 
In the scientific front, large-format displays are used in visually intensive applications 
like virtual reality and immersive environments [17]. Lately much effort has been 
directed into using a cluster of projectors to achieve large-format displays with higher 
resolutions. Figure 1-1 illustrates a large-format display created by using an arrangement 
of projectors. In this figure, the image represents the tiled final image. The final image is 
constructed from four split images, with two red screens on top and two blue screens at 
the bottom. It is to be noted that the split images have some common regions. This 
redundancy is required so that while tiling the images, some region can be overlapped. In 
this project, the term ‘input image’ usually refers to the split images. The output image is 
the final tiled image. 
 
1.2 Motivation for Research 
Clearly, large-format displays are very useful. Examples include display of larger images 
and 3D displays [16]. Traditionally, such displays are implemented using software. 
Software algorithms written for large-format display, uses Graphic Processing Unit 
 1
(GPU) for processing image. It is an inexpensive process as special hardware is not 
required. Since GPU’s are developed with a more generic application in mind, the results 
are slow. The alternative is to use hardware that is developed for this special purpose. 
The development costs incurred are higher in this case. The process of tiling two images 
together can be referred to as an image transformation [Section 2.2]. While it is easier to 
implement certain transformations that can be described using equations, it is not possible 
to easily implement certain other transformations that are non-algorithmic. 
 
 
Figure 1-1 Example of Seamless Projector Display1
 
1.3 Positioning of Research 
As mentioned earlier, multi-projector displays have been achieved before but this has 
chiefly been carried out using software. To the author’s knowledge only one multi-
projector system exists where hardware has been used to achieve large-format displays.  
 
                                                 
1 Courtesy: Image taken from Project Report of Sifang Li, University of Kentucky, www.vis.uky.edu 
 2
The Lightning-2 [6] system developed at the graphics lab at Stanford University again 
uses multiple projectors to achieve large-format displays. However, it is targeted towards 
3D display. A comparison can be made without involving the mechanism of 3D display. 
Lightning-2 uses hardware to perform the image transformation (commonly referred to as 
warping). However, image warping is performed using a ‘forward mapping technique’ 
[Section 4.3]. Headers transmitted in sets of 2 pixels in the input-space specify the 
transformation and the number of pixels that the transformation is used for. Every time 
such headers are encountered, the pixels are transformed accordingly. This is designed 
specifically for systems with coarser granularity or systems designed with block-based 
warping in mind. A block is just a region of pixels. It implies that several pixels, usually 
neighbors, undergo similar transformation. However, if every pixel were to be mapped 
completely arbitrarily the design would suffer excessive overhead.  
 
A second system was designed but not implemented. The Metabuffer [7] was designed at 
the University of Texas at Austin. It composits images from Commercial Off the Shelf 
(COTS) rendering engines and is again targeted at 3D display and designed to support 
multi-resolution. Rendering is the process of generating an image from a model using 
computer programs. The model is a description of three dimensional objects in a strictly 
defined language[8]. It claims a constant time for image warping irrespective of the 
geometry. Latencies are involved though. 
 
In general, warping is carried out on blocks of pixels. However, non-block based warping 
is difficult. Both these approaches described above aren’t meant for arbitrary, non-block 
based warping, where any pixel in the input space can be mapped to any output pixel. 
 
1.4 Design Approach and Project Goal 
This thesis develops a novel idea to achieve large-format displays where the 
transformation is specified using a LUT. 
 
Routing the pixels from their positions in the small images to the large image requires 
knowledge of where the pixels should end up. However, the input images are always not 
 3
tiled the same way. For example, assume an image, I1(a, b), is always routed to the left 
side of the output image O(x1, y1) and another image. I2(c, d), is always routed to the 
right side of the larger output image O(x1, y1). There is a region of overlap between the 
images so that the larger image doesn’t appear like it is tiled. Therefore, for large-format 
displays, given an output image location, O(xi, yj), the input location, I1(am, bn) or I2(cs, 
dt), can be readily calculated. These routing values are pre-computed and stored in a table 
in the hardware. During warping, they are easily looked up from the LUT.  
 
Although the primary application in using a cluster of projectors lies in creating large-
format displays, a secondary application is in displaying 3D images. One of the stages 
prior to display of 3D images is warping. By using LUTs as described above, to specify 
the warping function, this can be achieved. Also, using LUTs to describe the 
transformation introduces a new range of possibilities where the designer is not limited 
by the complexity of the function or the non-algorithmic nature. 
 
Ultimately, the concept of a ‘Anywhere Pixel-Router’ can be used to implement the idea 
developed in this thesis. However, as the development of the ‘Anywhere Pixel-Router’ is 
still under way, in this project, a single input-image single output-image system that 
implements arbitrary image warping on the input image, based on the contents of the 
LUT, is developed and tested. This thesis is developed with large-format displays in 
mind. The display of 3D images and other applications are beyond the scope of this 
document. 
 
 4
Chapter 2 :  Concepts Relating to Pixel Router 
 
This chapter covers some general terminology used in graphics. An advanced reader can 
skip this chapter except the last section where the overall architecture of the design 
developed in this thesis is discussed. 
 
2.1 Basic Terms 
In Figure 2-1a region of the image containing the alphabetic character ‘A’ is zoomed 
into. As can be seen from the zoomed image, smaller dots constitute the alphabet. The 
round dot is referred to as a pixel. Alternatively, the pixel is the smallest area that can be 
illuminated on the monitor. Processing images means working with image frames. 
Frames are two-dimensional arrays of pixels and vary in size. This size depends on the 
resolution of the image. An image with a resolution of 640 x 480 implies that there is an 
array of pixels with 640 columns and 480 rows. Each pixel contains an intensity value. 
This indicates the color present in the screen at one location. The pixel has a 9 bit value 
with 3 bits each for red, green and blue respectively. The maximum number of colors 
possible is then 29 or 512. 
 
                     
PixelA B C D E F G H I 
 
1 2 3 4 5 6 7 8 9 0 
Zooming in 
 
Figure 2-1 Image Display 
 
 5
In real-time, images are transferred through the ports and conform to certain timing 
specification. This timing is based on the type of port namely VGA or DVI. The timing 
information is discussed in detail in Chapter 3. In this project, image stream refers to the 
flow of pixels.  
 
2.2 Image Transformations 
For most applications the image streams can be processed or transformed using 
transformation functions. These functions map a pixel from the input space to a pixel in 
the output space. Figure 2-2 shows a simple example. An input image frame is shown on 
the left side of the figure. The right side shows the warped image which is identical to the 
input image. The transformation can be represented as, 
Iwarp(x,y) = I(x,y) 
Where, x co-ordinate represents the column and y co-ordinate represents the rows. 
 
A B C D E F G H I
  
  
1 2 3 4 5 6 7 8 9 0
A B C D E F G H I
 
1 2 3 4 5 6 7 8 9 0
 
Figure 2-2 Identity Transform 
 
Figure 2-3 shows another example. A similar input image frame is seen. Here the output 
shows a rotated version of the input image. The input has been rotated so that the first 
few columns are shifted. The transformation function can be written as, 
Iwarp(x,y) = {I(x+210,y) for 0 ≤ x < 480 
                   {I(x-480,y) for 480 ≤ x < 640 
 
Both figures have transformations that can be described by using equations. There are 
other transformations that cannot be described by equations. The approach used in this 
 6
thesis can be easily extended to any transformation irrespective of whether it can be 
described by equations or not.  
 
 A B C D E F G H I 
 
 
1 2 3 4 5 6 7 8 9 0 
C D E F G H I A B 
3 4 5 6 7 8 9 0 1 2 
 
Figure 2-3 Rotation Transform 
 
Figure 2-4 illustrates how a transformation function can be carried out on two images 
instead of a single image. Here two images from two projectors are tiled to form a single 
image. These two images are transformed to a single larger image. The outputs from the 
two images are overlapped to form the original larger image. However, as seen on the 
figure on top, the overlapped region is brighter as there is higher intensity. Alpha 
blending is performed so that the overlapped region has the same intensity as those 
regions which are not overlapped. An opacity value, 'α' is used to control the intensity 
values of two input pixel values that end up a single output pixel. For a detailed study of 
alpha blending please refer to [18].  After blending, the image has uniform intensity 
throughout and this is shown in the figure at the bottom. Here tiling and alpha blending 
determine the LUT. In general two input images can be warped depending on the 
contents of the LUT.  
 
The easiest way to perform warping is to fill in the pixels of the output frame and 
determine the input pixel location. This technique is called reverse mapping [Section 
4.3]. This is the technique that will be used in the controller developed in this thesis. 
 
2.3 Research and Development 
Ideally, it would be desirable to have a functional unit like the ‘Anywhere Pixel-Router’ 
as shown in Figure 2-5. The pixel router would be an image compositor with DVI input 
 7
and output ports, memory chips to store the large frame buffers and a processor to 
perform image transformations. The inputs are connected to COTS Graphic Processing 
Units (GPU) inside the Central Processing Unit of a computer and the outputs are 
connected to projectors. The input image streams from the DVI input ports will be 
processed by the processor and can be temporarily stored in the memory. The processed 
images can be scanned out to the projectors which will display the images. Similarly, a 
goal is that Pixel Routers can be configured as shown in Figure 2-6 and will be able to 
handle four input images and four output images.  
 
 
 
A B C D E F G H I J K L M N O P 
 
 
A B C D E F G H I J K L M N O P 
Figure 2-4 Alpha Blending in Multi-Projectors. 
 8
 
Figure 2-5 Overall Diagram 
 
 
 
Figure 2-6 Hooking Multiple Boards 
 
However, the current single input-image single output-image system as developed in this 
thesis doesn’t have provision for an image input. Figure 2-7 shows the high-level 
 9
schematic   representing the design of the system of this thesis and its mapping to a 
commercial prototyping board. 
 
The components present on the board include a single VGA output port for providing the 
display, an SDRAM chip required for image storage and a Field Programmable Gate 
Array (FPGA) device which processes the images and controls the other components. 
The architecture is dealt with in detail in the subsequent chapters. 
 
 
FPGA 
256 Mbit 
SDR-SDRAM
VGA output port 
VGA Controller 
SDR-SDRAM 
Memory 
Controller 
Warping 
Controller 
Figure 2-7 High-Level Schematic of Design on Board 
 
2.4 Research, Development and Implementation Goal  
This thesis verifies the theory of non-block based image warping based on the LUT 
approach. A single input image frame is transformed based on a LUT stored in the 
memory and displayed. As there is no input port to the system of Figure 2-7, the input 
images are simulated. For a single input image, different warping functions are applied in 
 10
order to make sure that arbitrary image warping is indeed possible. There are two 
functions that need to be developed and implemented. One is the warping of the input 
image. The second is the continuous display of the image as it is being warped at a scan 
out rate of 60 frames per second (fps).  
 11
Chapter 3 :  VGA Timing Parameters 
 
A lot of the design deals with VGA timing signals and the scanning out of images to be 
displayed. Therefore a chapter is necessary to provide the background related to VGA 
timing signals. 
 
3.1 Basics of Frames 
Image frames are scanned out at high frequency, typically 60Hz (meaning 60 frames per 
second), to produce video. It is imperative that this number is satisfied. If the frame rate 
is any lesser then the human eye can detect the changes in the frames and also the display 
will shut off. Hence there will be a flickering effect. Although frame rates could be 
higher, the nominal frame rate of 60Hz suffices. 
  
In the case of computers, scanning is progressive. Progressive scanning, like the name 
indicates, is sequential in nature. The image is scanned out similar to the way a page of 
text is read. i.e. from left to right and top to bottom. This is shown in Figure 3-1 where 
the direction of scanning is indicated by arrows.  
 
There are two processes involved in scanning. One is the process of going to the next 
pixel position and the second is the illumination of the pixel. Scanning is carried out as 
indicated in Figure 3-2. Lines A, C, E etc. are scanned out one after the other. The 
horizontal line is scanned (A-B) and pixels are displayed. The next trace is meant to start 
at the next scan line C. So the position is changed from B to C by retracing and moving to 
the right position. During this period (B-C), no pixel is displayed. This retrace is known 
as the horizontal blanking period. At the end of the frame, the position is retraced from 
the last line to the first line (O-A). This is called the vertical blanking period. The period 
during which pixels are scanned out is known as the active period. 
 12
 
Figure 3-1 Progressive Scanning 
 
Horizontal Retrace 
Vertical Retrace 
A 
C 
B
D
E F
L 
N 
M
O
 
Figure 3-2 Horizontal and Vertical Active and Blanking Signals 
 
During the blanking region black pixels are transmitted. In the middle of the blanking 
interval, a horizontal sync pulse is transmitted. The blanking interval before the sync 
 13
pulse is known as the front porch, and the blanking interval after the sync pulse is known 
as the back porch. The timing for these porches is fixed just like the timing for other 
parameters and specified in the VGA standard[13]. Since there are both horizontal and 
vertical blanking regions, the porches are referred to as horizontal and vertical porches 
respectively. 
 
In the waveforms for the vertical (Figure 3-3) and horizontal (Figure 3-4) synchronization 
signals, ‘A’ represents the active video part, ‘B’ represents front porch, ‘C’ represents the 
blanking pulse and ‘D’ represents the back porch. 
 
 
Hsync 
Vsync 
A B C D 
 
Figure 3-3 Vertical Synchronization signal 
 
 A B C D
Hsync 
Figure 3-4 Horizontal Synchronization Signal 
 
3.2 DVI and VGA Timing Signals 
A cable is used to connect the output from the CPU of a computer to a monitor/projector. 
The cable plugs into a DVI or a VGA port. The timing signals are very similar in nature. 
VGA timings are stricter. The main difference is that while VGA ports use an analog 
standard, DVI ports support a digital standard. In DVI, each color component; Red, 
Green and Blue (RGB) can be represented by 8 bits. That is DVI can carry 24 bit colors. 
 14
In all 224 colors or 16,777,216 colors can be displayed. The screen resolution can be from 
640 x480 to up to 1280 x 1024.  VGA supports up to 9 bits in the work here or 512 colors 
in all. The DVI port also has more pins than the VGA port. Besides, it is easy to obtain 
real images from DVI input ports. However, a VGA output port is more easily available 
on existing starter video prototyping boards and hence used in this design.  
 
The pixel clock frequency is an indication of the speed at which a pixel is scanned to the 
display. The frame refresh rate is fixed at 60 Hz so naturally the higher the resolution, 
more pixels have to be scanned out and hence faster the clock. A list of resolutions 
commonly supported by graphics cards and the corresponding pixel clock rates are shown 
inTable 3.1. 
Table 3.1 Resolutions and Corresponding Pixel Clock 
Resolution Pixel  Clk, in MHz 
640x480, 60Hz 25.175 
800x600, 60Hz 40.000 
1024x768, 60Hz 65.000 
1280x1024, 60Hz 108.000 
 
 
Our current project deals with the minimum resolution of 640 x 480 at which the image 
resolution can be considered decent.  
 
3.3 Memory Constraints 
Memory used to store image frames are referred to as frame buffers. The bottleneck on 
the overall design of this project lies in the operation of the memory. If the resolution is 
640 by 480, 307200 (640 * 480) pixel values are stored. Each entry has an intensity 
value. Clearly, there is a dependence on the speed of operation of the memory and the 
resolution/pixel clock. As the pixel clock frequency increases, the faster the memory has 
to read and write pixels. To give an approximate idea of real-time operation, consider that 
an input image frame is being written into the memory. At the same time the image frame 
 15
is also read from and scanned out to the display through the VGA port. So at the 
minimum resolution, with the pixel clock working at 25 MHz, this requires 2 operations 
to be done, one is writing the frame to the memory and the other is reading the same from 
the memory. The memory has to now operate at least at 50 MHz to make sure that there 
is no flickering or distortion in the display. It has to be remembered that this is just an 
approximation and since there are overheads involved, the memory has to operate at a 
higher speed than 50 MHz.  
 
 16
Chapter 4 :  Implementation Details 
 
The background essential for understanding the idea behind the project has been 
established so far. In this chapter the reasons for choosing the current video system 
prototyping board is presented. Also, the theoretical bandwidth required to perform 
memory operations for full speed of overall design is calculated and compared with the 
actual bandwidth that can be achieved based on the current hardware and software. The 
overall state diagram of the control algorithm implemented in the FPGA is included at the 
end of this chapter. 
 
4.1 Hardware Implementation Platform 
Traditional large-format displays implemented with software are slow as mentioned in 
Chapter 1. The best solution is to use programmable logic which is available in the 
market and is easier to work with than Application Specific Integrated Circuits (ASIC) 
but yet can perform processing faster than software versions.  
 
Spartan 3 FPGA’s from Xilinx Inc. are low cost FPGA’s which can be used for initial test 
purposes. A lot of starter kits with peripheral devices are available. The Xess XSA-1000 
[1] prototype board was chosen as it has a VGA interface and approximately 256Mb of 
memory on board.  
 
4.2 Need for External Memory 
Block RAM’s are configurable, synchronous blocks of memory available inside the 
FPGA. Relatively large amounts of data can be stored here. In the Spartan 3 chip, 
XC3S1000 [2] , the amount of memory available is 432 Kb. This memory can be 
accessed very rapidly, that is, the output is obtained in the clock cycle immediately next 
to the one where the input address is issued. 
 
 17
Throughout this project an image resolution of 640 x 480 is used and the pixel can 
display XGA resolution with up to 512 colors. Each pixel requires 9 bits. 
Total memory required to store an image frame is, 
= Number of lines x Number of columns x Bits per 
location 
= 480 x 640 x 9 bits 
= 2700 Kb. 
The maximum amount of Block ram available in the largest FPGA chip in the Spartan 3 
family, contains only 1872Kb of memory, still a lot less than required. Also, as the 
resolution of the images increases, the memory required will also increase. As the Block 
ram is too small to hold a single image frame, memory chips of the type Single Data Rate 
– Synchronous Dynamic Random Access Memory (SDR-SDRAM) are used as frame 
buffers.  
 
4.3 Direct Mapping Vs Reverse Mapping 
There are two ways in which the input image can be transformed to the output image.  
In direct mapping, line after line from the input image is mapped to output locations as 
determined by the LUT. That is, the stored input image is read out sequentially and is 
directed to the output frame. The output location is based on the LUT and can be 
anywhere in the image. The output locations can be overwritten. Each input pixel can be 
mapped to a maximum of one output pixel as it is referred to implicitly and only once.  
 
 
A B C 
 
1 2 3 
G H I 
 
7 8 9 
INPUT IMAGE LUT - refers to output image 
address 
 
TRANSFORMED IMAGE
A B C 
 
1 2 3 
D E F 
 
4 5 6 
G H I 
 
7 8 9 
Copy to 
image 
within 4 
corners 
(213, 0), 
(425, 0), 
(213, 479) 
and (425, 
479) 
Copy to 
image 
within 4 
corners 
(213, 0), 
(425, 0), 
(213, 479) 
and (425, 
479) 
Copy to 
image 
within 4 
corners 
(0,0), 
(212,0), 
(0, 479) 
and 
(212,479)
Figure 4-1 Direct Mapping 
 18
In Figure 4-1, three image frames are seen. The leftmost represents the input image. The 
middle image shows the LUT. The LUT contains address of the output pixels. The 
address of the input image is implicit. That is, the pixel from the 1st location, (0,0) in the 
input image is mapped to the output location specified by the data contained in location 
(0,0) of the LUT. The LUT table is roughly divided into three blocks. The first block in 
the LUT specifies that the input pixel in the same block in the input image is mapped to 
first block in output image. The second block in the LUT specifies that the same block in 
the input image is mapped to the second block in the output image. The third block in the 
LUT specifies the same block in the input image is mapped to the third block in the 
output image. In this case, the third block in input image overwrites the previous location. 
That is the DEF block is overwritten by the GHI block. Also there is no transformation 
carried out on the third block of the output image. It is important to note that the warping 
occurs in the same order as scan out happens. That is every line is warped from left to 
right first and then the next line to the bottom is warped until the end of the frame is 
reached. 
 
In reverse mapping, the output frame is written line after line and the input location is 
determined by the LUT. So, the same input pixel can be mapped to many output pixels. 
However each output location is written into only once and cannot be overwritten as it is 
referred to only once implicitly.  
 
A B C 
 
1 2 3 
A B C 
 
1 2 3 
A B C 
 
1 2 3 
INPUT IMAGE TRANSFORMED IMAGE
Copy from 
image 
within 4 
corners 
(0,0), 
(0,213), 
(479, 0), 
(479, 213) 
A B C D E F G H I 
 
1 2 3  4 5 6 7 8 9 0 
LUT – refers to input image 
address
Copy from 
image 
within 4 
corners 
(0,0), 
(0,213), 
(479, 0) , 
(479, 213) 
Copy from 
image 
within 4 
corners 
(0,0), 
(0,213), 
(479, 0), 
(479, 213) 
 
Figure 4-2 Reverse Mapping 
Figure 4-2 is very similar to Figure 4-1 except that the LUT contains the address location 
of the input pixels. The addressing of the output image is implicit in this case. That is, the 
 19
 20
pixel from the 1st location, (0,0) in the output image is mapped to the input location 
specified by the data contained in location (0,0) of the LUT. The LUT table is divided 
into three blocks. All three blocks specify that the input is taken from the first block of 
the input image. This shows that the same input pixel can be referred to more than once. 
Since all the frames are stored in the memory it becomes the bottleneck. The faster the 
values are written into and read from the memory the faster are the other operations. In 
this project reverse mapping is followed. From now on, whenever warping is mentioned, 
it is implemented using reverse mapping. 
 
4.4 Example of Image Transformation 
Figure 4-3 shows an example of image transformation. In this figure, a 3 x 3 block of 
pixels from the input frame is shown which is transformed into the output. The first two 
columns in the input are shifted over to the next two columns in the output.  
 
 
 
In Figure 4-4, the 3 x 3 block shown as ‘Input Addressing’ is the input block and is 
labeled I00 to I22. The 3 x 3 block shown as ‘Output Addressing’ is the transformed output 
block labeled O00 to O22. The 3 x 3 block labeled L00 to L22 contains the LUT address. The 
first two columns in the input are shifted over to the next two columns in the output just 
like in the example. The final 3 x 3 output block is labeled with the input co-ordinates I00 
to I22 and shows the input mapping. Here the first column in the output is filled with the 
pixel in the first row, first column in the input image. 
Input image                 Output 
image 
Figure 4-3 Example of transformation 
 
21 
                  
 
Figure 4-4 Concept of Image Transformation 
 
 
 
 
 
 
The LUT is also shown. The output pixel at O00 is found out by reading the corresponding 
address from the LUT, i.e L00. From the LUT, L00 contains the column and row address 
of the input pixel I00 stored in that order. This address is then read from the input frame 
and it contains pixel I00. Similarly L01 contains the column and row address of input pixel 
I00. However, L02 contains the address of I01. In this way the entire 3 x 3 block is 
transformed from the input to the output frame. The entire transformation is carried out in 
the the scan order of the output image. That is each pixek in the output image is filled in 
from left to right from top to bottom. 
 
4.5 Calculation of Required Bandwidth of Memory Controller 
The approximate speed at which the memory is required to operate if scan out happens at 
60fps and if the input image has to be warped every time a new frame is scanned out, can 
be calculated as follows. A new frame is displayed every 1/60th of a second, that is every 
16.67 ms. If it takes 10 cycles to open and close memory rows [5] and if the warping is 
done in blocks of 256 values from the same memory row then, 
 
Total Number of cycles of operations to be done in 16.67 ms 
= No. of cycles required to do scan out + No. of 
cycles required to do warping 
 
In terms of memory operations this can be written as, 
= read values from scanout frame + (read LUT 
values + read corresponding image values + write 
transformed values into new frame) 
 
The memory can store the 640 rows multiplied by 480 columns (307200) as 600 rows of 
512 columns (307200) values. Also, pixels are accessed in blocks of 256 values. Each of 
these can be transformed uniquely. The size 256 is chosen because the intermediate 
storage is not large enough to transform an entire frame at one go. 
 22
 
 
= (1200 times read blocks of 256 values) + (1200 
times read blocks of 256 values + 640 x 480 times 
read input values based on LUT + 1200 times write 
blocks of 256 values) 
 
Assuming that read and write delays for blocks have almost the same number of clock 
cycles, 
= 3*{1200 *(number of clock cycles for reading a 
256 value block + delay associated with opening 
and closing a block)} + (640 x 480 x delay 
associated with opening and closing a row along 
with reading a value) 
 
If the delay associated with opening and closing a block is 10 clock cycles and reading a 
value out is 1 cycle, then 
= 3600*(256 + 10) + (640x480x11) 
= 3600(266) + (640 x 480 x 11) 
= 4,336,800 clock cycles. 
 
Total Number of cycles of operations to be done in 16.67 ms  
= 4,336,800 clock cycles 
Clock cycle period  
= 16.67 ms / 4336800 
= 3.84ns 
 
i.e a clock speed of 260Mhz is required. This is just a preliminary number and doesn’t 
take into account the bandwidth required for performing refresh. The maximum speed 
supported by the memory chip in the hardware used is 166 MHz. So this project doesn’t 
attempt to optimize speed. It merely introduces and tests an algorithm to perform 
arbitrary image warping depending on a LUT. All operations are run at 25 MHz for ease 
 23
 
of operation. This slows down the warping operation considerably. As the system needs a 
dedicated scan out, image warping is accomplished during the time the memory 
controller is not required to fetch pixels for scan out.  
 
4.6 Calculation of Actual Bandwidth of Memory Controller 
The LUT contains the location of the input pixel in terms of row and column. Each 
memory location can only store 16 bits and as 22 bits are needed to completely address a 
single location, two LUT entries are used to store the entire address. So, instead of 1200 
blocks of length 256, 1200 blocks of length 512 are read from the LUT to form complete 
image addresses.  
 
At 25 MHz (40ns period), the time spent on reading pixels for scan out is calculated as 
equal to (1200 x 256) clock cycles.  
Time spent on scan out  
= 1200 x 256 x 40 ns 
= 12.768 ms 
 
Available warping time  
= Time taken to display a frame – Time taken to 
read pixels for scan out 
= 16.67ms – 12.768 ms 
= 3.902 ms. 
 
The total time needed to arbitrarily warp an image depends on the spatial location in the 
memory. In the worst case scenario, every sequential output location needs input pixels 
from different rows. Assuming this condition,  
 
Total time required for image warping  
= (read LUT values + read corresponding image 
values + write new frame) 
 24
 
= [(1200 x 522) + (640 x 480 x 11) + (1200 x 266)] 
clock cycles 
≈ [3*(1200 x 266) + (640 x 480 x 11)] clock cycles 
≈ 3*(1200 x 266 x 40) + (640 x 480 x 11 x 40) ns 
≈ 173.4 ms 
 
This takes about 173.4/16.67 or 45 frames. Ideally the warping should be performed in a 
single frame. It is assumed here that the warping is completely arbitrary, that is the worst 
case scenario. 
 
4.7 Overall System Inside FPGA 
The Spartan 3 FPGA interfaces to the VGA port and the DRAM memory on the board as 
shown in Chapter 2. 
 
Figure 4-5 shows the Overall Block Diagram of the system described in the FPGA. There 
are three main sections to this block diagram. The input image is warped continuously 
using the image warping controller. The warped images are displayed using the display 
controller. The data needed by these two controllers is furnished from the memory using 
the memory controller. Dashed arrows (red color) represent address buses. Solid arrows 
(green color) represent data buses. Double lined arrows (blue color) form an interface to 
the display device. Dotted arrows (black color) represent control by the memory. 
 
As seen in the figure, the memory controller reads data from two Row, Column and Data 
FIFO’s (RCD); the scan out FIFO (scanout FIFO) and the miscellaneous FIFO (MISC 
FIFO). FIFO stands for First In First Out and is dealt with in detail in Section 7.4.2. The 
scan out FIFO contains exactly what the name indicates; requests to read scan out data 
from memory from the display sub-system (DISPLAy SYS). The miscellaneous FIFO 
contains data that maybe written into the LUT from the FIFO that contains write requests 
to the LUT (WR_LUT_FIFO), input pixels that are written into the input frame which are 
from the FIFO that contains write requests to the input frame (WR_IMG_FIFO), pixels 
 25
 
that are to be written into the final frame and read requests to get LUT data from the 
memory that are present in the read or write FIFO (RDWR_LUT_FIFO). The 
transformation of the input frame into the output frame by reading out the LUT values is 
performed by the image warping controller. Depending on the stage of operation [Section 
4.8] the corresponding request gets stored in one of the FIFO’s.  
 
There are four frames present in the memory as frame buffers. The LUT, the input image 
frame and then the two working buffers (Section 7.4.1). All of these frames fit into one of 
the memory banks, bank 0 (Bank 0 is shown here on top of the other 3 banks) All of the 
other memory banks are unused in this system.  
 
WR LUT FIFO 
SCANOUT FIFO
Input 
Generation 
LUT 
Generation 
Image 
warping 
WR_IMG_FIFO 
RDWR LUT FIFO 
Memory 
Banks 0 - 3
LUT 
INPUT 
IMAGE 
WORKING 
BUFFER 1 
WORKING 
BUFFER 2 
To Display
RCD
R 
C 
D
Memory 
Controller
X
DISPLAY SYS.
M 
I 
S 
C 
  
 
F 
I 
F 
O 
SCAN OUT DATA 
TRACKING CIRCUIT
 
Figure 4-5 Overall System Inside FPGA 
 26
 
Once the memory controller is initialized (powered on), it is in idle state. If there are 
outstanding requests to fetch scan out data from the memory, it is immediately 
performed. When the scan out FIFO is empty, the write/read requests from the 
miscellaneous FIFO are serviced. Any data read from the memory belongs to the display 
system or to the warping controller. This is decided by the scan out data tracking circuit. 
The scan out data tracking circuit is discussed in detail in Section 7.4.4. 
4.8 Dynamic Controller State Diagram    
The state machine in Figure 4-6 controls the overall module flow. The state diagram 
should be interpreted as follows. Any arrow with no signal written near it doesn’t have 
conditional control flow. Any arrow with a signal next to it that starts with the signal 
name indicates an ‘if’ condition. Finally, any arrow with an ‘@ELSE’ next to it indicates 
what happens when the if condition is not satisfied.  
 
Initially the controller is idle indicated by the CONTR_IDLE stage. After memory 
initialization, the LUT is stored (LUT_STORE). The input image is stored next 
(IMAGE_STORE). This is followed by image warping (IMAGE_WARP). Once the first 
warping is done, a counter is enabled which starts the process of scan out. In the final 
stage, the TRACK_CNTR stage, scan out once enabled has highest priority. If scan out 
isn’t busy then the next state is carried out. The LUT is stored only once prior to warping 
and image storage. Only the input image storing and warping are then carried out in 
turns. 
 
4.9 Design Capture, Synthesis and Implementation  
 The entire design is described using the Verilog Hardware Description Language (HDL) 
[20]. Verilog HDL is more flexible than the other commonly used Very High Speed 
Integrated Circuit HDL (VHDL). As this particular design involves coding of controllers 
and complex circuitry, Verilog is chosen. The code is large and it is broken down into 
several modules in such a way that there are minimum signals that need to communicate 
between the individual, smaller modules. Hence a structural coding style is followed. 
 27
 
 
Figure 4-6 State Machine of Overall Dynamic Controller 
 28
 
Each individual module is designed at the Register Transfer Level (RTL) or at the 
behavioral level depending on the amount of complexity. The general hierarchy of the 
modules is shown in Figure 4-5 and can be better understood once the overall block 
diagram shown in Figure 4-5 is understood. The Verilog code of the entire design is listed 
in the Appendix. 
 
The design was synthesized using the Xilinx XST which is the synthesis tool used by the 
Xilinx ISE cad tool set and the results of the synthesis are not shown because of the large 
size of the image. The FPGA configuration bit stream is loaded into the FPGA using the 
parallel port of the hardware.  
 29
 
Chapter 5 :  Display Controller 
 
The overall block diagram and the overall controller state diagram have been described in 
and shown in Figure 2-7 and Figure 4-5 in details. The three main sections of the overall 
system namely, the display controller, the memory controller and the image warping 
controller are each seen one by one in the next three chapters where the detailed view of 
each of them can be seen. In this chapter the function of the display controller is 
discussed and a detailed view of the display controller is seen. 
 
5.1 Function of Display Controller 
The display controller is the smallest controller in terms of size in the design. The main 
purpose of the display controller is to facilitate the scan out from the final frame the 
display device. Display of an output image has the highest priority. The VGA port is 
connected to the FPGA as shown in Figure 5-1. 
 
 
Figure 5-1 Connection of VGA Port to the FPGA 
 
As can be seen from the figure, the bits of the RGB components are converted to analog 
signals and finally the 3 resistors corresponding to the 3 bits of each color component are 
tied together. The resistors values are such that they form a certain ratio with each other.  
 30
 
5.2 Overall Description 
An overall description of the display controller is presented along with the description of 
the different functional units. The functional view of the controller is shown in Figure 
5-2. The timing control circuit is the heart of the display controller. The output buffer 
address generation circuit generates the address to be read from the memory. The request 
FIFO stores these addresses and isses them to the memory. The result FIFO stores the 
pixel data thus read out from the memory. These pixels are then fed to the display system 
when required.  
 
The display controller implements the following mechanism for display. Whenever the 
number of pixels in the result FIFO is less than the number of pixels that might be used 
by the display system, more pixels are requested from the memory. This prevents FIFO 
underflow (Section 7.4.2). Whenever the number of pixels in the result FIFO nears the 
maximum amount of pixels that the FIFO can hold, pixels are no longer requested. This 
prevents FIFO overflow (Section 7.4.2). 
 
The timing generator has four counters, Horizontal Synchronization Counter (HSC), 
Horizontal Active Counter (HAC), Vertical Synchronization Counter (VSC) and Vertical 
Active Counter (VAC). The timing generator generates the timing control information for 
the image to be displayed. Recalling from the third chapter (Section 3.1), pixels are 
displayed continuously during the active region. During the blanking period, pixels are 
not displayed. Whenever the timing control circuit indicates that it is in the active region, 
pixels present in the result FIFO are read out and scanned to the display system. During 
the blanking region, no pixel values are read out from the result FIFO.  
 
The address generation module is seen which selects one of the frame buffers. At any 
time there are two working frame buffers. While one of them is being warped, the other 
gets displayed. The ‘address select’ shown in the figure, is a 13 bit number, which is 
added to the inital address, to identify the proper frame buffer. The initial address is 
generated using the address generator shown in figure. The initial address generated is 
from rows 0 to 599. The working buffers start at physical rows, row 1800 (row 1800 to 
 31
 
row 2399) and at row 2400 (row 2400 to row 2999). The address generated is from row 0 
to row 599. If first buffer is to be selected, 1800 is added and if the second buffer is to be 
selected then 2400 is added. This address select is changed every time a frame is 
completely warped and ready for scan out. The validation of this module is covered in 
Chapter 8. This address is stored in the request FIFO to be sent to the memory controller 
where the pixel value from the address is read out and returned as data.  
 
The pixels that are read out from the request FIFO end up in the scan out result FIFO. 
The memory read request signal depends on the number of pixel data present in the result 
FIFO (FIFO count).  When this signal is ‘high’ the result FIFO needs pixels from the 
memory. When the signal is ‘low’, the result FIFO doesn’t need pixels from the scan out. 
 
 
Address 
select 
Memory 
read request
Address 
Generator 
Request 
FIFO 
Result 
FIFO 
Frame 
done 
Interface to memory 
controller 
VSC HSCHACVAC
HSC 
Dly 
VSC 
Data 
Blue
Red
Green
hsync
vsync
To VGA 
display 
port
D 
E 
M 
U 
X
address 
Frame done signal
datarequest
Data from 
memory
Address to 
memory
FIFO count 
Timing Control Circuit
Display System 
Output Buffer Address Generation 
 
 
Figure 5-2 Functional View of Display Controller 
 
The scan out frame done signal is generated by the ‘Frame done circuitry’. This signal 
indicates the completion of scan out of a frame. This signal is used for synchronizing the 
end of scanning out one frame buffer and beginning to scan out another frame buffer.  
 
 32
 
The display system receives data from the Result FIFO. This system generates the output 
signals that interface with the display device. The timing information generated from the 
timing control circuit is also used. Only when both horizontal and vertical timing signals 
are ‘high’, the display is in the active region and the ‘datarequest’ signal is high. 
Otherwise the display is in the blanking interval and the ‘datarequest’ signal is low. The 
‘datarequest’ signal acts as the read enable signal from the result FIFO. A demultiplexer 
(DEMUX) present in the output, splits the pixel data into the Red, Green and Blue (RGB) 
values and sends them out to the display port along with the horizontal and vertical 
synchronization signals.  
 33
 
Chapter 6 :  Memory Controller and Design Aspects 
 
In this chapter the Memory Controller of Figure 2-7 and Figure 4-5 is described. The 
memory controller interfaces the FPGA to the off chip memory. To understand the 
working of the controller, certain design aspects are presented. The type of memory along 
with the terminology is discussed. The different operations that can be performed in the 
memory are then described. The power-up sequence is also explained along with other 
tasks that the controller needs to perform to assure proper performance of the memory 
controller. Finally, the memory controller is described and a detailed view inside the 
controller is shown  
 
6.1 SDR-SDRAM 
This acronym stands for Single Data Rate – Synchronous Dynamic Random Access 
Memory. ‘Single Data Rate’ indicates that all inputs or outputs can be sampled at only 
one of the clock edges, i.e. positive or negative. The word synchronous implies that all 
operations are performed with respect to a clock edge (positive in this case). Although the 
operation could be performed within a few nanoseconds, the next input or output can be 
given or taken only once every clock period. For this purpose, the memory is usually 
worked at a higher clock rate. SDRAM is used as it is inexpensive and can store large 
volume of data. 
 
In general, Dynamic Random Access Memories (DRAM) can store more data than Static 
Random Access Memory (SRAM) per unit volume. This has to do mainly with the 
internal architecture. But also, the DRAM doesn’t require the address and data inputs into 
the memory during the same clock. So the address and data pins are multiplexed. Pin 
contacts on a chip take up much space, mainly due to the size of the bonding pad. This 
multiplexing of pins saves a number of pins especially as the size of the memory 
increases. 
 
 34
 
6.2 Basic Memory Terminology 
Memory is used to store data and this data is stored in the form of bits; either a ‘1’ or a ‘0. 
Each of these bits is stored in a cell. The memory chip consists of an array of cells. So 
each location is identified by specifying an address.  If the memory is configured as a 1-
bit memory then a single address refers to the value of 1 cell. If it is configured as a 4-bit 
then the address can be used to refer the value of 4 cells. The current configuration uses a 
16-bit memory. So 16 bits can be stored or read from “one location”. SDRAM’s have 
multiple banks of memory (2 or 4) and this particular chip has 4 banks (Figure 6-1) each 
consisting of 8192 rows with each row having 512 columns (Figure 6-2).  
 
 
8192 
rows 
512 Columns 
Bank 0
Bank 3 Bank 2
Bank 1 
 
Figure 6-1 Overall Memory Architecture 
 
Some of the operations that can be performed in a memory are listed in Table 6.1. With 
memory operations the term “command issue” is more appropriate than “command 
execution” as memory need not yield the result of the operation in the immediate clock 
cycle following the issue. The time taken to finish the operation depends on the clock 
latency. The latency in turn is dependant on the clock time and is measured in clock 
cycles. Suppose a particular operation takes 60ns. With the 10ns clock period (100 MHz) 
6 clock cycles have to be waited out and with 40ns clock period (25 MHz) 1.5 clock 
cycles have to be waited out. This is rounded off to 2 clock cycles. 
 35
 
 
8191 
0 
1 
2 
3 
. 
. 
. 
. 
. 
. 
. 
1198 
1199 
. 
. 
. 
. 
. 
. 
. 
 
 
8190 
R
O
W
S 
0 1 2 .    .    .    .    .    .    .    .    .     .    .    511COLUMN
 
Figure 6-2 Inside a Memory Bank 
 
6.3 Memory Retention and Refresh 
The SDRAM has volatile memory. So the memory content is lost after 64ms. This is 
avoided by refreshing the memory, which is similar to reading out the value and storing it 
again in the same location before it is lost. The entire memory has to be refreshed and can 
be done in burst mode or in distributed mode. In the burst mode, all the 8192 rows are 
refreshed continuously. In distributed mode, once every 7.82 µs (or lesser) a refresh 
operation is carried out. This is done so that all rows will be refreshed within the 
retention time. The refresh command doesn’t need the specification of an address. The 
address is generated by an internal counter. During refresh, no other commands can be 
 36
 
issued. There are two types of refresh, auto refresh and self refresh. Auto refresh is used 
only during the normal mode of operation whereas self refresh can be used even when the 
SDRAM is in power-down mode. In power-down mode all the input and output buffers 
ve CKE are inactive. This mode is used to reduce the power dissipation. 
 
Table 6.1 Memory Commands and their Description 
 of Operation 
sa
 
Name Description 
NOP ot interrupted. 
 
No Operation. The operation being performed is n
Also no new operation is issued. Is just an idle cycle. 
ACTIVE 
 
issued to a bank only when 
Opens a new row in a bank. The row is specified with the address 
pins. A time period of tRCD must be waited before issuing next 
command. An active command can be 
all the rows are in closed and idle state. 
READ he 
 
Reads out the value stored from the column address specified. T
row is the current active row. Read Result appears after a Delay. 
WRITE 
 ctive row. Write operation is completed in a single 
Stores the input value at the column address specified. The row is 
the current a
clock cycle. 
PRECHARGE 
 
accesses. Without auto precharge, the command must be 
Closes the current active row. A time period of tRP must be waited 
before issuing next command. Precharge can be either auto or not. 
With auto precharge, the row being accessed is automatically closed 
after all 
issued.  
AUTO REFRESH tributed 
 
Refreshes Memory contents during normal mode. In dis
mode, must wait tRFC before issuing subsequent command.   
LOAD MODE 
REGISTER 
 issuing next 
command. Can only be issued when all banks are idle. 
Programs the mode of operation. Must wait tMRD before
 37
 
6.4
ile is configured into the FPGA, the memory should be initialized. The 
rocess of initialization is meant to “wake-up” the memory and should not be 
 sequence is as follows and 
 
NO
1) ed to the clock pin of chip once the power is stable. Now the memory 
2) alone 
3) nks are precharged (INIT_PRE_ALL). The 
4) aited out 
during the INIT_NOP_WAIT stage. 
5) ed during INIT_LOAD and the SDRAM is ready 
for use after waiting the mode register NOP cycles, MRD1 and MRD2. 
he memory mode register value M12 – M0 (Figure 6-3 ) is set through the address pins 
(RFU - Reserved for future use) 
Figure 6-3 Mode Register Contents 
 Memory Initialization 
Once the bit f
p
confused with setting the memory content. The power-up
is shown in.  
P represents no operation stages. As seen from the figure, 
Clock is issu
enters the idle stage ((INIT_NOP_IDLE). 
A period of 200µs is waited out during which NOP (No operation) signals 
are issued.  
At the end of this period, all the ba
precharge period is waited out with two NOP cycles namely RP1 and RP2.  
Two auto-refresh cycles are issued (INIT_REFRESH) and tRFC is w
 The mode register is programm
 
6.5 Mode Register Contents 
T
(A12 – A0). The Bank Address is to be programmed as 0 to ensure compatibility with 
devices in the future. Such values that are not yet defined are also said to be ‘reserved’. 
 
 
M12  M10 M4M5M6M7M8 M9 M0M1M2 M3–
Burst LengthWB OpRFU BTCAS Latency Mode
 38
 
 
Figure 6-4  SDR-SDRAM Chip Power-Up or Initialization Sequence 
 39
 
 
The register contents thus programmed are chosen depending on the desired mode of 
operation. An important term in SDRAM’s is the CAS Latency. It is the number of clock 
cycles taken to produce a result after a read operation is issued. This value determines the 
delay in several cases and is dependant on the speed of operation of the memory. The 
minimum CAS Latency at 25 MHz is 1 clock cycle. Table 6.2 through Table 6.6 lists 
modes and values to be programmed into the mode register to set the various modes of 
operation. 
 
 
Table 6.2 Burst Type 
M3 Burst Type 
0 Sequential 
1 Interleaved 
 
 
 
Table 6.3 Burst Length 
 Burst Length 
M2 M1 M0 M3 = 0 M3 = 1 
0 0 0 1 1 
0 0 1 2 2 
  0 1 0 4 4 
0 1 1 8 8 
1 0 0 Reserved Reserved 
1 0 1 Reserved Reserved 
1 1 0 Reserved Reserved 
1 1 1 Full Page Reserved 
 
 
 40
 
 
 
 
Table 6.4 CAS Latency 
M2 M1 M0 CAS Latency 
0 0 0 Reserved 
0 0 1 Reserved 
0 1 0 2 
0 1 1 3 
1 0 0 Reserved 
1 0 1 Reserved 
1 1 0 Reserved 
1 1 1 Reserved 
 
Table 6.5 Operating Mode 
M8 M7 M6 – M0 Operating Mode 
0 0 Defined Standard Operation 
- - - All other states reserved 
 
 
Table 6.6 Write Burst Mode 
M9 Write Burst Mode 
0 Programmed Burst Length 
1 Single Location Access 
 
6.6 Isolated Memory Architecture 
One of the main parts of this project is in the design of a memory controller. This is 
designed to be generic and to work as efficiently as possible. At the heart of the memory 
controller is a core controller which is dependent on a generic state machine. So when a 
 41
 
value is to be written into the memory, the address and data are sent to the memory along 
with a request for a write operation. The controller handles the details and writes the 
value into the memory. Similarly when a value is to be read from the memory the address 
is sent along with a read request. The controller then reads out the value along with a read 
valid signal. This isolated architecture helps in interfacing with the other modules (such 
as image warping or scanning out an image). 
 
Before plunging into the depths of the controller model, a small analogy is presented.  
The operation of a memory may be compared to the way a notebook is used.  To write in 
the notebook, first the notebook is chosen, then it is opened to the required page and a 
line is filled in. There is small delay after a line is filled when the writer has to move to 
the next line. Once the page is filled, the writer flips to the next page. When a notebook is 
kept open, the writer may choose to write on the left side or the right side. So a notebook 
always has 2 banks. Now consider the operation of a memory. First the memory is 
enabled. Then a bank is chosen (a page in the notebook) and a row is opened (a line in the 
notebook) and then the column is chosen. Within the same row, the memory can be 
accessed at full speed if the latency is pipelined. If a new row is to be accessed then the 
current row is closed (precharge) and the new row selected. There is a latency involved 
here. At any time, there is only one active row in a bank. But each bank may have its own 
active row.  
 
6.7 Core Controller 
The memory operations can be carried out one after the other with each delay being 
waited out completely. Certain operations can be pipelined. To explain pipelined 
operation, operations that are not pipelined are explained first. At anytime, in a single 
memory bank, only one row can be kept open or active.. Imagine that two values have to 
be written into the memory in the same row and that the CAS Latency is two clock 
cycles. Assuming that the row is already active, the first read command is issued. After 
two clock cycles, the data is obtained. Now the second read command is issued and again 
 42
 
after the two cycles delay the data is obtained. This is shown in Figure 6-5, where Dout1 
and Dout2 are separated by a time period of 3 clock cycles.  
 
 
READ READNOP NOP NOP NOP 
Add. 1 Add. 2
Dout 1 Dout 2
Figure 6-5 Non-Pipelined Operations 
 
If the operation is pipelined then the two reads can be issued continuously. This is shown 
in Figure 6-6 where the two addresses are issued in adjacent cycles. It is obvious from 
comparing the figures that the operation can be finished two cycles earlier. So, pipelining 
can help in improving the efficiency. Some of the memory operations that are pipelined 
here are the CAS delay for each read operation and the tWR for each write operation. 
 
If two consecutive read or write operations are to the same row in the same bank, then the 
latency associated with closing and opening the active row is avoided (Figure 6-6). This 
speeds up the controller considerably. At any time, each bank can have its own active 
row, provided the maximum time for which a row can be kept active is not exceeded 
(tRAS ) 
 
It is possible to provide more pipelining and the idea is to develop the system as a whole 
rather than focusing on optimization techniques. As pipelining becomes deeper, the code 
becomes much more complex.  
 43
 
 
 
Add. 1 Add. 2 
Dout 1 Dout 2
READ NOP NOP READ NOP NOP
Figure 6-6 Pipelined Operations 
 
For pipelining to be possible it is essential to be able to analyze more than a single set of 
inputs. To enable this, the whole operation is commenced after a minimum number of 
clock cycles, so the controller can look ahead at the values. Figure 6-7 represents this 
idea. The addresses from two consecutive read commands are compared to see if they are 
issued to the same open row. This delay of a couple of clock cycles doesn’t really affect 
the efficiency of the process because it is insignificant compared to the pipelining 
process. During the wait time, the values have to be stored. So FIFO’s are used to store 
the values. 
 
A command FIFO is used to store the Row address, column address and input data. For 
write operations the corresponding input data is stored and for read operations zeroes are 
stored as input data. The most significant bit in the Row address is used to identify 
whether the operation is a read or a write. The FIFO can be imagined to be a single FIFO 
large enough with each value capable of storing enough bits to accommodate the row 
address, column address and data.  
 
 44
 
 
READ NOP NOPREAD READ READ
Add. 1 Add. 2 Add. 3 Add. 4
Cmp 1 Cmp 2 Cmp 3
READ READ READ READ
CLK 
Operation 
Command 
Issued 
Address 
Comparison 
of Addresses 
2 cycles delay 
Figure 6-7 Address Pipelining Stage 
 
Now that all the basic terminology has been discussed, the actual algorithm used in the 
controller core is explained. The core is implemented as a finite state machine (FSM), 
using the one hot encoding technique. The state machine, which is based on a state 
diagram, uses control signals to control the order in which these commands are executed. 
It is referred to as one hot encoded because at any time during the execution, only one 
state is active. Therefore the FSM is used to execute a series of sequential commands.  
 
The state diagram is shown here in Figure 6-8. As seen from the figure, the state machine 
is in idle state (C_NOP_IDLE) as long as there are no commands issued. If any command 
is available then an active command is issued to the bank indicated by C_ACTIVE. The 
CAS latency as programmed in the mode register is waited out (RCD1 and RCD2). In 
this case the CAS Latency is defined as 2 clock cycles. Depending on the command type 
the next command is a read (C_READ) or a write (C_WRITE). Because the controller is 
pipelined, the row and bank addresses of the current and the next command available are 
compared along with the type of command. If the next command is of the same type and 
 45
 
accesses the same row then the next state is the same command. Hence consecutive reads 
or writes can be issued to the same row. 
 
However reads and writes cannot be mixed. If the command type is ‘read’ then CAS 
delay is waited out as two NOP cycles (CAS1 and CAS2). If the command type is ‘write’ 
then tWR is waited out (C_NOP_WR1). After read or write is issued the next command 
issued is the precharge (C_PRECHARGE). The precharge time tRP is waited out 
(C_NOP_PRE1, C_NOP_PRE2 and C_NOP_PRE3). At the end of the precharge it is 
checked if refresh is due. If yes then the refresh command is issued (C_REFRESH). Here 
tRF is waited out (C_NOP_REF1 and C_NOP_REF2). After refresh the control is again 
transferred to precharge where it is checked if there are commands available. If there is 
no available command then the state machine again switches to idle state. If any 
command is available the active command is issued and the cycle continues. 
 
6.8 Detailed View of the Memory Controller 
Figure 6-9 provides a detailed view into the register level of the memory controller. In a 
way this is also a slightly top level view as not all the registers are shown. The interested 
reader is referred to the verilog code in Appendix for further details. The dotted block at 
the top is the initialization algorithm and the dotted block at the bottom is the core 
controller.  
 
The memory is initialized using the start-up sequence which is realized using the state 
machine shown in Figure 6-4. A reference counter and a wait counter are used to count 
the number of clock cycles that need to be waited out for proper execution of the memory 
operations.  
 
The core controller has a multiplexer (MUX) at the input which selects the memory 
service request, address, data and valid signals from either the LUT store algorithm, input 
image store algorithm or the image warping algorithm. The select signal depends on the 
state of the dynamic controller shown in Figure 4-6. 
 46
 
 
Figure 6-8 State machine for the Memory Controller 
 47
 
The values thus selected are written into the FIFO (miscellaneous FIFO). A second MUX 
selects values from either the Misc. FIFO or the scan out FIFO depending on the ‘empty’ 
signal from the two FIFO’s. Once the values are selected the memory operates on them in  
a similar way irrespective of the origin of the command. The command is now pipelined 
for execution. A set of registers are seen like ‘row’, ‘cmd’ , RC, WC, ‘col’ and ‘data’ 
which contain the row, command, read command valid, write command valid, column 
and data values respectively.  
 
 
 
reset 
Clock 
B 
U 
F 
F 
E 
R 
IO 
control
Select 
SM 
Refresh 
Counter
FIFO 
M
U
X
M
U
X
M
U
X
PV
PH
IOB
row
RC
WC
cmd
col
data
row
RC
WC
cmd
col
data
Dly
Data
Address
Bank Address
Command
Ref. 
Counter 
Wait 
Counter 
Scan out 
FIFO
Select
Init done
SM
==
==
PE
CE
State
Figure 6-9 Detailed View of the Memory Controller 
 
PE and CE stand for Page Equal and Command Equal registers. The first command 
pipelined is carried out. The second command is compared with the previous and checked 
if it uses the same row (result stored in PE) and if it uses the same command (result 
stored in CE). If both registers have a true result then the PH or the page hit register 
stores a hit value. The PV or page valid register holds a marker to a valid page. When 
both the PH and the PV registers have a positive result the second command is executed 
on the cycle after the first one gets executed. The same is carried out for all the operations 
 48
 
until a command comes through which requires a new row to be opened. The second set 
of registers shown with the name ‘Dly’ implies a delayed copy of the first set of values. 
Both set of registers take as inputs, output signals from the State Machine (SM) block. 
This state machine is shown in Figure 6-8.  
 
The MUX at the extreme right end of the block selects the address, band address and 
command from either the initialization or the core controller depending on the ‘init_done’ 
signal. When the init_done signal is low, the initialization is under way and the MUX 
selects the signals from the upper block. When the init_done signal is high, the start-up is 
complete and the memory controller can now service read and write requests. 
 
 49
 
Chapter 7 :  Image Warping 
 
This chapter describes the image warping controller shown in Figure 2-7 and Figure 4-5 
in detail. The different stages in the warping algorithm are seen. A detailed view of the 
warping algorithm is presented and the state machine controller is explained. In the final 
section of this chapter, some of the small, important circuits in this design are discussed. 
The words “frame” and “image” are used interchangeably throughout this chapter. 
 
7.1 Warping Algorithm 
Image Warping is the actual stage where warping occurs. The warping operation involves 
three sub stages. 
1. Reading out the LUT values. These values contain pixel locations on the input 
image frame (referred to as input pixel). Two LUT values are used to store the 
entire address of the input pixel. 
2. Reading out the pixels values from the input image after forming the complete 
address specified by the LUT. 
3. Writing these values into the transformed image frame (referred to as output 
pixel). 
 
7.2 State Machine Controller 
A state machine is used to keep track of the operations and its state graph is shown in 
Figure 7-1. If the input image frame or LUT is being stored then the warp state machine 
is idle. Once warping is enabled the control is transferred to the stage where LUT values 
are read (LUT_RD). During this stage, an address is generated to read the LUT values. If 
two less than the number of LUT read operations (N) have been requested, or if the 
address FIFO overflows in which case new read requests cannot be stored, then control is 
transferred to a no operation stage (NOP_LUT_RD). If the issue were FIFO overflow, 
then when the traffic is lesser, control is transferred to the LUT_RD stage where more 
read operations will be requested. If the change were to occur due to completion of N-2 
 50
 
operations, then just one more LUT_RD operation is performed, and the control is 
transferred back to the NOP stage. The reason for performing N-2 operations instead of N 
operations is as follows. There is a single cycle delay between the state machine and the 
status check for the number of requests, and a two cycle delay between the state machine 
and the address storage in the FIFO. There is a compulsory wait stage between the last 
couple of operations so that the number of issues can be tracked properly.  
 
Once the control is transferred to image read stage (IMG_RD), the image values are read 
similarly. This is the only stage in the entire project where the memory access is not 
sequential as the LUT values stored in the data FIFO is the source of the row and column. 
It is necessary to make sure that there exists data to supply addresses.  
 
Once the reads have been issued the control is then transferred to the image write stage 
(IMG_WR), where the transformed image is written into a new frame. The address is 
generated sequentially by an output pixel address generator and the data to be written are 
the read results from the IMG_RD stage. A mechanism similar to LUT_RD stage is 
employed in the IMG_RD and IMG_WR stages to keep the overflow problem in check, 
and also to track the correct number of requests for input pixel reads and the output pixels 
written. The no operation stage corresponding to the image read stage is NOP_IMG_RD 
and the no operation stage corresponding to the image write stage is NOP_IMG_WR. 
 
Finally all registers are cleared for a new warping cycle (CLR_ALL). During any stage, if 
the warping of an entire frame is completed, the control is transferred to the 
WARP_TRAP stage. These three stages constitute a single warping cycle and this is 
performed repeatedly till an entire frame is transformed, at the end of which all values are 
reset and a new input image scanned in.  
 
7.3 Detailed View of the Image Warping Controller 
The image warping controller, just like most of the other parts in this design is very 
sequential in nature. The block diagram seen in Figure 7-2 contains a state machine  
 51
 
 
Figure 7-1 State Machine for Image Warping 
 
 52
 
This block diagram is shown in a dotted block at the top. The state machine and its details 
are explained in Section 7.2. The state outputs are very important.These outputs form 
inputs into the counters shown in the figure. There are counters for each stage of the 
image algorithm. The counters are used to track the total number of cycles in a stage 
accurately. The counter values are registered and the registered values are compared to 
certain values to test if a certain value is reached. The results from this operation form 
inputs for the state machine. At the end of a complete cycle of warping, the counters are 
reset to a specified value. 
 
 
AG1 
LUT DATA + 
Input DATA 
AG2 
M
U
X 
D
E
M
U
X 
 
Counter 
R/W 
R
E
G 
FRAME DONE 
CHECK 
REG REG 
== 
Select 
A 
A 
State 
Counter 1 
Counter 2 
Counter 3 
== 
== 
R
E
G 
 
ROW FIFO 
COL FIFO 
DATA FIFO 
Track Count 
Change 
 
Figure 7-2 Inside the Image Warping Controller 
 
 53
 
At the bottom left of Figure 7-2, a block is observed with the tag ‘LUT data + input data’. 
The data read out from the memory during the read cycles can be pixel data (if input 
frame or scan out frame) or address locations (if LUT frame). To make a complete LUT 
address two adjacent data values are required. However in case of pixel data, each data 
value represents a complete data. Both these data values are stored in a 32 bit FIFO. 
Some logic is required to determine if the data values represent pixel values or address 
locations. Every warping cycle has one LUT_RD stage in which 512 addresses are read 
out and a IMG_RD stage in which 256 pixel values are read out. Also the LUT_RD stage 
is completed prior to the IMG_RD stage. Since these numbers can be predetermined a 
counter is used to keep track of the type of data. The first 512 valid data are grouped to 
form addresses and the next 256 values are embedded with leading zeroes to form 32 bit 
data values. 
 
A MUX is used to determine the row and column address in each stage of the image 
warping circuit. In case of LUT_RD and IMG_WR stages, address generators generate 
the rows and columns whereas in the IMG_RD stage the addresses are obtained from the 
data FIFO. The select signal again depends on the state outputs.  
 
All the commands are sent to the memory. Depending on the type of command (Read or 
Write) the signals are demultiplexed. The read/write bit is the MSB of the row address. If 
the MSB is a read then this is sent to the memory as a read request. If the MSB is a write 
then the data value is read out of the FIFO and sent to the memory as a write request. 
 
At all times, the warping controller checks to see if the final warping operation for a 
particular frame is completed. If it is then the warping operation for a frame is over and 
the warping controller goes to idle state and the values are reset. Otherwise, normal 
operation occurs. 
 
 54
 
7.4 Small Functional Units and Circuits Used in the Design 
7.4.1 Address Generation Module 
Addressing is generated as follows. Each frame with a resolution of 640 rows by 480 
columns is stored in a memory array of 600 rows and 512 columns. All frames other than 
the LUT use this addressing. Since each pixel has two LUT entries, the LUT takes up 
1200 rows [Section 0]. 
 
The input frame stores the input image. The LUT frame stores the LUT. Then there are 
two frame buffers. The first one is called the working buffer. It is the frame being warped 
currently. The other frame is called the scan out buffer and is the frame being scanned 
out. At the end of a complete frame warp these two buffers swap names. The frame that 
is recently warped becomes the new scan out frame, and the other becomes the work 
buffer where a transformed image is written. The addressing for the 600*512 frames are 
generated by instantiating the main address generator module and a constant is added to it 
to refer to different frames. The values for rows generated are from 0 to 599 and for 
columns are from 0 to 511. For instance, input frame is stored from rows 0 to 599. So 
nothing has to be added to it. The work buffer and the scan out buffer are from 1800 to 
2399 and 2400 to 2999. So constant values of 1800 or 2400 are added depending on 
whether the frame is currently a scan out buffer or a working buffer.  
 
The LUT frame is stored from rows 600 to 1799. So the number 600 is added to the 
generated values. However a different address generator generates 1200 rows by 512 
columns for the LUT frame. 
 
7.4.2 FIFO 
 
FIFO’s are queues [11] with first in first out capability. A certain amount of memory is 
required for storing the words. This depends on the width and depth of the FIFO. There 
are two ways to implement a FIFO. The first option is to use the core generator modules 
available within the FPGA [2]. The core generator is available along with the integrated 
 55
 
environment. The second option is to use a template from Xilinx [2]. Table 7.1 lists the 
configuration options available with the core generator module. 
 
Table 7.1 Configuration Option While Using the Core Generator 
FIFO Parameter Description 
Width of input Number of bits in the input signal. 
Depth of FIFO Maximum number of input words that may have to be 
stored at any time. 
Type of memory used FIFO can be implemented in Block RAM [Section 4.2] 
or distributed RAM. Distributed RAM uses the logic 
blocks distributed over the entire area of the FPGA. 
 
 
FIFO’s can be either synchronous or asynchronous. Synchronous FIFO’s take in a single 
clock and this clock signal is used as both read and write clocks. Asynchronous FIFO’s 
have two separate clock ports, one for reading and one for writing. Some of the FIFO 
parameters are listed below. In this project only synchronous FIFO’s are used. Table 7.2 
lists some of the signals used. 
 
Table 7.2 Signals Used in FIFO Module 
FIFO signals Description 
Clock Clock Signal used for read and write operations 
Data Input This is the input signal bus. The maximum number of 
bits is set using the ‘input width’ parameters 
Read Enable This represents the read request. If high, the value on 
the input bus is read out onto the FIFO.  
Write Enable This represents the write request. If high, the value on 
the input bus is written into the FIFO 
Data Out This is the output signal bus which is ‘width’ number 
of bits long. 
 56
 
Full Flag The FIFO is filled to the maximum height. No 
additional writes are valid. 
Empty Flag The FIFO is empty. No reads are valid. 
Data Count Indicates the number of words present in the FIFO. 
Read Acknowledge Indicates that the read requested on the previous cycle  
was fulfilled. Also this signal is used to indicate that 
the value on the output bus is valid. 
Write Acknowledge Indicates that the write requested on the previous cycle 
was fulfilled. 
Read Error This flag is set when a read is requested from an empty 
FIFO. It is also known as Underflow condition. 
Write Error This flag is set when a write is requested of a FIFO 
which is full. It is also known as Overflow condition. 
 
7.4.3 Multiplexing rows, columns, data and valid signals 
There are large Multiplexers that are used to select the row address, column address, data 
value and valid signal depending on whether the memory operation is a read or a write. 
The MSB of the row address stored is ‘0’ in case of reads and ‘1’ in cases of writes. The 
next two higher bits contain the bank address. So, the MSB is used to determine whether 
the memory operation requested for the row and column address specified is read or 
write. In case of reads the addresses are supplied to the memory queue. If the operation is 
a write though, the data on top of the FIFO is read out along with the row and column 
addresses.  
 
7.4.4 Tracking Data 
There are two queues which store the command. One is meant to store scan out requests 
alone. The other contains requests for storing the LUT or storing the input frame or 
warping the frame. Whenever the scan out queue is empty, the requests from the other 
queue are serviced. The presence of even a single request in the scan out queue indicates 
 57
 
that it be serviced first. Write requests generate no output from the memory. However, 
when the memory is read from, the output is used somewhere. There arises the issue of 
identifying where the value read out by the memory is used. Storing values inside the 
LUT or input frame only requires write operations. Image Warping requires both read 
and write operations. Scan out requires only read operations. A very important albeit 
small circuit is used to identify whether the data read out by the memory is meant for 
scan out or for image warping. 
 
It is difficult to find out which module the data is intended for inside the memory 
controller. However, as and when data is read out of either one of the two command 
FIFO’s it is easier to identify the source. Also, image warping and scan out are always on 
different frames. So when a row is opened, either all values read out belong to scan out or 
to warping. The number of values read out from each FIFO is kept track of using 
counters. If a value is read out from a FIFO, then the count increases by one. The count 
whose values were increased first is also kept track of. As values are read out from 
memory, the one which increased first gets decremented first. As soon as this count 
reaches zero, the other starts getting decremented and so on. The two FIFO counts are 
mutually exclusive and both can never vary at the same time.  
 58
 
 59
Chapter 8 :  Simulation and Image Results 
 
In this chapter, some of the system functional and performance validation techniques are 
discussed. This chapter also shows several waveforms generated during different stages 
of system operation which validated certain functional and performance operation of the 
system. 
 
8.1 Validation Tools 
Modelsim, a popular HDL simulator [19], was used to run the simulations. Heavy 
debugging was possible with Modelsim. During debugging, error signals were used to 
identify the problem and fix it. Some of these waveforms are the results of the post 
translate simulations and some from post place and route simulations. The best method of 
validation for image processing application is by configuring the hardware with the .bit 
file and then displaying proper image frames. In this method, several input images and 
Look Up Tables were used in testing.  Using Modelsim, most errors can be identified and 
fixed. But errors due to timing parameters in the memory and due to interruption of 
memory operations is not directly reflected and such errors were always corrected using 
memory simulation models. The simulation models played a significant role in the 
identification of timing issues. 
 
8.2 Test Conditions 
On a Hewlett Packard machine with a Pentium 4 processor, running at 2 Ghz, having 1 
GB RAM and running no other applications, the longest simulation (6.5s) took about 10 
days. All signals are italicized in this chapter to avoid confusion 
8.3 Initialization Sequence 
Figure 8-1 shows the power up sequence for the memory. All banks are first precharged. 
This is followed up by issuing two refresh commands. Then the mode register is 
programmed with a CAS delay of 2 clock cycles [3,4,5]. 
 
 
Figure 8-1 Power-Up Sequence for SDRAM 
Figure 8-2 First Warping Cycle 
 
 
60 
 
 61
8.4 LUT Storage 
The waveform in Figure 8-2 shows the LUT being stored. lut_store, image_store and 
image_warp are the signals that enable the different modules. The transitions in lut_store 
and image_store stages are marked with an ellipse. Similarly the transitions in 
image_store and image_warp stages are also seen. From the figure, the time taken to 
store an image is roughly twice the amount of time taken to store the LUT. This is 
expected because the LUT is twice the size of an image frame [Section7.4.1]. 
 
Around 100ms there are several events occurring. Chief among them is the transitions in 
the image_warp and the image_store signals. Since this is the first frame where 
warp_done goes high, the scan out is enabled. Also, scan out uses the frame that was 
warped recently. A new frame starting at row 2400 is then selected as the working buffer. 
The transition in the vsync_n indicates that the display is in use. track_data dictates 
whether the value read from memory is to be used as a pixel meant for scan out or value 
used in warping of images. When it is 0 the value is meant for scan out and when it is 1 
the value is identified as belonging to the image warping module. The track_data 
circuitry is enabled only when the scan out is active and until then it has a default value 
of 1[Section7.4.4]. 
  
8.5 Validation of Values Written into LUT 
Figure 8-3 shows that the LUT values are written at the correct address. LUT data values 
511, 324 and 126 need to be stored in columns 216, 217 and 218 respectively, shown 
inside the circle just after 869µs. The memory controller writes these values into the 
correct columns and is shown inside the ellipse after 870µs. The multiplexing operations 
are shown in the code snippet in Figure 8-5.  
 
 
Figure 8-4 LUT Frame Validation after Interruption by Refresh Operation  
Figure 8-3 LUT Frame Validation 
 
 
62 
 
 63
 
When dealing with images it is necessary to get every single pixel correct. Even if a 
single pixel is lost there will be significant distortion when viewing the output image. 
Refresh operations that are performed periodically, interrupt writing or reading of the 
memory values. Figure 8-4 shows that pixels are not lost during refresh. LUT data 201 
and 553 need to be stored in columns 220 and 221 respectively. The LUT data 201 is 
stored in 553 and then refresh is carried out. After the completion of refresh operation, 
data value 553 is correctly stored in location 221. Also, during refresh, the generation of 
new data is stopped because the memory queue is full.  
Figure 8-5 Multiplexing Data, Row, Column and Valid Signals 
 
 
64 
 
Figure 8-6 Transition from Writing LUT to Writing Image Frame
 
 65
Figure 8-6 shows the transition from LUT to Scan In. As described earlier, 
lut_frame_detect goes high once the LUT frame has been completely written. 
sd_row_wr_lut signal carries the address of the row to be written into. Immediately after 
the end of frame is detected it goes from 34567 to 33368. The MSB indicates a ‘write’ 
operation. The 2 digits after MSB represent the bank. The actual row is represented only 
in the last 13 bits. Table 8.1 lists the binary representation of sd_row_wr_lut signal and 
Table 7.2 lists the decimal representation of sd_row_wr_lut signal. 
 
Table 8.1 Binary Representation of sd_row_wr_lut Signal 
Decimal number Binary number 
34567 1000 0111 0000 0111 
33368 1000 0010 0101 1000 
 
Table 8.2 Decimal Representation of LUT Rows 
Binary number Decimal Number 
1000 0111 0000 0111 1799 
1000 0010 0101 1000 600 
 
This shows that the address generator works as expected. The LUT which stores 1200 
rows in 512 columns is stored from row 600 to row 1799, both inclusive.  
 
8.6 Rapid Operations During Non-Active Display 
Image warping is carried out as quickly as possible i.e. whenever the scan out module 
doesn’t need pixels. It is seen that during blank region and porches (Figure 8-7), marked 
by ellipse on hsync_n and vsync_n, the last four signals of the image change faster. In the 
figure the blank regions and porches cannot be distinguished. This is as expected. Figure 
8-7 can be divided into 3 parts. First the image warping is completed and the scan out is 
enabled. Now, the display controller needs pixels and reads them from the memory. 
 
 
66 
 
Figure 8-7 Operations during Active Vs Blank Display Time 
 
 
 
 67
As the display has vertical blank regions and porches initially, the scan out queue gets 
full. The vertical sync signals and porch are still not active. The next part involves 
writing a new input image frame into the memory. During the non-active region of 
display, the write operations are carried out as fast as possible.During the final part, the 
display becomes active and pixels are regularly supplied to the scan out queue, and the 
writing of the input frame occurs only during horizontal blank and porch regions. 
 
Irrespective of whether an input frame is to be written or if the input is to be warped, 
during the time the display is not active, these operations are performed rapidly. hsync_n 
and vsync_n are active-low blank signals. The rest of the signals are active-high.  
 
The first part of Figure 8-7 is zoomed in the next figure. From Figure 8-8, it can be 
observed that the sd_row_rd_scanout, sd_col_rd_scanout and the rd_addr_valid_scanou 
signal are all generated only when the sdram_read_request_scanout signal is high. The 
request becomes low once the scan out queue becomes almost full. 
 
8.7 Validation of Image Warping Stages 
One of the main stages to be validated is image warping. There are two important things 
to verify in the image warping stage.  
i) Order of the warping stages. 
a. The order should always be LUT_RD to IMG_RD to IMG_WR.  
ii) Exact number of operations performed. 
a. This parameter has a value of 256 in this design. 
Three figures, each of one of the warping stages, are shown in this section.  
 
Figure 8-9 shows the change from LUT_RD stage to IMG_RD stage. The transition is 
orderly. Also, there are 512 LUT values that are read out (256 values with 2 LUT entries 
per value). Here, the first 511 values are requested to be read out. This is shown in the 
first rectangle in the figure. Then the 512th value is read out separately [Section 7.4.1] and 
is shown in the second rectangle. 
 
 
68 
 
Figure 8-8 Transition from Warping to Writing Input Frame and Scan Out Enabled 
 
 
 69
The process of slowing down the operations helps in keeping track of a particular 
number. Also the delay between the registration of LUT_RD stage and registration of 
read request (rc_fifo_en signal) is 2 clock cycles. Figure 8-10 shows the change from 
IMG_RD Stage to IMG_WR stage. The transition is again correct. Here 256 image pixels 
are read out from the input image frame. The first rectangle shows 255th value being 
requested for and the second rectangle shows the registration of the 256th value.  
 
Similarly Figure 8-11 also shows that the number of image values written into the 
transformed image is 256. However, the IMG_WR stage is the last stage in one warping 
cycle. After the completion of this stage the warping cycle is reset so that the next 
warping cycle may begin. The reset_cntr signal goes high for one cycle indicating the 
end of the current warping cycle. At this time, all the values are reset. The resetting of the 
lut_requests, img_rd_requests and img_wr_requests signals is seen inside the last 
rectangle. 
 
In all these three figures count_change signals are seen. These signals are very helpful in 
debugging and kept track of any changes in the number of requests especially when the 
warping state machine was in a NOP stage. 
 
8.8 Validation of Memory Operations 
The correct operation of the memory controller is vital to the project. The controller 
should always write and read values from the correct address. Figure 8-12 Memory 
Operations Involved in Writing an Input Frame shows the input frame being written into 
the memory. scanout_enable is low and hence scan out isn’t being carried out 
simultaneously. During this period the display controller is inactive and the entire 
memory bandwidth is allocated to writing and warping of the input frame. rd_en_misc is 
high for a short time, followed by a low period and then high again. Whenever this read 
enable is high, values are read out from the ‘miscellaneous’ queue.  Every time a write is 
performed, it is checked to see whether the operation is a hit i.e. if a write was performed 
to the same row in the previous cycle. 
 
 
 
70 
Figure 8-9 The LUT Read Stage in Image Warping Cycle 
 
 
 
71 
 
 
Figure 8-11 The Image Write Stage in the Image Warping Cycle 
Figure 8-10 The Image Read Stage in the Image Warping Cycle 
 
The hit signal indicates this. page_eq checks whether the previous operation was to the 
same page and cmd_eq checks whether the previous operation was a read or a write. 
Since this particular frame performs operations that are very sequential, the memory 
continuously issues write commands to these locations. dly_2_row and dly_2_col hold 
the row and column addresses. 
 
Figure 8-13 Memory Operations during Image Warping. shows a sequence of operations. 
Here, image pixels have been read out of the memory. These pixels are then written into 
the transformed image. Two things are validated by looking at the figure.  
i) The values are written into the correct columns in the correct order. The 
memory doesn’t lose pixels during opening or closing of rows or during 
refresh operations.  
ii) Also, track_data is high indicating that the value read out belongs to the 
image warping module. It is important to note that track_data changes only 
during or after read operations as it is needed to track values that are read out. 
Figure 8-8 shows a case of where track_data is low. 
 
8.9 Simulation Validation of Overall System Organization, Architecture, Design and 
Performance 
Figure 8-14 shows a total post-implementation HDL simulation run of 6 seconds. This is 
marked at the right bottom in a rectangle. In real time the display can be kept powered on 
for several hours also. The initialization operations occur in a small fraction of the time 
and are not clearly visible here. They are explained later on. A few points of interest in 
this waveform are as follows. 
 
1) It is seen that image_store and image_warp have alternating transitions from time 
0.5s to 6s. This shows that a new input frame is written (image_store high and 
image_warp low) and then this input is transformed (image_store low and 
image_warp high).  
 72
 
 73
2) After scan out is enabled (indicated by change in workbuffer_addr_select signal 
the first time), every time image_store goes high, scan_addr_select and 
workbuffer_addr_select signals swap values. This shows that after an entire frame 
is warped, the newly warped frame gets displayed and the old one is used as a 
working buffer for a new transformation. 
 
3) During the entire time period following initialization, vsync_n dips low often 
about 360 times in 6 seconds. That is 60 times in a second which translates to 60 
fps. During image warping, scan out happens continuously. 
 
4) Finally the error signals marked dt_rd_err, dt_wr_err, rd_err_out, wr_err_out 
and scanout_sync_error are all always low indicating that there are no errors 
detected in the modules. The first two error signals show that the data FIFO in 
image warping never overflows and exactly 256 requests are warped every small 
warp cycle. The next two show that the output FIFO for scan out is never empty 
or full. So whenever there is request for data by the display controller, the FIFO 
serves as a source. 
 
It is never so full that pixels are lost while writing into the FIFO itself. The low 
scanout_sync_error signal shows that there is always equal number of rows and columns 
in the row and column FIFO. 
 
8.10 User Constraint File 
The user constraint file is used to map output signals to pin locations on the board. The 
time period of the clock signal along with the duty cycle is also specified. This file 
enforces a stricter timing. The file is found in the Appendix. 
 
 
74 
Figure 8-12 Memory Operations Involved in Writing an Input Frame 
 
 
 
 
 
 
 
 
 
 
75 
Figure 8-13 Memory Operations during Image Warping. 
 
 
 
 
76 
 
Figure 8-14 Post Place and Route Simulation Waveform for 6 Seconds of Operation 
 
 
 
 
 
8.11 Validation Using Image Results. 
As mentioned in the beginning of this chapter, the best validation technique is to verify 
the design using the results from the display device. This is illustrated in figure Figure 
8-15. In this Figure, (a) shows the simulated input image. Simulation here means that the 
image is generated in the hardware and not taken from a display input port, as one if not 
available on the board. This image is generated in such way that horizontally, every pixel 
varies from its neighbor. Vertically, each line has the same pixel in the column as the 
previous line. As each line is scanned from right to left, even if a single pixel has issues it 
can be easily spotted as there will be distorted lines. The first transformation tested is 
identity. The LUT is also simulated on the hardware and not given as input into the 
design. The LUT maps every single output pixel to the pixel in the exact same location in 
the input image. The output image is seen in (b). The identical output image has no 
distortion. To further debug, the expected generated image was also developed for this 
transform and subtracted from the generated image to see if there were any difference. 
All the pixels were generated correctly. 
 
(a) (b) 
Figure 8-15 (a) Simulated Input Image (b) Identical Transformed Output Image. 
 
The image results validate that the overall operation is indeed performed correctly. That 
is, the input image is stored correctly, the image is warped correctly using the LUT and 
 77
 
that the display controller does display the proper image. Also, the memory controller 
reads and writes proper values from the memory. 
 
(a) (b) 
Figure 8-16 (a) Shifted Output Image (b) Output Image Rotated 45° with respect to 
Origin. 
 
Four other transformations are also tested. Figure 8-16 (a) shows the translation 
transformation. The output image is a shifted version of the input image. The input is 
Figure 8-15 (a) shifted 160 pixels to the right in this image. Figure 8-16 (b) and Figure 
8-17 show results from rotational transformations. In the Figure 8-16 (b)Figure 8-15, the 
input image is rotated 45° with respect to the origin (the top left corner is the origin) to 
produce the output image. In the second set of figures, Figure 8-17 (a) shows the input 
image rotated 45° with respect to the center of the image to produce the output image, 
whereas Figure 8-17 (b) shows the input image rotated -45° with respect to the center of 
the image to produce the output image.  
 78
 
 
(b) (a) 
Figure 8-17 (a) Output Image Rotated 45° with respect to Center (240, 320)  (b) 
Output Image Rotated -45° with respect to Center. 
 
 79
 
 
Chapter 9 :  Conclusion and future work 
9.1 Summary 
The goal was to develop a system and architecture to perform non-block based warping 
on an input image using a specified LUT. The system takes a single input image and 
produces via transformation, a single output image. The LUT is written into a bank of the 
memory. The image warping algorithm, specified by the LUT, is performed on the image 
to produce the warped final image. The final image is scanned out to a display device 
using a display controller.  
 
9.2 Conclusion 
The LUT based system organization and architecture hence designed was able to perform 
arbitrary image warping. The individual modules were validated in Chapter 8. Identity, 
translation (shift) and rotation transforms were tested on simulated input images and the 
image results were shown in Section 8.11. The memory controller designed for this 
purpose is fully functional. The system organization and functionality was experimentally 
verified on a commercially available prototype board shown in Figure 9-1 . The output 
image was run at a resolution of 640 x 480. As seen in Chapter 8, the image warping 
algorithm based on the LUT worked correctly. Table 9.1 lists the amount of resources 
used in the FPGA. 
 
Table 9.1 Device Utilization Summary 
Logic Utilization Used Available Utilization 
Number of Slice Flip Flops: 1,079 15,360 7%
Number of 4 input LUTs: 1,524 15,360 9%
Logic Distribution:     
Number of occupied Slices: 1,604 7,680 20%
Number of Slices containing only related logic: 1,604 1,604 100%
Number of Slices containing unrelated logic: 0 1,604 0%
 
 80
 
(Table 9.1 continued) 
 
Total Number 4 input LUTs: 2,680 15,360 17%
Number used as logic: 1,524   
Number used as a route-thru: 274   
Number of bonded IOBs: 53 173 30%
Number of Block RAMs: 1 24 4%
Number of MULT18X18s: 2 24 8%
Number of GCLKs: 2 8 25%
Number of DCMs: 1 4 25%
 
 
 
Figure 9-1 Hardware Used for Testing Design 
 
 81
 
9.3 Future Work 
9.3.1 Real-Time Images 
The input images here are only simulated. To get real-time input frames, a video input 
port along with a decoder is necessary. 
 
9.3.2 Resolution 
The images could be run at a higher resolution of 1024 x 768. More memory is required 
to store these larger frames.  
 
9.3.3 Speed 
The image warping is too slow for continuous display and the output does not vary for 45 
frames in the worst case scenario where every pixel is accessed from a unique row in the 
memory. This can be improved by increasing the speed of operation of the memory by, 
increasing the level of parallel activity and by using a higher speed target technology. 
Additional logic can be added to optimize the design.  
 
9.3.4 Scalability 
The design could be tested for scalability. The number of input and output frames could 
be changed and warping could be carried out on these frames to test for functionality. 
 
Dr. Ruigang Yang, Assistant Professor of Computer Science at the University of 
Kentucky, is currently investigating new hardware architecture to facilitate higher 
performance and to support multiple pixel streams. He is also involved in developing 
optimization techniques for the overall design. 
 82
 
 
APPENDIX 
 
Memory_Controller_Top module: 
`timescale 1ns / 1ps 
module Memory_Controller_Top(clk_in, reset, dvi_clkout, sd_clkout, 
     red, blue, green, 
hsync_n, vsync_n, 
sd_cke, sd_cs_n, sd_dqm, command, addr, 
baddr, data); 
 
 //Inputs 
 input clk_in; //50 Mhz 
 input reset; 
 
 //Outputs 
 output sd_clkout; 
 output dvi_clkout; 
 
 //dvi output 
 output [2:0] red; 
 output [2:0] green; 
 output [2:0] blue; 
 output hsync_n; 
 output vsync_n; 
  
 
 wire clk; 
 
 // sdram connections 
 output sd_cke, sd_cs_n; 
 output [1:0] sd_dqm; // UDQM LDQM 
 output [2:0] command; 
 output [12:0] addr; 
 output [1:0]  baddr; 
 inout [15:0] data; 
 
 
   
//wire and reg 
wire [12:0] workbuffer_addr_select; 
wire [12:0] scan_addr_select; 
wire [15:0] sd_row_wr_ff; 
wire [8:0] sd_col_wr_ff; 
wire [15:0] data_to_sdram_wr_ff; 
wire [15:0] data_to_sdram_wr_warp; 
wire [15:0] sd_row_wr_warp; 
wire [8:0] sd_col_wr_warp; 
wire [15:0]  sd_row_rd_warp; 
wire [8:0] sd_col_rd_warp; 
wire [15:0] sd_row_wr_lut; 
wire [8:0] sd_col_wr_lut; 
wire [15:0] data_to_sdram_wr_lut; 
 83
 
wire [15:0] data_from_sdram_rd; 
wire [9:0] CountY, CountX; 
reg sdram_write_request;//Mem inputs 
reg sdram_read_request; 
reg wr_addr_valid; 
reg rd_addr_valid; 
reg [15:0] sd_row_wr; 
reg [15:0] sd_row_rd; 
reg [8:0] sd_col_wr; 
reg [8:0] sd_col_rd; 
reg [15:0] data_to_sdram_wr; 
wire [15:0] sd_row_rd_scanout; 
wire [8:0] sd_col_rd_scanout; 
wire [15:0] sd_data_rd_scanout; 
wire [15:0] out_row_misc; 
wire [8:0] out_col_misc; 
wire [7:0] image_warping_count; 
 
wire [1:0] sd_dqm; 
wire reset, reset_init; 
wire locked, init_done; 
reg init_done_reg; 
wire warp_done; 
 
 
/********************************************************************/ 
//REGISTERED RESETS 
/*******************************************************************/ 
assign reset_init = ~init_done_reg; 
//AFTER LOCK 
reg lock_reg; 
always @ (posedge clk) 
 if(locked) 
  lock_reg <= 1; 
 else 
  lock_reg <= 0; 
 
//AFTER SDRAM DONE 
always @ (posedge clk) 
 if(init_done) 
  init_done_reg <= 1; 
 else 
  init_done_reg <= 0; 
 
//AFTER LUT FRAME STORE 
assign lut_frame_detect = (sd_row_wr_lut[12:0] == 13'b0011100000111) && 
(sd_col_wr_lut == 9'b111111111);//13'b0010010101111 
 
//AFTER FRAME CAPTURE 
assign image_frame_detect = (sd_row_wr_ff[12:0] == 13'b0001001010111) 
&& (sd_col_wr_ff == 9'b111111111); //13'b0001001010111 
 
/*******************************************************************/ 
//CLOCK GENERATION 
/*******************************************************************/ 
//DCM 
dcm_clkgen CLK_GEN ( 
 84
 
  //input 
  .CLKIN(clk_in), 
  //output 
  .clk25_int(clk), 
  .clk100_int(), 
  .clk100_ext(sd_clkout),  
  .clk25_ext(dvi_clkout), 
  .locked(locked) 
  ); 
 
/*******************************************************************/ 
//MAIN_MEMORY_CONTROLLER   DO NOT MESS WITH 
/*******************************************************************/ 
Memory_Controller MEM_CNTRL ( 
  //inputs 
  .clk(clk), 
  .reset(~lock_reg), 
  .reset_mod(reset_init), 
  .sdram_write_request(sdram_write_request), 
  .sdram_read_request(sdram_read_request), 
  .wr_addr_valid(wr_addr_valid), 
.rd_addr_valid(rd_addr_valid), 
  .scanout_rcbd_empty(scanout_rcbd_empty), 
  .scanout_rcbd_full(scanout_rcbd_full), 
  .sd_row_rd_scanout(sd_row_rd_scanout), 
  .sd_col_rd_scanout(sd_col_rd_scanout), 
  .sd_data_rd_scanout(sd_data_rd_scanout), 
  .rd_addr_valid_scanout(rd_addr_valid_scanout),
  
  .reset_ref(reset_init), 
.sd_row_wr(sd_row_wr), 
  .sd_col_wr(sd_col_wr), 
  .data_to_sdram_wr(data_to_sdram_wr), 
  .sd_row_rd(sd_row_rd), 
  .sd_col_rd(sd_col_rd), 
  //outputs 
  .sd_cke(sd_cke), 
  .sd_cs_n(sd_cs_n), 
  .sd_dqm(sd_dqm), 
  .command(command), 
  .addr(addr), 
  .baddr(baddr), 
  .data(data), 
  .mem_wr_issue(mem_wr_issue), 
  .mem_rd_issue(mem_rd_issue), 
  .rd_cmd(rd_cmd), 
  .rd_valid(mem_rd_valid), 
  .data_from_sdram_rd(data_from_sdram_rd), 
  .page_en(page_en), 
  //Debug Signals 
  .cmd_full(cmd_full), 
  .cmd_empty(cmd_empty), 
  .wr_cmd(wr_cmd), 
  .cmd_avail(cmd_avail), 
  .out_row(out_row), 
  .out_col(out_col), 
  .out_row_misc(out_row_misc[15:0]), 
 85
 
  .out_col_misc(out_col_misc[8:0]), 
  .page_hit(page_hit), 
  .hit(hit), 
  .fifo_wr_en(fifo_wr_en), 
  .page_eq(page_eq), 
  .dly2_row(dly2_row), 
  .dly2_col(dly2_col), 
  .dly_cmd_avail(dly_cmd_avail), 
  .state(state), 
  .fifo_empty_n(fifo_empty_n), 
  .init_done(init_done), 
  .cmd_eq(cmd_eq), 
  .page_valid(page_valid), 
  .rd_en(rd_en), 
  .rd_en_scanout(mem_rd_en_scanout), 
  .sync_error(sync_error),   
  .in_row_mux(in_row_mux), 
  .in_col_mux(in_col_mux), 
  .row_data_count(row_data_count), 
  .col_data_count(col_data_count), 
  .dt_data_count(dt_data_count), 
  .almost_full(almost_full) 
  ); 
 
/*********************************************************************/ 
//DYNAMIC CONTROLLER 
/*********************************************************************/ 
Dynamic_Controller OVERALL_CNTRLR( 
  //inputs 
  .clk(clk), 
  .reset(reset_init), 
  .lut_frame_detect(lut_frame_detect), 
  .image_frame_detect(image_frame_detect), 
  .warp_done(warp_done), 
  .scanout_frame_done(scanout_frame_done), 
  //outputs 
  .scan_addr_select(scan_addr_select), 
  .workbuffer_addr_select(workbuffer_addr_select), 
  .scanout_enable(scanout_enable), 
  .lut_store(lut_store), 
  .image_store(image_store), 
  .image_warp(image_warp) 
  ); 
 
/*********************************************************************/ 
//READDATA TRACKING - ROUTE DATA FROM MEMORY 
/*********************************************************************/ 
Readdata_Tracking TRACKER( 
  //inputs 
  .clk(clk), 
  .reset(~scanout_enable), 
  .scanout_request(mem_rd_en_scanout), 
  .image_warping_request(rd_en && 
(~out_row_misc[15])), 
  .scanout_read(mem_rd_valid && (~track_data)), 
  .image_warping_read(mem_rd_valid && track_data), 
  .page_en(page_en), 
 86
 
  //outputs 
  .track_data(track_data), 
  //debug signals 
  .scanout_count(scanout_track_count), 
  .image_warping_count(image_warp_track_count) 
  ); 
 
/*********************************************************************/ 
//LUT STORE - Only Writes 
/*********************************************************************/ 
Lut_Store LUT_FRAME_STORE ( 
//inputs 
  .clk(clk), 
  .reset(~lut_store), 
  .mem_wr_issue(mem_wr_issue_lut), 
  //outputs 
  .sd_col_wr(sd_col_wr_lut), 
  .sd_row_wr(sd_row_wr_lut), 
  .wr_addr_valid(wr_addr_valid_lut), 
  .sdram_write_request(sdram_write_request_lut), 
  .data_to_sdram_wr(data_to_sdram_wr_lut), 
  //debug signals 
  .sd_row_img(sd_row_lut_img), 
  .sd_col_img(sd_col_lut_img), 
  .sd_row_reg(sd_row_reg), 
  .sd_col_reg(sd_col_reg), 
  .sd_carry(sd_carry), 
  .sd_row(sd_row), 
  .sd_col(sd_col) 
  ); 
 
/*********************************************************************/ 
//DVI SCAN IN AND MEMORY WRITE DATA AND ADDRESS GENERATION 
/*********************************************************************/ 
Dvi_Scan_In_Top SCAN_IN_TOP ( 
  //inputs 
  .clk(clk), 
  .reset(~image_store), 
  .mem_wr_issue(mem_wr_issue_ff), 
  .frame_detect_reg(image_frame_detect), 
  //outputs 
  .sd_col_wr(sd_col_wr_ff), 
  .sd_row_wr(sd_row_wr_ff), 
  .wr_addr_valid(wr_addr_valid_ff), 
  .sdram_write_request(sdram_write_request_ff), 
  .data_to_sdram_wr(data_to_sdram_wr_ff) 
  ); 
    
/*********************************************************************/ 
//IMAGE WARPING  
/*********************************************************************/ 
Image_Warping_Top IMAGE_WARP ( 
  //inputs 
  .clk(clk), 
  .reset(~image_warp), 
  .mem_wr_issue(mem_wr_issue_warp), 
  .data_from_capture(data_from_sdram_rd), 
 87
 
  .data_valid(mem_rd_valid && track_data), 
     
 .mem_rd_issue(mem_rd_issue_warp), 
  .addr_select(workbuffer_addr_select), 
  //outputs 
  .sd_col_wr(sd_col_wr_warp), 
  .sd_row_wr(sd_row_wr_warp), 
  .wr_addr_valid(wr_addr_valid_warp), 
  .sdram_write_request(sdram_write_request_warp), 
  .data_to_sdram_wr(data_to_sdram_wr_warp), 
  .sd_row_rd(sd_row_rd_warp), //14bits 
  .sd_col_rd(sd_col_rd_warp), 
  .sdram_read_request(sdram_read_request_warp), 
  .rd_addr_valid(rd_addr_valid_warp), 
  .warp_done(warp_done), 
  //Debug Signals 
  .row_data_count(warp_row_count), 
  .col_data_count(warp_col_count), 
  .dt_data_count(warp_dt_count), 
  .lut_rd(lut_rd), 
  .img_rd(img_rd), 
  .img_wr(img_wr), 
  .dly_img_wr(dly_img_wr), 
  .dt_empty(dt_empty), 
  .dt_going_empty(dt_going_empty), 
  .dt_full(dt_full), 
  .rc_empty(rc_empty), 
  .rc_full(rc_full), 
  .rc_fifo_en(rc_fifo_en), 
  .data_en_fifo(data_en_fifo), 
  .max_lut_read_requests(max_lut_read_requests), 
 
 .dly_max_lut_read_requests(dly_max_lut_read_requests), 
  .requests_pending(requests_pending), 
  .max_img_write_requests(max_img_write_requests), 
 
 .dly_max_img_write_requests(dly_max_img_write_requests), 
  .reset_cntr(reset_cntr), 
  .sd_row_mem(sd_row_mem), 
  .sd_col_mem(sd_col_mem), 
  .lut_requests(lut_requests), 
  .img_rd_requests(img_rd_requests), 
  .img_wr_requests(img_wr_requests), 
  .sd_row_lut(sd_row_lut), 
  .sd_row_img(sd_row_img), 
  .sd_col_lut(sd_col_lut), 
  .sd_col_img(sd_col_img), 
  .almost_full(rc_almost_full), 
  .sd_row_fifo(sd_row_fifo), 
  .sd_col_fifo(sd_col_fifo), 
  .lut_count_change(lut_count_change), 
  .img_wr_count_change(img_wr_count_change), 
  .img_rd_count_change(img_rd_count_change), 
  .dt_rd_err(dt_rd_err), 
     .dt_wr_err(dt_wr_err), 
  .row_out(row_out), 
  .col_out(col_out), 
 88
 
  .warp_state(warp_state) 
  ); 
 
 
/*********************************************************************/ 
//DVI SCAN OUT AND MEMORY READ ADDRESS GENERATION 
/*********************************************************************/ 
wire [3:0] fifocount_out; 
Dvi_Scan_Out_Top SCAN_OUT_TOP ( 
  //inputs 
  .clk(clk), 
  .reset(~scanout_enable),  
      
 .data_from_sdram(data_from_sdram_rd), 
  .mem_rd_en(mem_rd_en_scanout), 
  .mem_rd_valid(mem_rd_valid && (~track_data)), 
  .addr_select(scan_addr_select), 
  //outputs 
  .hsync_n(hsync_n),  
  .vsync_n(vsync_n), 
  .red(red), 
  .green(green), 
  .blue(blue), 
  //outputs to sdram 
  .sd_row_rd(sd_row_rd_scanout), //14bits 
  .sd_col_rd(sd_col_rd_scanout), 
  .sd_data_rd(sd_data_rd_scanout), 
  .dly_mem_rd_en(rd_addr_valid_scanout), 
  .scanout_frame_done(scanout_frame_done), 
  //debug signals 
  .mem_read_request(sdram_read_request_scanout), 
  .datarequest(datarequest), 
  .fifocount_out(fifocount_out), 
  .CountX(CountX), 
  .CountY(CountY), 
  .full_out(full_out), 
  .empty_out(empty_out), 
  .wr_err_out(wr_err_out), 
  .rd_err_out(rd_err_out), 
  .row_count(scanout_row_count), 
  .col_count(scanout_col_count), 
  .dt_count(scanout_dt_count), 
  .rcbd_full(scanout_rcbd_full), 
  .rcbd_empty(scanout_rcbd_empty), 
  .sync_error(scanout_sync_error), 
  .almost_full(scanout_almost_full) 
  ); 
 
/*********************************************************************/ 
// HUGE MUXES TO SELECT PIXEL FRAME CAPTURE/LUT STORAGE/WARP AND 
WARP/SCANOUT 
// VALUES THAT ARE NEEDED BY THE MEMORY CONTROLLER 
/*********************************************************************/ 
always @ (posedge clk) 
if(reset_init) 
begin 
 sdram_write_request <= 0; 
 89
 
 sdram_read_request <= 0; 
 wr_addr_valid  <= 0; 
 rd_addr_valid  <= 0; 
 sd_row_wr[15:0]  <= 0; 
 sd_col_wr[8:0]  <= 0; 
 data_to_sdram_wr[15:0]  <= 0; 
 sd_row_rd[15:0]  <= 0; 
 sd_col_rd[8:0]  <= 0; 
end 
else 
begin 
 case({image_warp, image_store, lut_store}) 
 3'b000:    begin 
      
 sdram_write_request <= 0; 
      
 sdram_read_request <= 0; 
      
 wr_addr_valid  <= 0; 
      
 rd_addr_valid  <= 0; 
      
 sd_row_wr[15:0]  <= 0; 
      
 sd_col_wr[8:0]  <= 0; 
      
 data_to_sdram_wr[15:0]  <= 0; 
      
 sd_row_rd[15:0]  <= 0; 
      
 sd_col_rd[8:0]  <= 0; 
      
 end 
 3'b001:    begin 
sdram_write_request <= 
sdram_write_request_lut
; 
      
 sdram_read_request <= 0; 
      
 wr_addr_valid  <=  
       
 wr_addr_valid_lut; 
      
 rd_addr_valid  <= 0; 
      
 sd_row_wr[15:0]  <=  
sd_row_wr_lut[15:0]; 
      
 sd_col_wr[8:0]  <=  
sd_col_wr_lut[8:0]; 
      
 data_to_sdram_wr[15:0]  <=  
data_to_sdram_wr_lut[15
:0]; 
      
 sd_row_rd[15:0]  <= 0; 
 90
 
      
 sd_col_rd[8:0]  <= 0; 
      
 end 
  
 3'b010:    begin 
sdram_write_request <= 
sdram_write_reque
st_ff; 
      
 sdram_read_request <= 0; 
      
 wr_addr_valid  <=  
wr_addr_val
id_ff; 
      
 rd_addr_valid  <= 0; 
      
 sd_row_wr[15:0]  <=  
sd_row_wr_ff[15:0
]; 
      
 sd_col_wr[8:0]  <=  
sd_col_wr_ff[8:0]
; 
      
 data_to_sdram_wr[15:0]  <=  
data_to_sdram_wr_ff[15:
0]; 
      
 sd_row_rd[15:0]  <= 0; 
      
 sd_col_rd[8:0]  <= 0; 
      
 end 
 
 3'b100:    begin 
      
 sdram_write_request <=  
sdram_write_request_war
p; 
      
 sdram_read_request <=  
sdram_read_request_warp
; 
      
 wr_addr_valid  <=  
wr_addr_valid_war
p; 
      
 rd_addr_valid  <=  
rd_addr_valid_war
p; 
      
 sd_row_wr[15:0]  <=  
sd_row_wr_warp[15
:0]; 
 91
 
      
 sd_col_wr[8:0]  <=  
sd_col_wr_warp[8:
0]; 
      
 data_to_sdram_wr[15:0]  <=  
data_to_sdram_wr_warp[1
5:0]; 
      
 sd_row_rd[15:0]  <=  
sd_row_rd_warp[15
:0]; 
      
 sd_col_rd[8:0]  <=  
sd_col_rd_warp[8:
0]; 
      
 end 
 
 default:    begin 
      
 sdram_write_request <= 0; 
      
 sdram_read_request <= 0; 
      
 wr_addr_valid  <= 0; 
      
 rd_addr_valid  <= 0; 
      
 sd_row_wr[15:0]  <= 0; 
      
 sd_col_wr[8:0]  <= 0; 
      
 data_to_sdram_wr[15:0]  <= 0; 
      
 sd_row_rd[15:0]  <= 0; 
      
 sd_col_rd[8:0]  <= 0; 
      
 end 
 
 endcase 
end 
  
/*********************************************************************/ 
// HUGE MUXES TO SELECT VALUES OUTPUTTED BY THE MEMORY CONTROLLER 
/*********************************************************************/ 
assign mem_wr_issue_lut = mem_wr_issue && lut_store; 
assign mem_wr_issue_ff = mem_wr_issue && image_store; 
assign mem_wr_issue_warp = mem_wr_issue && image_warp; 
assign mem_rd_issue_warp = mem_rd_issue && image_warp; 
 
 
endmodule 
 
 
CLK_GEN module: 
 92
 
 
`timescale 1ns / 1ps 
/*/////////////////////////////////////////////////////////////////////
/// 
//Generates clocks of 25Mhz from 50Mhz input clock 
Use the Spartan 3 DCM to generate requires clocks. Initially used the 
25 and 100 MHz clocks. Now only uses the 25Mhz Clock. 
///////////////////////////////////////////////////////////////////////
/*/ 
 
module dcm_clkgen(CLKIN,clk25_int,clk100_int,clk100_ext,clk25_ext, 
locked); 
 
input CLKIN; 
output clk25_int; 
output clk100_int; 
output clk100_ext; 
output clk25_ext; 
output locked; 
 
wire CLK0, CLKFB; 
wire CLKDV_w, CLK2X_w, CLK2X180_w; 
wire clk25_int,clk100_int,clk100_ext; 
wire logic0; 
 
assign logic0 = 1'b0; 
 
//***********INSTANTIATIONS BEGIN******************** 
 DCM #( 
      .CLKDV_DIVIDE(2.0),  
 // Divide by: 1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0,5.5,6.0,6.5 
       //   7.0,7.5,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0 or 16.0 
      .CLKFX_DIVIDE(1),    // Can be any interger from 1 to 32 
      .CLKFX_MULTIPLY(2),  // Can be any integer from 2 to 32 
      .CLKIN_DIVIDE_BY_2("FALSE"), // TRUE/FALSE to enable CLKIN divide 
by two feature 
      .CLKIN_PERIOD(0.00000002),  // Specify period of input clock 
      .CLKOUT_PHASE_SHIFT("NONE"), // Specify phase shift of NONE, 
FIXED or VARIABLE 
      .CLK_FEEDBACK("1X"),  // Specify clock feedback of NONE, 1X or 2X 
      .DESKEW_ADJUST("SYSTEM_SYNCHRONOUS"), // SOURCE_SYNCHRONOUS, 
SYSTEM_SYNCHRONOUS or 
                                            //   an integer from 0 to 
15 
      .DFS_FREQUENCY_MODE("LOW"),  // HIGH or LOW frequency mode for 
frequency synthesis 
      .DLL_FREQUENCY_MODE("LOW"),  // HIGH or LOW frequency mode for 
DLL 
      .DUTY_CYCLE_CORRECTION("TRUE"), // Duty cycle correction, TRUE or 
FALSE 
      .FACTORY_JF(16'hC080),   // FACTORY JF values 
      .PHASE_SHIFT(0),     // Amount of fixed phase shift from -255 to 
255 
      .STARTUP_WAIT("TRUE")   // Delay configuration DONE until DCM 
LOCK, TRUE/FALSE 
   ) DCM1 ( 
      .CLK0(CLK0),     // 0 degree DCM CLK output 
 93
 
      .CLK180(CLKDV180_w), // 180 degree DCM CLK output 
      .CLK270(), // 270 degree DCM CLK output 
      .CLK2X(CLK2X_w),   // 2X DCM CLK output 
      .CLK2X180(CLK2X180_w), // 2X, 180 degree DCM CLK out 
      .CLK90(),   // 90 degree DCM CLK output 
      .CLKDV(CLKDV_w),   // Divided DCM CLK out (CLKDV_DIVIDE) 
      .CLKFX(),   // DCM CLK synthesis out (M/D) 
      .CLKFX180(), // 180 degree CLK synthesis out 
      .LOCKED(locked), // DCM LOCK status output 
      .PSDONE(), // Dynamic phase adjust done output 
      .STATUS(), // 8-bit DCM status bits output 
      .CLKFB(CLKFB),   // DCM clock feedback 
      .CLKIN(CLKIN),   // Clock input (from IBUFG, BUFG or DCM) 
      .PSCLK(),   // Dynamic phase adjust clock input 
      .PSEN(),     // Dynamic phase adjust enable input 
      .PSINCDEC(), // Dynamic phase adjust increment/decrement 
      .RST(logic0)        // DCM asynchronous reset input 
   ); 
 BUFG   dcm1fbbuf (.I(CLK0),  .O(CLKFB)); 
 BUFG   int100buf (.I(CLKDV_w), .O(clk100_int)); 
 BUFG   int25buf (.I(CLKDV_w), .O(clk25_int)); 
 OBUF   out100buf (.I(CLKDV_w), .O(clk100_ext)); 
 OBUF   out25buf (.I(CLKDV_w), .O(clk25_ext)); 
   
 //***********INSTANTIATIONS END********************** 
 
endmodule 
 
Memory_Controller module: 
 
`timescale 1ns / 1ps 
module Memory_Controller(  
    clk, reset, reset_mod, 
reset_ref,      
  sdram_write_request, sdram_read_request,  
     
 data_to_sdram_wr, wr_addr_valid, rd_addr_valid, 
    sd_col_wr, sd_col_rd, 
sd_row_wr, sd_row_rd, 
    //sdram connections 
    sd_cke, sd_cs_n, sd_dqm, 
command, 
    addr, baddr, data, 
mem_wr_issue, mem_rd_issue,     
  rd_valid, data_from_sdram_rd, rd_en_scanout, 
page_en, 
    //Debug Signals 
    cmd_full, cmd_empty, rd_cmd, 
wr_cmd, cmd_avail,     
 out_row, out_col, page_hit, hit, fifo_wr_en, page_eq, 
    dly2_row, dly2_col, 
dly_cmd_avail, state,      
  fifo_empty_n, init_done, cmd_eq, page_valid, 
rd_en,      in_row_mux, 
in_col_mux, sync_error, row_data_count,   
   col_data_count, dt_data_count, 
almost_full,      
 94
 
 scanout_rcbd_empty, sd_row_rd_scanout,   
    
 sd_col_rd_scanout, sd_data_rd_scanout,   
    
 rd_addr_valid_scanout, scanout_rcbd_full, 
    out_row_misc, out_col_misc 
    ); 
 
// inputs 
input clk; 
input reset; 
input reset_mod; 
input reset_ref; 
input sdram_write_request; 
input sdram_read_request; 
input [15:0] data_to_sdram_wr; 
input wr_addr_valid; 
input rd_addr_valid; 
input [15:0] sd_row_rd; 
input [8:0] sd_col_rd; 
input [15:0] sd_row_wr; 
input [8:0] sd_col_wr; 
input scanout_rcbd_empty; 
input scanout_rcbd_full; 
input [15:0] sd_row_rd_scanout; 
input [8:0] sd_col_rd_scanout; 
input [15:0] sd_data_rd_scanout; 
input rd_addr_valid_scanout; 
// outputs 
 
// sdram connections 
output sd_cke, sd_cs_n; 
output [1:0] sd_dqm; 
output [2:0] command; 
output [12:0] addr; 
output [1:0]  baddr; 
inout [15:0] data; 
output mem_wr_issue, mem_rd_issue; 
output [15:0] data_from_sdram_rd; 
output rd_valid; 
output page_en; 
 
//Debug Signals 
output cmd_full, cmd_empty; 
output rd_cmd, wr_cmd; 
output cmd_avail; 
output [15:0] out_row; 
output [8:0] out_col; 
output page_hit; 
output hit; 
output fifo_wr_en; 
output page_eq; 
output [15:0] dly2_row; 
output [8:0] dly2_col; 
output dly_cmd_avail; 
output [13:1] state; 
output fifo_empty_n; 
 95
 
output init_done; 
output cmd_eq; 
output page_valid; 
output rd_en; 
output [15:0] in_row_mux; 
output [8:0] in_col_mux; 
output sync_error; 
output [2:0] row_data_count; 
output [2:0] col_data_count; 
output [2:0] dt_data_count; 
output almost_full; 
output rd_en_scanout; 
output [15:0] out_row_misc; 
output [8:0] out_col_misc; 
 
 
wire logic0, logic1; 
assign logic0 = 1'b0; 
assign logic1 = 1'b1; 
 
// hardcode cke, cs_n, and dqm 
assign sd_cke = logic1; 
assign sd_cs_n = logic0; 
assign sd_dqm = {(~init_done),(~init_done)}; 
 
//wires and reg 
wire [15:0] sd_row_wr, sd_row_rd; 
wire [8:0] sd_col_wr, sd_col_rd; 
wire [15:0] data_to_sdram_wr; 
wire [15:0] data_from_sdram_rd; 
 
 
 
sdram_control MAIN_CONTROLLER( 
   .clk(clk), 
  .reset(reset), 
  .reset_mod(reset_mod), 
  .reset_ref(reset_ref), 
  .sdram_write_request(sdram_write_request), 
  .sdram_read_request(sdram_read_request), 
  //mem read and write address & data 
  .sd_row_wr(sd_row_wr), 
  .sd_col_wr(sd_col_wr), 
  .data_to_sdram_wr(data_to_sdram_wr), 
  .sd_row_rd(sd_row_rd), 
  .sd_col_rd(sd_col_rd), 
  .wr_addr_valid(wr_addr_valid), 
  .rd_addr_valid(rd_addr_valid), 
  .scanout_rcbd_empty(scanout_rcbd_empty), 
  .scanout_rcbd_full(scanout_rcbd_full), 
  .sd_row_rd_scanout(sd_row_rd_scanout), 
  .sd_col_rd_scanout(sd_col_rd_scanout), 
  .sd_data_rd_scanout(sd_data_rd_scanout), 
  .rd_addr_valid_scanout(rd_addr_valid_scanout), 
  //outputs 
  .command(command), 
  .addr(addr), 
 96
 
  .baddr(baddr), 
  .data(data), 
  .mem_wr_issue(mem_wr_issue), 
  .mem_rd_issue(mem_rd_issue), 
  .rd_valid(rd_valid), 
  .rddata_out(data_from_sdram_rd), 
  .page_en(page_en), 
  //Debug Signals 
  .cmd_full(cmd_full), 
  .cmd_empty(cmd_empty), 
  .rd_cmd(rd_cmd), 
  .wr_cmd(wr_cmd), 
  .cmd_avail(cmd_avail), 
  .out_row(out_row), 
  .out_col(out_col), 
  .out_row_misc(out_row_misc[15:0]), 
  .out_col_misc(out_col_misc[8:0]), 
  .page_hit(page_hit), 
  .hit(hit), 
  .fifo_wr_en(fifo_wr_en), 
  .page_eq(page_eq), 
  .dly2_row(dly2_row), 
  .dly2_col(dly2_col), 
  .dly_cmd_avail(dly_cmd_avail), 
  .state(state), 
  .fifo_empty_n(fifo_empty_n), 
  .init_done(init_done), 
  .cmd_eq(cmd_eq), 
  .page_valid(page_valid), 
  .rd_en(rd_en), 
  .rd_en_scanout(rd_en_scanout), 
  .sync_error(sync_error), 
  .in_row_mux(in_row_mux), 
  .in_col_mux(in_col_mux), 
  .row_data_count(row_data_count), 
  .col_data_count(col_data_count), 
  .dt_data_count(dt_data_count), 
  .almost_full(almost_full) 
  ); 
      
 
endmodule 
 
 
sdram_control module: 
 
 
`timescale 1ns / 1ps 
module sdram_control(//inputs 
      
 clk, reset, reset_mod, reset_ref, sdram_write_request,  
      
 sdram_read_request, data_to_sdram_wr, sd_row_wr, 
      
 sd_col_wr, sd_row_rd, sd_col_rd, wr_addr_valid, 
 97
 
      
 rd_addr_valid, scanout_rcbd_full, 
      
 //outputs 
      
 command, addr, baddr, data,rd_en_scanout, page_en, 
      
 mem_wr_issue, mem_rd_issue, rd_valid, rddata_out, 
       
      
 //Debug Signal 
      
 cmd_full, cmd_empty, rd_cmd, wr_cmd, cmd_avail, out_row, 
      
 out_col, page_hit, hit, fifo_wr_en, page_eq, 
      
 dly2_row, dly2_col, dly_cmd_avail, state, fifo_empty_n, 
      
 init_done, cmd_eq, page_valid, rd_en, in_row_mux, in_col_mux, 
      
 sync_error, row_data_count, col_data_count, dt_data_count, 
      
 almost_full, scanout_rcbd_empty, sd_row_rd_scanout, 
sd_col_rd_scanout, 
      
 sd_data_rd_scanout, rd_addr_valid_scanout, out_row_misc, 
out_col_misc 
      
 ); 
 
//inputs 
input clk; 
input reset; 
input reset_mod; 
input reset_ref; 
 
//mem write and read address 
input sdram_write_request; 
input sdram_read_request; 
input [15:0] sd_row_rd; 
input [8:0] sd_col_rd; 
input [15:0] sd_row_wr; 
input [8:0] sd_col_wr; 
input [15:0] data_to_sdram_wr; 
input wr_addr_valid; 
input rd_addr_valid; 
input scanout_rcbd_empty; 
input [15:0] sd_row_rd_scanout; 
input [8:0] sd_col_rd_scanout; 
input [15:0] sd_data_rd_scanout; 
input rd_addr_valid_scanout; 
input scanout_rcbd_full; 
 
//outputs to sdram 
output [2:0] command; 
output [12:0] addr; 
output [1:0]  baddr; 
 98
 
inout [15:0] data; 
 
//outputs 
output mem_wr_issue; 
output mem_rd_issue; 
output rd_valid; 
output [15:0] rddata_out; 
output page_en; 
 
//Debug Signals 
output cmd_full, cmd_empty; 
output rd_cmd, wr_cmd; 
output cmd_avail; 
output [15:0] out_row; 
output [8:0] out_col; 
output page_hit; 
output hit; 
output fifo_wr_en; 
output page_eq; 
output [15:0] dly2_row; 
output [8:0] dly2_col; 
output dly_cmd_avail; 
output [13:1] state; 
output fifo_empty_n; 
output init_done; 
output cmd_eq; 
output page_valid; 
output rd_en; 
output [15:0] in_row_mux; 
output [8:0] in_col_mux; 
output sync_error; 
output [2:0] row_data_count; 
output [2:0] col_data_count; 
output [2:0] dt_data_count; 
output almost_full; 
output rd_en_scanout; 
output [15:0] out_row_misc; 
output [8:0] out_col_misc; 
 
 
// these are intended to be IOB registers 
reg [15:0] rddata_in; 
reg [2:0]  command; 
reg [12:0] addr; 
reg [1:0]  baddr; 
reg [15:0] wrdata_out; 
//synthesis attribute IOB rddata_in "true" 
//synthesis attribute IOB command    "true" 
//synthesis attribute IOB addr       "true" 
//synthesis attribute IOB baddr      "true" 
//synthesis attribute IOB wrdata_out "true" 
 
//reg and wires 
wire [15:0] data_to_sdram_wr; 
wire [2:0] sd_cmd; 
wire [1:0] sd_baddr; 
wire [15:0] sd_wrdata_out; 
 99
 
wire [12:0] sd_addr; 
wire [2:0] init_sd_cmd; 
reg rd_valid; 
reg [15:0] rddata_out; 
 
//0 and 1 
wire logic0; 
assign logic0 = 1'b0; 
 
//Register initdone for the dqm signal in the Memory_Controller module 
wire last_init_done; 
reg init_done1, init_done; 
always @ (posedge clk) 
 if(reset) 
  begin 
  init_done1 <= 0; 
  init_done  <= 0; 
  end 
 else 
  begin 
  init_done1 <= last_init_done; 
  init_done  <= init_done1; 
  end   
 
// Sdram Power up sequence.  
sdram_init SDRAM_STARTUP( 
  .clk(clk), 
  .reset(reset), 
  //outputs 
  .sdram_rdy(last_init_done), 
  .ras_n(init_sd_cmd[2]), 
  .cas_n(init_sd_cmd[1]), 
  .we_n(init_sd_cmd[0]), 
  .a5(a5_init), 
  .a10(a10_init), 
  .refdone(refresh_done), //backup 
  .ref_cnt(refresh_count) //backup 
  ); 
 
//Sdram pipelined core controller.  
core_control CORE_UNIT( 
  //inputs 
  .clk(clk), 
  .reset(reset_mod), //send to mem should be reached. 
init shud be done 
  .reset_ref(reset_ref), 
  .sdram_write_request(sdram_write_request), 
  .sdram_read_request(sdram_read_request), 
  //mem read and write address & data 
  .sd_row_wr(sd_row_wr[15:0]), 
  .sd_col_wr(sd_col_wr[8:0]), 
  .data_to_sdram(data_to_sdram_wr[15:0]), 
  .sd_row_rd(sd_row_rd[15:0]), 
  .sd_col_rd(sd_col_rd[8:0]), 
  .wr_addr_valid(wr_addr_valid), 
  .rd_addr_valid(rd_addr_valid), 
  .scanout_rcbd_empty(scanout_rcbd_empty), 
 100
 
  .scanout_rcbd_full(scanout_rcbd_full), 
  .sd_row_rd_scanout(sd_row_rd_scanout), 
  .sd_col_rd_scanout(sd_col_rd_scanout), 
  .sd_data_rd_scanout(sd_data_rd_scanout), 
  .rd_addr_valid_scanout(rd_addr_valid_scanout), 
  //outputs 
  .wr_cmd_issue(mem_wr_issue), 
  .rd_cmd_issue(mem_rd_issue), 
  .sync_error(sync_error), 
  .addr(sd_addr[12:0]), 
  .baddr(sd_baddr[1:0]), 
  .command(sd_cmd[2:0]),  
  .wrdata_out(sd_wrdata_out[15:0]), 
  //Debug Signals 
  .full(cmd_full), 
  .empty(cmd_empty), 
  .rd_cmd(rd_cmd), 
  .wr_cmd(wr_cmd), 
  .cmd_avail(cmd_avail), 
  .out_row(out_row[15:0]), 
  .out_col(out_col[8:0]), 
  .out_row_misc(out_row_misc[15:0]), 
  .out_col_misc(out_col_misc[8:0]), 
  .page_hit(page_hit), 
  .hit(hit), 
  .fifo_wr_en(fifo_wr_en), 
  .page_eq(page_eq), 
  .page_en_reg(page_en), 
  .dly2_row(dly2_row[15:0]), 
  .dly2_col(dly2_col[8:0]), 
  .dly_cmd_avail(dly_cmd_avail), 
  .state(state), 
  .fifo_empty_n(fifo_empty_n), 
  .cmd_eq(cmd_eq), 
  .page_valid(page_valid), 
  .rd_en_misc(rd_en), 
  .rd_en_scanout(rd_en_scanout), 
  .in_row_mux(in_row_mux[15:0]), 
  .in_col_mux(in_col_mux[8:0]), 
  .row_data_count(row_data_count[2:0]), 
  .col_data_count(col_data_count[2:0]), 
  .dt_data_count(dt_data_count[2:0]), 
  .almost_full(almost_full) 
  ); 
 
 
//*** 
  // Bidirectional I/O for Data Bus 
  //*** 
 
  // from Xilinx Answer Record #11658, to get the synthesizer  
  // to use the IOB tristate enable register, create one flip-flop  
  // and place the IOB attribute on it (may also have to turn on  
  // "Pack I/O Registers into IOBs" synthesis option 
  reg reg_oe_n; 
  //synthesis attribute IOB reg_oe_n "true" 
 
 101
 
  // iobufs 
  wire [15:0] rddata_pins; 
  my_iobuf16 
SD_DATA_BUS(.I(wrdata_out),.T(reg_oe_n),.O(rddata_pins),.IO(data)); 
 
 
// logic to determine when incoming read data is valid 
  reg q_rd_cmd1, q_rd_cmd2, q_rd_cmd3, rd_data_val; 
  wire read_cmd = (sd_cmd == 3'b101); 
  always @(posedge clk) 
    if (reset) 
      begin 
        q_rd_cmd1 <= 0; 
        q_rd_cmd2 <= 0; 
        q_rd_cmd3 <= 0; 
        rd_data_val <= 0; 
      end 
    else 
      begin 
        q_rd_cmd1 <= read_cmd; 
        q_rd_cmd2 <= q_rd_cmd1; 
        q_rd_cmd3 <= q_rd_cmd2; 
        rd_data_val <= q_rd_cmd3; 
      end 
 
  // clock in read data 
  always @(posedge clk) 
    if (reset) rddata_in <= 0; 
    else if (q_rd_cmd3) rddata_in[15:0] <= rddata_pins[15:0]; 
 
  // only drive data bus during write cycles 
  wire oe_n = ~(sd_cmd == 3'b100); 
  reg [15:0] wrdata1_out; 
  reg reg1_oe_n; 
  always @(posedge clk) 
    if (reset) 
      begin 
        wrdata1_out <= 0; 
        reg1_oe_n <= 1; 
      end 
    else  
      begin 
        wrdata1_out[15:0] <= sd_wrdata_out[15:0]; 
        reg1_oe_n <= oe_n; 
      end 
 
  // send off chip on negative edge 
  always @(negedge clk) 
    if (reset) 
      begin 
        wrdata_out <= 0; 
        reg_oe_n <= 1; 
      end 
    else  
      begin 
        wrdata_out[15:0] <= wrdata1_out[15:0]; 
        reg_oe_n <= reg1_oe_n; 
 102
 
      end 
 
//OUTPUTS      
  
//command, addr and data mux 
wire [2:0] cmd_mux; 
wire [15:0] rddata_mux; 
wire [12:0] addr_mux; 
wire [1:0] baddr_mux; 
 
//Mux addr, command and baddr. If sdram initialized then select the 
sdram core command. Otherwise select the sdram initialization commands 
assign addr_mux[12:0] = (init_done) ? sd_addr[12:0] : 
{logic0,logic0,a10_init,logic0,logic0,logic0,logic0,a5_init,logic0,logi
c0,logic0,logic0,logic0}; 
assign baddr_mux[1:0] = (init_done) ? sd_baddr[1:0] : 2'b00; 
assign cmd_mux[2:0] = (init_done) ? sd_cmd[2:0] : init_sd_cmd[2:0]; 
 
 // register the read data mux 
  always @(posedge clk) 
    if (reset) 
      begin 
        rd_valid <= 0; 
        rddata_out <= 0; 
      end 
    else 
      begin 
        rd_valid <= rd_data_val; 
        rddata_out[15:0] <= rddata_in[15:0]; 
      end 
 
// register the other mux outputs 
  reg [2:0] command1; 
  reg [12:0] addr1; 
  reg [1:0]  baddr1; 
  always @(negedge clk) 
    if (reset) 
      begin 
        command1 <= 3'b111;   // nop 
        addr1 <= 0; 
        baddr1 <= 0; 
      end 
    else 
      begin 
        command1[2:0] <= cmd_mux[2:0]; 
        addr1[12:0] <= addr_mux[12:0]; 
        baddr1[1:0] <= baddr_mux[1:0]; 
      end 
 
  // send off chip on negative edge 
  always @(negedge clk) 
    if (reset) 
      begin 
        command <= 3'b111;   // nop 
        addr <= 0; 
        baddr <= 0; 
      end 
 103
 
    else 
      begin 
        command[2:0] <= command1[2:0]; 
        addr[12:0] <= addr1[12:0]; 
        baddr[1:0] <= baddr1[1:0]; 
      end 
 
endmodule 
 
sdram init module 
 
`timescale 1ns/10ps 
 
`define  INIT_NOP_IDLE  1  // start here 
`define  INIT_PRE_ALL   2  // precharge all 
`define  INIT_NOP_RP1   3 
`define  INIT_NOP_RP2   4 
`define  INIT_NOP_RP3   5 
`define  INIT_REFRESH   6  // refresh command 
`define  INIT_NOP_WAIT  7 
`define  INIT_LOAD      8  // load mode register command 
`define  INIT_NOP_MRD1  9 
`define  INIT_NOP_MRD2  10 
`define  INIT_NOP_TRAP  11 
 
module 
sdram_init(clk,reset,sdram_rdy,ras_n,cas_n,we_n,a5,a10,refdone,ref_cnt); 
  input clk, reset; 
  output sdram_rdy; 
  output ras_n, cas_n, we_n; 
  output a5, a10; 
  output refdone; 
  output [10:0] ref_cnt; 
 
  reg [11:1]   state, next_state; 
 
  wire  init_pre_all = state[`INIT_PRE_ALL]; 
  wire  init_refresh = state[`INIT_REFRESH]; 
  wire  init_load = state[`INIT_LOAD]; 
  wire  init_nop_trap = state[`INIT_NOP_TRAP]; 
 
  // state machine outputs 
  assign sdram_rdy = init_nop_trap; 
  assign ras_n = ~(init_load || init_pre_all || init_refresh); 
  assign cas_n = ~(init_load || init_refresh); 
  assign we_n = ~(init_load || init_pre_all); 
  assign a5 = init_load; 
  assign a10 = init_pre_all; 
    
  wire waitdone; 
 
  // reference counter 
  reg [4:0] ref_cnt; 
  wire refdone; 
  assign refdone = ref_cnt[4]; 
  always @ (posedge clk) 
   if (reset) ref_cnt = 0; 
 104
 
 else begin 
  if (refdone) ref_cnt = 0; 
  else ref_cnt = ref_cnt + 1; 
  end 
 
  // wait counter 
  reg [10:0] wait_cnt; 
  assign waitdone = (wait_cnt == 11'b10011101111); 
  always @(posedge clk) 
    if (reset) wait_cnt <= 0; 
    else if (refdone) wait_cnt <= wait_cnt + 1;  
  else if (waitdone) wait_cnt <= 0; 
 
  //assign wait_done1 = wait_cnt[3]; 
  assign ref_wait_done = wait_cnt[0]; 
 
  // state_intialization 
  always @ (posedge clk or posedge reset) 
    if (reset)   state <= 11'b00000000001; 
    else         state <= next_state; 
 
 
  // state transitions 
  always @ (state or reset or refdone or ref_wait_done or waitdone or 
refdone) 
    begin 
      // has default values for outputs so synthesis tool doesn't infer 
latches 
      next_state = 11'b00000000000; 
       
      casex (1'b1) 
        state[`INIT_NOP_IDLE]: 
          if (!waitdone)                 next_state[`INIT_NOP_IDLE] = 1; 
          else                           next_state[`INIT_PRE_ALL] = 1; 
         
        state[`INIT_PRE_ALL]:            next_state[`INIT_NOP_RP1] = 1; 
 
    state[`INIT_NOP_RP1]:            
next_state[`INIT_NOP_RP2] = 1; 
 
    state[`INIT_NOP_RP2]:            
next_state[`INIT_NOP_RP3] = 1; 
 
        state[`INIT_NOP_RP3]:            next_state[`INIT_REFRESH] = 1; 
             
        state[`INIT_REFRESH]:            next_state[`INIT_NOP_WAIT] = 1; 
 
        state[`INIT_NOP_WAIT]: 
          if (!refdone)                  next_state[`INIT_NOP_WAIT] 
= 1; 
          else if (!ref_wait_done)       next_state[`INIT_REFRESH] = 1; 
          else                           next_state[`INIT_LOAD] = 1; 
         
        state[`INIT_LOAD]:               next_state[`INIT_NOP_MRD1] = 1; 
 
        state[`INIT_NOP_MRD1]:           next_state[`INIT_NOP_MRD2] = 1; 
 
 105
 
        state[`INIT_NOP_MRD2]:           next_state[`INIT_NOP_TRAP] = 1; 
 
        state[`INIT_NOP_TRAP]:           next_state[`INIT_NOP_TRAP] = 1; 
 
        default:                         next_state[`INIT_NOP_IDLE] = 1; 
      endcase 
    end 
 
endmodule 
 
my_iobuf16  module: 
 
`timescale 1ns/10ps 
 
module my_iobuf16(I,T,O,IO); 
  input T; 
  input [15:0] I; 
  output [15:0] O; 
  inout [15:0] IO; 
 
  IOBUF iobuf15(.I(I[15]), .O(O[15]), .T(T), .IO(IO[15])); 
  IOBUF iobuf14(.I(I[14]), .O(O[14]), .T(T), .IO(IO[14])); 
  IOBUF iobuf13(.I(I[13]), .O(O[13]), .T(T), .IO(IO[13])); 
  IOBUF iobuf12(.I(I[12]), .O(O[12]), .T(T), .IO(IO[12])); 
  IOBUF iobuf11(.I(I[11]), .O(O[11]), .T(T), .IO(IO[11])); 
  IOBUF iobuf10(.I(I[10]), .O(O[10]), .T(T), .IO(IO[10])); 
  IOBUF iobuf09(.I(I[9]),  .O(O[9]),  .T(T), .IO(IO[9])); 
  IOBUF iobuf08(.I(I[8]),  .O(O[8]),  .T(T), .IO(IO[8])); 
  IOBUF iobuf07(.I(I[7]),  .O(O[7]),  .T(T), .IO(IO[7])); 
  IOBUF iobuf06(.I(I[6]),  .O(O[6]),  .T(T), .IO(IO[6])); 
  IOBUF iobuf05(.I(I[5]),  .O(O[5]),  .T(T), .IO(IO[5])); 
  IOBUF iobuf04(.I(I[4]),  .O(O[4]),  .T(T), .IO(IO[4])); 
  IOBUF iobuf03(.I(I[3]),  .O(O[3]),  .T(T), .IO(IO[3])); 
  IOBUF iobuf02(.I(I[2]),  .O(O[2]),  .T(T), .IO(IO[2])); 
  IOBUF iobuf01(.I(I[1]),  .O(O[1]),  .T(T), .IO(IO[1])); 
  IOBUF iobuf00(.I(I[0]),  .O(O[0]),  .T(T), .IO(IO[0])); 
 
endmodule 
 
core_control module: 
 
`timescale 1ns / 1ps 
module core_control (//inputs 
      
 clk,reset,reset_ref, sdram_read_request, sdram_write_request, 
      
 sd_row_wr, sd_col_wr, wr_addr_valid, 
      
 sd_row_rd, sd_col_rd, rd_addr_valid, 
      
 data_to_sdram, 
      
 //outputs 
      
 wr_cmd_issue, rd_cmd_issue, sync_error, rd_en_scanout, 
      
 addr, baddr, command, wrdata_out, page_en_reg, 
 106
 
      
 //Debug Signals 
      
 full,empty, rd_cmd, wr_cmd, cmd_avail, out_row, 
      
 out_col, page_hit, hit, fifo_wr_en, page_eq, 
      
 dly2_row, dly2_col, dly_cmd_avail, state, fifo_empty_n, 
      
 cmd_eq, page_valid, rd_en_misc, in_row_mux, in_col_mux, 
      
 row_data_count, col_data_count, dt_data_count, 
      
 almost_full, scanout_rcbd_empty, sd_row_rd_scanout, 
      
 sd_col_rd_scanout, sd_data_rd_scanout, rd_addr_valid_scanout, 
      
 scanout_rcbd_full, out_row_misc, out_col_misc 
      
 ); 
      
  
//inputs 
input clk; 
input reset; 
input reset_ref; 
input sdram_write_request; 
input sdram_read_request; 
 
//mem write and read address 
input [15:0] sd_row_rd; 
input [8:0] sd_col_rd; 
input [15:0] sd_row_wr; 
input [8:0] sd_col_wr; 
input [15:0] data_to_sdram; 
input wr_addr_valid; 
input rd_addr_valid; 
input scanout_rcbd_empty; 
input scanout_rcbd_full; 
input [15:0] sd_row_rd_scanout; 
input [8:0] sd_col_rd_scanout; 
input [15:0] sd_data_rd_scanout; 
input rd_addr_valid_scanout; 
 
//output 
output wr_cmd_issue, rd_cmd_issue; 
output sync_error; 
output rd_en_scanout; 
 
//outputs to sdram 
output [12:0] addr; 
output [1:0] baddr; 
output [2:0] command; 
output [15:0] wrdata_out; 
 
//Debug Signals 
output full, empty; 
 107
 
output rd_cmd, wr_cmd; 
output cmd_avail; 
output [15:0] out_row; 
output [8:0] out_col; 
output page_hit; 
output hit; 
output fifo_wr_en; 
output page_eq; 
output [15:0] dly2_row; 
output [8:0] dly2_col; 
output dly_cmd_avail; 
output [13:1] state; 
output fifo_empty_n; 
output cmd_eq; 
output page_valid; 
output rd_en_misc; 
output [15:0] in_row_mux; 
output [8:0] in_col_mux; 
output [2:0] row_data_count; 
output [2:0] col_data_count; 
output [2:0] dt_data_count; 
output almost_full; 
output [15:0] out_row_misc; 
output [8:0] out_col_misc; 
output page_en_reg; 
 
 
 
wire logic0; 
assign logic0 = 1'b0; 
 
//reg and wires 
wire [15:0] in_row_mux; 
wire [8:0] in_col_mux; 
wire [15:0] in_wrdata_mux; 
wire [15:0] out_row_misc; 
wire [8:0] out_col_misc; 
wire [15:0] out_wrdata_misc; 
reg [12:0] addr; 
reg [1:0] baddr; 
reg [2:0] command; 
reg [15:0] wrdata_out; 
wire almost_full; 
wire rd_en; 
reg dly_scanout_rcbd_empty; 
wire page_en; 
reg page_en_reg; 
 
wire full; 
wire [2:0] sd_cmd; 
 
//REFRESH COUNTER 
//Refresh every 15.36 us so that the data is not lost. 
refresh_counter REFRESH ( 
  .clk(clk), 
  .reset(reset_ref), 
  .ack(refresh_ack), 
 108
 
  .refresh(refresh), 
  .tc() // no need to connect 
  ); 
 
//HANDLE DATA READ AND WRITE REQUEST  
//Mux the read and write inputs. Also memory will issue a write or read 
to the misc modules only 
// if the misc memory queue is not full are almost full.  
assign fifo_wr_en = wr_addr_valid || rd_addr_valid; 
assign in_row_mux[15:0] = (wr_addr_valid) ? sd_row_wr[15:0] : 
(rd_addr_valid ? sd_row_rd[15:0] : 16'b1000000000000000); //0's if this 
is not read data 
assign in_col_mux[8:0] = (wr_addr_valid) ? sd_col_wr[8:0] : 
(rd_addr_valid ? sd_col_rd[8:0] : 9'b000000000); 
assign in_wrdata_mux[15:0] = (wr_addr_valid) ? 
data_to_sdram[15:0] : 16'h0000; 
assign wr_cmd_issue = (~full  && (~almost_full)) ? 
sdram_write_request : 1'b0; 
assign rd_cmd_issue  = (~full && (~almost_full)) ? sdram_read_request & 
(~sdram_write_request) : 1'b0; 
 
wire [15:0] out_row; 
wire [8:0] out_col; 
wire [15:0] out_wrdata; 
reg fifo_empty_n; 
//DATA INTO CMD_FIFO 
rcbd_fifo COMMAND_FIFO( 
  //inputs 
  .clk(clk), 
  .reset(reset), 
  .row_in(in_row_mux[15:0]), 
  .col_in(in_col_mux[8:0]), 
  .data_in(in_wrdata_mux[15:0]), 
  .wr_en(fifo_wr_en), 
  .rd_en(rd_en_misc), 
  //outputs 
  .full(full), 
  .empty(empty), 
  .row_out(out_row_misc[15:0]), 
  .col_out(out_col_misc[8:0]), 
  .data_out(out_wrdata_misc[15:0]), 
  .sync_error(sync_error), 
  .rd_ack(rd_ack_misc), 
  .row_data_count(row_data_count[2:0]), 
  .col_data_count(col_data_count[2:0]), 
  .dt_data_count(dt_data_count[2:0]) 
  ); 
assign almost_full = (row_data_count[2] == 1'b1); //Make an almost full 
signal using the data count. 
 
//Now both fifos have data in them.  Read data from one of them.  
//If the scanout request fifo is not empty then read the data from 
there. 
//If the scanout request fifo is empty and the misc fifo has some 
command then read from there. 
assign rd_en_scanout = (~scanout_rcbd_empty) && rd_en; 
assign rd_en_misc = scanout_rcbd_empty && (~empty) && rd_en; 
 109
 
 
//capture page_en 
always @ (posedge clk) 
 if(reset) 
  page_en_reg <= 0; 
 else 
  page_en_reg <= page_en; 
 
//If Fifo not empty then Read/Write Available. Lookahead to see if the 
fifo will become empty. 
always @ (posedge clk) 
 if(reset) 
  fifo_empty_n <= 0; 
 else if(~scanout_rcbd_empty) 
  fifo_empty_n <= 1; 
 else if(rd_en) 
  fifo_empty_n <= ~empty || (~scanout_rcbd_empty); 
 
 
reg dly_scanout_begin; 
always @ (posedge clk) 
 if(reset) 
  dly_scanout_begin <= 0; 
 else if(~scanout_rcbd_empty) 
  dly_scanout_begin <= 1; 
 
 
//COMBINE INTO OUT ROW 
//Out row multiplexes the scanout and the misc data.  
assign out_row[15:0]  = (~scanout_rcbd_empty && dly_scanout_begin) ? 
sd_row_rd_scanout[15:0] : out_row_misc[15:0]; 
assign out_col[8:0]   = (~scanout_rcbd_empty && 
dly_scanout_begin) ? sd_col_rd_scanout[8:0] : out_col_misc[8:0]; 
assign out_wrdata[15:0] = (~scanout_rcbd_empty && dly_scanout_begin) ? 
sd_data_rd_scanout[15:0] : out_wrdata_misc[15:0]; 
 
reg cmd_avail; 
reg wr_cmd, rd_cmd; 
reg dly_wr_cmd, dly_rd_cmd; 
reg [15:0] dly1_row, dly2_row; 
reg [8:0] dly1_col, dly2_col; 
reg [15:0] dly1_wrdata, dly2_wrdata; 
//Register Command 
always @ (posedge clk) 
 if(reset) 
  begin 
  dly1_row <= 0; 
  dly1_col <= 0; 
  dly1_wrdata <= 0; 
  cmd_avail <= 0; 
  wr_cmd <= 0; 
  rd_cmd <= 0; 
  end 
 else if(rd_en) 
  begin 
  wr_cmd <= out_row[15]; 
  rd_cmd <= ~out_row[15]; 
 110
 
  dly1_row[15:0] <= out_row[15:0]; 
  dly1_col[8:0] <= out_col[8:0]; 
  dly1_wrdata[15:0] <= out_wrdata[15:0]; 
  cmd_avail <= fifo_empty_n; 
  end 
 
//PAGE HIT AND VALID LOGIC 
//page_eq and cmd_eq check if the same command is issued to the same 
row. 
reg page_hit, page_valid; 
reg dly_cmd_avail; 
wire page_rst; 
assign page_eq = (out_row[12:0] == dly1_row[12:0]); 
//Check Row 
assign cmd_eq = (out_row[15] == wr_cmd) || (~out_row[15] 
== rd_cmd);   //Check Command 
always @ (posedge clk) 
 if(reset) 
  page_hit   <= 0; 
 else  begin 
  if(!cmd_avail || !fifo_empty_n) 
   page_hit   <= 0; 
  else if(rd_en) //other than first write/read 
commands 
   page_hit   <= 
page_eq && cmd_eq; //same command to same page 
 end 
 
//Page valid indicates every page. If active then new page open. If 
precharge, page closed 
always @ (posedge clk) 
 if(reset) 
  page_valid  <= 0; 
 else begin 
  if(page_rst) page_valid <= 0; 
  else if(page_en) page_valid <= 1; 
 end 
 
//Hit is high only if same command is issued to same row 
assign hit = page_hit & page_valid;  
 
//DELAY HIT AND COMMAND 
reg dly_page_hit; 
always @ (posedge clk) 
 if(reset) 
  begin 
  dly2_row <= 0; 
  dly2_col <= 0; 
  dly2_wrdata <= 0; 
  dly_cmd_avail <= 0; 
  dly_wr_cmd <= 0; 
  dly_rd_cmd <= 0; 
  end 
 else if(rd_en) 
  begin 
  dly_wr_cmd <= wr_cmd; 
  dly_rd_cmd <= rd_cmd; 
 111
 
  dly2_row[15:0] <= dly1_row[15:0]; 
  dly2_col[8:0] <= dly1_col[8:0]; 
  dly2_wrdata[15:0] <= dly1_wrdata[15:0]; 
  dly_cmd_avail <= cmd_avail; 
  dly_page_hit <= page_hit; 
  end 
 
assign dly_hit = dly_page_hit & page_valid; 
 
//State Machine Controller 
sdram_state STATE_MACHINE ( 
   //inputs 
  .clk(clk), 
  .reset(reset), 
  .wr_cmd(wr_cmd), 
  .rd_cmd(rd_cmd), 
  .dly_wr_cmd(dly_wr_cmd), 
  .dly_rd_cmd(dly_rd_cmd), 
  .hit(hit), 
  .dly_cmd_avail(dly_cmd_avail), 
  .cmd_avail(cmd_avail), 
  .refresh(refresh), 
  //outputs 
  .rd_en(rd_en), 
  .page_rst(page_rst), 
  .page_en(page_en), 
  .row_add(row_add),   
  .ras_n(sd_cmd[2]), 
  .cas_n(sd_cmd[1]), 
  .we_n(sd_cmd[0]), 
  .a10(a10), 
  .refresh_ack(refresh_ack), 
  .state(state) 
  ); 
 
//outputs 
wire [12:0] addr_mux; 
assign addr_mux[12:0] = (row_add) ? dly2_row[12:0] : {logic0, logic0, 
a10, logic0, dly2_col[8:0]}; // 
 
//Finally register the address, bank address, command (ras_n, cas_n and 
we_n) and data. 
always @(posedge clk) 
    if (reset) 
      begin 
        command <= 3'b111;   // nop 
        addr <= 0; 
        baddr <= 0; 
        wrdata_out <= 0; 
      end 
    else 
      begin 
        command[2:0] <= sd_cmd; 
        addr[12:0] <= addr_mux[12:0]; 
        baddr[1:0] <= dly2_row[14:13]; 
        wrdata_out[15:0] <= dly2_wrdata[15:0]; 
      end 
 112
 
 
endmodule 
 
 
refresh counter module: 
 
`timescale 1ns/10ps 
 
module refresh_counter(clk,reset,ack,refresh,tc); 
  input clk, reset, ack; 
  output refresh, tc; 
  reg refresh, tc; 
 
  // counter 
  reg [8:0] refresh_cnt; 
  wire refresh_tc = refresh_cnt[8] & refresh_cnt[7]; 
  always @(posedge clk) 
    if (reset) refresh_cnt <= 0; 
    else 
      begin 
        if (refresh_tc) refresh_cnt <= 0; 
        else refresh_cnt <= refresh_cnt + 1; 
      end 
 
  // refresh request output reg 
  wire refresh_req = refresh_cnt[8] & refresh_cnt[7]; 
  always @(posedge clk) 
    if (reset) refresh <= 0; 
    else 
      begin 
        if (ack) refresh <= 0; 
        else if (refresh_req) refresh <= 1; 
      end 
 
  // tc output reg 
  always @(posedge clk) 
    if (reset) tc <= 0; 
    else tc <= refresh_cnt[8] & refresh_cnt[7]; 
 
endmodule 
 
rcbd fifo module: 
 
`timescale 1ns / 1ps 
module rcbd_fifo( //inputs 
      clk, 
reset, row_in, col_in, data_in, wr_en, rd_en, 
     
 //outputs 
      full, 
empty, row_out, col_out, data_out, sync_error, 
     
 row_data_count, col_data_count, dt_data_count, 
      rd_ack 
       ); 
 
//inputs 
 113
 
input clk; 
input reset; 
input [15:0] row_in; 
input [8:0] col_in; 
input [15:0] data_in; 
input wr_en; 
input rd_en; 
//outputs 
output full; 
output empty; 
output [15:0] row_out; 
output [8:0] col_out; 
output [15:0] data_out; 
output rd_ack; 
output sync_error; 
output [2:0] row_data_count; 
output [2:0] col_data_count; 
output [2:0] dt_data_count; 
 
wire [2:0] row_data_count; 
wire [2:0] col_data_count; 
wire [2:0] dt_data_count; 
 
//Row Fifo  
row_fifo_32_14bit ROW_FIFO ( 
    .clk(clk), 
    .sinit(reset), 
    .din(row_in[15:0]), 
    .wr_en(wr_en), 
    .rd_en(rd_en), 
    .dout(row_out[15:0]), 
    .full(full), 
    .empty(empty), 
  .rd_ack(rd_ack), 
    .data_count(row_data_count[2:0])); 
 
 
//Column Fifo 
col_fifo_32_8bit COL_FIFO ( 
    .clk(clk), 
    .sinit(reset), 
    .din(col_in[8:0]), 
    .wr_en(wr_en), 
    .rd_en(rd_en), 
    .dout(col_out[8:0]), 
    .full(), 
    .empty(), 
    .data_count(col_data_count[2:0])); 
 
 
//Data Fifo  
data_fifo_32_24bit DATA_FIFO ( 
    .clk(clk), 
    .sinit(reset), 
    .din(data_in[15:0]), 
    .wr_en(wr_en), 
    .rd_en(rd_en), 
 114
 
    .dout(data_out[15:0]), 
    .full(), 
    .empty(), 
    .data_count(dt_data_count[2:0])); //data_data_count just sounds so 
wrong dunnit? 
 
 
//Sync Error 
reg sync_error; 
wire sync_error_w = (row_data_count[2:0] != col_data_count[2:0]) || 
(row_data_count[2:0] != dt_data_count[2:0]); 
always @(posedge clk) 
    if(reset) 
   sync_error <= 0; 
    else 
   if(sync_error_w) 
  sync_error <= 1; 
 
endmodule 
 
sdram state module: 
 
`timescale 1ns/10ps 
 
`define  C_NOP_IDLE   1 
`define  C_ACTIVE      2    
`define  C_NOP_RCD1   3 
`define  C_NOP_RCD2   4 
`define  C_WRITE       5 
`define  C_NOP_WR1     6 
`define  C_REFRESH     7 
`define  C_READ   8 
`define  C_NOP_CAS1   9 
`define  C_NOP_CAS2   10 
`define  C_PRECHARGE    11 
`define  C_NOP_PRE1    12 
`define  C_NOP_PRE2    13 
`define  C_NOP_PRE3    14 
`define  C_NOP_TRAS   15 
`define  C_NOP_REF1    16 
`define  C_NOP_REF2    17 
`define  C_NOP_REF3    18 
`define  C_NOP_TRAS1   19 
`define  C_NOP_TRAS2   20 
 
module sdram_state ( //inputs 
      
 clk,reset, refresh, refresh_ack, 
      
 wr_cmd, rd_cmd, hit, dly_cmd_avail, cmd_avail, 
      
 //outputs 
      
 page_rst, page_en, rd_en, row_add, 
      
 ras_n,cas_n,we_n,a10, dly_wr_cmd, dly_rd_cmd, 
 115
 
      
 //Debug Signals 
      
 state 
      
 ); 
 
  input clk; 
  input reset; 
  input refresh; 
  input wr_cmd; 
  input rd_cmd; 
  input dly_wr_cmd; 
  input dly_rd_cmd; 
  input hit; 
  input dly_cmd_avail; 
  input cmd_avail; 
   
  output page_rst, page_en, rd_en, row_add; 
  output ras_n, cas_n, we_n; 
  output a10; 
  output [16:1] state; 
  output refresh_ack; 
   
 
  //debug signals 
   
  reg [20:1]   state, next_state; 
 
  wire  c_nop_idle = state[`C_NOP_IDLE]; 
  wire   c_active  = state[`C_ACTIVE]; 
  wire   c_read  = state[`C_READ]; 
  wire   c_write  = state[`C_WRITE]; 
  wire  c_precharge  = 
state[`C_PRECHARGE]; 
  wire  c_refresh = state[`C_REFRESH]; 
     
  // state machine outputs 
  assign rd_en = c_nop_idle || c_write || c_read; 
  assign page_en = c_active; 
  assign page_rst = c_precharge; 
  assign row_add = c_active; 
  assign ras_n = ~(c_active || c_precharge || c_refresh); 
  assign cas_n = ~(c_read || c_write || c_refresh); 
  assign we_n = ~(c_write || c_precharge); 
  assign a10 = 1'b0; 
  assign refresh_ack = (c_refresh); 
  
    
  // state_intialization 
  always @ (posedge clk or posedge reset) 
    if (reset)   state <= 20'b00000000000000000001; 
    else         state <= next_state; 
 
  
  // state transitions 
 116
 
  always @ (state or reset or cmd_avail or wr_cmd or rd_cmd or hit or 
dly_cmd_avail or refresh 
       or dly_wr_cmd or 
dly_rd_cmd) 
    begin 
      // has default values for outputs so synthesis tool doesn't infer 
latches 
      next_state = 20'b00000000000000000000; 
       
      casex (1'b1) 
    state[`C_NOP_IDLE]:   
    if (cmd_avail) next_state[`C_ACTIVE] = 1; 
        
      
  else next_state[`C_NOP_IDLE] = 1; 
 
        state[`C_ACTIVE]:                 next_state[`C_NOP_RCD1] = 1; 
 
    state[`C_NOP_RCD1]:     
  next_state[`C_NOP_RCD2] = 1; 
 
    state[`C_NOP_RCD2]:     
  if (dly_wr_cmd) next_state[`C_WRITE] = 1; 
        
      
  else if(dly_rd_cmd) next_state[`C_READ] = 1; 
      
      
  else if (wr_cmd) next_state[`C_WRITE] = 1; 
        
      
  else if(rd_cmd) next_state[`C_READ] = 1; 
 
    state[`C_WRITE]:    
     if (hit && wr_cmd && cmd_avail && 
(~refresh)) next_state[`C_WRITE] = 1; 
        
      
    else next_state[`C_NOP_WR1] = 1; 
 
    state[`C_NOP_WR1]:   
     next_state[`C_NOP_TRAS] = 1; 
       
        state[`C_READ]:          
 if (hit && rd_cmd && cmd_avail && (~refresh)) 
next_state[`C_READ] = 1; 
        
      
    else next_state[`C_NOP_CAS1] = 1; 
 
    state[`C_NOP_CAS1]:            
 next_state[`C_NOP_CAS2] = 1; 
 
    state[`C_NOP_CAS2]:            
 next_state[`C_NOP_TRAS] = 1; 
     
 117
 
    state[`C_NOP_TRAS]:            
 next_state[`C_PRECHARGE] = 1; 
 
    state[`C_PRECHARGE]:  
   next_state[`C_NOP_PRE1] = 1; 
 
    state[`C_NOP_PRE1]:   
    next_state[`C_NOP_PRE2] = 1; 
 
    state[`C_NOP_PRE2]:   
    next_state[`C_NOP_PRE3] = 1; 
      
    
    state[`C_NOP_PRE3]:   
    if(refresh) next_state[`C_REFRESH] = 1; 
        
      
  else if(dly_cmd_avail) next_state[`C_ACTIVE] = 1; 
        
      
  else next_state[`C_NOP_IDLE] = 1;
 //trp = precharge time 
        
      
   
        state[`C_REFRESH]:      
  next_state[`C_NOP_REF1] = 1; 
 
    state[`C_NOP_REF1]:     
  next_state[`C_NOP_REF2] = 1; 
 
    state[`C_NOP_REF2]:     
  next_state[`C_PRECHARGE] = 1; 
 
    default:                         
 next_state[`C_NOP_IDLE] = 1; 
      endcase 
    end 
 
endmodule 
 
Dynamic Controller module: 
 
`timescale 1ns / 1ps 
 
`define CONTR_IDLE  1 
`define LUT_STORE   2 
`define IMAGE_STORE  3 
`define IMAGE_WARP 4 
`define TRACK_CNTR 5 
`define NOP_IMAGE_STORE 6 
`define NOP_IMAGE_WARP  7 
 
module Dynamic_Controller(// inputs 
      
   clk, reset, lut_frame_detect, warp_done, 
 118
 
      
   image_frame_detect, scanout_frame_done, 
      
   //outputs 
      
   scan_addr_select, scanout_enable, 
lut_store, 
      
   image_store, image_warp, 
workbuffer_addr_select 
      
   ); 
input clk; 
input reset; 
input lut_frame_detect; 
input warp_done; 
input image_frame_detect; 
input scanout_frame_done; 
 
output [12:0] scan_addr_select; 
output [12:0] workbuffer_addr_select; 
output scanout_enable; 
output lut_store; 
output image_store; 
output image_warp; 
 
reg [7:1] state, next_state; 
 
reg scanout_count; 
reg scanout_enable; 
reg [12:0] scan_addr_select; 
reg [12:0] workbuffer_addr_select; 
 
assign lut_store = state[`LUT_STORE]; 
assign image_store = state[`IMAGE_STORE]; 
assign image_warp = state[`IMAGE_WARP]; 
assign track_cntr = state[`TRACK_CNTR]; 
 
 
//STATE MACHINE to select the active module 
// state_intialization 
always @ (posedge clk or posedge reset) 
 if (reset)   state <= 7'b0000001; 
   else         state <= next_state; 
 
// state transitions 
  always @ (state or reset or lut_frame_detect or image_frame_detect or 
warp_done) 
     begin 
      // has default values for outputs so synthesis tool doesn't infer 
latches 
      next_state = 7'b0000000; 
       
      casex (1'b1) 
   state[`CONTR_IDLE]: if(~reset) 
next_state[`LUT_STORE] = 1; 
 119
 
      
    else next_state[`CONTR_IDLE] 
= 1; 
 
   state[`LUT_STORE]:
 if(lut_frame_detect) next_state[`IMAGE_STORE] = 1; 
      
    else next_state[`LUT_STORE] 
= 1; 
 
   state[`IMAGE_STORE]:
 if(image_frame_detect) next_state[`NOP_IMAGE_STORE] = 1; 
      
    else next_state[`IMAGE_STORE] 
= 1; 
 
   state[`NOP_IMAGE_STORE]: 
next_state[`IMAGE_WARP] = 1; 
 
   state[`IMAGE_WARP]: if(warp_done) 
next_state[`TRACK_CNTR] = 1; 
      
    else next_state[`IMAGE_WARP] 
= 1; 
 
   state[`NOP_IMAGE_WARP]: 
next_state[`TRACK_CNTR] = 1; 
 
   state[`TRACK_CNTR]:
 next_state[`IMAGE_STORE] = 1; 
 
        default:    
 next_state[`CONTR_IDLE] = 1; 
      endcase 
    end 
 
// Scanout Count Track. 
//Choose between the two buffers. If new frame fully warped,  
//then choose the new frame 
always @ (posedge clk) 
 if(reset) 
  scanout_count = 0; 
 else if(track_cntr) 
  scanout_count = scanout_count + 1; 
 
//Latch once image warping is accomplished once.  
always @ (posedge clk) 
 if(reset) 
  scanout_enable <= 0; 
 else if(track_cntr) 
  scanout_enable <= 1; 
 
//addr_select for this memory is just a constant number. 
// Once new frame chosen, donot start scanout until the start of next 
scanout frame. 
// i.e If a frame is being scanned out, donot interrupt. Wait to 
scanout new frame. 
 120
 
//A VERY HUGE ASSUMPTION HERE IS THAT, THE NEXT SCANOUT FRAME WILL BE 
DETECTED BEFORE 
//THE NEXT IMAGE FRAME IS WRITTEN. THIS WAY THE NEW ADDRESS WILL BE 
SELECTED BY THE TIME  
//IMAGE WARP IS ATTEMPTED FOR THE NEW WORKING BUFFER - This will be 
true unless an image can be 
//stored while scanning out one frame. 
always @ (posedge clk) 
 if(reset) 
  begin 
  scan_addr_select <= 1800; //Initially addr select 
is 1800 
  workbuffer_addr_select <= 1800; 
  end 
 else if(track_cntr && (~scanout_enable)) 
   begin 
   scan_addr_select <= 1800; 
   workbuffer_addr_select <= 2400; 
   end 
  else if(scanout_frame_done) 
   begin 
   casex(scanout_count) 
    1'b0 :   
 begin 
      
   scan_addr_select <= 2400; 
      
   workbuffer_addr_select <= 1800; 
      
   end 
    1'b1 :   
 begin 
      
   scan_addr_select <= 1800; 
      
   workbuffer_addr_select <= 2400; 
      
   
      
   end 
   endcase 
   end 
 
endmodule 
 
Readdata tracking module: 
 
`timescale 1ns / 1ps 
module Readdata_Tracking(//inputs 
      
  clk, reset, scanout_request, image_warping_request, 
      
  scanout_read, image_warping_read, page_en, 
      
  //outputs 
      
  track_data, 
 121
 
      
  //debug signals 
      
  scanout_count, image_warping_count 
      
  ); 
input clk; 
input reset; 
input scanout_request; 
input image_warping_request; 
input scanout_read; 
input image_warping_read; 
input page_en; 
 
output track_data; 
 
//debug signals 
output [7:0] scanout_count; 
output [7:0] image_warping_count; 
 
wire [7:0] image_warping_count; 
wire [7:0] scanout_count; 
reg track_data; 
 
//Up Down counter module. If upcount then increase count. 
//If downcount then decrease count. More like a counter with 
//enable because count neednt vary every clock cycle. 
counter_up_down SCANOUT_CNT ( 
  //inputs 
  .clk(clk), 
  .reset(reset), 
  .upcount(scanout_request), 
  .downcount(scanout_read), 
  .count(scanout_count) 
  ); 
 
counter_up_down IMGWARPING_CNT ( 
  //inputs 
  .clk(clk), 
  .reset(reset), 
  .upcount(image_warping_request), 
  .downcount(image_warping_read), 
  .count(image_warping_count) 
  ); 
 
//By default, i.e until first image warping done and scanout enable is 
active, 
// the default read result ends up in image warping.  
always @ (posedge clk) 
 if(reset) 
  track_data <= 1; 
 else if(page_en) 
  begin 
  if(scanout_count == 8'b00000001) 
   track_data <= 1; 
  else if(image_warping_count == 8'b00000000) 
   track_data <= 0; 
 122
 
  end 
 
endmodule 
 
Counter up down module: 
 
`timescale 1ns / 1ps 
module counter_up_down(clk, reset, upcount, downcount, count); 
parameter C = 8; 
 
input clk; 
input reset; 
input upcount; 
input downcount; 
output [C-1:0] count; 
 
reg [C-1:0] count; 
 
 
always @ (posedge clk) 
 if(reset) 
  count <= 0;   
 else  
  begin 
  if(upcount && (~downcount)) 
   count <= count + 1; 
  else if((~upcount) && downcount) 
   count <= count - 1; 
  end 
 
 
endmodule 
 
Lut Store module: 
 
`timescale 1ns / 1ps 
module Lut_Store(//inputs 
      
  clk, reset, 
      
  mem_wr_issue, 
      
  //outputs 
      
  sd_col_wr, sd_row_wr, wr_addr_valid, 
      
  sdram_write_request, data_to_sdram_wr, 
      
  sd_row_img, sd_col_img, sd_row_reg, sd_col_reg, 
      
  sd_row, sd_col, sd_carry 
//      
  u, v, dly_sd_row, dly_sd_col, 
//      
  dly_wr_addr_valid_lut, dly_wr_addr_valid_img, 
//      
  alt_counter, dly_data_active 
 123
 
      
  ); 
 
//inputs 
input clk; 
input reset; 
input mem_wr_issue; 
 
//outputs 
output [8:0] sd_col_wr; 
output [15:0] sd_row_wr; 
output wr_addr_valid; 
output sdram_write_request; 
output [15:0] data_to_sdram_wr; 
 
//debug signals 
output [12:0] sd_row_img; // x .. 0 to 479 
output [9:0] sd_col_img; // y .. 0 to 639 
//output [22:0] u , v; 
output  [22:0] sd_row_reg; // u 
output  [22:0] sd_col_reg; // v 
output  [12:0] sd_row; 
output [8:0] sd_col; 
output sd_carry; 
//output  [12:0] dly_sd_row; 
//output [8:0] dly_sd_col; 
//output dly_wr_addr_valid_lut; 
//output dly_wr_addr_valid_img; 
//output alt_counter; 
//output dly_data_active; 
// 
wire [12:0] sd_row_img; 
wire [9:0] sd_col_img; 
wire [12:0] sd_row; 
wire [8:0] sd_col; 
wire [22:0] pixel; 
wire [22:0] u, v; 
wire [1:0] sd_badd_wr_img; 
wire wr_addr_valid_img; 
reg [12:0] dly_sd_row; 
reg [8:0] dly_sd_col; 
reg [22:0] sd_row_reg; 
reg [22:0] sd_col_reg; 
reg [22:0] sd_row_reg_p; 
reg [22:0] sd_col_reg_p; 
reg dly_wr_addr_valid_img; 
reg alt_counter; 
wire wr_addr_valid_lut; 
wire [12:0] sd_row_600; 
wire [15:0] sd_row_wr; 
wire [8:0] sd_col_wr; 
wire [15:0] data_to_sdram_wr; 
reg dly_wr_addr_valid_lut; 
wire wr_addr_valid; 
wire sdram_write_request; 
 
//A pulse every alternate write issue.. 
 124
 
always @ (posedge clk) 
 if(reset) 
  alt_counter <= 0; 
 else if(mem_wr_issue && sdram_write_request) 
  alt_counter <= alt_counter + 1; 
 
//GENERATE 640 X 480 LUT ADDRESS 
sdram_image_add_gen PAT_ADD_GEN( 
  //inputs 
  .clk(clk), 
  .reset(reset), 
  .data_active(mem_wr_issue && sdram_write_request 
&& (~alt_counter)),//mem_wr_issues used for 
  //outputs 
  .sd_row(sd_row_img), 
  .sd_col(sd_col_img), 
  .bank_add(sd_badd_wr_img), 
  .wr_en(wr_addr_valid_img) 
  ); 
 
//*************FIRST GENERATE LUT PHYSICAL ADDRESS 
//Dly Active Reg.  
reg dly_data_active_reg; 
wire dly_data_active = dly_data_active_reg; 
always @ (posedge clk) 
 if(reset) 
  dly_data_active_reg <= 0; 
 else 
  dly_data_active_reg <= mem_wr_issue && 
sdram_write_request; 
   
 
//LUT ADDRESS HAS 1200 X 512 LOCATIONS.. 
sdram_lut_add_gen IN_ADD_GEN( 
  //inputs 
  .clk(clk), 
  .reset(reset), 
  .data_active(dly_data_active),//mem_wr_issues used 
for 
  //outputs 
  .sd_row(sd_row), 
  .sd_col(sd_col), 
  .bank_add(), // unconnected right now 
  .wr_en(wr_addr_valid_lut) 
  ); 
 
always @ (posedge clk) 
 if(reset) 
  begin 
  dly_sd_row <= 0; 
  dly_sd_col <= 0; 
  end 
 else if(wr_addr_valid_lut) 
  begin 
  dly_sd_row <= sd_row; 
  dly_sd_col <= sd_col; 
  end 
 125
 
 
//*************SAMETIME GENERATE LUT TRANSFORM. CALL THIS 
SD_ROW/COL_REG 
assign u = ((181 * (560 - sd_col_img - sd_row_img)) >> 8) + 320; 
assign v = ((181 * (sd_col_img - sd_row_img - 80)) >> 8) + 240; 
 
//assign u = ((181 * (sd_col_img - sd_row_img - 80)) >> 8) + 320; 
//assign v = ((181 * (sd_col_img + sd_row_img - 560)) >> 8) + 240; 
 
// Transforms 
always @ (posedge clk) 
 if(reset) 
 begin 
  sd_row_reg = 0; 
  sd_col_reg = 0; 
  dly_wr_addr_valid_img = 0; 
 end 
 else  
 begin 
  if( ((v<0)||(u<0)||(v>479)||(u>639)) && 
wr_addr_valid_img) //(if(x <= y) 
  begin 
  sd_row_reg = 0; 
  sd_col_reg = 0; 
  end   
  else if(wr_addr_valid_img)   
  begin 
  sd_row_reg = v; //row.. 0 to 479 
  sd_col_reg = u; //col.. 0 to 639 
  end 
  dly_wr_addr_valid_img = wr_addr_valid_img; 
 end 
 
//**************GENERATE PHYSCIAL IMAGE ADDRESS 
assign pixel = (sd_row_reg[9:0] * 640) + sd_col_reg[12:0]; //480*640 + 
col. 
 
// Transforms 
always @ (posedge clk) 
 if(reset) 
  begin 
  sd_row_reg_p = 0; 
  sd_col_reg_p = 0; 
  end 
 else if(dly_wr_addr_valid_img)   
  begin 
  sd_row_reg_p = pixel >> 9; //row.. 0 to 599 
  sd_col_reg_p = pixel - (sd_row_reg_p << 9);//col. 
0 to 511 
  end 
 
//**************GENERATE PHYSCIAL IMAGE ADDRESS 
always @ (posedge clk) 
 if(reset) 
  dly_wr_addr_valid_lut <= 0; 
 else 
  dly_wr_addr_valid_lut <= wr_addr_valid_lut; 
 126
 
 
//Muxes to select the correct value to be stored. if the LSB of sd_col 
is 1 then store the row and if it is 
// zero then store the column address. 
 
assign data_to_sdram_wr[15:0] = (dly_sd_col[0]) ? {3'b000, 
sd_row_reg_p[12:0]} : {7'b0000000, sd_col_reg_p[8:0]}; 
assign wr_addr_valid = dly_wr_addr_valid_lut; 
assign sd_row_600[12:0] = dly_sd_row[12:0] + 600; 
assign sd_row_wr[15:0] = {1'b1, 2'b00, sd_row_600[12:0]}; 
assign sd_col_wr[8:0] = dly_sd_col[8:0]; 
assign sdram_write_request = 1'b1; //is this ok? 
 
endmodule 
 
sdram image add gen module: 
 
`timescale 1ns / 1ps 
module sdram_image_add_gen(clk, reset, data_active, sd_row, sd_col, 
bank_add, wr_en); 
 
 //Inputs and Outputs 
 input clk; 
 input reset; 
 input data_active; 
 output [12:0] sd_row; 
 output [9:0] sd_col; 
 output [1:0] bank_add; 
 output wr_en; 
 
 reg dly_data_active; 
 reg [9:0] col_count, dly_col_count; 
 reg [1:0] bank_add_reg; 
 reg [12:0] row_count; 
 wire [12:0] sd_row; 
 wire [9:0] sd_col; 
  
 //Delay the Data Enable 
 always @ (posedge clk) 
  if(reset) 
   dly_data_active <=  0; 
  else 
   dly_data_active <= data_active; 
  
 //Column Counter - use post increment 
 assign col_count_done = (col_count == 10'b1001111111); 
//639..was 511 
 always @ (posedge clk) 
  if(reset) 
   col_count <= 10'b0000000000; 
  else if(dly_data_active) 
   begin 
   if (~col_count_done) 
   col_count <= col_count + 1; 
   else 
   col_count <= 0; 
   end 
 127
 
     
 //Delay by 1 
 always @ (posedge clk) 
  if(reset) 
   dly_col_count <= 10'b0000000000; 
  else 
   dly_col_count <= col_count; 
 
  
 // Row Counter 
 assign row_count_done = (row_count == 13'b0000111011111); 
//479..was 599 
 always @ (posedge clk) 
  if(reset) 
   row_count <= 13'b0000000000000; 
  else if(dly_data_active) 
   begin 
   if(row_count_done && col_count_done) 
   row_count <= 13'b0000000000000; 
   else if(col_count_done) 
   row_count <= row_count + 1; 
   end 
 
 //Every time a frame completes 
 assign framedone = row_count_done && col_count_done; 
 
 //Bank Address 
 always @ (posedge clk) 
  if(reset) 
   bank_add_reg <= 2'b00; 
  else if(framedone) 
   bank_add_reg <= ~bank_add_reg; 
  
 //Data Enable and Address 
 assign wr_en = dly_data_active; 
 assign bank_add = bank_add_reg; 
 assign sd_row = row_count; 
 assign sd_col = col_count; 
 
endmodule 
 
sdram lut add gen module: 
 
`timescale 1ns / 1ps 
module sdram_lut_add_gen(clk, reset, data_active, sd_row, sd_col, 
bank_add, wr_en); 
 
 //Inputs and Outputs 
 input clk; 
 input reset; 
 input data_active; 
 output [12:0] sd_row; 
 output [8:0] sd_col; 
 output [1:0] bank_add; 
 output wr_en; 
 
 reg dly_data_active; 
 128
 
 reg [8:0] col_count; 
 reg [1:0] bank_add_reg; 
 reg [12:0] row_count; 
 wire [12:0] sd_row; 
 wire [8:0] sd_col; 
  
 //Delay the Data Enable 
 always @ (posedge clk) 
  if(reset) 
   dly_data_active <=  0; 
  else 
   dly_data_active <= data_active; 
  
 //Column Counter - use post increment 
 assign col_count_done = (col_count == 9'b111111111); 
 always @ (posedge clk) 
  if(reset) 
   col_count <= 9'b000000000; 
  else 
   if(dly_data_active) 
   col_count <= col_count + 1; 
     
   
 // Row Counter 
 assign row_count_done = (row_count == 13'b0010010101111); 
//1199 
 always @ (posedge clk) 
  if(reset) 
   row_count <= 13'b0000000000000; 
  else if(dly_data_active) 
   begin 
   if(row_count_done && col_count_done) 
   row_count <= 13'b0000000000000; 
   else if(col_count_done) 
   row_count <= row_count + 1; 
   end 
 
 //Every time a frame completes 
 assign framedone = row_count_done && col_count_done; 
 
 //Bank Address 
 always @ (posedge clk) 
  if(reset) 
   bank_add_reg <= 2'b00; 
  else if(framedone) 
   bank_add_reg <= ~bank_add_reg; 
  
 //Data Enable and Address 
 assign wr_en = dly_data_active; 
 assign bank_add = bank_add_reg; 
 assign sd_row = row_count; 
 assign sd_col = col_count; 
 
endmodule 
 
Dvi Scan In Top module: 
 
 129
 
`timescale 1ns / 1ps 
module Dvi_Scan_In_Top(//inputs 
      
  clk, reset, frame_detect_reg, 
      
  mem_wr_issue, 
      
  //outputs 
      
  sd_col_wr, sd_row_wr, wr_addr_valid, 
      
  sdram_write_request, data_to_sdram_wr 
      
  ); 
 
//inputs 
input clk; 
input reset; 
input frame_detect_reg; 
input mem_wr_issue;  
//outputs 
output [8:0] sd_col_wr; 
output [15:0] sd_row_wr; 
output wr_addr_valid; 
output sdram_write_request; 
output [15:0] data_to_sdram_wr; 
 
wire [12:0] sd_row; 
wire [8:0] sd_col; 
wire [15:0] sd_row_wr; 
wire [8:0] sd_col_wr; 
//wire [15:0] data_to_sdram_wr; 
reg [15:0] data_to_sdram_wr; 
wire [1:0] sd_badd_wr; 
wire [12:0] img_row; 
wire [9:0] img_col; 
wire sdram_write_request; 
reg frame; 
 
//The sdram_image_gen generates an address every data_active. 
//The addressing is upto 640 x 480. Whenever the output address is 
valid, 
// wr_en is high.  
sdram_image_add_gen PAT_GEN( 
  //inputs 
  .clk(clk), 
  .reset(reset), 
  .data_active(mem_wr_issue), 
  //outputs 
  .sd_row(img_row), 
  .sd_col(img_col), 
  .bank_add(), 
  .wr_en(img_addr_valid) 
  ); 
 
//This is just a test case, used to alternate between two input images. 
always @ (posedge clk) 
 130
 
 if(frame_detect_reg) 
 frame <= frame + 1; 
 
 
//The two input images are basically column or row dependant for easy 
generation. 
//The column based image helps in debugging as every single pixel 
varies compulsorily. 
//Also the pixels arrange vertically. Any problem can be easily spotted 
this way.  
always @ (posedge clk) 
 if(reset) 
  begin 
  data_to_sdram_wr <= 16'h0000; 
  end 
 else if(img_addr_valid) 
  begin 
  if(~frame) 
  data_to_sdram_wr <= {7'b0000000, {3{img_row[2]}}, 
{3{img_row[1]}},{3{img_row[0]}}}; 
  else 
  data_to_sdram_wr <= {7'b0000000, {3{img_col[2]}}, 
{3{img_col[1]}},{3{img_col[0]}}}; 
  end 
 
//This generates the physical address. Everytime a valid address is 
obtained, the physical 
//address also is generated. This address could also be generated using 
multiplication as done  
//for the look up table. But because this is always sequential, it is 
comfortable to use a  
//counter. 
sdram_add_gen IN_ADD_GEN( 
  //inputs 
  .clk(clk), 
  .reset(reset), 
  .data_active(img_addr_valid), 
  //outputs 
  .sd_row(sd_row), 
  .sd_col(sd_col), 
  .bank_add(sd_badd_wr), 
  .wr_en(wr_addr_valid) 
  ); 
 
assign sd_row_wr[15:0] = {1'b1, 2'b00, sd_row[12:0]}; //First bit 1 
represents a write to memory.  
      
      
      //The 
next two bits are the bank, for compatibility. In 
      
      
      //this 
project only bank 0 is always used.  
assign sd_col_wr[8:0] = sd_col[8:0]; 
 
 131
 
//This signal served as request. Later on, the memory controller 
handled serving 
//the requests. So this is redundant. 
assign sdram_write_request = 1'b1; 
 
endmodule 
 
Image Warping Top module: 
 
`timescale 1ns / 1ps 
module Image_Warping_Top(//inputs 
      
  clk, reset, data_from_capture, 
      
  data_valid, mem_wr_issue, 
      
  mem_rd_issue, addr_select, 
      
  //outputs 
      
  sd_col_wr, sd_row_wr, wr_addr_valid, 
      
  sdram_write_request, data_to_sdram_wr, 
      
 
 sd_row_rd,sd_col_rd,sdram_read_request,rd_addr_valid, 
      
  warp_done, 
      
  //debug signals 
      
  row_data_count, col_data_count, 
      
  dt_data_count, lut_rd, img_rd, img_wr, dt_empty, 
dly_img_wr, 
      
  dt_full, rc_empty, rc_full, rc_fifo_en, 
data_en_fifo, 
      
  max_lut_read_requests, requests_pending, 
max_img_write_requests, 
      
  reset_cntr, sd_row_mem, sd_col_mem, 
      
  dly_max_img_write_requests, 
dly_max_lut_read_requests, 
      
  lut_requests, img_rd_requests, img_wr_requests, 
      
  sd_row_lut, sd_row_img, sd_col_lut, sd_col_img, 
      
  sd_row_fifo, sd_col_fifo, almost_full, 
      
  img_wr_count_change, lut_count_change, 
      
  dt_rd_err, dt_wr_err, row_out, col_out, 
 132
 
      
  img_rd_count_change, dt_going_empty, warp_state, 
      
  warp_cnt, dly2_warp_cnt, alt_counter, 
dly2_in_data_valid, 
      
  in_data_fifo, dly1_in_data_fifo, dly2_in_data_fifo, 
      
  in_data_valid, data_out_fifo 
      
  ); 
 
parameter FIFO_LENGTH = 256;  
parameter LUT_LENGTH = 512; 
parameter NBITS = 10; 
 
//inputs 
input clk; 
input reset; 
input [15:0] data_from_capture; 
input data_valid; 
input mem_wr_issue; 
input mem_rd_issue; 
input [12:0] addr_select; 
 
//outputs 
output [8:0] sd_col_wr; 
output [15:0] sd_row_wr; 
output [15:0] data_to_sdram_wr; 
output wr_addr_valid; 
output sdram_write_request; 
output [8:0] sd_col_rd; 
output [15:0] sd_row_rd; 
output rd_addr_valid; 
output sdram_read_request; 
output warp_done; 
 
//Debug Signals 
output [4:0] row_data_count; 
output [4:0] col_data_count; 
output [8:0] dt_data_count; 
output lut_rd; 
output img_rd; 
output img_wr; 
output dly_img_wr; 
output dt_empty, dt_full; 
output dt_going_empty; 
output rc_empty, rc_full; 
output rc_fifo_en; 
output data_en_fifo; 
output max_lut_read_requests; 
output dly_max_lut_read_requests; 
output requests_pending; 
output max_img_write_requests; 
output dly_max_img_write_requests; 
output reset_cntr; 
output [15:0] sd_row_mem; 
 133
 
output [8:0] sd_col_mem; 
output [9:0] lut_requests; 
output [8:0] img_rd_requests; 
output [8:0] img_wr_requests; 
output [12:0] sd_row_lut; 
output [12:0] sd_row_img; 
output [8:0] sd_col_lut; 
output [8:0] sd_col_img; 
output almost_full; 
output [15:0] sd_row_fifo; 
output [8:0] sd_col_fifo; 
output lut_count_change; 
output img_wr_count_change; 
output img_rd_count_change; 
output dt_rd_err, dt_wr_err; 
output [15:0] row_out; 
output [8:0] col_out; 
output [10:1] warp_state; 
output [9:0] warp_cnt; 
output [9:0] dly2_warp_cnt; 
output alt_counter; 
output dly2_in_data_valid; 
output [31:0] in_data_fifo; 
output [15:0] dly1_in_data_fifo; 
output [15:0] dly2_in_data_fifo; 
output in_data_valid; 
output [31:0] data_out_fifo; 
 
//wires and regs 
wire [15:0] sd_row_wr, sd_row_rd; 
wire [15:0] row_out; 
reg [15:0] sd_row_mem, sd_row_fifo; 
wire [12:0] sd_row1, sd_row2; 
wire [12:0] sd_row_lut, sd_row_img; 
wire [8:0] sd_col1, sd_col2; 
wire [8:0] col_out; 
wire almost_full; 
reg [8:0] sd_col_mem, sd_col_fifo; 
wire [8:0] sd_col_wr, sd_col_rd; 
wire [8:0] sd_col_lut, sd_col_img; 
wire [1:0] sd_b_add1, sd_b_add2; 
wire [4:0] row_data_count, col_data_count; 
wire [8:0] dt_data_count; 
wire [31:0] data_out_fifo; 
wire [15:0] data_from_capture; 
reg max_lut_read_requests; 
reg requests_pending; 
reg max_img_write_requests; 
reg rc_fifo_en; 
reg dly_max_lut_read_requests; 
reg dly_max_img_write_requests; 
reg dly_requests_pending; 
reg dly2_max_lut_read_requests; 
reg dly2_max_img_write_requests; 
reg dly2_requests_pending; 
reg lut_count_change, img_wr_count_change, img_rd_count_change; 
reg dly_fifo_addr_valid1, dly_fifo_addr_valid, dly_fifo_addr_valid2; 
 134
 
wire dt_going_empty; 
reg dly_lut_rd, dly_img_rd, dly_img_wr; 
reg alt_counter; 
reg dly1_in_data_valid, dly2_in_data_valid; 
reg [15:0] dly1_in_data_fifo, dly2_in_data_fifo; 
reg [31:0] in_data_fifo; 
reg in_data_valid; 
reg [NBITS-1 : 0] dly1_warp_cnt, dly2_warp_cnt; 
 
//Address Generation for reading from the LUT - LUT is 1200 x 512 
values.  
// The physical address is from Row 1200 to row 1799. So add 600 to the 
generated row. 
sdram_lut_add_gen READ_ADDR_GEN_LUT( 
  //inputs 
  .clk(clk), 
  .reset(reset), 
  .data_active(lut_rd), //wire  Vs Reg 
  //outputs 
  .sd_row(sd_row1), 
  .sd_col(sd_col1), 
  .bank_add(sd_b_add1), 
  .wr_en(rd_addr_valid1) 
  ); 
assign sd_row_lut[12:0] = sd_row1[12:0] + 600; 
assign sd_col_lut[8:0] = sd_col1[8:0];// + 8; 
      
//Address Generation for reading from the sdram. Write from location 
1800 to 2399 or 2400 to 2999. 
sdram_add_gen WRITE_ADDR_GEN_IMG( 
  //inputs 
  .clk(clk), 
  .reset(reset), 
  .data_active(img_wr), //wire  Vs Reg 
  //outputs 
  .sd_row(sd_row2), 
  .sd_col(sd_col2), 
  .bank_add(sd_b_add2), 
  .wr_en(rd_addr_valid2) 
  ); 
assign sd_row_img[12:0] = sd_row2[12:0] + addr_select[12:0]; //Image 
can be stored in either buffer 
assign sd_col_img[8:0] = sd_col2[8:0]; 
 
 
//State Machine for image warping 
sdram_warp WARP_STATE_MACHINE ( //inputs 
  .clk(clk), 
  .reset(reset), 
  .dly_lut_rd(dly_lut_rd), 
  .dly_img_rd(dly_img_rd), 
  .dly_img_wr(dly_img_wr), 
  .dt_empty(dt_empty), 
  .dt_going_empty(dt_going_empty), 
  .rc_almost_full(almost_full), 
  .max_lut_read_requests(max_lut_read_requests), 
//LUT Read Count 
 135
 
  .requests_pending(requests_pending), 
 //Image Split Count 
 
 .max_img_write_requests(max_img_write_requests),//Warp Frame 
Write Count 
 
 .dly_max_lut_read_requests(dly_max_lut_read_requests), //LUT 
Read Count 
 
 .dly_max_img_write_requests(dly_max_img_write_requests),//War
p Frame Write Count 
  .dly_requests_pending(dly_requests_pending), 
 
 .dly2_max_lut_read_requests(dly2_max_lut_read_requests), 
 
 .dly2_max_img_write_requests(dly2_max_img_write_requests), 
  .dly2_requests_pending(dly2_requests_pending), 
  .lut_count_change(lut_count_change), 
  .img_wr_count_change(img_wr_count_change), 
  .img_rd_count_change(img_rd_count_change), 
  .warp_done(warp_done), 
  //outputs 
  .lut_rd(lut_rd), 
  .img_rd(img_rd), 
  .img_wr(img_wr), 
  .reset_cntr(reset_cntr), 
  .state(warp_state) 
  ); 
 
 
//Delay all signals.  
reg [8:0] dly_img_wr_requests, dly_img_rd_requests; 
reg [9:0] dly_lut_requests; 
//DELAY STATE 
always @ (posedge clk) 
 if(reset ) 
  begin 
  dly_lut_rd <= 0; 
  dly_img_rd <= 0; 
  dly_img_wr <= 0; 
  dly_max_lut_read_requests <= 0; 
  dly_max_img_write_requests <= 0; 
  dly_requests_pending <= 0; 
  dly2_max_lut_read_requests <= 0; 
  dly2_max_img_write_requests <= 0; 
  dly2_requests_pending <= 0; 
  dly_lut_requests <= 0; 
  dly_img_wr_requests <= 0; 
  dly_img_rd_requests <= FIFO_LENGTH - 1; 
  end 
 else  
 begin 
   if(reset_cntr) 
   begin 
   dly_lut_rd <= 0; 
   dly_img_rd <= 0; 
   dly_img_wr <= 0; 
 136
 
   dly_max_lut_read_requests <= 0; 
   dly_max_img_write_requests <= 0; 
   dly_requests_pending <= 0; 
   dly2_max_lut_read_requests <= 0; 
   dly2_max_img_write_requests <= 0; 
   dly2_requests_pending <= 0; 
   dly_lut_requests <= 0; 
   dly_img_wr_requests <= 0; 
   dly_img_rd_requests <= FIFO_LENGTH - 1; 
   end 
   else 
   begin 
   dly_lut_rd <= lut_rd; 
   dly_img_rd <= img_rd; 
   dly_img_wr <= img_wr; 
   dly_max_lut_read_requests <= 
(lut_requests == LUT_LENGTH); 
   dly_max_img_write_requests <= 
(img_wr_requests == FIFO_LENGTH); 
   dly_requests_pending <= 
(img_rd_requests == 511); // not generic.. 
   dly2_max_lut_read_requests <= 
dly_max_lut_read_requests; 
   dly2_max_img_write_requests <= 
dly_max_img_write_requests; 
   dly2_requests_pending <= 
dly_requests_pending; 
   dly_lut_requests <= lut_requests; 
   dly_img_wr_requests <= img_wr_requests; 
   dly_img_rd_requests <= img_rd_requests; 
   end 
  end 
//*********************************************************************
************ 
//2 LUT values have to stored to generate an entire location. Logic to 
generate address 
//valid every time 2 LUT values have been read out. These LUT values 
[32 bits] are then 
//written into the data fifo. 
//MULTIPLEX THE INCOMING READ DATA INTO THE DATA FIFO 
//HAVE A COUNTER TRACK THE VALUES  
wire [NBITS-1 : 0] warp_cnt; 
counter_ce #(NBITS,(FIFO_LENGTH + LUT_LENGTH))  WARP_CNT(clk, reset, 
data_valid, ,warp_cnt); 
   
//If due to a LUT_RD result then combine two data. Else write directly 
into the fifo 
always @ (posedge clk) 
 if(reset) 
  alt_counter <= 0; 
 else if(data_valid && (~warp_cnt[9])) //9 - not 
generic..change later NBITS - 1?? 
  alt_counter <= alt_counter + 1; 
  
//Delay the data valid signal 
always @ (posedge clk) 
 if(reset) 
 137
 
  begin 
  dly1_in_data_valid <= 0; 
  dly2_in_data_valid <= 0; 
  end 
 else 
  begin 
  dly1_in_data_valid <= data_valid; 
  dly2_in_data_valid <= dly1_in_data_valid; 
  end 
 
//Delay the data in..so a single word can be formed before entering 
data fifo 
always @ (posedge clk) 
 if(reset) 
  begin 
  dly1_in_data_fifo[15:0] <= 0; 
  dly2_in_data_fifo[15:0] <= 0; 
  end 
 else if(data_valid) 
  begin 
  dly1_in_data_fifo[15:0] <= data_from_capture[15:0]; 
  dly2_in_data_fifo[15:0] <= dly1_in_data_fifo; 
  end 
 
//and..delay the warp_cnt signal 
always @ (posedge clk) 
 if(reset) 
  begin 
//  dly1_warp_cnt <= 0; 
  dly2_warp_cnt <= 0; 
  end 
 else 
  begin 
//  dly1_warp_cnt <= warp_cnt; 
  dly2_warp_cnt <= warp_cnt; 
  end 
  
//If a LUT_RD value then write into the data fifo. If a data result 
then the 
//data is stored as the least significant 16 bits of a 32 bit word. 
always @ (posedge clk) 
 if(reset) 
  begin 
  in_data_fifo[31:0] <= 0; 
  in_data_valid <= 0; 
  end 
 else if(~(dly2_warp_cnt[9])) //if(LUT_RD) value 
  begin 
  in_data_fifo[31:0] <= 
{dly2_in_data_fifo[15:0],dly1_in_data_fifo[15:0]}; 
  in_data_valid <= (warp_cnt[0] ^ dly2_warp_cnt[0]) 
&& (~alt_counter); 
  end 
 else 
  begin 
  in_data_fifo[31:0] <= {16'h0000, 
dly1_in_data_fifo[15:0]}; 
 138
 
  in_data_valid <= dly1_in_data_valid; 
  end 
  
//*********************************************************************
************   
//FIFO STATUS 
//Keep track of the number of requests that have been services by the 
warp controller. 
//There is a delay involved. So if there are BURST_REQUESTS number of 
burst requests then 
//make sure that BURST_REQUESTS-2 are done at full speed. The last one 
or two must be done 
//individuall with nop cycles to ensure no error. Although this 
introduces certain  
//amount of inefficiency, it ensures that there will be no error. 
reg [8:0] img_rd_requests, img_wr_requests; 
reg [9:0] lut_requests; 
always @ (posedge clk) 
 if(reset) 
  begin 
  lut_requests    
 <= 0; 
  img_rd_requests    <= 
FIFO_LENGTH - 1; 
  img_wr_requests    <= 0; 
  max_lut_read_requests <= 0; 
  requests_pending   <=
 0;    
  max_img_write_requests <= 0; 
  end 
 else if(reset_cntr) 
     begin 
     lut_requests <= 0; 
     img_rd_requests 
<= FIFO_LENGTH - 1; 
     img_wr_requests 
<= 0; 
    
 max_lut_read_requests <= 0; 
     requests_pending
   <= 0;    
    
 max_img_write_requests <= 0; 
     end  
   else 
     begin 
     casex({img_wr, 
img_rd, lut_rd}) 
     3'b001: begin 
      
  lut_requests <= lut_requests + 1; //0 to 
255 
      
  max_lut_read_requests <= (lut_requests >= 
LUT_LENGTH-3); //should it be fifo_length - 1 
 139
 
      
  end    
     
      
     3'b010: begin 
      
  img_rd_requests <= img_rd_requests - 1; //255 to 0 
      
  requests_pending <= (img_rd_requests <= 2); 
      
  end 
 
     3'b100:  begin 
      
  img_wr_requests <= img_wr_requests + 1; //0 to 255 
      
  max_img_write_requests <= (img_wr_requests >= 
FIFO_LENGTH-3); 
      
  end 
     endcase 
     end 
 
//COUNT CHANGE - Track and check if the count has changed. i.e it keeps 
track of residual requests 
//being served. 
always @ (posedge clk) 
 if(reset) 
  begin 
  lut_count_change <= 0; 
  img_wr_count_change <= 0; 
  img_rd_count_change <= 0; 
  end 
 else 
  begin 
  lut_count_change <= (lut_requests != 
dly_lut_requests); //0 - no change 
  img_wr_count_change <= (img_wr_requests != 
dly_img_wr_requests); //0 - no change 
  img_rd_count_change <= (img_rd_requests != 
dly_img_rd_requests); //0 - no change 
  end 
 
//AFTER IMAGE WARPING 
//The 2 rows indicates that it could be any of the two buffers. 
assign warp_done = ((sd_row_wr[12:0] == 13'b0101110110111) || 
(sd_row_wr[12:0] == 13'b0100101011111)) 
      
     && (sd_col_wr[8:0] 
== 9'b111111111); //POSSIBLE ERROR 
      
      
      
      // Could be 
either of the scanout buffers 
 
//MUXES TO SELECT RC CONTENT - GENERATE WRITES INTO FIFO 
 140
 
// During Lut read cycle the row and column are generated using address 
generator. 
// During Image read cycle the row and column are read out from the 
Look Up Table whose 
// results are stored in the data fifo. 
// During Image Write cycle the row and column are generated using the 
image address generator 
always @ (posedge clk) 
 if(reset) 
  begin 
  sd_col_fifo[8:0]  <= 0;  
  sd_row_fifo[15:0]  <= 0; 
  rc_fifo_en    <= 0;
   
  end 
 else 
  begin 
  case({dly_img_wr, dly_img_rd, dly_lut_rd}) 
  3'b001: begin 
     sd_row_fifo[15:0] 
<= {1'b0, 2'b00, sd_row_lut[12:0]}; 
     sd_col_fifo[8:0] 
 <= sd_col_lut[8:0]; 
     rc_fifo_en 
   <= 1; 
     end 
 
  3'b010: begin 
     sd_row_fifo[15:0] 
<= {1'b0, 2'b00, data_out_fifo[12:0]}; //make sure it outputs at the 
right time 
     sd_col_fifo[8:0] 
 <= data_out_fifo[24:16]; 
     rc_fifo_en 
   <= 1; // dly_data_en_fifo 
     end 
 
  3'b100: begin 
     sd_row_fifo[15:0] 
<= {1'b1, 2'b00, sd_row_img[12:0]}; 
     sd_col_fifo[8:0] 
 <= sd_col_img[8:0]; 
     rc_fifo_en 
   <= 1; 
     end 
 
  default: begin 
     sd_row_fifo[15:0] 
<= 0; 
     sd_col_fifo[8:0] 
 <= 0; 
     rc_fifo_en 
   <= 0; 
     end 
  endcase 
  end 
 
 141
 
//READ SIGNAL FOR DATA FIFO - Read out RC from Lut or pixel from image 
wire dt_empty; 
assign data_en_fifo = img_rd || (row_out[15] && dly_fifo_addr_valid1); 
 
 
//If addr valid then queue address - if mem rd issued then read out 
address and data 
//if new dest address produced write it into queue. 
//Destination Row Fifo  
row_fifo_255_14bit DEST_ROW_FIFO ( 
    .clk(clk), 
    .sinit(reset), 
    .din(sd_row_fifo[15:0]), //write from 600 to 1200 
    .wr_en(rc_fifo_en), 
    .rd_en(mem_wr_issue || mem_rd_issue), 
    .dout(row_out[15:0]), 
    .full(rc_full), 
    .empty(rc_empty), 
    .data_count(row_data_count[4:0])); 
 
assign rc_fifo_sync_error = (row_data_count[4:0] != 
col_data_count[4:0]); //check != Vs ~= 
 
//Destination Column Fifo 
col_fifo_255_8bit DEST_COL_FIFO ( 
    .clk(clk), 
    .sinit(reset), 
    .din(sd_col_fifo[8:0]), 
    .wr_en(rc_fifo_en), 
    .rd_en(mem_wr_issue || mem_rd_issue), 
    .dout(col_out[8:0]), 
    .full(), 
    .empty(), 
    .data_count(col_data_count[4:0])); 
 
assign almost_full = (row_data_count[4:0] == 5'b11111); 
      
      
       
 
// Destination Data Fifo  
data_fifo_255_24bit SOURCE_DATA_FIFO ( 
    .clk(clk), 
    .sinit(reset), 
    .din(in_data_fifo[31:0]), 
    .wr_en(in_data_valid), 
    .rd_en(data_en_fifo), 
    .dout(data_out_fifo[31:0]), 
    .full(dt_full), 
    .empty(dt_empty), 
  .rd_err(dt_rd_err), 
    .wr_err(dt_wr_err), 
    .data_count(dt_data_count[8:0])); 
 
assign dt_going_empty = (dt_data_count == 9'b000000001) && 
(data_en_fifo) && (~in_data_valid);// if one value left, 
 142
 
      
  //and there is read issued and there is no write 
issued, then next state will be empty 
 
//Delay one cycle so that the MSB of row_out can be used to identify 
whether the operation is  
//a read or a write. If a write then the corresponding data would also 
be read out. 
always @ (posedge clk) 
 if(reset) 
  begin 
  dly_fifo_addr_valid1 <= 0; 
  dly_fifo_addr_valid  <= 0; 
  sd_row_mem    <= 0; 
  sd_col_mem    <= 0; 
  end 
 else  
  begin 
  dly_fifo_addr_valid1 <= mem_wr_issue || 
mem_rd_issue; 
  dly_fifo_addr_valid  <= dly_fifo_addr_valid1; 
  sd_row_mem    <= 
row_out[15:0]; 
  sd_col_mem    <= 
col_out[8:0]; 
  end    
      
 
//Wr and Rd Addr Valid can be taken from 1 bit of row op 
//Large Muxes. 
//Fifo Communication wth Memory Queue - if rc fifo is not empty, talk 
to memory queue 
assign sdram_read_request = (~rc_empty) && (~sd_row_mem[15]); //see 
what happens when both read and write requests are 1 
assign sdram_write_request = (~rc_empty) && sd_row_mem[15]; 
assign sd_col_wr[8:0] = (sd_row_mem[15]) ? sd_col_mem[8:0] : 0; 
assign sd_row_wr[15:0] = (sd_row_mem[15]) ? sd_row_mem[15:0] : 0;  
assign wr_addr_valid = dly_fifo_addr_valid && sd_row_mem[15]; 
assign data_to_sdram_wr[15:0] = (sd_row_mem[15]) ? data_out_fifo[15:0] : 
0;  
assign sd_col_rd[8:0] =  (~sd_row_mem[15]) ? sd_col_mem[8:0] : 0; 
assign sd_row_rd[15:0] = (~sd_row_mem[15]) ? sd_row_mem[15:0] : 0; 
assign rd_addr_valid = dly_fifo_addr_valid && (~sd_row_mem[15]); 
 
endmodule       
 
sdram add gen module: 
 
`timescale 1ns / 1ps 
module sdram_add_gen(clk, reset, data_active, sd_row, sd_col, bank_add, 
wr_en); 
 
 //Inputs and Outputs 
 input clk; 
 input reset; 
 input data_active; 
 output [12:0] sd_row; 
 143
 
 output [8:0] sd_col; 
 output [1:0] bank_add; 
 output wr_en; 
 
 reg dly_data_active; 
 reg [8:0] col_count; 
 reg [1:0] bank_add_reg; 
 reg [12:0] row_count; 
 wire [12:0] sd_row; 
 wire [8:0] sd_col; 
  
 //Delay the Data Enable 
 always @ (posedge clk) 
  if(reset) 
   dly_data_active <=  0; 
  else 
   dly_data_active <= data_active; 
  
 //Column Counter - use post increment 
 assign col_count_done = (col_count == 9'b111111111); 
 always @ (posedge clk) 
  if(reset) 
   col_count <= 9'b000000000; 
  else 
   if(dly_data_active) 
   col_count <= col_count + 1; 
     
   
 // Row Counter 
 assign row_count_done = (row_count == 13'b0001001010111); 
//599 
 always @ (posedge clk) 
  if(reset) 
   row_count <= 13'b0000000000000; 
  else if(dly_data_active) 
   begin 
   if(row_count_done && col_count_done) 
   row_count <= 13'b0000000000000; 
   else if(col_count_done) 
   row_count <= row_count + 1; 
   end 
 
 //Every time a frame completes 
 assign framedone = row_count_done && col_count_done; 
 
 //Bank Address 
 always @ (posedge clk) 
  if(reset) 
   bank_add_reg <= 2'b00; 
  else if(framedone) 
   bank_add_reg <= ~bank_add_reg; 
  
 //Data Enable and Address 
 assign wr_en = dly_data_active; 
 assign bank_add = bank_add_reg; 
 assign sd_row = row_count; 
 assign sd_col = col_count; 
 144
 
 
endmodule 
 
sdram warp module: 
`timescale 1ns / 1ps 
 
`define  INIT_NOP_IDLE  1  // start here 
`define  LUT_RD     2  // precharge all 
`define  NOP_LUT_RD   3 
`define  IMG_RD     4 
`define  IMG_WR     5 
`define NOP_IMG_WR  6 
`define CLR_ALL   7 
`define WARP_TRAP  8 
`define  NOP_IMG_RD   9 
`define NOP_CLR_ALL  10 
 
module sdram_warp(//inputs 
      clk, 
reset, max_lut_read_requests, requests_pending, 
     
 rc_almost_full, dt_empty, dt_going_empty, 
     
 max_img_write_requests, dly_max_img_write_requests, 
     
 dly_max_lut_read_requests, dly_requests_pending, 
     
 warp_done, dly_img_wr, dly_img_rd, dly_lut_rd, 
     
 //outputs 
      lut_rd, 
img_rd, img_wr, reset_cntr, 
     
 lut_count_change, img_wr_count_change, 
     
 dly2_max_img_write_requests, dly2_max_lut_read_requests, 
     
 dly2_requests_pending, img_rd_count_change, state 
      ); 
 
input clk; 
input reset; 
input dly_img_wr; 
input dly_img_rd; 
input dly_lut_rd; 
input max_lut_read_requests; 
input dly_max_lut_read_requests; 
input dly2_max_lut_read_requests; 
input lut_count_change; 
input max_img_write_requests; 
input dly_max_img_write_requests; 
input dly2_max_img_write_requests; 
input img_wr_count_change; 
input requests_pending; 
input dly_requests_pending; 
input dly2_requests_pending; 
input img_rd_count_change; 
 145
 
input dt_empty; 
input dt_going_empty; 
input rc_almost_full; 
input warp_done; 
output lut_rd; 
output img_rd; 
output img_wr; 
output reset_cntr; 
output [10:1] state; 
 
reg [10:1] state, next_state; 
 
wire lut_rd  = state[`LUT_RD]; 
wire img_rd  = state[`IMG_RD]; 
wire img_wr  = state[`IMG_WR]; 
wire reset_cntr = state[`CLR_ALL]; 
 
 
 // state_intialization 
  always @ (posedge clk or posedge reset) 
    if (reset)   state <= 10'b0000000001; 
    else         state <= next_state; 
 
 
  // state transitions 
  always @ (state or reset or max_lut_read_requests or requests_pending 
or 
     
 max_img_write_requests or dly_max_lut_read_requests 
      or 
dly_max_img_write_requests or rc_almost_full or lut_count_change 
      or 
img_wr_count_change or dly2_max_lut_read_requests or 
     
 dly2_max_img_write_requests or img_rd_count_change or  
     
 dly2_requests_pending or dt_empty or dt_going_empty 
      or 
dly_requests_pending or warp_done or dly_img_wr or  
     
 dly_img_rd or dly_lut_rd) 
 
    begin 
      // has default values for outputs so synthesis tool doesn't infer 
latches 
      next_state = 10'b0000000000; 
       
      casex (1'b1) 
        state[`INIT_NOP_IDLE]: if(~reset) next_state[`LUT_RD] = 1; 
        
     else 
next_state[`INIT_NOP_IDLE] = 1; 
 
    state[`LUT_RD]:   
 if(warp_done) next_state[`WARP_TRAP] = 1; 
 146
 
        
     else 
if(rc_almost_full || max_lut_read_requests) next_state[`NOP_LUT_RD] = 1; 
      
     else 
next_state[`LUT_RD] = 1; 
 
    state[`NOP_LUT_RD]:  if(warp_done) 
next_state[`WARP_TRAP] = 1; 
        
     else 
if(dly_max_lut_read_requests || dly2_max_lut_read_requests) 
next_state[`NOP_IMG_RD] = 1; //WAIT STAGE 
        
     else 
if(rc_almost_full || lut_count_change  || dly_lut_rd) 
next_state[`NOP_LUT_RD] = 1; 
      
     else 
next_state[`LUT_RD] = 1; 
 
    state[`IMG_RD]:   
 if(warp_done) next_state[`WARP_TRAP] = 1; 
        
     else 
if(rc_almost_full || requests_pending || dt_empty || dt_going_empty) 
next_state[`NOP_IMG_RD] = 1; //wait stage 
        
     else 
next_state[`IMG_RD] = 1; 
 
    state[`NOP_IMG_RD]:  if(warp_done) 
next_state[`WARP_TRAP] = 1; 
        
     else 
if(dly_requests_pending || dly2_requests_pending) 
next_state[`NOP_IMG_WR] = 1; //WAIT STAGE 
        
     else 
if(rc_almost_full || img_rd_count_change || dt_empty || dt_going_empty 
|| dly_img_rd) next_state[`NOP_IMG_RD] = 1; 
      
     else 
next_state[`IMG_RD] = 1; 
  
    state[`IMG_WR]:   
 if(warp_done) next_state[`WARP_TRAP] = 1; 
        
     else 
if(rc_almost_full || max_img_write_requests || dt_empty || 
dt_going_empty) next_state[`NOP_IMG_WR] = 1; 
      
     else 
next_state[`IMG_WR] = 1; 
 
    state[`NOP_IMG_WR]:  if(warp_done) 
next_state[`WARP_TRAP] = 1; 
 147
 
        
     else 
if(dly_max_img_write_requests || dly2_max_img_write_requests) 
next_state[`CLR_ALL] = 1; 
        
     else 
if(rc_almost_full || img_wr_count_change || dt_empty || dt_going_empty 
|| dly_img_wr) next_state[`NOP_IMG_WR] = 1; 
      
     else 
next_state[`IMG_WR] = 1; 
 
    state[`CLR_ALL]:  
 if(warp_done) next_state[`WARP_TRAP] = 1; 
        
     else 
next_state[`NOP_LUT_RD] = 1; 
 
    state[`WARP_TRAP]:  
 if(reset) next_state[`INIT_NOP_IDLE] = 1; 
        
     else 
next_state[`WARP_TRAP] = 1; 
           
        default:                 next_state[`INIT_NOP_IDLE] = 1; 
      endcase 
    end 
 
 
endmodule 
 
Dvi Scan Out Top module: 
 
`timescale 1ns / 1ps 
module Dvi_Scan_Out_Top(//inputs 
      
  clk, reset,// clk_dvi, 
      
  data_from_sdram, mem_rd_en, mem_rd_valid, 
      
  addr_select, 
      
  //outputs 
      
  mem_read_request, hsync_n, vsync_n, 
      
  red, green, blue, 
      
  //outputs to sdram 
      
  sd_row_rd, sd_col_rd, sd_data_rd, 
      
  dly_mem_rd_en, scanout_frame_done, 
      
  //debug signals 
      
  datarequest, fifocount_out, CountX, CountY, 
 148
 
      
  full_out, empty_out, wr_err_out, rd_err_out, 
      
  rcbd_full, rcbd_empty, 
      
  sync_error, row_count, 
      
  col_count, dt_count, almost_full 
      
  ); 
 
//inputs 
input clk; 
//input clk_dvi; 
input reset; 
input [15:0] data_from_sdram; 
input mem_rd_en; //mem asks for address 
input mem_rd_valid; //mem says data is valid 
input [12:0] addr_select; //TO SELECT BETWEEN THE OUTPUT BUFFERS 
 
//outputs 
output mem_read_request; 
output hsync_n; 
output vsync_n; 
output [2:0] red; 
output [2:0] green; 
output [2:0] blue; 
 
//outputs to sdram 
output [15:0] sd_row_rd; 
output [8:0] sd_col_rd; 
output [15:0] sd_data_rd; 
 
//Signals to identify when memory read data and address are valid 
output dly_mem_rd_en; 
output scanout_frame_done; 
 
//Debug Signals 
output datarequest; 
output [3:0] fifocount_out; 
output [9:0] CountX, CountY; 
output full_out, empty_out; 
output wr_err_out, rd_err_out; 
output rcbd_full; 
output rcbd_empty; 
output sync_error; 
output [2:0] row_count; 
output [2:0] col_count; 
output [2:0] dt_count; 
output almost_full; 
 
 
wire [15:0] data_to_scanout; 
wire [15:0] data_from_sdram; 
wire [15:0] sd_row_rd; 
wire [8:0] sd_col_rd; 
wire [15:0] sd_row_rd_in; 
 149
 
wire [8:0] sd_col_rd_in; 
wire [12:0] sd_row_gen; 
wire [12:0] sd_row_gen_1200; 
wire [8:0] sd_col_gen; 
wire [1:0] sd_b_add_gen; 
reg mem_read_request; 
reg dly_mem_rd_en; 
wire [3:0] fifocount_out; 
wire [2:0] red, green, blue; 
wire [9:0] CountX, CountY; 
reg scanout_frame_done; 
 
 
/**********************************************************************
************/ 
//Dvi Scan out - To generate vga timing and the output pulses. 
//DVI SCAN OUT 
Dvi_Scan_Out SCAN_OUT ( 
  //inputs 
  .clk(clk), 
  .reset(reset),  //make this 0 or 
1 to turn this module on or off 
  .rddata(data_to_scanout[15:0]),//to_scanout), 
  //outputs 
  .datarequest(datarequest), 
  .hsync(hsync),  
  .vsync(vsync), 
  .red(red[2:0]), 
  .blue(blue[2:0]), 
  .green(green[2:0]), 
  //debug signals   
       
  .hactive(hactive), 
  .vactive(vactive), 
  .hcnt(hcnt), 
  .vcnt(vcnt) 
  ); 
//hsync and vsync are negative 
assign hsync_n = ~hsync; 
assign vsync_n = ~vsync; 
 
/**********************************************************************
************/ 
//ADDRESS GENERATOR - Generate location at which data is to be read 
from 
//640 x 480 can be stored in 600 x 512 physical locations. 
sdram_add_gen READ_ADDR_GEN( 
  //inputs 
  .clk(clk), 
  .reset(reset), 
  .data_active(mem_read_request && (~almost_full)), 
//mem read asks for address 
  //outputs 
  .sd_row(sd_row_gen), 
  .sd_col(sd_col_gen), 
  .bank_add(sd_b_add_gen), 
  .wr_en(rd_addr_valid_in) 
 150
 
  ); 
 
assign sd_row_gen_1200[12:0] = sd_row_gen[12:0] + addr_select;// 
 
//outputs 
assign sd_row_rd_in[15:0] = {1'b0, 2'b00/*sd_b_add_gen*/, 
sd_row_gen_1200[12:0]};//sd_row_gen[12:0]}; 
assign sd_col_rd_in[8:0] = sd_col_gen[8:0];//9'b000110101; // 
 
//Scanout frame is read when row = 2400 or 3000 and the column = 510 
always @ (posedge clk) 
 if(reset) 
  scanout_frame_done <= 0; 
 else 
  scanout_frame_done <= (((sd_row_rd_in[12:0] == 
13'b0101110110111) || (sd_row_rd_in[12:0] == 13'b0100101011111)) 
      
       && 
(sd_col_rd_in == 9'b111111110));//if 510 and a wr request??? 
 
/**********************************************************************
************/ 
//INPUT FIFO - copy comment from laptop program 
//This is the command fifo and stores the row/column/bank and the data. 
//For reads the data is 0. 
rcbd_fifo INPUT_FIFO( 
  //inputs 
  .clk(clk), 
  .reset(reset), 
  .row_in(sd_row_rd_in[15:0]), 
  .col_in(sd_col_rd_in[8:0]), 
  .data_in(16'h0000), //all reads have data value of 
0 
  .wr_en(rd_addr_valid_in), 
  .rd_en(mem_rd_en), 
  //outputs 
  .full(rcbd_full), 
  .empty(rcbd_empty), 
  .row_out(sd_row_rd), 
  .col_out(sd_col_rd), 
  .data_out(sd_data_rd), 
  .sync_error(sync_error), 
  .row_data_count(row_count), 
  .col_data_count(col_count), 
  .dt_data_count(dt_count) 
  ); 
 
assign almost_full = (row_count[2:1] == 2'b11); 
/**********************************************************************
************/ 
//This fifo stores the read data. The width is 36 by default. 
//Here the higher 20 bits are 0. 511 elements can be stored.  
//pick out the 0's from here...use to grab the data enable 
wire [35:0] data_from_sdram_36, data_to_scanout_36; 
//assign dvi_pulse = (clk_count[1:0] == 2'b00); 
assign fifo_256_elements = (fifocount_out[3:0] == 4'b1000); //256 
elements exist to supply pixels for writing. 
 151
 
assign fifo_16_elements = (fifocount_out[3:2] == 
2'b00);//(fifocount_out == 4'b0001); // atleast 0-32 elements present 
assign data_from_sdram_36[35:0] = {20'h00000,data_from_sdram[15:0]}; 
assign data_to_scanout[15:0] = data_to_scanout_36[15:0]; 
//DOUBLE FIFO BUFFER INPUT FRAME - Shock Absorber 
fifoctlr_cc_v2 OUTPUT_FIFO( 
  //inputs  
  .clock_in(clk), 
  .read_enable_in(datarequest), 
  .write_enable_in(mem_rd_valid), 
  .write_data_in(data_from_sdram_36[35:0]), 
  .fifo_gsr_in(reset), // what should be gsr here? 
  .read_data_out(data_to_scanout_36[35:0]), 
  .full_out(full_out), 
  .empty_out(empty_out), 
  .fifocount_out(fifocount_out[3:0]), 
  .rd_err_out(rd_err_out), 
  .wr_err_out(wr_err_out) 
  ); 
 
//send out memory write request..flush out all remaining pixels in the 
input fifos 
//If number of fifo contents less than 32 then issue a memory read 
request for scanout 
//else if more than 256 data present then no need to issue for a memory 
read requests. 
//Here and in several places a latch has been inferred.  
always @ (posedge clk) 
 if(reset) 
  mem_read_request <= 1'b0; 
 else 
  if (fifo_16_elements) 
   mem_read_request <= 1'b1; 
  else if(fifo_256_elements) //254 now.. 
   mem_read_request <= 1'b0; 
 
always @ (posedge clk) 
 if(reset) 
  dly_mem_rd_en <= 1'b0; 
 else 
  dly_mem_rd_en <= mem_rd_en; 
 
 
endmodule 
 
Dvi Scan out module: 
 
`timescale 1ns / 1ps 
 
module Dvi_Scan_Out(clk, reset, rddata, 
      
  datarequest,//hactive,vactive,    
                      hsync, vsync, red, blue, green, 
      
  //debug signals 
      
  hactive, vactive, hcnt, vcnt 
 152
 
      
  ); 
 
  // parameters for horizontal 
  parameter N1       = 10;    // number of counter bits 
  parameter HCOUNT   = 800;   // total pixel count 
  parameter HS_START = 2;     // start of hsync pulse //WAS 8 NOW 2 
  parameter HS_LEN   = 96;    // length of hsync pulse 
  parameter HA_START = 127;   // start of active video 
  parameter HA_LEN   = 640;   // length of active video 
 
  // parameters for vertical 
  parameter N2 = 10;          // number of counter bits 
  parameter VCOUNT = 525;     // total line count 
  parameter VS_START = 2;     // start of vsync pulse 
  parameter VS_LEN   = 2;     // length of vsync pulse 
  parameter VA_START = 24;    // start of active video 
  parameter VA_LEN   = 480;   // length of active video 
 
  input clk, reset; 
 
  // from sdram controller   
  input [15:0] rddata; 
   
  // dvi signals 
  output hsync, vsync; 
  
  // to sdram controller 
  output datarequest; 
 
  //debug signals 
  output hactive, vactive; 
  output [9:0] hcnt; 
  output [9:0] vcnt; 
  output [2:0] red; 
  output [2:0] green; 
  output [2:0] blue; 
   
    
  //*** 
  // Sync pulse stuff ... 
  //*** 
  wire vce; 
  wire [N1-1:0] hcnt; 
  wire [N2-1:0] vcnt; 
 
  // horizontal 
  counter #(N1,HCOUNT)             H_CNT(clk, reset, hcnt); 
  pulse_gen #(N1,HS_START,HS_LEN)  H_SYNC(clk, reset, 1'b1, hcnt, 
hsync); 
  pulse_gen #(N1,HA_START,HA_LEN)  H_ACTIVE(clk, reset, 1'b1, hcnt, 
hactive); 
 
  // vertical 
  pulse_high_low                   V_CNT_CE(clk, reset, 1'b1,   hsync, 
vce); 
  counter_ce #(N2,VCOUNT)          V_CNT(clk, reset, vce, ,vcnt); 
 153
 
  pulse_gen #(N2,VS_START,VS_LEN)  V_SYNC(clk, reset, 1'b1, vcnt, 
vsync); 
  pulse_gen #(N2,VA_START,VA_LEN)  V_ACTIVE(clk, reset, 1'b1, vcnt, 
vactive); 
 
 
  //*** 
  // RGB stuff ... 
  //*** 
 
   
  //reg dly_hactive1, dly_vactive1; 
  reg dly_hactive, dly_vactive; 
  always @(posedge clk) 
    if (reset) 
      begin 
    //    dly_hactive1 <= 0; 
      //  dly_vactive1 <= 0; 
    dly_hactive <= 0; 
        dly_vactive <= 0; 
      end 
    else 
      begin 
       // dly_hactive1  <= hactive; 
        //dly_vactive1  <= vactive; 
    dly_hactive  <= hactive; 
        dly_vactive  <= vactive; 
      end 
 
 //reg vsync, hsync; 
 reg [2:0] red; 
   reg [2:0] green; 
   reg [2:0] blue; 
   
 always @(posedge clk) 
    if (reset) 
      begin 
        red      <= 0; 
    green   <= 0; 
    blue   <= 0; 
      end 
    else //if(dvi_pulse) 
      begin 
    // gate rgb with active signals 
    red    <= 
rddata[8:6] & {3{dly_hactive}} & {3{dly_vactive}}; 
    green   <= rddata[5:3] & 
{3{dly_hactive}} & {3{dly_vactive}}; 
    blue   <= rddata[2:0] & 
{3{dly_hactive}} & {3{dly_vactive}}; 
      end 
 
  
  reg datarequest; 
  always @(posedge clk) 
    if (reset)  datarequest <= 0; 
    else  
 154
 
   begin 
  datarequest <= hactive & vactive; 
  end 
  
 
endmodule 
 
Fifoctlr cc v2 module: 
 
 
/**********************************************************************
**\ 
*                                                                        
* 
*  Module      : fifoctlr_cc_v2.v               Last Update: 09/04/02    
* 
*                                                                        
* 
*  Description : FIFO controller top level.                              
* 
*                Implements a 511x36 FIFO with common read/write clocks. 
* 
*                                                                        
* 
*  The following Verilog code implements a 511x36 FIFO in a Virtex2      
* 
*  device.  The inputs are a Clock, a Read Enable, a Write Enable,       
* 
*  Write Data, and a FIFO_gsr signal as an initial reset.  The outputs   
* 
*  are Read Data, Full, Empty, and the FIFOcount outputs, which          
* 
*  indicate how full the FIFO is.                                        
* 
*                                                                        
* 
*  Designer    : Nick Camilleri                                          
* 
*                                                                        
* 
*  Company     : Xilinx, Inc.                                            
* 
*                                                                        
* 
*  Disclaimer  : THESE DESIGNS ARE PROVIDED "AS IS" WITH NO WARRANTY     
* 
*                WHATSOEVER AND XILINX SPECIFICALLY DISCLAIMS ANY        
* 
*                IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR      
* 
*                A PARTICULAR PURPOSE, OR AGAINST INFRINGEMENT.          
* 
*                THEY ARE ONLY INTENDED TO BE USED BY XILINX             
* 
*                CUSTOMERS, AND WITHIN XILINX DEVICES.                   
* 
 155
 
*                                                                        
* 
*                Copyright (c) 2002 Xilinx, Inc.                         
* 
*                All rights reserved                                     
* 
*                                                                        
* 
\**********************************************************************
**/ 
 
`timescale 1ns / 10ps 
 
`define DATA_WIDTH 35:0 
`define ADDR_WIDTH 8:0 
 
module fifoctlr_cc_v2 (clock_in, read_enable_in, write_enable_in,  
                       write_data_in, fifo_gsr_in, read_data_out,  
                       full_out, empty_out, fifocount_out, rd_err_out, 
      
   wr_err_out ); 
 
input clock_in, read_enable_in, write_enable_in, fifo_gsr_in; 
input  [`DATA_WIDTH] write_data_in; 
output [`DATA_WIDTH] read_data_out; 
output full_out, empty_out; 
output [3:0] fifocount_out; 
output rd_err_out, wr_err_out; 
 
wire read_enable = read_enable_in; 
wire write_enable = write_enable_in; 
wire fifo_gsr = fifo_gsr_in; 
wire [`DATA_WIDTH] write_data = write_data_in; 
wire [`DATA_WIDTH] read_data; 
assign read_data_out = read_data; 
reg full, empty; 
assign full_out = full; 
assign empty_out = empty; 
 
reg [`ADDR_WIDTH] read_addr, write_addr, fcounter; 
reg read_allow, write_allow; 
reg rd_err, wr_err; 
assign rd_err_out = rd_err; 
assign wr_err_out = wr_err; 
 
wire fcnt_allow; 
 
wire [`DATA_WIDTH] gnd_bus = 'h0; 
wire gnd = 0; 
wire pwr = 1; 
 
/**********************************************************************
\ 
*                                                                      
* 
* A global buffer is instantianted to avoid skew problems.             
*  
 156
 
*                                                                      
* 
\**********************************************************************
/ 
  
//BUFG gclk1 (.I(clock_in), .O(clock)); 
wire clock = clock_in; 
 
/**********************************************************************
\ 
*                                                                      
* 
* BLOCK RAM instantiation for FIFO.  Module is 512x36, of which one    
*  
* address location is sacrificed for the overall speed of the design.  
* 
*                                                                      
* 
\**********************************************************************
/ 
 
RAMB16_S36_S36 bram1 ( .ADDRA(read_addr), .ADDRB(write_addr),  
                       .DIA(gnd_bus[35:4]), .DIPA(gnd_bus[3:0]),  
                       .DIB(write_data[35:4]), .DIPB(write_data[3:0]),  
                       .WEA(gnd), .WEB(pwr), .CLKA(clock), .CLKB(clock),  
                       .SSRA(gnd), .SSRB(gnd), .ENA(read_allow),  
                       .ENB(write_allow), .DOA(read_data[35:4]), 
                       .DOPA(read_data[3:0]), .DOB(), .DOPB() ); 
 
/***********************************************************\ 
*                                                           * 
*  Set allow flags, which control the clock enables for     * 
*  read, write, and count operations.                       * 
*                                                           * 
\***********************************************************/ 
 
wire [3:0] fcntandout; 
wire ra_or_fcnt0, wa_or_fcnt0, emptyg, fullg; 
 
always @(posedge clock or posedge fifo_gsr) 
   if (fifo_gsr) read_allow <= 0; 
   else read_allow <= read_enable && ! emptyg; 
 
always @(posedge clock or posedge fifo_gsr) 
   if (fifo_gsr) write_allow <= 0; 
   else write_allow <= write_enable &&  ! fullg; 
 
assign fcnt_allow = write_allow ^ read_allow; 
 
/***********************************************************\ 
*                                                           * 
*  Empty flag is set on fifo_gsr (initial), or when on the  * 
*  next clock cycle, Write Enable is low, and either the    * 
*  FIFOcount is equal to 0, or it is equal to 1 and Read    * 
*  Enable is high (about to go Empty).                      * 
*                                                           * 
\***********************************************************/ 
 157
 
 
assign ra_or_fcnt0 = (read_allow || ! fcounter[0]); 
 
fifoand4b4 emptyand1 (.data(fcounter[4:1]), .out(fcntandout[0])); 
fifoand4b4 emptyand2 (.data(fcounter[8:5]), .out(fcntandout[1])); 
fifoand4b1 emptyand3 
(.in1(fcntandout[0]), .in2(fcntandout[1]), .in3(ra_or_fcnt0), 
                      .in4(write_allow), .out(emptyg)); 
  
always @(posedge clock or posedge fifo_gsr) 
   if (fifo_gsr) empty <= 1; 
   else empty <= emptyg; 
 
/***********************************************************\ 
*                                                           * 
*  Full flag is set on fifo_gsr (but it is cleared on the   * 
*  first valid clock edge after fifo_gsr is removed), or    * 
*  when on the next clock cycle, Read Enable is low, and    * 
*  either the FIFOcount is equal to 1FF (hex), or it is     * 
*  equal to 1FE and the Write Enable is high (about to go   * 
*  Full).                                                   * 
*                                                           * 
\***********************************************************/ 
 
assign wa_or_fcnt0 = (write_allow || fcounter[0]); 
 
fifoand4   fulland1 (.data(fcounter[4:1]), .out(fcntandout[2])); 
fifoand4   fulland2 (.data(fcounter[8:5]), .out(fcntandout[3])); 
fifoand4b1 fulland3 
(.in1(fcntandout[2]), .in2(fcntandout[3]), .in3(wa_or_fcnt0), 
                     .in4(read_allow), .out(fullg)); 
 
always @(posedge clock or posedge fifo_gsr) 
   if (fifo_gsr) full <= 1; 
   else full <= fullg; 
 
/***********************************************************\ 
*                                                           * 
*  Read error flag is set if an attempt is made to read     * 
*  from the fifo when it is empty.                          * 
*                                                           * 
\***********************************************************/ 
 
always @(posedge clock or posedge fifo_gsr) 
   if (fifo_gsr) rd_err <= 0; 
   else if (read_enable && empty) rd_err <= 1; 
 
/***********************************************************\ 
*                                                           * 
*  Write error flag is set if an attempt is made to write   * 
*  to the fifo when it is full.                             * 
*                                                           * 
\***********************************************************/ 
 
always @(posedge clock or posedge fifo_gsr) 
   if (fifo_gsr) wr_err <= 0; 
   else if (write_enable && full) wr_err <= 1; 
 158
 
 
/************************************************************\ 
*                                                            * 
*  Generation of Read and Write address pointers.  They now  * 
*  use binary counters because it is simpler in simulation,  * 
*  and the previous LFSR implementation wasn't in the        *    
*  critical path.                                            * 
*                                                            * 
\************************************************************/ 
 
always @(posedge clock or posedge fifo_gsr) 
   if (fifo_gsr) read_addr <= 'h0; 
   else if (read_allow) read_addr <= read_addr + 1; 
  
always @(posedge clock or posedge fifo_gsr) 
   if (fifo_gsr) write_addr <= 'h0; 
   else if (write_allow) write_addr <= write_addr + 1; 
  
/************************************************************\ 
*                                                            * 
*  Generation of FIFOcount outputs.  Used to determine how   * 
*  full FIFO is, based on a counter that keeps track of how  * 
*  many words are in the FIFO.  Also used to generate Full   * 
*  and Empty flags.  Only the upper four bits of the counter * 
*  are sent outside the FIFO module.                         * 
*                                                            * 
\************************************************************/ 
 
always @(posedge clock or posedge fifo_gsr) 
   if (fifo_gsr) fcounter <= 'h0; 
   else if (fcnt_allow) 
      fcounter <= fcounter + { read_allow, read_allow, read_allow,  
                               read_allow, read_allow, read_allow,  
                               read_allow, read_allow, pwr }; 
 
assign fifocount_out = fcounter[8:5]; 
 
endmodule 
 
/************************************************************\ 
*                                                            * 
*  The logic modules below are coded explicitly, to ensure   * 
*  that the logic is implemented in a minimum of levels.     *    
*                                                            * 
\************************************************************/ 
 
module fifoand4b1 (in1, in2, in3, in4, out); 
 
input in1, in2, in3, in4; 
output out; 
 
assign out = (in1 && in2 && in3 && ! in4); 
 
endmodule 
 
module fifoand4 (data, out); 
 
 159
 
input [3:0] data; 
output out; 
 
assign out = (data[3] && data[2] && data[1] && data[0]); 
 
endmodule 
 
module fifoand4b4 (data, out); 
 
input [3:0] data; 
output out; 
 
assign out = (! data[3] && ! data[2] && ! data[1] && ! data[0]); 
 
endmodule 
 
Counter module: 
 
`timescale 1ns/10ps 
 
module counter(clk, reset, q); 
  parameter N = 8;       // number of bits 
  parameter TCNT = 256;  // desired terminal count 
 
  input  clk, reset; 
  output [N-1:0] q; 
  reg    [N-1:0] q; 
 
  // check for one less than what you want ... 
  wire tc_tmp; 
  assign tc_tmp = (q==TCNT-2); 
 
  // ... then register (causes 1-cycle delay) 
  reg tc; 
  always @(posedge clk) 
    if (reset) tc <= 0; 
    else tc <= tc_tmp; 
 
  // counter 
  always @(posedge clk) 
    if (reset) q <= 0; 
    else 
      begin 
        if (tc) q <= 0; 
        else q <= q + 1; 
      end 
endmodule 
 
 
Counter ce module: 
 
`timescale 1ns / 1ps 
module counter_ce(clk, reset, enable, tc, q); 
  parameter N = 8;       // number of bits 
  parameter TCNT = 256;  // desired terminal count 
 
  input  clk, reset, enable; 
 160
 
  output tc; 
  output [N-1:0] q; 
 
  reg    tc; 
  reg    [N-1:0] q; 
 
  // check for one less than what you want ... 
  wire tc_tmp; 
  assign tc_tmp = (q==TCNT-2); 
 
  // ... then register (causes 1-cycle delay) 
  always @(posedge clk) 
    if (reset) tc <= 0; 
    else if (enable) tc <= tc_tmp; 
 
  // counter 
  always @(posedge clk) 
    if (reset) q <= 0; 
    else if (enable) 
      begin 
        if (tc) q <= 0; 
        else q <= q + 1; 
      end 
endmodule 
 
Pulse gen module: 
 
`timescale 1ns / 1ps 
module pulse_gen(clk,reset,enable,count,pulse); 
  parameter N = 8;      // number of counter bits 
  parameter START = 0;  // start count 
  parameter LENGTH = 1; // pulse length (number of cycles) 
 
  input clk, reset, enable; 
  input [N-1:0] count; 
  output pulse; 
  reg pulse; 
 
  always @(posedge clk) 
    if (reset) pulse <= 0; 
    else if (enable) 
      begin 
        if ((count>=START)&&(count<START+LENGTH)) pulse <= 1; 
        else pulse <= 0; 
      end 
endmodule 
 
Pulse high low module: 
 
`timescale 1ns / 1ps 
module pulse_high_low(clk,reset,enable,din,pulse); 
  input clk, reset, enable, din; 
  output pulse; 
  reg pulse; 
  reg din_reg; 
 
  // 1-cycle delay reg 
 161
 
  always @(posedge clk) 
    if (reset) din_reg <= 0; 
    else if (enable) din_reg <= din; 
 
  // check for old value high, current value low 
  always @(posedge clk) 
    if (reset) pulse <= 0; 
    else if (enable) 
      begin 
        if (!din && din_reg) pulse <= 1; 
        else pulse <= 0; 
      end 
endmodule 
 
Test fixture for post-synthesis implementation: 
 
 
`timescale 1ns / 1ps 
module Full_Frame_lut_TF_v_v_v; 
 
 // Inputs 
 reg clk_in; 
 reg reset; 
 
 // Outputs 
 wire dvi_clkout; 
 wire sd_clkout; 
 wire [2:0] red; 
 wire [2:0] blue; 
 wire [2:0] green; 
 wire hsync_n; 
 wire vsync_n; 
 wire sd_cke; 
 wire sd_cs_n; 
 wire [1:0] sd_dqm; 
 wire [2:0] command; 
 wire [12:0] addr; 
 wire [1:0] baddr; 
 wire image_frame_detect; 
 wire lut_frame_detect; 
 wire scanout_frame_done; 
 wire sdram_write_request; 
 wire sdram_read_request; 
 wire wr_addr_valid; 
 wire rd_addr_valid; 
 wire [15:0] data_to_sdram_wr; 
 wire [15:0] sd_row_wr; 
 wire [15:0] sd_row_rd; 
 wire [8:0] sd_col_wr; 
 wire [8:0] sd_col_rd; 
 wire mem_wr_issue_lut; 
 wire mem_wr_issue_ff; 
 wire mem_wr_issue_warp; 
 wire mem_rd_issue_warp; 
 wire [12:0] scan_addr_select; 
 wire [12:0] workbuffer_addr_select; 
 162
 
 wire scanout_enable; 
 wire lut_store; 
 wire image_store; 
 wire image_warp; 
 wire track_data; 
 wire [7:0] scanout_track_count; 
 wire [7:0] image_warp_track_count; 
 wire mem_wr_issue; 
 wire mem_rd_issue; 
 wire mem_rd_valid; 
 wire [15:0] data_from_sdram_rd; 
 wire cmd_full; 
 wire cmd_empty; 
 wire rd_cmd; 
 wire wr_cmd; 
 wire cmd_avail; 
 wire [15:0] out_row; 
 wire [8:0] out_col; 
 wire [15:0] out_row_misc; 
 wire [8:0] out_col_misc; 
 wire page_hit; 
 wire hit; 
 wire fifo_wr_en; 
 wire page_eq; 
 wire [15:0] dly2_row; 
 wire [8:0] dly2_col; 
 wire dly_cmd_avail; 
 wire [13:1] state; 
 wire fifo_empty_n; 
 wire init_done; 
 wire cmd_eq; 
 wire page_valid; 
 wire rd_en; 
 wire mem_rd_en_scanout; 
 wire sync_error; 
 wire [15:0] in_row_mux; 
 wire [8:0] in_col_mux; 
 wire [2:0] row_data_count; 
 wire [2:0] col_data_count; 
 wire [2:0] dt_data_count; 
 wire almost_full; 
 wire page_en; 
 wire sdram_write_request_lut; 
 wire [15:0] data_to_sdram_wr_lut; 
 wire [15:0] sd_row_wr_lut; 
 wire [8:0] sd_col_wr_lut; 
 wire wr_addr_valid_lut; 
 wire sdram_write_request_ff; 
 wire [15:0] data_to_sdram_wr_ff; 
 wire wr_addr_valid_ff; 
 wire [15:0] sd_row_wr_ff; 
 wire [8:0] sd_col_wr_ff; 
 wire sdram_write_request_warp; 
 wire sdram_read_request_warp; 
 wire [15:0] sd_row_wr_warp; 
 wire [15:0] sd_row_rd_warp; 
 wire [8:0] sd_col_wr_warp; 
 163
 
 wire [8:0] sd_col_rd_warp; 
 wire rd_addr_valid_warp; 
 wire wr_addr_valid_warp; 
 wire warp_done; 
 wire [4:0] warp_row_count; 
 wire [4:0] warp_col_count; 
 wire [8:0] warp_dt_count; 
 wire lut_rd; 
 wire img_rd; 
 wire img_wr; 
 wire dly_img_wr; 
 wire dt_empty; 
 wire dt_full; 
 wire rc_empty; 
 wire rc_full; 
 wire dt_going_empty; 
 wire rc_fifo_en; 
 wire data_en_fifo; 
 wire max_lut_read_requests; 
 wire dly_max_lut_read_requests; 
 wire requests_pending; 
 wire max_img_write_requests; 
 wire dly_max_img_write_requests; 
 wire reset_cntr; 
 wire [15:0] sd_row_mem; 
 wire [8:0] sd_col_mem; 
 wire [8:0] lut_requests; 
 wire [8:0] img_rd_requests; 
 wire [8:0] img_wr_requests; 
 wire [12:0] sd_row_lut; 
 wire [12:0] sd_row_img; 
 wire [8:0] sd_col_lut; 
 wire [8:0] sd_col_img; 
 wire rc_almost_full; 
 wire [15:0] sd_row_fifo; 
 wire [8:0] sd_col_fifo; 
 wire lut_count_change; 
 wire img_wr_count_change; 
 wire img_rd_count_change; 
 wire dt_rd_err; 
 wire dt_wr_err; 
 wire [15:0] row_out; 
 wire [8:0] col_out; 
 wire [10:1] warp_state; 
 wire [15:0] sd_row_rd_scanout; 
 wire [8:0] sd_col_rd_scanout; 
 wire rd_addr_valid_scanout; 
 wire sdram_read_request_scanout; 
 wire datarequest; 
 wire [3:0] fifocount_out; 
 wire [9:0] CountY; 
 wire [9:0] CountX; 
 wire empty_out; 
 wire full_out; 
 wire rd_err_out; 
 wire wr_err_out; 
 wire [2:0] scanout_row_count; 
 164
 
 wire [2:0] scanout_col_count; 
 wire [2:0] scanout_dt_count; 
 wire scanout_rcbd_full; 
 wire scanout_rcbd_empty; 
 wire scanout_sync_error; 
 wire scanout_almost_full; 
 
 // Bidirs 
 wire [15:0] data; 
 
 // Instantiate the Unit Under Test (UUT) 
 Memory_Controller_Top uut ( 
  .clk_in(clk_in),  
  .reset(reset),  
  .dvi_clkout(dvi_clkout),  
  .sd_clkout(sd_clkout),  
  .red(red),  
  .blue(blue),  
  .green(green),  
  .hsync_n(hsync_n),  
  .vsync_n(vsync_n),  
  .sd_cke(sd_cke),  
  .sd_cs_n(sd_cs_n),  
  .sd_dqm(sd_dqm),  
  .command(command),  
  .addr(addr),  
  .baddr(baddr),  
  .data(data),  
  .image_frame_detect(image_frame_detect),  
  .lut_frame_detect(lut_frame_detect),  
  .scanout_frame_done(scanout_frame_done),  
  .sdram_write_request(sdram_write_request),  
  .sdram_read_request(sdram_read_request),  
  .wr_addr_valid(wr_addr_valid),  
  .rd_addr_valid(rd_addr_valid),  
  .data_to_sdram_wr(data_to_sdram_wr),  
  .sd_row_wr(sd_row_wr),  
  .sd_row_rd(sd_row_rd),  
  .sd_col_wr(sd_col_wr),  
  .sd_col_rd(sd_col_rd),  
  .mem_wr_issue_lut(mem_wr_issue_lut),  
  .mem_wr_issue_ff(mem_wr_issue_ff),  
  .mem_wr_issue_warp(mem_wr_issue_warp),  
  .mem_rd_issue_warp(mem_rd_issue_warp),  
  .scan_addr_select(scan_addr_select),  
  .workbuffer_addr_select(workbuffer_addr_select),  
  .scanout_enable(scanout_enable),  
  .lut_store(lut_store),  
  .image_store(image_store),  
  .image_warp(image_warp),  
  .track_data(track_data),  
  .scanout_track_count(scanout_track_count),  
  .image_warp_track_count(image_warp_track_count),  
  .mem_wr_issue(mem_wr_issue),  
  .mem_rd_issue(mem_rd_issue),  
  .mem_rd_valid(mem_rd_valid),  
  .data_from_sdram_rd(data_from_sdram_rd),  
 165
 
  .cmd_full(cmd_full),  
  .cmd_empty(cmd_empty),  
  .rd_cmd(rd_cmd),  
  .wr_cmd(wr_cmd),  
  .cmd_avail(cmd_avail),  
  .out_row(out_row),  
  .out_col(out_col), 
  .out_row_misc(out_row_misc[15:0]), 
  .out_col_misc(out_col_misc[8:0]), 
  .page_hit(page_hit),  
  .hit(hit),  
  .fifo_wr_en(fifo_wr_en),  
  .page_eq(page_eq),  
  .dly2_row(dly2_row),  
  .dly2_col(dly2_col),  
  .dly_cmd_avail(dly_cmd_avail),  
  .state(state),  
  .fifo_empty_n(fifo_empty_n),  
  .init_done(init_done),  
  .cmd_eq(cmd_eq),  
  .page_valid(page_valid),  
  .rd_en(rd_en),  
  .mem_rd_en_scanout(mem_rd_en_scanout),  
  .sync_error(sync_error),  
  .in_row_mux(in_row_mux),  
  .in_col_mux(in_col_mux),  
  .row_data_count(row_data_count),  
  .col_data_count(col_data_count),  
  .dt_data_count(dt_data_count),  
  .almost_full(almost_full), 
  .page_en(page_en), 
  .sdram_write_request_lut(sdram_write_request_lut),  
  .data_to_sdram_wr_lut(data_to_sdram_wr_lut),  
  .sd_row_wr_lut(sd_row_wr_lut),  
  .sd_col_wr_lut(sd_col_wr_lut),  
  .wr_addr_valid_lut(wr_addr_valid_lut),  
  .sdram_write_request_ff(sdram_write_request_ff),  
  .data_to_sdram_wr_ff(data_to_sdram_wr_ff),  
  .wr_addr_valid_ff(wr_addr_valid_ff),  
  .sd_row_wr_ff(sd_row_wr_ff),  
  .sd_col_wr_ff(sd_col_wr_ff),  
 
 .sdram_write_request_warp(sdram_write_request_warp),  
  .sdram_read_request_warp(sdram_read_request_warp),  
  .sd_row_wr_warp(sd_row_wr_warp),  
  .sd_row_rd_warp(sd_row_rd_warp),  
  .sd_col_wr_warp(sd_col_wr_warp),  
  .sd_col_rd_warp(sd_col_rd_warp),  
  .rd_addr_valid_warp(rd_addr_valid_warp),  
  .wr_addr_valid_warp(wr_addr_valid_warp),  
  .warp_done(warp_done),  
  .warp_row_count(warp_row_count),  
  .warp_col_count(warp_col_count),  
  .warp_dt_count(warp_dt_count),  
  .lut_rd(lut_rd),  
  .img_rd(img_rd),  
  .img_wr(img_wr),  
 166
 
  .dly_img_wr(dly_img_wr),  
  .dt_empty(dt_empty),  
  .dt_full(dt_full),  
  .rc_empty(rc_empty),  
  .rc_full(rc_full),  
  .dt_going_empty(dt_going_empty),  
  .rc_fifo_en(rc_fifo_en),  
  .data_en_fifo(data_en_fifo),  
  .max_lut_read_requests(max_lut_read_requests),  
 
 .dly_max_lut_read_requests(dly_max_lut_read_requests),  
  .requests_pending(requests_pending),  
  .max_img_write_requests(max_img_write_requests),  
 
 .dly_max_img_write_requests(dly_max_img_write_requests),  
  .reset_cntr(reset_cntr),  
  .sd_row_mem(sd_row_mem),  
  .sd_col_mem(sd_col_mem),  
  .lut_requests(lut_requests),  
  .img_rd_requests(img_rd_requests),  
  .img_wr_requests(img_wr_requests),  
  .sd_row_lut(sd_row_lut),  
  .sd_row_img(sd_row_img),  
  .sd_col_lut(sd_col_lut),  
  .sd_col_img(sd_col_img),  
  .rc_almost_full(rc_almost_full),  
  .sd_row_fifo(sd_row_fifo),  
  .sd_col_fifo(sd_col_fifo),  
  .lut_count_change(lut_count_change),  
  .img_wr_count_change(img_wr_count_change),  
  .img_rd_count_change(img_rd_count_change),  
  .dt_rd_err(dt_rd_err),  
  .dt_wr_err(dt_wr_err),  
  .row_out(row_out),  
  .col_out(col_out),  
  .warp_state(warp_state), 
  .sd_row_rd_scanout(sd_row_rd_scanout),  
  .sd_col_rd_scanout(sd_col_rd_scanout),  
  .rd_addr_valid_scanout(rd_addr_valid_scanout),  
 
 .sdram_read_request_scanout(sdram_read_request_scanout),  
  .datarequest(datarequest),  
  .fifocount_out(fifocount_out),  
  .CountY(CountY),  
  .CountX(CountX),  
  .empty_out(empty_out),  
  .full_out(full_out),  
  .rd_err_out(rd_err_out),  
  .wr_err_out(wr_err_out),  
  .scanout_row_count(scanout_row_count),  
  .scanout_col_count(scanout_col_count),  
  .scanout_dt_count(scanout_dt_count),  
  .scanout_rcbd_full(scanout_rcbd_full),  
  .scanout_rcbd_empty(scanout_rcbd_empty),  
  .scanout_sync_error(scanout_sync_error), 
  .scanout_almost_full(scanout_almost_full) 
 ); 
 167
 
   
 initial begin 
  // Initialize Inputs 
  clk_in = 0; 
  reset = 0; 
 
  // Wait 100 ns for global reset to finish 
  #100; 
         
  // Add stimulus here 
   forever 
   begin 
   #10 clk_in <= 0; 
   #10 clk_in = 1; 
   end 
 
 end 
       
endmodule 
 
User Constraint File 
 
NET "clk_in" TNM_NET = "clk_in"; 
TIMESPEC "TS_clk_in" = PERIOD "clk_in" 20 ns HIGH 50 %; 
 
NET "clk" TNM_NET = "clk"; 
 
TIMEGRP "rising_ffs" = RISING "clk"; 
TIMEGRP "falling_ffs" = FALLING "clk"; 
TIMESPEC "TS_pos_to_neg" = FROM "rising_ffs" TO "falling_ffs" 20 ns ; 
NET "clk" TNM_NET = "clk"; 
 
#Constraints on Instances 
INST "CLK_GEN/DCM1" CLKIN_PERIOD = 20  ; 
 
#Pin locations 
#All clock parameters 
NET "clk_in" LOC = "P8"; 
NET  "reset"     LOC="E11"; 
#dvi output 
NET "hsync_n" LOC = "B7"; 
NET "vsync_n" LOC = "D8"; 
NET "dvi_clkout"  LOC = "N8"  ; 
NET "red<0>"  LOC = "C8"  ; 
NET "red<1>"  LOC = "D6"  ; 
NET "red<2>"  LOC = "B1"  ; 
NET "green<0>"  LOC = "A8"  ; 
NET "green<1>"  LOC = "A5"  ; 
NET "green<2>"  LOC = "C3"  ; 
NET "blue<0>"  LOC = "C9"  ; 
NET "blue<1>"  LOC = "E7"  ; 
NET "blue<2>"  LOC = "D5"  ; 
 
 
#SDRAM 
NET "addr<0>"  LOC = "B5" ; 
NET "addr<10>"  LOC = "B6" ; 
 168
 
NET "addr<11>"  LOC = "C5" ; 
NET "addr<12>"  LOC = "C6" ; 
NET "addr<1>"  LOC = "A4" ; 
NET "addr<2>"  LOC = "B4" ; 
NET "addr<3>"  LOC = "E6" ; 
NET "addr<4>"  LOC = "E3" ; 
NET "addr<5>"  LOC = "C1" ; 
NET "addr<6>"  LOC = "E4" ; 
NET "addr<7>"  LOC = "D3" ; 
NET "addr<8>"  LOC = "C2" ; 
NET "addr<9>"  LOC = "A3" ; 
NET "baddr<0>"  LOC = "A7" ; 
NET "baddr<1>"  LOC = "C7" ; 
NET "command<0>"  LOC = "B10" ; 
NET "command<1>"  LOC = "A10" ; 
NET "command<2>"  LOC = "A9" ; 
NET "sd_cke" LOC = "D7"; 
NET "sd_clkout"  LOC = "E10" ; 
NET "sd_cs_n" LOC = "B8" ; 
NET "sd_dqm<0>" LOC = "D9" ; 
NET "sd_dqm<1>" LOC = "C10" ; 
NET "data<0>"  LOC = "C15" ; 
NET "data<10>"  LOC = "C12" ; 
NET "data<11>"  LOC = "B14" ; 
NET "data<12>"  LOC = "D14" ; 
NET "data<13>"  LOC = "C16" ; 
NET "data<14>"  LOC = "F12" ; 
NET "data<15>"  LOC = "F13" ; 
NET "data<1>"  LOC = "D12" ; 
NET "data<2>"  LOC = "A14" ; 
NET "data<3>"  LOC = "B13" ; 
NET "data<4>"  LOC = "D11" ; 
NET "data<5>"  LOC = "A12" ; 
NET "data<6>"  LOC = "C11" ; 
NET "data<7>"  LOC = "D10" ; 
NET "data<8>"  LOC = "B11" ; 
NET "data<9>"  LOC = "B12" ; 
 
 169
 
REFERENCES 
[1] Xess XSA – 3S1000, prototyping board with Spartan 3 device, datasheet, 
(See at: http://www.xess.com/manuals/xsa-3S-manual-v1_0.pdf). 
[2] Xilinx Spartan 3 Low cost FPGA, complete datasheet, (See at: 
http://direct.xilinx.com/bvdocs/publications/ds099.pdf). 
[3] SDR-SDRAM 256Mb E-die datasheet, Samsung Electronics, (See at: 
http://www.samsung.com/Products/Semiconductor/Sync_AsyncDRAM/S
DRSDRAM/Module/RegisteredDIMM/M390S6450ETU/ds_sdr_rdimm_b
ased_on_256mb_e_die_rev16.pdf).  
[4] SDR-SDRAM device operations datasheet, Samsung Electronics, (See at: 
http://www.samsung.com/Products/Semiconductor/Sync_AsyncDRAM/do
wnload/sdr_device_operation_jul_06.pdf). 
[5] SDR-SDRAM timing diagrams, Samsung Electronics, (See at: 
http://www.samsung.com/Products/Semiconductor/Sync_AsyncDRAM/do
wnload/sdr_timing_diagram_feb_04.pdf). 
[6] Gordon Stoll, Matthew Eldridge, Dan Patterson, Art Webb, Steven 
Berman, Richard Levy, Chris Caywood, Milton Taveira, Stephen Hunt, 
and Pat Hanrahan, “Lightning 2 – A high performance display subsystem 
for PC clusters”, Proceedings of the 28th annual conference on Computer 
graphics and interactive techniques, p.141-148 , Los Angeles, California, 
SIGGRAPH 2001. 
[7] William Blank, Chandrajit Bajaj, Donald Fussel, and Xiaoyu Zhang, “The 
MetaBuffer: A Scalable Multiresolution Multidisplay 3-D Graphics 
System Using Commodity Rendering Engines”, Technical Report 
TR2000-16, Department of Computer Science, University of Texas at 
Austin, 2000. 
[8] Steven Molnar , John Eyles , John Poulton, “PixelFlow: high-speed 
rendering using image composition”, Proceedings of the 19th annual 
conference on Computer graphics and interactive techniques, p.231-240, 
Chicago, Illinois, July 1992. 
 170
 
[9] Ruigang Yang, Aditi Majumder, and Michael Brown, 
”Camera Based Calibration Techniques for Seamless Multi-Projector 
Displays”,  
IEEE Transactions on Visualization and Computer Graphics, Vol. 11, No. 
2, March-April, 2005. 
[10] Ruigang Yang, Shunnan Chen, Xinyu Huang, Sifang Li, Liang Wang, 
and Chris Jaynes, “Towards the Light Field Display”, VR 2005 workshop 
on Emerging Display Technologies, Bonn, Germany.  
[11] Xilinx Application note on Synchronous FIFO, XAPP256, (See at: 
http://direct.xilinx.com/bvdocs/appnotes/xapp258.pdf). 
[12] Xilinx Application note on using block ram for Virtex – 2 device, 
XAPP258, (See at: 
http://direct.xilinx.com/bvdocs/appnotes/xapp258.pdf).  
[13] VGA timing specifications, (See at: 
http://www.epanorama.net/documents/pc/vga_timing.html).  
[14] Alpha blending tutorial, Wikipedia, (See at: 
http://en.wikipedia.org/wiki/Alpha_transparency). 
[15] Jason Stewart. Memory Controller design, (See at: 
http://www.cs.unc.edu/~stewart/comp290-ghw/sdram.html). 
[16] Thomas Funkhouser, Kai Li, “Large-Format displays”, IEEE Computer 
Graphics and Applications, July/August 2000, p.20-21, (See at: 
www.cs.princeton.edu/~funk/cgaintro.pdf). 
[17] Introduction to Virtual Environments for beginners from the University 
of Michigan (See at: http://www-vrl.umich.edu/intro/index.html) 
[18] Wikipedia tutorial on Digital Compositing (See at: 
http://en.wikipedia.org/wiki/Digital_compositing). 
[19] Modelsim Datasheet (See at: 
 http://www.model.com/products/pdf/datasheets/se.pdf) . 
[20] Michael D. Ciletti, “Starter’s Guide to Verilog 2001”, Sept 2003, 
Published by Prentice Hall. 
 
 171
 
VITA 
Subhasri Krishnan was born on June 11th 1983 in Vellore, Tamil Nadu, India. She 
received her Bachelor of Engineering Degree with distinction in the Electronics and 
Communications Department in May 2004 from University of Madras. She received a 
scholarship during her sophomore year at SSN college of Engineering. She joined the 
University of Kentucky during the Fall 2004. 
 
 172
