FPIX is a pixel architecture designed for colliding-beam experiments at the Tevatron. Its most important application to date is the BTeV experiment. PreFPIX2 is a chip designed to test the FPIX Core, i.e. the pixel control and r architecture. This FPIX Core will be mated to a Periphqy specific to a particular experiment.
I. INTRODUCTION
The FPIX architecture has been under development at Fermilab for the last three years. At present, the driving force behind this work is the BTeV experiment [l] though significant effort has been made in the design process to broaden the applicability of this architecture to fit any Tevatron experiment. In BTeV, the pixel detector will be employed for on-line track finding for the lowest level trigger system [2] , and therefore, the pixel chips are required to read out all hit data from every crossing [3] . Given the expected distance from the beam (6" for the nearest chips) and the expected luminosity (2x16' cm" s-'), this means that the chip closest to the beam line will be expected to read out an average of 1.25 pixels per BCO crossing with statistical fluctuations often much higher. PreFPIX2 is a developmental step in the evolution of the final FPIX architecture [4] . After the completion of FPIXl, it became apparent that certain functions such as pixel control and readout were independent of the requirements of any ' This work is sponsored by the U.S. Department of Energy under contract No. given experiment but that other functions, like data packing and communication with data acquisition hardware, changed for every experiment. Those functions independent of experimental requirements were organized into the FPIX Core.
Experimentally dependent requirements were organized into the Periphery. PreFPIX2 represents the completion of the FPIX Core architecture. The FPIX Periphery specific to the BTeV experiment is under development.
TMe'chip was developed in a 0.25pm process, using radiation tolerant design techniques such as enclosed translstors [5] . Recent tests show that this will enable FPIX to achieve the desired radiation tolerance (25-30 Mrad) without resorting to a rad-hard process [6] . PreFPIX2 has been developed to test a number of algorithmic and electrical modifications to the original read-out control system cleveloped in FPIXl [4] . Most notably, the programmable ability to operate in either an externally triggered or a selftriggered mode has been eliminated. The FPIX Core itself is now purely self-triggered.
In Section 11, the FPIX Core architecture will be described, Section 111 will cover major developments to the FPIX , architecture. Finally, Section IV will discuss the detailed simulation technique employed in preFPIX2B development and the results obtained thus far.
FPIX ARCHITECTURE

A. Core versus Periphery
The basic job of the FPIX Core is to convert the various pixel hits into a predictable data streain (see Figure 1 ). Whenever coreTalking is active, the Core will output a new data word (coreData) with every rising edge of the Readout clock. Moreover, coreData will be stable by the falling edge of the Readout clock . When coreTalking is inactive, coreData is zero.
CoreData itself is a 24-bit wide word that consists of three bits for hit magnitude from a 3-bit FADC located in each Pixel Cell, five bits for column location, eight bits for row location, and eight bits for time stamp. The Periphery communicates with the Core by providing the clocks and two control signals, RejectHits and SendData. When RejectHits is active, the Core is instructed to stop accepting new hits. Pixels already hit can still be read out. When SendData is active, the Periphery is telling the Core that it is ready to accept data. When SendData is inactive, the Core will send no more data, but it can continue to accept hits.
Using the structure in Figure 2 , it is possible to imagine a wide range of Periphery cells customized to an experimentk needs.
B. Core Organization
The FPIX Core is a column-based architecture that uses an indirect addressing scheme to associate pixel hits with a time stamp. It is best understood as consisting of three mutually dependent functional blocks as shown in Figure 3 . These three blocks are the Core Logic, the End-of-column Logic and the Pixel Cell.
The Core Logic understands time. At the rising edge of the beam clock, it stamps every time slice with an eight-bit beam crossing (BCO) The End-of-column Logic blocks are considerably more complicated. They also understand time in that, whenever there is a hit, they store the BCO numbers broadcast by the Core Logic. Obviously, they also understand hits, which are driven to them from the Pixel Cells via the HFastOR signal, a distributed OR-gate with a pull-down transistor in every Pixel Cell in the column. The End-of-column Logic Blocks also understand the existence of the Pixel Cells because they communicate to those pixels through a series of commands and tokens. Finally, the End-of-column Logic Blocks understand output. When the Core is in the Talking state and when a particular End-of-column Logic Block has the Horizontal Token and when that End-of-column Logic Block has hit pixels to output, then it outputs those pixels.
The Pixel Cells themselves know nothing of time. They only understand hits and commands from their End-ofcolumn Logic block. These commands are Idle (do nothing), Listen (listen for new hits), Reset (reset your contents), and Output (output your contents). There are four sets of such commands coming from the End-of-column Logic block. If a pixel is Empty and it receives a hit, it associates itself with whichever command set is issuing the Listen command. From that point and until the Pixel Cell is reset, it obeys only the commands from the associated command set. The Pixel Cells also communicate back to the End-of-column Logic bock via the fast ORs. The HFastOR communicates hits in response to a Listen command and the FGastOR communicates the presence of Pixel Cells as yet unread in response to an Output command. Rights to the column output bus are arbitrated by a Column Token issued by the End-ofcolumn Logic block.
The original FPIX architecture included the ability to switch between an externally triggered or self-triggered mode. In the externally triggered mode, an external source provided the chip with a BCO number which was compared to the BCO numbers latched in the End-of-column Logic Blocks. In the self-triggered mode, a second BCO counter broadcast requested BCO numbers. If there was a match with any stored BCO number in any End-of-column Logic block, then the counter would be stopped and all hit pixels associated with that BCO number would be read out. This constant need to compare requested BCO numbers to stored BCO numbers reduced the efficiency of the readout scheme. In preFPIX2, there are no such BCO comparisons. Instead, if any End-of-column Logic block has any data to output, it immediately alerts the Core Logic, which then switches to the Talking state. Unlike the original output scheme, this method does not guarantee that hit pixels would be output in 9-104 time stamp order. However, the new output scheme dramatically improves readout efficiency.
DEVELOPMENTS IN PREFPIX2
A. End-of-column Logic
The original FPIX architecture utilized four Command State Machines, one for each command set. The state machines were simple, with only two states, Empty and Full. However, since the state machines made their transitions at the rising edge of the BCO clock, great pains were necessary to ensure that information synchronous to the Readout Clock, such as the completion of an output, arrived to these state machines with enough setup and hold time. Moreover, the original architecture required a priority encoder state machine to determine which command set would be the next to issue the Listen Command. This required a substantial amount of room, and created some timing problems of its own.
In the new FPIX Core, these problems are solved as shown in Figure 4 Since it is possible for more than one Command State e to be in the Full state at the same time, a second priority encoder is required. This Output Priority Encoder is necessarily more complicated than the Hit Priority Encoder. For example, if the DAQ system can read a chip faster than hits are input to it, then low priority Command State Machines may never enter the Listen state, and this would have no effect on our efficiency. High priority Command State Machines would do all the work. However, if a state machine enters the Full state, then it must reach the Output state as quickly as possible or that data will be lost. In other words, somehow all machines in the Full state must have equal priority while, at the same time, something must distinguish them so that a choice can be made. Finally, to minimize the transistor count and to maintain the isolation of the Readout and BCO clocks, the Output Priority Encoder must also be purely combinatorial. The solution is to rely on the states of the Command State Machines and to use a circular scheme as shown in The Column State Machine starts in the "Nothing to Say" or Nothing State where it remains until it sees an Output (Read) command issued by any of the Command State Machines. At the next rising edge of the Read Clock, the Column State Machine makes the transition to the 'Something to Say" or Something State. At this point, the Core logic is alerted to the fact that there is data to output, Simultaneously, the stored BCO number (which associates the hit pixels with the time they were hit) is driven onto the bus. The last pixel is being read out when the RFastOR goes away. This signals the completion of the read cycle. At the next rising edge of the Read Clock, the Column State Machine makes the transition to the Silent state. This sets the Output Done signal informing the Command State Machine that it can make its own transition from the Output to the Empty State. The Output Done signal is reset when the Command State Machine reaches the Empty state. When the entire array has been read out, the Horizontal Token drops out of the last End-of-column Logic Block, and the Core Logic makes its transition from Talking to Silent. This signals to all the Column State Machines that they can make their own transition back to the Nothing State.
9-105
Output Full
HA
B. Token Passing Logic
Experience with earlier versions of FPIX has revealed that there are two limiting factors in readout speed related to the Column Token.
First, speed is limited by how fast the token can be passed from one hit pixel to another. Once a hit pixel has been ordered to Output, it waits for the Column Token, grabs it and then drives its data onto the bus at the rising edge of the Read Clock. Simultaneously, it releases the token to the next hit pixel in the column. Under worst case conditions, that token must travel almost the entire length of the column before it reaches the next hit pixel and it must do this before the next rising edge of the Read Clock. The amount of time it takes for the token to pass through an empty pixel is called the skip frequency. Therefore, the maximum readout speed is Second, speed is limited by how long it takes an entire column to restore itself after a read has been completed. This determines how rapidly successive read cycles can be made.
The token passing architecture shown in Figure 6 has been optimized in preFPIX2 to permit skipping frequencies between 7 and 8GHz, yielding readout clock frequencies in excess of 40 MHz. It is also resettable, allowing for maximum speed in successive read cycles.
IV. MONTE CARLO-VERILOG SIMULATION
A unique and very comprehensive method of design and simulation was used on the FPIX Core. First, individual digital subcircuits including nand gates, nor gates, inverters, CMOS transmission gates, SR-flip-flops and D-flip-flops were simulated using SPICE to determine their best, worst and typical propagation delays. Next, all critical drivers such as Command Drivers and Address Drivers were similarly evaluated under their expected loads. All of these delays were transferred into the Verilog hardware description language. Then the readout architecture and control system were constructed in a bottom-up fashion from those basic digital components. No behavioral modeling was permitted in the FPIX Core, and great attention was given to ensuring that gates were not excessively loaded with capacitance. The net result was a Verilog model of the FPIX Core accurate to the gate level and, in many places, the transistor level. It modeled the entire data path from the output of each of the 2880 analog front-ends to the pads of the chip.
This procedure yielded a number of benefits. First, through software, purely structural Verilog code can be converted into schematics.
Therefore, layout-versusschematic (LVS) comparisons became, in effect, layoutversus-Verilog comparisons. Second, this design procedure ultimately produced a Verilog model of the Core that was 9-106 extremely accurate with respect to timing. SPICE simulations were performed regularly at higher and Pigher levels of the hierarchy to ensure this.
Next, the analog front end of each pixel was mod$ed, behaviorally. The model describes both the time walk an& ' the dead time of the analog front end. Moreover, the model includes the way time walk and dead time change as a function of charge magnitude. The information necessary for this model was determined experimentally from prototype versions of the front end.
Monte Carlo analysis of 5000 beam crossover periods in the interaction chamber was done using MCFAST, maki'ng geometric cuts around the region that would be occupied by the hottest chip. This produced a list of hit pixels with their associated charge magnitudes in each of the 5000 time slices. Different analyses were made assuming the expected luminosity of the beam, half that luminosity and twice that luminosity. The results of these analyses were converted into Verilog and used as input to the FPIX Core model.
Finally, a rudimentary DAQ system was modeled to capture the output of the FPIX Core. This output was reconstructed into hit pixels, their hit magnitude and time stamp. The input and output lists were compared and additional lists were made of matches, missing members of the input list, and extra members of the output list (see Table 1 ). v. CONCLUSIONS ,The FPIX Core architecture has been completed. Substantial improvements were made to its architecture, resulting in readout efficiencies greater than 99.6%. Rigid adherence to bottom-up design techniques, with great attention paid at the start of the project to propagation delays in low-level digital cells, resulted in a Verilog model of the architecture that was accurate at the gate level to the final design. Therefore, the Verilog model could be compared to the final layout using standard CAD software. The modelk timing was also very accurate even at the highest levels. Monte Carlo simulations of the interaction region performed by the physicists on the BTeV project were used as inputs to the model of the FPIX Core. This enabled the designers to exhaustively test the design. It also permitted the chip designers to present to the system designers a description of expected data stream coming from the chip.
VI. ACKNOWLEDGMENTS
