



CERN/DRDC 93-22 RD12/Status Report 5 May 1993

# Status Report on the RD-12 Project

J. Altaber, S. Cittolin<sup>\*</sup>, M. Demoulin, A. Fucci, W. Jank, B. Lofstedt, H. Muller, J.-P. Porte, A. Racz, D. Samyn, B.G. Taylor CERN, Geneva, Switzerland S. Inkinen, E. Pietarinen Research Institute for HEP, SEFT, Helsinki, Finland K. Kaski, K. Melkas, J. Niittylahti, H. Raittinen Tampere University of Technology, Finland X. Ricadonna, J. Colas, J. Lecoq, M. Moynot, J.M. Thenard Laboratoire de Physique des Particules, LAPP, Annecy S. Centro, F. Dal Corso, R. Martinelli, D. Pascoli, A.J. da Ponte Sancho Dipartimento di Fisica dell' Universita' e Sezione INFN di Padova, Italy G. Goggi Dipartimento di Fisica Nucleare e Teorica dell' Universita' e Sezione INFN di Pavia, Italy Ph. Busson, G. Fouque, D. Lecouturier, P. Matricon Ecole Polytechnique, Paris, France

\* Spokesman

## 1. Introduction

During the period since RD-12 started in May 1991 we have developed and installed the necessary hardware and software tools for several personal computer-based laboratory test benches. All the modules foreseen in the initial project proposal have been prototyped and tested, and test benches have been created and exploited for the evaluation of front-end electronics timing and control distribution, ADCs, digital pipelines, high-speed adders and multipliers, and new VLSI developments carried out in collaboration with the CERN microelectronics group and external institutions.

The work completed in 1992-93 and currently in progress in RD-12 includes the development of basic tools and the implementation of test benches as follows:

#### Tools developments

- Timing and Control components evaluation
- Programmable-phase clock generator production
- Fast Dual-Port Memory and VME board production
- VLSI, CMOS and GaAs developments (Padova)

- Evaluation of real time simulation tools
- Digital signal processing algorithms and methods

# Completed test benches

- Timing and Control distribution (RD-27)
- Memory Address Generator MAG VLSI test (Padova)
- HDTV Digital pipeline and VLSI study (LAPP)
- Readout network simulation (RD-24, Nebulas RD-31)
- ADC test bench and DSP algorithms (RD-16)

Test benches in preparation

- Boolean Neural Nets EPLD implementation
- ITT-DAVIS evaluation board (RD-11)
- Fiber Channel OLC link
- MicroDAQ-MEC3 and Frontend Bus (ECP-MIC)
- Event builder test (RD-24, Nebulas RD-31)
- 1.5 Gb/s transceivers (Rome INFN)
- L-Neuro 2.0 processor (Ecole Polytechnique, Philips)

During the coming year it is proposed to continue this work in collaboration with other projects and institutions, completing the current tests and pursuing new milestones as follows:

- Timing and control single-crate subsystem prototype (RD-27)
- Timing and control receiver ASIC specification
- Timing and control receiver ASIC design study (Padova)
- MAG evolution for sparse readout controller (Padova)
- CMOS and GaAs VLSI developments (Padova)
- LabVIEW 3 multiplatform (Mac, Sun and PC) portable libraries
- System simulation
  - -VHDL and graphics compilers evaluation
  - Experiment readout system model in VHDL
- Digital signal processing
  - •VHDL model, EPLD implementations (RD-16)
  - ITT and Philips processors
- Simulation and design of 16-bit digital filters with CNET (LAPP)
- Nonlinear ADC with auto-calibration test bench (LAPP)
- Boolean Neural Net pattern recognition EPLD and Anna chip
- ITT evaluation board for digital filter and data formatter applications
- OCL fiber channel link
- Philips L.Neuro 2.0 processor specification (Ecole Polytechnique)
- MEC3 MicroDAQ system test bench (CERN ECP/MIC)
- Frontend bus studies and multiMEC3 implementation
- Event builder evaluation based on SCI and ATM (RD-24 and RD-31)

(Milestones for 1993-94 are indicated by •)

### 2. Timing and Control

The timing and control test bench which was set up in the first phase of the project has been used to pursue the development of a system for the simultaneous distribution of timing reference, level-1 trigger decision, calibration and control signals from a few sources to be located near the central trigger processors to large numbers of front-end electronics destinations on the LHC detectors. A small series of programmable-phase clock generators has also been made for use in other test benches.

Biconic-taper tree couplers have been evaluated which would allow the transmission of the signals over an entirely passive all-glass optical distribution network. Their price has already decreased by one half since the start of the project, reinforcing our optimism that the components required for this approach will become affordable in quantity during the LHC timescale.

A primary VCXO/PLL has been developed which generates 160.32 MHz or 267.2 MHz from the LHC clock with very low phase jitter and offset drift. It employs a linear phase detector and high-gain active loop filter to achieve an rms phase tracking error of 5.6 ps for an input frequency deviation of  $\pm$ 75 ppm. Different encoding schemes have been evaluated to explore the tradeoff between timing precision and data rate for efficiencies between 25% and 80%. The 267.2 MBaud encoder can transmit PRBS data with total (systematic plus random) output jitter of less than 9 ps rms.

The current prototype implements a scheme by which two data channels are time-division multiplexed and encoded biphase mark. One channel transmits the broadcast trigger-accept (one bit per bunch-crossing) while the other is used to send dataless broadcast commands such as bunch-counter reset and also addressed formatted data to individual front-end destinations. An important use of the second channel is the transmission of the deskew parameters and virtual pipeline address offsets by which the system will be synchronized.

A variety of laser diode sources has been studied, covering a range of optical power output (up to 0.2 W), modulation characteristics, sensitivity to optical feedback noise, efficiency, wavelength (830 nm and 1315 nm) and price. Modulators have been made which can supply over 1 W of broadband RF. The higher power diodes operate in multiple longitudinal modes and exhibit mode-hopping, chirping and turn-on delay under modulation, but sources (e.g. Mitsubishi ML7781A) have been identified which can transmit the signals with adequate precision to groups of over 1000 destinations each. This evaluation will continue as new laser devices are introduced.

The length of optical fibre in each distribution path is not expected to exceed 100m, and measurements have been made of the dispersion in graded index multimode fibres over this distance at different wavelengths. The range is from 11 ps rms in 50/125  $\mu$ m fibre at 1322 nm to 31 ps in 100/140  $\mu$ m fibre at 845 nm. At present, 50  $\mu$ m multimode fibre is being considered for the

distribution backbone outside the detectors and pure silica core monomode fibre for feeders entering regions of higher radiation level.

For fibre to be deployed to large numbers of front-end electronics units, it will be essential that inexpensive optoelectronic receivers can be used. Devices based on silicon photodiodes and pinfets, InGaAs diodes and a germanium avalanche photodiode have been evaluated. The bandgap of InGaAsP varies with the composition of the material and some devices specified for 1300 nm have been found to outperform Si at 830 nm.

Following the change in the planned LHC bunch-crossing interval from 15 ns to 25 ns, attention moved from components optimised for fibre channel FC-0 use at 265.625 MBaud to those manufactured for Sonet OC-3 (CCITT SDH STM-1) use at 155.52 MBaud. The telecommunications application is expected to generate the larger volume, and pre-production samples of new low-cost plastic pinfets are under test. They require a postamplifier for which the GaAs devices being developed by Padova may prove appropriate.

The initial test bench incorporated components and instruments of varied origin and history using 905 and 906 SMA, SC, ST and FC/PC optical connectors with plastic, ceramic and stainless steel ferrules. As a result of our experience with these types and in accord with our emerging preference for Sonet telecommunications parts, we have decided to standardise on ST/PC connectors which exist in both MMF and SMF versions and are reliable, convenient and multisourced. High-performance versions are available for laser interfacing to reduce modal noise, while lower-cost versions can be used for coupling to the receivers where alignment tolerances are less stringent.

A number of techniques are available for the recovery of the timing reference from the encoded signal. A transition-tracking clock synchronizer implemented with gate delays was developed for use in the test bench prototype as it has stable absolute phase shift and a sharp threshold useful for testing. Clock extractors based on SAW filters or other resonators, which are common in Sonet applications, may be more appropriate for production receivers. These devices would have to be custom-made for our special frequency and discussions with manufacturers are in progress. With the twochannel encoding described, there is a fundamental ambiguity in the phase of the recovered clock. A technique has been developed which resolves the ambiguity by monitoring formatting constraints imposed on the data stream.

Once all the prototype components of the distribution system had been completed, measurements could be made of the overall performance from the LHC clock input to the received timing reference output while transmitting random data. For the two-channel system transmitting PRBS data at 267.2 MBaud, with transmission at 1315 nm over 100m of multimode fibre and through two levels of 1:32 tree coupler (1024 destinations), an overall rms jitter performance of 35 to 70 ps has been obtained, according to the encoding system and optoelectronic receivers used.

This result is compatible with the estimated requirement of a few hundred ps rms provisionally specified by detector electronics developers. The rms collision length for proton bunches in the LHC is expected to be 0.053 m, corresponding to an initial spread of about 180 ps relative to the LHC clock.

The present timing and control test bench occupies four Eurochassis, plus external laser diodes, coolers, amplifiers and controllers. In the following phase of this work, it is proposed to build a single-crate subsystem prototype implementing the most promising techniques so far developed with the test bench. Parallel developments will continue, in particular with new laser sources (RD-27), isolators, encoders, modulators and cheap optoelectronic receivers, and the modular nature of the prototype crate will allow relatively smooth upgrading as these technologies evolve.

Additional logic prototyping of the timing receiver will be carried out and the specifications for a general-purpose timing receiver ASIC defined in conjunction with Padova (RD-12). Collaboration with the LPNHE (Univ. Paris VI & VII) members of RD-16 will be pursued to allow the development of a compatible service module ASIC for FERMI.

In conjunction with other studies in RD-27, the interface with the central trigger processor will be defined and the integration of the system in the overall first-level trigger architecture studied. Initial timing system simulation trials based on Excel have proved promising and this activity will be pursued using the more specialised simulation tools Extend and Foresight.

#### 3. FDPM and VME boards for test support

The Fast Dual Port Memory (FDPM) design is a basic component of the test benches operating at high speed. The family of boards comprises:

- the complete FDPM, intended as a data pattern generator or as a data recorder
- the VME-FE, which implements only the VME protocol and contains a user area to accommodate chips to be tested
- the FDUB, which contains the VME front end as well as the high speed protocol.

Several boards have been produced and used by both in-house designers and other projects or labs. ADC's, interfaces and optical links have already been tested using the FDPM. New users are registered to follow the training and the use of the boards.

The FDPM has found a new field of application for a neuron net VLSI chip. Philips is developing a digital neuron net which will have 128 data lines as input. We are exploring the possibility to expand the 32-bit FDPM to a 64 or 128 bit parallel path. Since the MAG is now available we have implemented a proto FDPM using that chip and a PCB board may be made at a later stage.

Our focus in the current months will be on the front end bus with the involvement of the ITT chip and the close study of possible protocol. We plan also to participate in the collaboration for the Philips chip since it could be used in trigger areas were more parallel processing is required.

#### 3.1 Front-end bus study

In the front-end bus (FE-BUS) study we intend to extend the family of VME boards for test benches. The overall aim is to understand the parameters involved in a design for data collection from detectors, eventually simulate the behaviour, build the relevant prototypes, design a suitable protocol and test the correctness of the assumptions in a mini data acquisition environment. At the end the possible VLSI implementation of the critical circuits will be defined. As data source we have decided to follow the MEC3 implementation. Since the chip is not yet available, we have simulated the data protocol part using our favourite EPLD (ALTERA) and we have implemented it using a VME-FE.

As data collector/distributor we are currently evaluating two parallel paths: one involving the use of the MAG (chip developed by INFN/Padova for the FDPM), the other using the DAVIS chip developed by ITT. A few samples of the MAG chip have been produced and it has been tested at CERN using the FDPM. The MAG can control RAM memory and execute multi-DMA functions. The DAVIS chip by ITT contains a programmable processor, a part of an efficient data link mechanism, allowing real-time protocol handling and memory control. We are studying the possibility of using such an implementation as a protocol engine for the front-end bus. Since the chip is still not available we have implemented its simulation in ALTERA in order to test the test bench chain of hardware interface and software drivers.

For the front-end protocol we are following a large spectrum of industrial and commercial data protocols: from the SCI subsystem CMOS implementation to the specialized chips intended for use in cars or Hi-Fi/TV data and control communication. We are seeking the right performance/price mix.

#### 3.2 EPLDs study and expertise

Intended as a cheap VLSI implementation of prototypes, EPLDs are mentioned here because they represent an important investment in money and manpower. The new generation reaches the respectable size of 10,000 equivalent gates and speeds of the order of 90 MHz, making them attractive for large combinatory projects. Simulation is available and reprogramming is possible tens of times. We are studying the combination of Boolean neural net algorithms for network generation followed by EPLD implementation to control the complete chain of production of specialised circuits for cluster detectors. We are following the study for a possible trigger device for the muon detector for CMS.

## 3.3 Optical link for data transmission

The development of an optical link for data transmission was initiated last December. The project aims to implement a full-duplex data link between two VME crates located up to 2 km from each other using IBM's fiber-channel compatible OLC 266 evaluation kits. The transfer speed is 26.56 MBytes/sec and the pre-encoded data (8B/10B encoding scheme) are fed to the optical transmitter from an FDPM.

A very simple protocol based on a data-producer/data-consumer scheme is running to control the data flow: when the DTR bit is set, the data producer sends a DTR word; if the RTS bit is set, the data consumer answers with an RTS word; then the data producer emits a data stream starting with a START word and ending with a STOP word. This protocol is supported by a Xilinx chip (4K family) controlling also the laser. Figure 3 shows the principle of the link.



Figure 3 Link block diagram

The Xilinx chip has been implemented and simulated and the remaining hardware is under development. The next step is to evaluate the TriQuint chipset in order to have a compact data link using only one VME slot. This solution would have the same performance but would include the 8B/10B encoding and the error correction (CRC and parity).

### 3.4 MEC3 and readout simulations

In order to evaluate the performance of the event-synchronized readout of the MEC3 design, a simulation model has been developed using Foresight, a time domain simulator running under UNIX which allows a graphical description of the model. Hence, before receiving the hardware, we will be able to see the behaviour of such a readout in different conditions of triggering. We shall also evaluate a time domain simulator running on Macintosh supporting a graphical description of the model.

## 4. VLSI development (Padova)

Padova INFN is involved in the VLSI design of both digital and analog electronics. A Memory Address Generator (MAG) chip has been developed and prototypes have been successfully tested. This component will be used to control the access to dual-port memory systems which it is planned to employ in the frontend readout. Other developments include the implementation of a fast GaAs preamplifier usable in the inner detector frontend and for optical receivers, and a 100 MHz CMOS comparator has been simulated as part of the design of a low power 8-bit fast ADC.

#### 4.1 MAG - A fast Memory Address Generator

A full descriptive document can be obtained from "Dipartimento di Fisica e INFN, Università di Padova": reference DFPD 93/EI/30.

MAG is a CMOS 1.0  $\mu$ m technology chip capable of addressing 1 to 4 memory banks as programmed in the internal instruction register file. The 20bit word addresses up to one mega positions. The output address bus is threestate allowing multiple memory control. The VME-like I/O protocol allows chip programming and control through an 8-bit data bus. The 512 bytes internal addressing space is multiplexed on an 8-bit address bus.



Figure 4.1 MAG block diagram

## General features are:

- 20-bit three-state output memory address bus
- 4 strobe output lines for memory bank selection
- two external maskable interrupts
- 16 instruction words 72 bits length each
- 16 status words 42 bits length each
- 24-bit control register
- 8-bit bidirectional data bus
- one data strobe
- 8-bit input address bus
- two address strobes (for address demultiplexing)

- hardware reset
- chip select line
- TTL compatible inputs
- 3 operating modes: programming, running and test
- software clock program debugging
- circular buffer instruction
- software START, STOP, BREAK, RESTART and RESET commands
- HALT and WAIT instructions

The 16,000-stage (equivalent 8000 transistors) project was implemented by ES2 on 25 sqmm of silicon in an 84-pad package.

## 4.2 100 MHz pipeline CMOS comparator

Padova is also developing a switched-capacitor pipeline CMOS comparator implemented in 1.2  $\mu$ m technology, operating from a single supply voltage (5V). The main performance features are:

- Sampling rate of 100 MHz (typical)
- Input resolution of 2 mV over an input range of 2V
- Low power dissipation (about 6 mW)
- Expected die area of 300 microns<sup>2</sup>.

The architecture of the comparator, which is shown in Figure 4.2, comprises a pipeline cascade of an input sampling network, a regenerative auto-calibrated sense amplifier and a latch.



Figure 4.2 Comparator architecture

Figure 4.3 Comparator performance

The operations in the pipeline are controlled by a single external clock with 50% duty cycle inverted and delayed by a logical network. It generates a reset and a regeneration phase and a strobe signal for the latching. The input sampling network samples the two analog input signals to a reference bias voltage in the reset phase. During the regeneration phase it generates a differential output proportional to the input voltage difference and deletes the sampled common mode input voltage.

The sense amplifier consists of a cascode differential pair with a positive feedback loop. During the reset phase the input and the feedback transistors are in the steady state storing the bias voltage and DC offset in four capacitors; in the regeneration time the differential input, supplied by the input sampling network, triggers the amplification stage.

Figure 4.3 shows the output of the sense amplifier and the latch signals with 2 mV differential input. At the end of the regeneration time, the sense amplifier provides a 600 mV differential output and the latch can reach the digital logical levels.

#### 4.3 GaAs amplifier

A GaAs amplifier prototype has been realized in order to investigate the possibility of using this technology in front-end applications. Its high radiation tolerance and gain-bandwidth product make it a good choice in inner detector applications.

The greatest disadvantages of GaAs relative to Si are its higher flicker noise and its lower intrinsic gain. The amplifier prototype has been designed specifically to measure its flicker noise coefficients and to identify the parameters that affect its performance.

The prototype amplifier features are:

- programmable power with a minimum  $\approx 1 \text{mW}$
- radiation tolerance for  $\approx 100$  MRad and  $10^{14}$  n/cm<sup>2</sup>
- unity gain stable
- gain-bandwidth product > 1.5 GHz
- maximum gain  $\approx 5 \text{mV}/f_{\text{C}}$

We expect the best noise performances in the range of shaper output peak times to  $\approx 100$  ns where the flicker noise should be lower than the serial one. At these peak times its noise should be lower than that of a CMOS or JFET amplifier because with the same power consumption the transconductance of a GaAs MESFET is higher.

The first measurements on the prototype confirm that:

- the low intrinsic gain is not a problem in front-end amplifier applications of GaAs
- -the flicker noise is dominant for peak times  $> 0.5 \mu s$
- -it is possible to achieve very fast (in the region of 10ns) output shapes with a low power < 10 mW

## 5. HDTV digital front-end for calorimeter read-out (LAPP)

A classical read-out chain for a calorimeter (e.g. for liquid argon calorimeters) comprises a preamplifier, an analog shaper to reduce the bandwidth and hence the electronics noise, and an ADC to digitize the signal peak value. In an LHC environment, the optimal shaper parameters (like peaking time) depend on the accelerator characteristics: luminosity, background conditions... Digital filters with coefficients which could be programmed to follow the machine condition, could thus be well adapted. This is discussed more fully in the next section. Such filters (FIR filter) have been developed for HDTV, and a first test of these circuits has been reported in the previous status report. The thrust of the work done so far at LAPP is to build from available circuits the basic blocks of a digital read-out chain for calorimeters, to gain expertise and to evaluate the benefits and drawbacks of such a digital solution.

As discussed below, efforts have been concentrated on the design, building and testing of a 16-bit digital filter (MULAC16) and of a 60 MHz digitizer (ADC16) with all the necessary look-up table and calibration logic to be able, using the analog compressor board under construction, to input 16-bit dynamic range analog signals and output 16 bit words to the MULAC16 board for processing.

## 5.1 The ADC board

The ADC16 board (used together with the analog range compressor board), is a 60 MHz digitizer accepting analog input signals with a 16-bit dynamic range. The compressed analog signal is digitized with a 10-bit ADC (AD 9020); the ADC output addresses a look-up table previously loaded with the inverse of the analog compressor transfer function; the corresponding 16-bit data are then loaded into a 1K deep memory for local autonomous testing purposes, or sent onto the fast output bus to the digital shaper board (MULAC16). A "slow" calibration circuit with a 16-bit DAC is used to calibrate the system and load the look-up table.



Figure 5.1 Block diagram of the ADC16 board showing the 10-bit ADC, the 16-bit calibration DAC, the look-up table, the test memory and the fast bus output port

As shown in Figure 5.1, a VME/VXI mother board holds all the logic circuitry. A separate daughter board supports the fast 10-bit ADC and the 16-bit calibration DAC.

The ADC input voltage ranges from -2V to +2V, i.e. the LSB value is 4mV.

The acquisition can run on the 60 MHz internal clock asynchronous with the analog input signals or can use an external clock synchronized to the signals as will be the case in an LHC experiment.

The main technical goal was to investigate the noise problems associated with fast digital circuits switching near tiny analog inputs.

The board has been designed, built and is now under detailed investigation in standalone mode and connected to the MULAC16 board. The analog compressor board is not ready yet and this part of the connection has not yet been tested.

#### 5.1.1 Noise measurements

An estimation of the pick-up noise from the logic can be deduced from Figure 5.2. The R.M.S. of the distribution of the ADC outputs is shown versus input voltage (0 to 65000 corresponds to -2 to + 2v). Most of the time, a sigma of 0.5 LSB is observed. However, at well-defined bit pattern transitions, the output distribution is much broader. Sometimes, the less significant bits are lost during the write memory cycle. This effect, not fully understood at this time, is under investigation. Nevertheless results are already quite encouraging: with a careful board design, noise from fast logic can be kept to an acceptable level.



Figure 5.2 R.M.S. of the ADC output versus the DC input voltage from -2 to +2v

#### 5.1.2 Linearity measurements

For this measurement, the look-up table is loaded with a linear transfer function with two parameters: offset and gain. The gain is normalized such that the output directly reads millivolts. Figure 5.3 shows for 1000 input values (0 to 65000 corresponds to -2 to + 2v) the difference between the average reading and the input value. Quite good linearity is observed over three quarters of the

range, then the output value is slightly smaller than expected (note the expanded scale). This has been traced to a bad setting of the top reference voltage of the ADC which unfortunately could not be adjusted on the present prototype. The AD 9020 has four reference voltages. Only the top one was badly set. This explains why only the top part of the linearity curve exhibits a problem. This effect could have been absorbed in the look-up table but is shown here for completeness.



Figure 5.3 Differences in millivolts between ADC output and expected value assuming perfect linearity versus the DC input voltage from -2 to +2v

# 5.2 16-bit digital filters

In the previous status report [CERN/DRDC/92-10], the successful operation of a 60 MHz digital filter using available circuits (MULAC8) from CNET was reported. The data path on these circuits is only 8 bits wide. This is clearly not enough for application to calorimeter signals where a 16-bit dynamic range is needed. A two-pronged approach has been followed to investigate the possibilities:

- In collaboration with CNET in Grenoble, M. Aqachmar (an electronics student) is designing the microelectronics and software tools to produce programmable digital filters with user-selectable characteristics (number of bits for the data, number of taps...). A new architecture for the multipliers has already been implemented and simulated. It offers advantages in terms of speed and ease of cascading to any number of bits. This work should converge by mid-1994.
- To start learning with a complete read-out chain, a 16-bit digital filter board, using two MULAC8 and a fast 24-bit wide adder has been designed and built (Figure 5.4). This board can work standalone using its input and output memories or in pipeline mode taking its data from the fast bus input port, i.e. data coming from the ADC board. The board can run at the full 60 MHz clock rate.



Figure 5.4 Block diagram of the MULAC16 board showing the two MULAC8 chips and the 24-bit adder



Figure 5.5 Content of the output memory of the MULAC16 board when a 1 MHz sine wave signal is input to the ADC16 board. The MULAC8 are configured to be transparent

In Figure 5.5, a 1 MHz sine wave signal digitized by the ADC board was transmitted onto the fast bus flat cables to the MULAC16 board. The filter coefficients were set such that the output equalled the input and a plot of the output memory content is shown. No problems can be detected.

A quick test of deconvolution properties is presented in Figure 5.6. As above, Figure 5.6a shows the contents of the output memory when a burst of three nearly exponential signals is sent into the ADC-MULAC16 system with the filter programmed to be transparent. The clock is not synchronized with the pulses as can be seen from the apparent varying rise-time. Figure 5.6b shows the MULAC16 output memory content when the same three pulses go through the system but this time two of the filter coefficients are set as indicated in the figure.





#### 5.3 The dynamic range compressor

Calorimeter signals have a large 16-bit dynamic range but the resolution at the high end is limited to typically 0.5%. As discussed for instance in [Digital Front-end Electronics at LHC, Aachen vol III p 190], it is possible to map the input signal range (16 bit, 0.5%) to the input range of an ADC (10 bit, 1 LSB precision), without losing in precision. This requires the use of a non-linear amplifier with a "local" gain varying from high to low according to the signal height.

A practical circuit has been developed by the FERMI collaboration and is under test. Contacts have been made with FERMI and an agreement was found so that we will be able to use this chip to build a board which will condition the analog signal before it is sent to the ADC for digitization. This board should be ready by the autumn of 1993. A complete calorimeter read-out channel using digital techniques could then be evaluated.

## 6. Digital signal processing algorithms and tools

The development of digital signal processing algorithms was concentrated on new nonlinear and linear-nonlinear hybrid operations. The applications were baseline estimation, timing extraction of exponential pulses and very high precision feature-extraction filters. The research work was performed on simulation programs like the Matlab 4.0 and a new filter design tool SAFIR for SUN workstations. As a future plan three hardware platforms will be used to test the different algorithms in real-time conditions:

- 1. The first and second filter of the FERMI chip. These filters have programmable parameters to enable the testing of the feature extraction and time-tagging operations.
- 2. The DataWave video signal processor and a prototype board.
- 3. A board featuring a programmable gate array chip (Xilinx). The filters will be described in the VHDL language and from this description a downloadable code for the gate array device can be generated.

## 6.1 Base line estimation using sparse median operations

The time domain fluctuations of a measured detector signal are normally removed by subtracting a signal average from the original sequence. If the arithmetic mean operator is used in the base line estimation the signal pulses have a strong effect on the operator output. The median operator where the samples inside the analysis window are sorted and the middle value is selected as output performs better: if the analysis window is large enough the pulses do not affect the estimation result. The required analysis window covers so many samples that the sorting task to obtain the sequence median is computationally very demanding. To reduce the number of samples a sparse structure can be used. Instead of taking every consecutive sample the *sparse median* operator takes only every d <sup>th</sup> sample from the input sequence. This operator is slightly more sensitive to high frequency noise components present in the measured signal than the standard median, due to the reduced correlation of operator output samples. If the signal to noise ratio is low, better results are obtained using *sparse recursive median* operator, given by

$$y(i) = median[y(i-n),...,y(i-1), x(i), x(i+1),..., x(i+n\delta)]$$
 (6.1)

The structure will be extended by introducing a trainable filter structure and an antibias filter (an adaptive linear FIR filter which forms one input of the sparse median operator) to further improve the suppression of the effect of signal pulses on the estimation result.

## 6.2 Trainable FIR / order statistic hybrid filters

Pulse height measurement with very high precision is required for event triggering in calorimeters. For most cases the sample timing jitter is the dominant source of error. The task is to measure the pulse height from a small number of samples (typically four or five non-zero values for bipolar shaping) when the sampling positions have a Gaussian distribution around the expected position. A digital filter can be trained for this application by using a training set containing sequences with different jitter conditions, the number of sequences with a specific jitter condition being Gaussian-distributed. The performance is greatly improved if a trained FIR filter is replaced by a hybrid structure of linear and nonlinear filters: this filter class which uses a bank of linear FIR filters as sub-filters and an order statistic operator to select one of the FIR outputs for each sampling position is called trainable FIR / order statistic (FIR-OS) hybrid filter. A FIR-OS filter with three FIR sub filters is illustrated in Figure 6.1. During the training process the FIR tap coefficients and the order statistic operator) are adapted so that the total error function is minimized. An example of FIR-OS filter training for a 1-D signal is illustrated in Figures 6.3...6.6.



Figure 6.1 General FIR-OS filter structure.

An example of pulse height measurement is shown in Figure 6.2. A FIR filter and two FIR-OS filters were trained using a training set with Gaussiandistributed (s=3 ns) jitter. The FIR-OS filters with three and five subfilters with seven taps each perform very well, especially for the central area where the jitter is close to zero. These two filters are considered as reasonable compromises between computational performance and hardware complexity.



This FIR-OS filter structure is not the only possible combination of linear and nonlinear filters: two other solutions have been reported in the literature, and the performance of these algorithms will be compared to find the best operator for precision pulse height measurement. As part of this comparison different training methods will be tested.

#### 6.3 SAFIR tool for filter design

A novel software has been developed for designing trainable FIR and FIR-OS filters. The software package of programs in C-language is called SAFIR. It provides tools for adapting the filter parameters to a given example signal. This design approach has been chosen for two reasons: first, it enables an easy design entry to the most powerful design algorithms and filter structures for a physicist without deep knowledge in signal processing, and second, it is the only possible way to design filters that are truly optimal for the given signal. These two features with the careful implementation makes this software package the "state of the art" in this area of research.

The design concept is the following. The user prepares a file of training examples. These examples are input-output pairs that describe the desired filtering results in the time domain. These examples are fed to the chosen filter, that has initially random parameters. Next the parameters are selected in such a way that the error sum between actual filter outputs and the desired filter outputs is minimized. The error norm can be selected to be the Mean Absolute Error (MAE) norm or the Mean Squared Error (MSE) norm.

The minimization of the error function can be done with stochastic or analytic algorithms. The stochastic algorithms are the Simulated Annealing method, the Monte Carlo method, and a Fine Tuning method. These methods are especially suitable for solving complex nonlinear multivariable optimization problems. The analytic methods are the Least Mean Squared (LMS) and the Least Mean Absolute (LMA) algorithms with the conjugate gradient search. The LMA algorithm with soft approximation is the first algorithm to find the optimal FIR coefficients under the MAE criterion. The FIR-OS design algorithm is currently implemented for MATLAB, and it is to be integrated to this software package. In the future it is also planned to include a VHDL code generator as part of the filter design package. This generator will produce code which takes the limitations of gate array programming into account, and thus enables the translation of the filter VHDL code directly to gate array chip code.

All the design programs are stand-alone C-language programs with command-line parameter feed. This enables high portability. The stochastic design is supported with a graphic Xview user interface (safir.c), and a special graphic curve illustrator (xcap.c) is provided for illustrating the results. The software and the algorithms are to be reported in publications. The source code and the user's manual are available through anonymous ftp.

### 6.4 ITT Intermetall DataWave processor

Several algorithms were tested with the DataWave video signal processor. In addition to the nonlinear and hybrid filter algorithms, a line-fitting program



٠.

was developed. The program takes four hit coordinate values as input and calculates the line parameters: the position and the slope relative to the coordinate system. As an extension a more complicated algorithm also computes the mean squared error (MSE) value for the fitted line. The performances of these two algorithms were 62.5 and 25 million lines fitted per second, respectively. As a continuation the programs will be tested using real hardware when the prototype chips and a suitable test board are available.

#### 7. Neural Net investigation

Binary (boolean) neural nets track reconstruction started in Tampere has been further studied at CERN (track elements in a 7x7 hit matrix). The original study was using a very time-consuming algorithm for neural net optimization (learning), even in the simplistic case used as an example, and the learning set was necessarily restricted to a few hundred tracks, insufficient statistics for accurate learning.

We have investigated the use of a different, much faster, learning algorithm for "Adaptive Logic Networks", ATREE, developed by W.W. Armstrong and his collaborators at the University of Alberta, Canada. We were then able to use learning sets of several thousands cases.

A success rate of 99.9% on tracks satisfying the trigger criteria was achieved, while the percentage of background noise misidentified as acceptable tracks was 0.5 %, using about 1000 gates organized in 9 levels.

Due to the very simple characteristics of the "tracks" that we have considered, it was easy to determine exactly the proper decision tree by hand, which of course gives perfect results, and compare the ideal neural net with the result of the optimization procedure. By doing so we have learned a lot about the peculiarities of the optimization algorithm. The output of ATREE is translated in VHDL with a view to hardware implementation.

#### 7.1 Hardware for Boolean Neural Network

A board has been designed and built which allows binary trees to be implemented in hardware, making use of a Xilinx Programmable Gate Array.

The board contains 256 input/output registers writable and readable from the VME bus and a daughter board housing a programmable device (Xilinx or Altera). This board could also be used to implement different types of design especially algorithms for digital filtering and on-line processing.

The VHDL description of the BNN is translated by the PGA vendor software to the programmable device format for implementation and validated by the PGA simulation software.

The network described in the previous paragraph has been successfully implemented into a 4010PG191-5 Xilinx chip and is under test. The occupation of the chip is about 50% and the propagation time is in the range of 45ns worst case. A pipelined implementation of the BNN could reduce this delay by a factor of 4-5. The design of a daughter board for Altera is under development.

### 8. Application of L-Neuro 2.0 Philips processor (Ecole Polytechnique)

The Ecole Polytechnique laboratory is engaged, together with the "Laboratoire d'électronique Philips" (LEP), in the joint development of a dedicated digital neuro-chip (pipelined 16-bit arithmetic).

Outside the NN domain, the computational capabilities of this chip are interesting for data processing in a read-out system. We plan to design and build a board to test the chip on our test bench.

### 9. LabVIEW

•;

LabVIEW is an icon-based programming system for building software modules called virtual instruments (VIs) which has been developed since 1986 by National Instruments. It is a general-purpose programming tool for data acquisition, data analysis and instrument control based on data flow graphical, object-oriented programming technology.

RD-12 adopted LabVIEW as the main software system environment for the test benches running on Macintosh. Following the development of appropriate VI's in RD-12, LabVIEW is becoming widely used in the laboratory and by external institutions as the standard tool for electronics tests, small data acquisition, signal processing and equipment control applications.

The sets of virtual instruments developed by RD-12 includes :

- LabVEE. VME and CAMAC general data acquisition VIs for multiple industry interfaces (MacVEE, CES-M7212 and NI-MXI)
- LabHIL. General purpose histogramming package
- LabEPIO. Tape I/O VI with standard EPIO format
- LabADC. Comprehensive ADC evaluation library according to the IEEE ADC test specifications
- LabFDPM. Fast Dual Port Memory data acquisition VIs

The development of this software has been carried out in collaboration with National Instruments, for whom RD-12 is a beta tester of new LabVIEW software. In particular we are now working on the version 3 of LabVIEW which allows programs to be ported to several platforms (Mac, Sun and PC). The current work consists of rewriting and updating the existing libraries in order to permit portability of the code to Mac, Sun and PC with different VME and CAMAC interfaces.

#### 10. System simulation

In our study of readout architecture structures for future LHC experiments, two major problems have to be resolved:

- the transition from specification of an architecture to its implementation in hardware
- the comparison of different architectures.

To resolve these problems, there is a need for a system development tool which has the following characteristics:

- the tool has to support a methodology that ensures specification compliance across all levels of design abstraction,

- the tool has to provide the generation of an executable specification of the system requirements. An executable specification contains important characteristics: it is graphical, the specification can be animated and it is able to be simulated. A graphical (with animation) representation of system requirements has powerful advantages over the traditional written specification. Its executable feature provides a formal method for removing ambiguities and contradictions from the specification.

- the graphical representation has to support the concept of structural replication: recurrent structures or components in a design have to be elegantly handled.

In order to make statistical studies of different architectures, powerful methods for analyzing the resulting models are necessary.

We are seeking a tool environment that supports efficient high-level design of information technology systems combining the three main implementation domains: commercially available ICs (standard parts), application specific ICs (ASICs) and software (firmware, ...).

The environment would support top-down design space exploration across the boundaries of all three implementation technologies, including tradeoffs between hardware and software.

## 10.1 ObjecTime (Bell Northern Research)

ObjecTime is an object-oriented Case tool targeted for event-driven systems. The tool is based on the Real-Time Object-Oriented Modelling (ROOM) methodology. It provides an integrated development environment: the modelling concepts apply to all stages of the development eliminating in this way discontinuities.

Systems are described at two levels: the structure model and the behaviour model.

- at the structure level, the system is described as a set of concurrent active objects (Actors) with their interfaces (Ports, messages)

- at the behaviour level, the behaviour of the actors is described by hierarchical state diagrams.

The tool is object-oriented: the system is expressed as networks of cooperating components. It supports the standard OO aspects: encapsulation, inheritance, genericity, polymorphism, ... Its also supports graphical replication, a practical way of dealing systems with large replication factors.

ObjectTime does not allow an easy transition from specification to hardware: the C++ code produced by ObjecTime can not be handled by hardware implementors.

Furthermore comparison of different architectures is difficult in ObjecTime, due to the inefficient executable specification generated by the tool.

### 10.2 VHDL

An easy transition from specification to hardware implementation is provided only by tools working in a VHDL environment. VHDL (VHSIC Hardware Description Language) is a language for describing digital electronic systems. VHDL covers several needs in the design process:

- the language allows a description of the structure of the design (the system is described in terms of components and their interconnections)
- the language allows description of the behaviour of the design (the behaviour of each component is described in a familiar programming language form)
- it allows a design to be simulated before its implementations: different architectures can be compared and tested without the need for hardware prototyping
- a VHDL environment gives access to models of standard parts
- software modelling in VHDL is straightforward.

In VHDL, large systems can be described and verified with a high level of abstraction. High level behaviour simulation enables the concepts to be verified before implementing them.

VHDL is now becoming the standard system specification language, providing a coherent communication channel between designers and implementors. Commercial models for standard parts are available.

Most of the VHDL environments available at CERN (Cadence, ViewLogic, ...) are register-transfer model oriented: they allow ports and gates to be modelled but they are not easy to use to model the high-level behaviour of a system.

A tool where the VHDL high-level simulation is an integral part of the top-down methodology has been evaluated:

## 10.3 Express VHDL tool (i-Logix's)

Express VHDL is an interactive environment for system analysis. It allows the creation of a graphical model of the system function and behaviour which can be validated via model execution.

Systems are described using three graphical languages: module-charts, activity-charts and state-charts:

- the module-chart represents the structural view of the system, a hierarchical system block diagram that includes data flow between system components.
- the purpose of the activity-chart is to define system processes or functions. This allows the user to define data and control flows between the resident functions.
- the state-chart view defines the behaviour of these processes or functions. It represents the modes, or states, that a system may be in. State-charts are an extension of state transition diagrams, with the added notions of hierarchy and concurrency.

Once state-charts are created to describe the system control, the state-chart model may be validated using model execution. During model development,

the system designer continually verifies that the state-chart model reflects the true system requirements by performing a series of interactive simulations. Once model validation is achieved, test drivers may be created to simulate the application of test data. Test drivers consists of additional state-charts or simulation control programs. The final result is an executable specification: a description of system requirements that can be simulated for system validation.

After model validation, VHDL may be generated in either a behaviour or register-transfer level (RTL) form. The generation of behaviour VHDL may provide the framework for further system development via the enhancement of these behaviour models. The generation of RTL VHDL enables the transition to detailed design via logic synthesis. The technology independence of the model allows the exploration of various target technologies at the synthesis phase.

Express VHDL offers an integrated top down solution to the generation of FPGAs and ASICs from specifications. Express VHDL suffers from the fact that its graphical representation does not support replication, which makes the graphical design of large systems impossible.

No tools are available for analyzing the generated models.

The tool does not provide a VHDL simulator.

#### 10.4 Voyager (IKOS)

Voyager is a VHDL development system including a VHDL source analyzer, a debugger, a simulator and a monitor. The complete system is MOTIF based and is of high quality. It supports all VHDL design levels (from high-level behaviour to register-transfer model).

Particular attention has to be given to the efficiency of the simulator. The overhead in event-driven simulators for managing the event queue and dispatching the models (dynamic scheduling is well-known).

In conclusion, in order to provide an easy transition from specification to hardware and to be able to compare several readout-architectures, VHDL is the only choice possible.

Our intention is to provide a full VHDL description of possible readout architectures for future LHC experiments. We have close collaborations with R&D projects which are studying event-builder techniques (ATM switch eventbuilder, SCI event-builder) in order to get VHDL models of the proposed architectures. VHDL models of front-end and event-builder models will provide a uniform environment to study and to compare different solutions and to prepare final hardware implementations.

Up to now, a tool supporting all our requirements in a VHDL environment has not yet been found. Furthermore, tools which have been evaluated are expensive (around 50 KCHF for a single licence).

#### 11. Budget for 1993-1994

In 1993-94 we plan to pursue the activities presented in the introduction.

For this programme we require CERN funding of 50 KCHF for the continuation of the test benches, 50 KCHF for the implementation of the prototype timing and control system and 50 KCHF for the acquisition of licences for VHDL simulation.

#### **RD-12** internal notes and conference contributions

S. Inkinen, Digital Signal Processing Methods for Event Triggering in Particle Detectors. Master's Thesis Tampere University of Technology, 16.10.1991.

M. Kuusisto, Fast Dual-Port Memory for the Large Hadron Collider's Readout System Test Benches. Master's Thesis Tampere University of Technology, 13.11.1991.

B.G. Taylor, RD12 Timing and Control R&D summary. RD12-TN December 1991.

B.G. Taylor, PPCG User Notes. Programmable Phase Clock Generator. RD12 internal note.

B.G. Taylor, Multichannel optical fibre distribution system for LHC detector timing and control signals, Proc IEEE Nuclear Science Symposium, Orlando, Florida, 25-31 October 1992. S. Centro, R. Martinelli, D. Pascoli, F. Dal Corso and A.J. da Ponte Sancho, MAG Memory Address Generator. DFPD 93/EI/30.

A. Fucci, G. Heguesi, J. Kolbinger, M. Lomo, H. Masuch, C. Pirotte, Fast Dual Port Memory Module, CERN/ECP/93-xx

S. Cittolin, LabRSTB LabVIEW libraries for VME, CAMAC and histograms. RD12-TN February 1993.

M. Demoulin, LabVIEW Fast Dual Port Memory library. RD12-TN February 1992.

M. Hansen, O.Ledorz, ADC Test and Evaluation System. RD12-TN March 1992.

JP. Gros, W. Jank, Tape EPIO driver for SUN and LabVIEW. RD12-TN February 1992.

R. Bonino, JP. Gros, L. Pollet, Analog memory test bench. RD12-TN February 1992.

JP. Gros, L. Pollet, MEC2-VME system test board. RD12-TN February 1992.

J.-P. Nurro. Applications of fiber optic links (in Finnish for engineer's degree) Espoon-Vantaa technical school, 1991.

N. Seppnen, Fiber optic links, (in Finnish for engineer's degree) Espoon-Vantaa technical school, 1991.

E. Pietarinen, VMEbus cross interface processor module with high speed fiber optic links: VMExi, HU-SEFT-1991-14.

J. Tanskanen, P. Kotilainen, K. Melkas, J. Niittylahti, K. Kaski, *Track Finding with Boolean Nets*, Proceedings of the Second international Workshop on Software Engineering, Artificial Intelligence and Expert Systems for High Energy and Nuclear Physics, at l'Agelonde, January 13-18 1992, World Scientific, 1992.

K. Melkas. J. Niittylahti, K. Kaski, Neural Nets for Signal and Data Processing in Particle Physics Experiments, Report 4-92, Microelectronics Laboratory, Tampere University of Technology, 1992.

S. J. Inkinen, Nonlinear Digital Signal Processing Using the DataWave Processor. In New Computing Techniques in Physics Research II, edited by D. Perret-Gallix, World Scientific Publishing Co., 1992.

S. J. Inkinen and Y. Neuvo, Base Line Normalization of High Energy Physics Detector Signals Using Sparse Median Operations. In Proc. IEEE Winter Workshop on Nonlinear Digital Signal Processing, January 17-20, 1993, Tampere, Finland, pp. 4.1:3.1-4.1:3.6.

S. J. Inkinen and Y. Neuvo, Base Line Normalization of High Energy Physics Detector Signals Using Median-Based Operations. Manuscript under preparation.

S. J. Inkinen, J. Niittylahti and P. Jarske, A Hybrid Nonlinear Deconvolution Operator for Pulse Position Extraction in Noisy Conditions. Submitted to European Conference on Circuit Theory and Design ECCTD-93, August 30 - September 3, 1993, Davos, Switzerland.

S. J. Inkinen and J. Niittylahti, *Trainable FIR - Order Statistic Hybrid Filters*. Submitted to IEEE Transactions on Circuits and Systems - II: Analog and Digital Signal Processing, April 1993.

S. J. Inkinen, High Energy Physics Detector Signal Processing Using a Video Signal Processor. Manuscript under preparation.

J. Niittylahti, H. Raittinen, J. Tanskanen, and K. Kaski, General Purpose Simulated Annealing on Hardware. In Proc. IEEE Winter Workshop on Nonlinear Digital Signal Processing, January 17-20, 1993, Tampere, Finland.

J. Niittylahti, H. Raittinen, and K. Kaski, Dynamically Configurable Combinatory Logic Array as Boolean Neural Network. Submitted to The 5th International Conference on Tools With Artificial Intelligence, Boston, Massachusetts. November 8-11, 1993.

J. Niittylahti and S. J. Inkinen, Software Tool for Stochastic Design of Linear-Nonlinear Hybrid Filters. To be submitted to The Third International Workshop on Software Engineering, Artificial Intelligence and Expert Systems for High Energy and Nuclear Physics, Oberammergau, Germany, October 4-8, 1993.

J. Niittylahti and S. J. Inkinen, FIR Filter Design Under the Mean Absolute Error Criterion Using Sigmoidal Derivative Approximation. To be submitted to IEEE transactions on Circuits and Systems.

J. Niittylahti and S. J. Inkinen, FIR and Hybrid Filter Design Tools. A User's Manual, RD12 technical report.

J. Niittylahti, H. Raittinen, and K. Kaski, Software Tool for Designing Boolean Neural Networks. To be submitted to IEEE transactions on Neural Networks.

J. Niittylahti, H. Raittinen, and K. Kaski, *Simulated Annealing Hardware Tool*. To be submitted to IEEE transactions on Circuits and Systems.