The purpose of this paper is to present recent developments in the off-detector electronics of a PET (Positron Emission Tomography) system for mammography imaging. In particular, problems and solutions associated with the integration of its Data Acquisition Electronics are targeted. Synchronism is a critical issue in the DAE system. A resynchronization module is proposed to solve communication problems with the internal asynchronous busses with limited degradation of the communication rate, as compared with fully synchronous solutions. Data processing refers to 64 dual-channels, 11bit/channel, at a frequency of 100MHz. The maximum of the DAE output rate is 220 MB/s corresponding to 1 MCoincidence/s. The robustness of the proposed solutions has been validated with software simulation and hardware implementation. Results of test and validation on FPGA, boards and buses are presented.
INTRODUCTION
PET (Positron Emission Tomography) systems are among the most effective imaging-based technologies for medical diagnosis. In this contribution, the final development of a PET system for mammography -PEM (Positron Emission Mammography) dedicated to the analysis of the women breast is described. Emphasis is given to the integration aspects of the Data Acquisition Electronic (DAE), which is responsible for the off-detector acquisition and processing of the PEM system.
Main aspects of the development of the entire system, namely, the high level architecture, the study of the crystal detectors (Lecoq, 2002) (Abreu, 2006) , the transducers to sense the light and to transform it into energy pulses, the associated Frontend Electronics (Albuquerque, 2006) , the offdetector electronics (Bento, 2006) (Leong, 2006) , (Varela, 2005) , the imaging reconstruction procedures and, of course, the mechanical aspects of the robot have been reported earlier. In the identification of system requirements, software models have been used and extensive Monte Carlo simulations (Agostinelli, 2003) (Rodrigues, 2004) (Trindade, 2004) have been performed. This paper focuses on the difficulties associated with system integration, and how those difficulties have been overcome. Main problems have been found in implementing adequate communication infrastructures, satisfying synchronism and performance requirements. In fact, correct individual sub-systems functionality and performance was verified in stand-alone test. Nevertheless, several modifications were required when integration took place, specifically on subsystems that communicate with different clock domains. A typical case regards subsystems that access the buses. The purpose of this paper is to present innovative solutions which have been designed and implemented to solve those problems.
The paper is organized as follows. In section 2, the generic architecture of the PEM system is provided to highlight the communication infrastructure. In section 3, the DAE functionality and architecture are described, highlighting the critical sub-modules, in terms of the communication environment. Section 3 also contains the proposal of new architectures for modified modules that are responsible for the communication between the DAE and the Front End electronics and between the DAE and the imaging reconstruction computer. In section 4, details of the physical implementation are provided. Section 0 presents the hardware validation results. Finally, in section 6, the main conclusions and future work are outlined.
PEM SYSTEM
The main objective of the system is to identify the presence of cancer cells in women breasts. Imaging reconstruction is used for the purpose. As it is well known, image reconstruction quests for large amounts of data. Therefore, main characteristics of PEM system are high data volumes and high data rates.
A key aspect is the need to guarantee that meaningful data is unequivocally identified. In order to understand what we refer as meaningful data, it is necessary to understand PEM underlying physics.
Human cells emit γ rays when a radioactive substance is injected into the human blood stream. When this occurs, 2 γ ray photons are emitted in opposite directions over a linear trajectory. Emission sources are detected by the intersection of trajectories. 2 'planes' of scintillant crystals detect the emitted γ ray photons. Crystals emit light that is afterwards converted into electric signals by Avalanche Photo Diodes (APD). Crystal arrays in the PEM scanner are organized in modules and submodules in a hierarchical structure (Amaral, 2007) (Matela, 2004) . Data are captured through 12288 readout channels, organized in 296 identical detector modules and distributed by the two crystal planes.
Potential meaningful data correspond to the simultaneous detection of γ ray photons in both crystal plans. In this case, we consider that a coincidence has occurred. The underlying principle of PEM systems behavior is the identification of coincidences. PEM scanner is a high-resolution system, capable of detecting breast tumors with diameters down to 2 mm (Albuquerque, 2006) .
In Figure 1 , the top-level architecture of the Clear-PEM system is depicted. Three main subsystems can be identified, namely, the scanner, constituted by crystals and associated Front-End, FE, electronics, the off-detector Data Acquisition Electronics, DAE, and finally, an external computer, PC (or array of computers) performing data storage and image reconstruction.
The FE electronics is an analogue/mixed signal system, responsible for first conditioning of the signal generated by the APDs, for the analogue/digital conversion and for the communication with the off-detector DAE system. DAE is responsible for digital data processing, in order to identify coincidences and to send meaningful data to the computer where image reconstruction takes place. Data integrity must be kept during processing and communication phases. Otherwise, the diagnosis result would be unreliable, leading to false positive or false negative results. This justifies the critical relevance of the problems tackled in this contribution.
DAE SYSTEM

DAE Requirements and Functionality
The main functionality of the DAE is the identification of relevant data and the transmission of that data to the image reconstruction computer. DAE specific requirements are as follows. The system should support a data acquisition rate of 1 million events per second, under a total single photon background rate of 10 MHz (Albuquerque, 2006) . An event or hit (photoelectric event or Compton -according to the associated energy) is defined as the interaction of a γ ray with a crystal.
Data to be analyzed and processed correspond to the hitting energy in the different crystals, as a consequence of these interactions. Relevant data is associated with relevant events (coincidences). Hence, a relevant event is characterized by the simultaneous occurrence of hits in both crystal planes.
Communication Challenges
Data is transmitted from the FE to the DAE by a large number of connecting cables, which introduce various delays, noise, and possibly, signal degradation. Therefore, physical interconnections pose a challenge to system integration.
Another challenge is to guarantee the correctness of data communication between DAE and the data storage and image reconstruction computer, PC.
A third challenge is to guarantee error-free communication among DAE boards and chips through the DAE internal buses, while not degrading system performance. In the DAE, the minimum required transmission rate is 1MCoincidence/s at 100 MHz, which corresponds to 220MB/s bit rate.
DAE Architecture
A comprehensive description of the DAE system ( Figure 2 ) has been published elsewhere. However, for the sake of clarity, a brief description of the constituting elements is provided here. DAE maps the organization of crystals in the scanner. As a consequence, DAE is constituted by 4 DAQ (Data Acquisition) boards and 1 TGR/DCC (Trigger/Data Concentrator) Board, communicating among them through buses.
The DAQ boards carry out the first data filtering to identify probably useful data, out of all the data that floods from the FE. As mentioned, the criterion for this classification is the detection of coincidence, interactions of photons with crystal pairs within a given discrete time interval, to which an identifying Time Tag is associated. Data is classified either as (1) useful and stored, or as (2) noise, in which case it is discarded.
Due to its flexibility, from the technological point of view, main functionality is implemented using reconfigurable devices, namely FPGA technology. Each DAQ board houses two 4-million gates DAQ FPGA, each containing a read out controller -the DAQ ROC module. At present, the DAE houses 4 DAQ and one TRG/DDC boards.
The TGR/DCC board is responsible for the identification of coincidences. When a coincidence is detected, TGR/DCC generates a trigger signal that notifies the DAQ boards of the situation, picks up the corresponding data and concentrates it according to a given protocol, to be sent to the external PC for image reconstruction. The DAE/PC communication is based on a commercial Bus.
The 4 DAQ and the Trigger boards communicate through buses, namely, the GBUS (Generic Bus) and the DBUS (Dedicated Bus) (Figure 2 ).
Synchronism between FE and DAE
To guarantee that a detected coincidence is effectively a coincidence, it is mandatory to know, without ambiguity, at what time a given data have been generated. For this, it is mandatory to guarantee system synchronism. The PEM system is a GALS (Globally Asynchronous, Locally Synchronous) system, although FE and DAE are driven by a system clock.
Data that leaves the FE, at the same time, refer to the same temporal mark, that is, the same Time Tag. Moreover, all clock signals driving the system behavior should have the same frequency and phase.
However, channel links introduce delays. In order to deal with this situation, data is transmitted from the FE with the corresponding clocks.
Although, identical cables and components have been used in identical paths, there is no guarantee that the clock phase is identical for all local components where clock is restored. In fact, small physical differences in the cables, in local PLL, delays induced by variable thermal maps, clock jitter and clock skew may result in small phase differences in the clocks arriving through the different cables. If these clock signals are very close to the internal clock edge, it may occur that one is activated in a given clock cycle and the other be activated in the following clock cycle.
To solve the FE/DAE synchronization problem, DAE uses a specific signal, Sync, to guarantee data synchronism and an auxiliary synchronization module to guarantee that the synchronism is restored. Sync behaves as a feedback loop that makes possible to detect and to synchronize all the active inputs. The description of this sub-system is outside the scope of this paper, as it has been published elsewhere.
Internal Buses Communication
As mentioned, another challenge in the integration phase has been the need to guarantee robust communication among DAE sub-systems. In fact, each sub-system (e.g., an FPGA) is a clock domain with its own local clock. Many solutions may found in the literature, regarding GALS (Xin, 2005) , (Beigne, 2006) , (Ogras, 2007) .
In our system, data rate transmission is, at least, 220 MB/s. An adequate solution for dealing with high data rates would be to use a synchronous bus (Lee, 2007) . However, to guarantee robustness on data transfer, the asynchronous solution is preferable. Therefore, a trade-off between speed and robustness is required.
In PEM system, a slightly lower data rate transfer is acceptable in order to achieve the required robustness. This means that a small performance lost (when compared with standard synchronous solutions) in conjunction with a small increase on silicon area is acceptable.
Therefore, in the PEM system, subsystems communicate through asynchronous busses, which guarantee correct communication at the highest data rate among these multi-clock domains.
Resynchronization Module
In the DAE, DBus (Dedicated Bus) and GBus (Generic Bus), Figure 2 , are asynchronous buses. In this case, Figure 3 , two pipeline stages (to guarantee data synchronization), constituted by 2 chains of 2 Flip-flops each; namely, D1a D1b and D2a D2b. Additional flip-flops are used at the input of each clock domain in order to assess if arriving data is reliable. In this case, an event is generated and data communication is started. The Bus resynchronization timing diagrams (Figure 4) , show that the communication rate is not as significantly decreased as it would be in a fully asynchronous solution. This comes from the fact that each data carries, with it, the corresponding 'clock'/strobe, avoiding the need to wait for hand shake protocol as associated with asynchronous communication.
Moreover, clock domains can work in parallel with the communication procedure, since they are locally synchronous domains. By doing so, the speed of synchronous communication is almost reached. As would in an asynchronous solution, communication is correctly performed regardless the bus physical length (limited by the line drivers' capability). This is a considerable advantage of our methodology.
DAE/PC Interconnection
Three solutions have been analyzed and experimented for the communication link between DAE and external PC in order to meet speed and data rate requirements, namely, the PCI, the USB and the SLINK Fed Kit (CERN). Finally, a mixed solution based on the USB and the SLINK protocols has been adopted.
The USB solution is based on the (Cypress CY7C68013A) microcontroller. Although, the bandwidth is limited to 60MB/s (maximum theoretical data rate in USB 2.0 protocol), the USB link provides a bidirectional communication, able to handle command requests and consequent replies (all of each are limited to 64 bits package).
The SLINK (CERN S-LINK Fed Kit) is a proprietary unidirectional protocol that, theoretically, achieves up to 800MB/s at full speed. The SLINK guarantees raw data transmission from the DAE to the PC.
This USB+SLINK solution allows simultaneous communication through both links. Separating data and command communication channels minimizes the resources needed on the acquisition PC.
DAE IMPLEMENTATION
Transmission between FE and DAE is carried out at frequency of 300MHz (3.3ns) using 32 cables. Correction is ±1,7ns (±1/2 cycle).
Some electrical problems have been faced, in the choice of the interconnecting cables. The ideal would be to use a twisted pair, individually shielded cable. However, the cross section and the flexibility of such a set of cables are incompatible with the mechanical robot. Therefore, a trade-off between electrical characteristics with mechanical requirements was necessary.
Main electrical problems that have been faced are related with the physical distance between the FE electronics and the DAE (around 6 meters). Most relevant electrical problems are skew and DC balance.
Only Amphenol SpectraStrip cables respond to the system requirements and therefore have been used in all tests. Nevertheless, some modifications have still been carried out in this cable in order to improve electromagnetic characteristics.
DAQ functionality is implemented with 8 Xilinx™ xc2v4000-4bf957 FPGA (2 FPGA per DAQ Board), i.e., 4 million equivalent gates and 957 pins FPGA. The TGR/DCC functionality is implemented with one Xilinx™ xc2v3000-4bg728, 3 million equivalent gates and 728 pins FPGA.
The FE/DAE communication between the FE and DAE subsystems is carried out through LVDS (Low Voltage Differential Signaling) channel links. LVDS channel links de-serializer convert the high speed, serialized, long distance communication lines into the standard LVTTL (Low Voltage TTL (Transistor Transistor Logic) ) electrical signals at the input/output of DAQ board's logic components.
The DAE system is shown in Figure 5 . As mentioned, the DAE is constituted by 5 boards. 4 DAQ boards and 1 TGR/DCC board communicating among themselves by 2 internal buses (DBus and GBus). Transceivers are used as FPGAs gateways to the internal BUSes. Transceivers serve also as buffers for these BUSes.
VALIDATION RESULTS
Validation conditions are as follows. Input data (input events) are provided by a previously developed system that emulates the Front End (FE) functionality. Output data has been obtained with the DAE hardware.
The test strategy is as follows. The electrical test of the DAE boards has been carried out prior to the FPGAs functional test. The functionality of the FPGAs has been validated in silicon (DAE system and board test).
In Figure 6 , the different buses event rate is represented. It is worth to notice that, these results have been obtained with a working frequency of 50MHz (limited by current FE hardware version). This means that if we work at 100MHz, the event rate duplicates. Therefore, validation results are beyond system specifications. The GBus shows a linear dependence of the event rate on the input rate until almost 700Kevent/s. After this value, starts to flare until reaching a maximum transmission rate of 800Kevent/s. After that, it starts dropping. This is caused by the inefficiency of the arbiter which is now under revision. In the linear region, this inefficiency is masked by the fact that the required bandwidth is below the maximum.
It is expected that with a more efficient arbiter, the linearity of the GBus is verified until higher values and will remain almost constant after that.
SLINK performance in this test is only limited by the GBus throughput.
CONCLUSIONS AND FUTURE WORK
In this paper, some problems faced during the integration of the PEM DAE system have been reported. Emphasis has been put on communication related problems, particularly, those related with synchronism aspects. Some aspects related with the communication between DAE and FE electronics and the DAE and the imaging reconstruction computer are provided.
More detail has been given to the solution proposed for guaranteeing the correctness of the communication through the DAE internal buses. With this solution, the robustness of asynchronous communication and the speed of synchronous solutions are almost reached.
Experimental results show that the achieved performance surpassed system specifications.
Additional performance improvements will be obtained with the review of the GBus arbiter, which is currently under development.
System architecture is also under review in order to allow the integration of more data acquisition boards. This will allow more flexibility in the scanner geometry, thus allowing extension to the system applicability for medical imaging of other regions of the human body. Results will be reported in the future.
