Introduction
The Pierre Auger Observatory is a ground based detector located in Malargue (Argentina) (Auger South) at 1400 m above the sea level and dedicated to the detection of ultra high-energy cosmic rays with energies above 10 18 eV with unprecedented statistical and systematical accuracy. The main goal of cosmic rays investigation in this energy range is to determine the origin and nature of particles produced at these enormous energies as well as their energy spectrum. These cosmic particles carry information complementary to neutrinos and photons and even gravitational waves. They also provide an extremely energetic stream for the study of particle interactions at energies orders of magnitude above energies reached at terrestrial accelerators (Abraham J. et al., 2004) . The flux of cosmic rays above 10 19 eV is extraordinarily low: on the order of one event per square-kilometer per century. Only detectors of exceptional size, thousands of square-kilometers, may acquire a significant number of events. The nature of the primary particles must be inferred from properties of the associated extensive air showers (EAS). The Pierre Auger Observatory consists of a surface detectors (SD) array spread over 3000 km 2 for measuring the charged particles of EAS and their lateral density profile of muon and electromagnetic components in the shower front at ground, and of 24 wide-angle Schmidt telescopes installed at 4 locations at the boundary of the ground array measuring the fluorescence light associated with the evolution of air showers: the growth and subsequent deterioration during a development. Such a "hybrid" measurements allow cross-calibrations between different experimental techniques, controlling and reducing the systematic uncertainties. Very inclined showers are different from the ordinary vertical ones. At large zenith angles the slant atmospheric depth to ground level is enough to absorb the part of the shower that follows from the standard cascading interactions, both of electromagnetic and hadronic type. Only penetrating particles such as muons and neutrinos can traverse the atmosphere at large zenith angles to reach the ground or to induce secondary showers deep in the atmosphere and close to an air shower detector.
The ability to analyze inclined showers with zenith angles larger than 60 • induced by neutrinos or photons essentially increases the acceptance of the surface array and opens a part of the sky that was previously inaccessible to the detector. These showers provide a new tool for ultra high energy cosmic rays interpretation because they are probing muons of significantly higher energies than vertical showers. Spectral triggers offering a pattern recognition in a frequency domain may improve a standard detection technique based on the signal coincidences from many PMT channels above some thresholds in the time domain. The "old" muon shower fronts have only a small longitudinal extension, which is leading to short detector signals also in time. To identify these showers at the presence of "young" showers with a large electromagnetic component one may need a very good spectral sensitivity to the fast muon component in the trigger. The main advantage of the spectral trigger is the scaling feature. The set of the DCT coefficients depends only on the shape of signals, not on their amplitudes. Triggers sensitive on the shape of FADC traces may detect events with expected characteristics i.e. the fast attenuated, very short peaks related to the muonic, flat fronts coming from very inclined showers. Independence of the amplitude is especially promising for the Auger North, where due to a single PMT in the surface detectors the coincidence technique cannot be used. In order to keep reasonable trigger rate for the 1st level trigger (ca. 100 Hz), the threshold for the 1st trigger should be much higher than for example in the Pierre Auger Observatory, where 3-fold coincidences attenuated a noise. 
Triggers
Two different triggers are currently implemented at the 1st level. The first is a single-bin trigger generated as 3-fold coincidence of the 3 PMTs at a threshold equivalent to 1.75 vertical emitted muons. The estimated current for a Vertical Equivalent Muon (I VEM ) is the reference unit for the calibration of FADC traces signals and corresponds to ca. 50 ADC-counts. This trigger has a rate of about 100 Hz. It is used mainly to detect fast signals, which correspond also to the muonic component generated by horizontal showers. The single bin trigger is generated when the input signal is above the fixed thresholds calculated in the micro-controller during the calibration process. It is the simplest trigger useful for high-level signals. The second trigger is the Time over Threshold (ToT) trigger that requires at least 13 time bins above a threshold of 0.2 I VEM . A pre-trigger ("fired" time bin) is generated if in a sliding time window of 120 × 25 ns length a coincidence of any two channels appears. This trigger has a relatively low rate of about 1.6 Hz, which is the expected rate for two muons crossing the Auger surface detector. It is designed mainly for selecting small but spread-in-time signals, typical for high energy distant EAS or for low energy showers, while ignoring the single muon background (Abraham J. et al., 2010) . Cherenkov light generated by very inclined showers crossing the Auger surface detector can reach the PMT directly without reflections on Tyvec liners. Especially for "old" showers the muonic front is very flat. This together corresponds to very short direct light pulse falling on the PMT and in consequence very short rise time of the PMT response. For vertical or weakly inclined showers, where the geometry does not allow reaching the Cherenkov light directly on the PMT, the light pulse is collected from many reflections on the tank walls. Additionally, the shower developed for not so high slant depth are relatively thick. These give a signal from a PMT as spread in time and relatively slow increasing. Hadron induced showers with dominant muon component give an early peak with a typical rise time mostly from 1 to 2 time bins (by 40 MHz sampling) and decay time of the order of 80 ns (Aglietta et al., 2005) . The estimation of the rise time for the front on the base of one or two time bins is rather rough. The rise time calculated as for two time bins may be overestimated due to a low sampling rate and an error in a quantization in time. Higher time resolution would be favorable. The expected shape of FADC traces suggests to use a spectral trigger, instead of a pure threshold analysis in order to recognize the shape of the FADC traces characteristic for the traces of very inclined showers. The monitoring of the shape would include both the analysis of the rising edge and the exponentially attenuated tail. A very short rise time together with a relatively fast attenuated tail could be a signature of very inclined showers. We observe numerous very inclined showers crossing the full array but which "fire" only few surface detectors (Fig. 1 ). For that showers much more detectors should have been hit. Muonic front probably produces PMT signals not high enough to generate 3-fold coincidences, some of signals are below of thresholds (see Fig. 2 ). This may be a reason of "gaps" in the array of activated surface detectors.
Discrete Fourier Transform vs. Discrete Cosine Transform
There are several variants of the DCT with slightly modified definitions. The DCT-I is exactly equivalent (up to an overall scale factor of 2), to a DFT of 2N-2r e a lnumbers with even symmetry. The most commonly used form of the Discrete Cosine Transform is DCT-II. Fig. 1 ). For all very inclined showers the rising edge corresponds to one or two time bins.
for k ≥ 1. The DCT-III form is sometimes simply referred to as "the inverse DCT" (IDCT). A variant of the DCT-IV, where data from different transforms are overlapped, is called the Modified Discrete Cosine Transform (MDCT). The DCT is a Fourier-related transform similar to the DFT, but using only real numbers. DCT are equivalent to DFT of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even), where in some variants the input and/or output data are shifted by half a sample. The DCT-II and DCT-IV are considered as the alternative approach to the FFT. In fact, the FFT routine can be supplied in an interleaving mode, even samples treated as real data, odd samples as imaginary data. A trigger based on Discrete Fourier Transform (DFT) (Radix-2 FFT) (Szadkowski, 2006) has already been implemented in the 3rd generation of the Front FEB based on Cyclone™ Altera ® chip (Szadkowski, 2005b) . However, for real signal 
Pedestal independence
The analog section of the FEB has been designed to have a pedestal of ca. 10 % of the full FADC range in order to investigate undershoots. However, the pedestal is relatively sensitive on the temperature. Daily variation of the pedestal may reach 5 ADC-counts. The trigger pedestal-independent is very welcome. Let us consider signal with a constant pedestal:
Due to symmetry and parity of the cosine, we get for odd and even indices respectively:
By a recursion, repeating (5) we get finally
In a consequence fork>0theDCTcoefficients are independent of the pedestal.
Scaling
The DCT algorithm has a significant advantage in comparison to the FFT one. The structure of DCT coefficients is much simpler for interpretation and for a trigger implementation than the structure of the FFT real and imaginary coefficients (compare 4th of the FFT data vs. 2nd row for the DCT coefficients in Fig. 3 ). For the exponentially attenuated signals from the PMTs higher DCT coefficients (scaled to the 1st harmonics)
are almost negligible, while both real and imaginary parts of the FFT (scaled to the module of the 1st harmonics) give relatively significant contributions and are not relevant for triggering.
When a peak appears in the pure attenuated signal (last column in Fig. 3 ) the structure of the DCT dramatically changes and trigger condition immediately expires, while modules of FFT components almost do not change. The structure of FFT harmonics for the last graph in Fig. 3 would be more suitable for a trigger (almost negligible imaginary part for higher harmonics and also relatively low real harmonics), however it corresponds just to situation, when the , absolute values of the DFT (3rd row) and corresponding real (Re), imaginary parts (Im) (4th row). The 1st column shows the pulse (shape A), when two time bins are on the pedestal level, the 2nd one (shape B), when only the one time bin is still on the pedestal level, while the 3rd one (shape C) shows the pulse fully fulfilled the range of investigating shift registers. For a signal shape related to the exponential attenuation (shape C), the contribution of higher DCT coefficients is small and suitable for a trigger. When a peak appears in the declining signal (last column -shape D), the DCT coefficients immediately excesses assumed relatively narrow acceptance range for triggers. The DFT coefficients (Re and Im in 4th row) have similar structure as the DCT, however for the pure exponentially declining signal the higher real DFT harmonics have relatively high values and they are not suitable for triggering. Absolute values of DFT components (3rd row) are clearly insensitive on discussed conditions. pure attenuated signal is distorted by some peak on the tail and a trigger condition has been violated. The plot in the 4th row and 3rd column on Fig. 3 shows a contribution of the DFT vs. the absolute value of the 1st harmonic. For an exponential attenuated signal (with the attenuation factor = β) the contribution of both real and imaginary coefficients decreases monotonically with a significant value for all real coefficients. From the DFT definition we get:
where φ = 2πk N . Calculating (8) for boundary factors β = (0.28, 0.42) (from the Auger database) and for k = N/2 (as the lowest in a monotonically decreasing chain), we obtain forN=16: ξ = 24% and 28%, respectively. These values are too large to be use for triggering. Even an extension of the DFT size does not help very much. For N = 32: we get still large values: ξ = 17% and 23%. Almost vanishing higher DCT coefficients provide much natural trigger conditions. 32-point FFT (roughly equivalent to 16-point DCT) does not offer better stability.
Genaral DCT algorithm
The DCT for real signal x n gives independent spectral coefficients for k = 0,1,...,N-1, changing f k also from zero to f sampl 2 but with f sampl 2N grid. DCT vs. DFT gives twice better resolution. Splitting the sum (1) and redefine the indices we get:
Due to symmetry of the cosine function
We can introduce the new set of variables:
DCT coefficients can be separated for even and odd indices respectively:
Let us notice that (13) for even indices has the same structure as (1) After the 1st step of minimization, the terms of the sum (13) for odd indices depends only on the odd multiplicity of the fractional anglē
Using a following trigonometric identity
the fractional angles can be increased by the factor of 2 for β = kπ 2N . Thus:
Let us notice that: 1). cos(kπ)=(−1) k , for n = N-1, hence pure A n coefficient survives, 2). cos( kπ 2 )=0, for n = N 2 because of odd k, 3). the rest of indices appear in cosine terms twice in A n+1 and A n coefficients, which allows introducing the new set of variables
The range of B n indices is continuous and can be split again on even and odd parts. The above procedure can be repeated in recurrence.
8-point DCT algorithm
ForN=8according to formulae (12) and (17) we get :
For even indices the DCT coefficients are expressed as follows:
where For odd indices with a support of (15) we get:
A direct approach from the classical definition requires: a single multiplication for even indices (20) and 5 multiplications for odd indices (22). The scaled coefficients S 1,7,3,5X1,7,3,5 in (22) can be expressed in an equivalent way introduced by Arai, Agui, Nakajima (AAN, 1988) ., which allows reducing an amount of multiplications from 5 to 4 only. Fig. 4 . A fast DCT algorithm developed in 1988 by Arai, Agui and Nakajima A minimization of multiplications amounts is one of a fundamental goal in long-term numerical calculations. Reduction of product terms significantly speed up sophisticated calculations, because a single multiplication requires several clock cycles of processor. Multiplications in powerful FPGA chips can be however performed in very fast dedicated DSP blocks in a single clock cycle. Signals processed in parallel threads in a hardware implementation of a pipeline design have to be synchronized to each other. Pipeline approach requires additional shift registers for synchronization also for signal currently not being processed. However, such synchronization needs additional resources. Fig. 5 shows the part of pipeline chain corresponding to odd indices of DCT coefficients (lower part in Fig. 4) . A direct implementation of the pure AAN algorithm requires 7 pipeline stages, which utilize additional resources of shift registers for synchronization for operations like: X(t+1) = X(t). In a numerical calculation in processors data are simply waiting for a next performance cycle. The D 64 block contains a cascade of the sum and the multiplication. An implementation of the cascade in a single clock FPGA logic block significantly reduce a speed. Additionally, the lpm_add_sub mega-function from the Altera ® library of parameterized modules (LPM) does not support an inversion of a sum i.e.
These operations would have to be performed in a cascade way by an adder and a sign inversion. Cascade operations performed in the same clock cycle significantly slow down a global registered performance. A simple redefinition of nodes removes difficulties mentioned above. The B 4 node defined as the sum of A 4,5 nodes requires a simple lpm_add_sub mega-function. The D 4 node with currently inverted sign allows using lpm_add_sub in E 4 performing a subtraction. The D 64 node from Fig. 5 can be split into the subtraction C 64 and the multiplication D 64 in the next clock cycle (Fig. 6 ). A classical approach reduces a length of the chain from 6 to 5 stages only, at the cost of one additional multipliers. An abridgement of the pipeline chain and in a consequence a reduction of the shift registers needed for synchronization allows saving significant amount of logic blocks, especially for wide data bus. In order to reduce an approximation errors, the data bus in the intermediate stages is enlarged.
16-point DCT algorithm
The 16-point DCT algorithm will be implemented according to the classical approach with an optimization of the number of pipeline stages at the cost of an utilization of embedded multipliers (Szadkowski, 2009) . The 1st and the 2nd pipeline stages utilize the set of variables (12) and (17) respectively. For N = 16 the fractional angle of the twiddle factor in the 1st step of minimization equals to β = π . The same fractional angle corresponds to the 2nd step of minimization for even indices corresponded to A n .
B 0,1,2,3 = A 0,1,2,3 + A 7,6,5,4 B 4,5,6,7 = A 3,2,1,0 − A 4,5,6,7
The scaling procedure used for odd indices ofX k with the fractional angles β = 
CoefficientsX k for even indices can be expressed by variables (24) and scaling factor (21) 
Let us notice that the structure of the right vector in (29) is exactly the same as in (22), but the structures of the 6x4 matrices are different. In (22) the matrix comes from a transformation for the odd indices supported by (21), while in (29) the matrix comes from a transformation of even indices. Scaled coefficients corresponding to odd indices
can be expressed by variables (25) and scaling factors (21) as follows:
Matrix (32) can be factorized as follows: 9,13 = B 9,13 S 2,6 C 11 = B 11 S 4 (34)
In the 4th pipeline step directly from (32) we can introduce new variables:
The rest of variables require 10 next multipliers, 3 adders/sub-tractors and 3 shift registers: 
However, the 5th pipeline stage requires only a single multiplier for the E 2 variable: 
The 6th stage does not require any multiplier, only 10 adders/sub-tractors and 6 shift registers for synchronization: = E 3,5,7,9,13 ± E 2,4,6,8,12 F 0,1.9.11.13.15 = E 0,1.9.11.13.15 (39)
In the 7th pipeline stage 12 signals are delayed only for synchronization and 4 are scaled for the following (n,k) pairs: (14,1),(12,7),(10,3),(8,5):
In the 8th pipeline stage pure registers for synchronization only are implemented for even indices ofX 0,2,4,6,8,10,12,14 and 
The last stage contains all scaling multipliers:
for the following (k,m) pairs: (1,15), (15,14), (7,13), (9,12), (3,11), (13,10), (5,9), (11,8), (14,7), (2,6), (6,5), (10,4), (4,3), (12,2).
Implementation of the code into a FPGA
The spectral trigger should be generated if DCT coefficients normalized to the 1st harmonics are in an arbitrary narrow range:
where Thr L k and Thr H k are lower and upper thresholds for each spectral index k, respectively. Altera ® Library of Parameterized Modules (LPM) contains the lpm_divide routine supporting a division of fixed-point variables. However, this routine needs huge amount of logic elements and it is slow (calculation requires 14 clock cycles in order to keep sufficiently high registered performance). DSP blocks also do not support this routine. A simple conversion to allows implementation of fast multipliers from the DSP blocks and calculation of products in a single clock cycle. θ L k and θ H k are lower and upper scaled thresholds respectively, which are set as external parameters. According to (44) the calculation of a sub-trigger needs two multipliers, two comparators and an AND gate. The multiplier stage of an embedded multiplier block supports 9 × 9or18× 18 bit multipliers. Depending on the data width or operational mode of the multiplier, a single embedded multiplier can perform one or two multiplications in parallel. Due to wide data busses embedded multiplier blocks do not use the 9×9 mode in any multiplication. Each multiplier utilizes two embedded multiplier 9-bit elements. The full DCT procedure needs the calculation of all coefficients 70 DSP blocks. However, the scaling ofX k in the last pipeline chain is no longer needed. It is moved to the thresholds according to (44). Removing last pipeline chain reduces amount of DSP blocks to 40. Sub-triggers routines (Fig. 9 ) need 2 DSP blocks each. The chip EP3C40F324I7 selected for the 4th generation of the 1st level SD trigger contains 252 DSP 9-bit multipliers. So, for 3-fold coincidences and an implementation of 3 "engines" the single DCT "engine" can support only 11 independent DCT coefficients (Szadkowski, 2011) . Sub-triggers A
k are generated for the patterns A k , B k , C k and D k (k = 2,4,6) from Fig. 3 , respectively. Sub-triggers are synchronized to each other in shift registers in order to put simultaneously on an AND gate (Fig. 11) . In order to keep a trigger rate below the boundary deriving from the limited radio bandwidth, additionally the amplitude of the jump is verified. If the jump is too weak, a veto comparator disables the AND gate. Thus, if spectral coefficients ξ k match pattern ranges for each time bins selected by multiplexer totally in 4 consecutive time bins and if veto circuit is enabled the final trigger is generated. A delay time for the veto signal depends on the type of shape, which is an interest of an investigation. For the single time bin of the rising edge the veto is delayed on 3 clock cycles, for the investigated pattern corresponding to the three time bins of the rising edge the maximal ADC value appears 2 clock cycles later in comparison to the previous case, so the veto should be delayed on a single clock cycle only. 
Signals between that thresholds (two comparators + AND gate) generate preliminary sub-triggers, which are next summed and compared with the arbitrary Occupancy level. If an amount of "fired" preliminary sub-triggers is above the selected Occupancy, the final sub-trigger is generated for the next processes. It is enabled/disabled depending on the veto variable, verifying the minimal amplitude of the input signals to keep the trigger rate on the reasonable level and to prevent the saturation of the transmission channel. Fig. 10 . Simulation of the 1-fold spectral trigger simultaneously with the 3-fold threshold trigger. The length of the shift registers = 16. Data in the Ext_ADC0 channel corresponds to a muon signal with a 1-time-bin rising edge, 11-time-bins attenuation tail and with a constant pedestal = 40 ADC-counts. Together with the begin of the muon peak (at 23.075 µs), two neighboring channels Ext_ADC1,2 are driven artificially to 150 ADC-counts to generate the standard threshold trigger based on the 3-fold coincidence. The internal PLL clock = 80 MHz. The internal standard threshold trigger appears 5 clock cycles later (+62.5 ns). The nodes lpm_ff:$00000|dffs -lpm_ff:$00030|dffs correspond to the shift register x 15 ,...,x 0 . The system is tuned for the Shape_A recognition (two 1st time bins on the pedestal level). Ena_A_reg is generated (+200 ns = 16 clock cycles) due to the amplitude of the signal (140 ADC-counts) is above the veto threshold. It is delayed next 15 cycles to be synchronized with SUB_TRIG_Occ. Sub-triggers are generated 27 clock cycles (+337.5 ns) after the rising edge. A calculation of the Occupancy takes next two clock cycles. 29 clock cycles after the rising edge due to a coincidence of the Occupancy and Ena_DCT_del (inversion of the veto) the SUB_TRIG is generated. Finally it appears in the same position as 3-fold coincidence threshold trigger 31 clock cycles later. Final_DCT trigger corresponds to the possible coincidence with a neighboring DCT "engines". If the standard threshold trigger(based on 3-fold coincidence) appears next any triggers are ignored though 768 clock cycles.
The 16-point DCT with 16-stage shift register for 100 MHz sampling can cover 150 ns time window. For the horizontal or very inclined showers this interval is sufficient for the analysis. However, for the higher sampling frequency, when the time window may turn out too short, the shift register may be extended from 16 to 24 stages and the eight samples for the higher indices may be taken from the last 16 shift register nodes according to the Fig. 11 . The samples with higher indices correspond to the exponentially attenuated tail and the analysis of the tail is lest critical than the rising edge, where samples are analyzed with a full speed. Fig. 11 . A scheme of the final spectral trigger. The shift register presented here has an extended length = 24 stages to cover longer time window. However, for a sampling frequencies f s ≤ 100 MHz 16 stages and T ≥ 150 ns the window is wide enough for an analysis of horizontal showers. If signal shifted in the register chain matches the expected patterns for 4 consecutive time bins i.e. corresponding to ADC shapes in Fig. 3 (1st row, 3 first graphs. The 4th pattern is exactly the same as the 3rd one. The amplitude of the signal decreases, but the DCT coefficients remain the same (still an exponential attenuation).
3 DCT trigger "engines" have been successfully merged with the Auger code working with 100 MHz sampling. The final code utilizes only 38gives an opportunity to add new, sophisticated algorithms. The slack reported by the compiler corresponds to a maximal sampling frequency 112 MHz, which gives a sufficient safety margin for a stable operation of the system. For sufficiently high amplitudes of the ADC samples the Threshold trigger will be generated 32 clock cycles earlier than the spectral trigger (24 clock cycles of propagation in the shift registers + 8 clock cycles of performance in the DCT chain). If the Threshold trigger has been already generated, the next triggers are inhibited for 768 time bins necessary to fulfill memory buffers (see Fig. 7 in (Szadkowski, 2005a) ). Because the Threshold trigger (sensitive to bigger signals) has a higher priority than the spectral trigger, ADC samples will not be delayed for the Threshold trigger in order to synchronize it with the spectral one. The system uses 10-bit resolution (standard Auger one). A compilation for the 12-bit resolution for the current chip EP3C40F324I7 failed, due to a lack of the DSP blocks. 12-bit system requires bigger chip EP3C55. The slack times are on the same level as for EP3C40. All pipeline routines shown in Fig. 8 are implemented in a direct mode (no pipeline mode -like i.e. in the 2nd generation of the FEB based on the ACEX family (see Fig. 2 in (Szadkowski, 2005a) ) or for the FFT implementation in the Cyclone family ( Fig. 2 in (Szadkowski, 2005b) ). So, a performance of a signal requires a single clock cycle only. All routines are fast enough to work with 100 MHz sampling without an additional pipeline stages and they do not introduce an additional latency.
Accuracy
10-bit resolution of FADC in the high-gain channels (responsible for a trigger generation) implies the ranges ofX k coefficients given in the 2nd column of Table 1 . Multiplications of integer values N by real scaling factors sf give floating-point results. In order to keep possible high speed of calculation and not to utilize resources spendthrift the fixed-point algorithm of processing has been chosen. N×sf were approximated on each pipeline stage again to the integer value. For almost all scaling factors: sf ≤ 1, N×sf has a representation of the same or less amount of bits. For sf ≥ 1, N×sf extends the representation on 1 or 2 bits. This approximation introduces errors. However, the width of the data in the internal pipeline stages is extended from the N at the shift register x 15 ,...,x 0 , to N+1, N+2, N+3, N+4, N+5, N+7, N+8 in routines A, B, C, D, E, F, G, respectively (Fig. 8) . This reduces approximation errors mostly to the LSB, apart theX 15 . This coefficient will not be used for a trigger. k range of LSB 2 nd 3 rd and k range of LSB 2 nd 3 rd and X k bit moreX k bit more 0 0...4092 0.0% 0.00% 0.00% 8 ± 2041 0.0% 0.00% 0.00% 1 ± 2521 13.1% 0.00% 0.00% 9 ± 12224 23.8% 1.55% 0.00% 2 ± 2581 8.7% 0.00% 0.00% 10 ± 4557 12.8% 0.00% 0.00% 3 ± 2914 13.1% 0.00% 0.00% 11 ± 7519 17.7% 0.00% 0.00% 4 ± 2348 4.8% 0.00% 0.00% 12 ± 5671 11.5% 0.00% 0.00% 5 ± 4019 15.1% 0.00% 0.00% 13 ± 9605 24.3% 2.00% 0.00% 6 ± 3045 8.6% 0.00% 0.00% 14 ± 12978 26.9% 2.86% 0.00% 7 ± 10032 23.1% 1.10% 0.00% 15 ± 25597 30.9% 25.08% 6.83% Table 1 . Ranges ofX k coefficients and relative errors for least significant bits ofX k . For k ≤ 14 the errors appear practically only in the LSB.
According to above estimations, the configuration with 3 "engines" does not support all ξ k sub-triggers due to limited amount of DSP blocks. However, for the next generation of the water Cherenkov detectors array, where probably only a single PMT will be used, 3 "engines" will be implemented to investigate and to detect 3 different shapes of FADC traces corresponding to i.e. different rise times of the rising edge.
Preliminary tests
Analysis of Auger ADC traces of very inclined showers shows that the maximum of the signal is mostly reach in a single time bin. The attenuation factor for a tail is in the range of β = (0.2 -0.5). Fig. 12 shows shapes of signals with various attenuation factors with two first time bins on a pedestal level. For simplicity it has been set on zero. It does not reduce the generality of analysis, because the pedestal is irrelevant for DCT (k ≥ 1). The corresponding DCT coefficients are shown in upper Fig. 3 (Shape_A) . After a single clock cycle, when data is shifted in the registers chain, shifted signal with only one time bin on the pedestal level determines a new set of the DCT coefficients shown in lower Fig. 3 (Shape_B). Pattern, which is going to be recognized, can be selected by a setting of DCT coefficient in the DCT engines. All signals with first two time bins on the pedestal level for sure will be with only one time bin on the pedestal level in the next clock cycle. But, not vice versa. A signal with only a single time bin on the pedestal level before sharp rising edge can have significant contribution in the 2nd time bin before rising edge and it will not be recognized by a pattern recognition procedure tuned on the Shape_A. A procedure recognizing Shape_A is more restrictive and gives lower trigger rate than for the Shape_B. Due to limited amount of the DSP blocks only 11 DCT coefficients can be analyzed simultaneously. For the Shape_A thē X 4 andX 10 are ignored and for the Shape_B :X 6 andX 14 , respectively, as weakly sensitive on changes of signal shapes. The trigger based only on the DCT pattern recognition gives too high rate, due to a contribution of very week signals with also appropriate shape, but usually treated as noise. In order to reduce and control the trigger rate, the veto threshold has been introduced. The calculation of the DCT coefficients in the pipeline chain and next the calculation of sub-triggers in multipliers and comparators block takes 12 clock cycles. The signal is synchronized with the DCT sub-triggers delayed the same time to be compared with the veto threshold, simultaneously with a generated DCT sub-triggers. If the signal is above the sum of the veto threshold and the pedestal, the sub-triggers are enabled to generated a final spectral trigger. The condition that all 11 DCT coefficients were inside the acceptance lane is too strong. The shapes are not ideal, noise introduces additional shape distortions. Similarly as in the ToT trigger only a part of "fired" sub-triggers (Occupancy ≤ 11 = max. number of sub-triggers) is enough to generate the final spectral trigger. Although the spectral trigger is being developed for the future and for a single detection channel (a single PMT), the DCT trigger in the Auger surface detector has been tested in a 2-fold coincidences of any 3 PMTs, to be close as possible for a comparison of the results with the standard Auger data. Fig. 14 show the trigger rate for the Occupancy = 6 and 7, respectively. The T1 trigger rate is calibrated to ca. 100 Hz. Generally, the trigger rate for Occ = 6 is too high. In order not to saturate the microcontroller and the transmission chain the total (standard Auger + spectral) trigger rate should not exceed 150 Hz. This gives max. 40 -50 Hz for the spectral trigger only. The Occupancy = 7 with a range of attenuation factors limited to β = (0.20 -1.14) gives a trigger range on the reasonable level. The Occupancy = 8 reduces the trigger rate below 1 Hz and seems to be too restrictive. The FPGA contains internal counters counting the trigger rate and a contribution of DCT sub-triggers to the final trigger. The required trigger rate range can be set remotely from the Central Data Acquisition System (CDAS). The FPGA automatically tunes the veto threshold to get the required trigger rate. If the veto threshold is above 60 ADC-counts (ca. 1.2 VEM) the acceptance lane is modified. The attenuation factor β from the left side of the range is increased/decreased in the range of (0.20 -0.40) by the fixed right boundary β = 1.3. Fig. 15 show three calibration processes, when either initial parameters has been set ideally (B) or they have to be tuned to get required trigger rate (A and C). The tuning process typically does not exceed 3 minutes. In contrary to the standard Auger tuning procedure, when the thresholds for the Threshold trigger are calculated by the external microcontroller located on the Unified Board (UB), the thresholds for the acceptance DCT lane are initially calculated and next stored in the ROM inside the FPGA and they are only multiplexed. This allows a full autonomous FPGA calibration process without a support by any external microcontroller.
The new Front-End Board samples analog signals with 80 MHz. Data is written via a left port in the dual-port RAM. Stored data are next read via the right port with 40 MHz. The new board is seen by the rest of electronics as the standard one. Only a additional flag informs the system on the type of the trigger. Internal FPGA counters allow counting a contribution of the DCT coefficients to the final spectral trigger. Fig. 15d shows a relative contribution of the DCT coefficients for the Shape_A. The contribution of theX 5 andX 9 is a little bit lower than the rest ones. For theX 9 the acceptance lane (compare Fig. 13A ) is relatively narrow, so the lower contribution is not strange.X 5 is probably more sensitive on signal noise and possible signal distortions. The graph A (Fig. 15) shows a process when the trigger rate is initially too . A comparison of a contribution of "fired" DCT coefficients generating the sub-triggers for all three PMT channels and for various trigger rates requirements (right-down). There are no significant differences in a contribution of a fixed coefficients for different PMT and various configuration of trigger rate requirements high and the attenuation factor has to be increased (the acceptance lane is narrowed down). The graph B shows the process, when the initial parameters are optimal and the acceptance lane is not modified (only the veto threshold is tuned). The graph C shows the process when the initial parameters give too low trigger rate and the acceptance lane is changed three times.
Conclusion
The pattern recognition technique implemented parallel with the standard threshold detection may improve an efficency of a registration of rare events, especially for a single PMT in the surface detector, when the coincidence technique cannot be longer used. The optimized algorithm of the spectral trigger based on the Discrete Cosine Transform with veto and auto-calibration procedure has been successfully implemented into the FPGA and showed the perfect stability in the real detector. Measurements in the test detector confirmed assumption for a selection of limited amount of DCT coefficients and a stability of algorithm for arbitrarily selected acceptance lane of the spectral trigger rate. Although 6 surface detectors from the Pierre Auger Observatory have been used for the tests, the spectral trigger is being developed more generally for future ground EAS arrays usingother than the present Pierre Auger Observatory -only one PMT per station.
Acknowledgement
The author would like to thank the Pierre Auger Collaboration for being allowed to use a PAO infrastructure and a test-hexagon and for getting the data made available. 
