Abstract-This paper describes the implementation of a Digital Signal Processing (DP) subsystem for a single Long Wavelength Array (LWA) station.
INTRODUCTION
The Long Wavelength Array (LWA) is a radio telescope that will consist of 53 receiving stations located throughout southwest New Mexico [1] and operating in the frequency range 10 to 88 MHz. About 15 of the stations will form a compact core in a log-spiral design [2] .
Key science drivers for the LWA include the study of cosmic evolution and the high redshift universe, acceleration of relativistic particles, plasma physics (including the Earth's ionosphere) and space science, and exploration science [3] . Other low frequency radio astronomy arrays nearing completion or under construction include the Low Frequency Array (LOFAR) [10] , the Murchison Wide Field Array (MWA) [15] , and the Precision Array Probing the Epoch of Reionization (PAPER) [16] . Of these, only LOFAR covers frequencies below 70 MHz, and its collecting area is substantially less than that of the LWA. Much larger arrays are proposed, including the Square Kilometer Array [11] and the Lunar Radio Array [17] .
The overall range of baseline lengths will be 200 m to 400 km. Signals from the stations will be combined by crosscorrelating all pairs at a central facility, allowing detailed images of the sky to be formed. When completed, the LWA will provide unprecedented effective aperture and thus sensitivity, as well as unprecedented angular resolution, in its frequency range.
Each station is a phased array of 256 pairs of dipole-like elements. The elements are distributed in a pseudo-random manner within a circle of approximately 100 m diameter, giving a beam width of 2° at 80 MHz [3, 4] . The element gain at zenith is approximately 5 dB [12] , which leads to an effective area for each station that varies from approximately 6400 m 2 at 30 MHz to 750 m 2 at 88 MHz, and thus the entire 53-station array provides about 340,000 m 2 to 40,000 m 2 , respectively.
Figure 1-LWA-1, August 2010
Signals from the elements are combined locally so as to form four beams simultaneously, each rapidly and independently steerable to any direction in the sky. The beamformers are broad-band, covering the entire 10 to 88 MHz RF range. A portion of each combined signal is then selected for further processing. Bandwidths of 250 kHz to 19.6 MHz are available, independently tunable across the RF range.
The Earth's ionosphere is a major impediment to lowfrequency radio astronomy, and the challenges increase as frequency is reduced. It is necessary to track and correct for the ionospheric signal delay as a function of direction. To accomplish this, one beam of the LWA will normally be devoted to calibration. There are engineering challenges as well, including making simple and low-cost antennas which are efficient over a large fractional frequency range; providing signal processing electronics with sufficient flexibility at an affordable cost; and transmitting the signals to the central correlator.
At present, the first LWA station ("LWA-1") is under construction. It is located near the core of the National Radio Astronomy Observatory's Expanded Very Large Array (EVLA) [8] . All of the antennas have been installed, and a significant portion of the electronics, including the initial modules of the hardware and software described here, is in operation.
This paper is organized as follows: In Section 2, the overall system is described. In Section 3, details of the digital signal processing (DP) design are presented. Section 4 describes the implementation of the DP. In Section 5, some early test results are presented. Finally, in Section 6, conclusions are reached.
SYSTEM DESCRIPTION
Each LWA station includes several subsystems which work together to form beams from its antenna array as shown in Figure 2 . The DP subsystem digitizes the dual polarization signals at 196 MHz, adjusts the delay and amplitude of each signal, and forms four independent beams over the full 10 to 88 MHz band. Each beam has a digital receiver which can select two independent sub-bands. The sub-bands are processed by a filter bank into 4096 frequency channels and the data is sent to the Data Recorder subsystem for later analysis. The DP subsystem also provides two separate outputs intended to support detection of transient signals, known as the transient wideband buffer (TBW) and a transient narrowband buffer (TBN). These buffers are described in more detail below. The MCS subsystem monitors and controls the DP and other subsystems via 1 Gigabit Ethernet links [14] .
DIGITAL PROCESSING SUBSYSTEM: ARCHITECTURE
Key design considerations of the DP subsystem include modularity, expandability, and cost-effectiveness. A modular design allows individual submodules of the DP subsystem to be designed and tested individually. It is desirable for the system to be expandable so that the design may be implemented, validated, and utilized to obtain useful scientific results at LWA-1 and duplicated as additional stations are constructed. Cost effectiveness is crucial in enabling implementation of the system with available funding.
The system must also be able to meet the LWA timing requirements. An observation is defined as all parameters that define a beam, including the center frequency, bandwidth, delays, and gains. The time between the beginning of one observation and the beginning of the next observation is required to be 55 ms or less. A short minimum observation duration allows calibration of the system to be accomplished more quickly.
Other requirements include the ability to schedule events with a time resolution of 50 ms, sample timing accuracy of 709 ps, and the capability to initiate 106 observations per second [4] . Another important consideration is the minimization of board-level interconnections.
The digital processing subsystem architecture for a pair of dual polarization signals is shown in the block diagram in Figure 3 . The signals are digitized at 196 MHz and 12 bit resolution by a pair of analog-to-digital converters. The samples are distributed to four beam forming units (BFU), a TBW unit, and two TBN units [6] .
The beam forming units adjust the delay and amplitude of the signals to form four beams. Each beam forming unit adds the processed samples to the partial sums from the previous antenna pair, outputting a new partial sum. The final sums are sent to four digital receivers (DRX). Each digital receiver provides two independent sub-bands with user-selectable center frequency and bandwidth. There are 256 identical processing units, one for each antenna pair, the last of which is shown. Beams are formed by passing partial sums from one unit to the next. The final sums are further processed by Digital Receivers.
Figure 3-Digital Processing Subsystem block diagram
The transient wideband buffer consists of a memory that can be filled with raw data samples from all antennas upon receipt of a trigger signal from the monitor-control subsystem. A 32-MB RAM module is provided for each of the 2x256 signals, enough for 57 ms of recording. This data is then packetized and transmitted more slowly to the data recording subsystem.
Each transient narrowband buffer unit consists of a digital down converter and filters that decimate the samples to a specified center frequency and bandwidth.
Beamforming-Each pair of signals is distributed to four beam forming units, which adjust the delay and amplitude of the signals and adds them to the results from the previous antenna pair, so that in the end we have the delayed and weighted sum of all signals for each polarization. A block diagram of a beam forming unit is shown in Figure 4 . The delays are programmed so that a signal from the desired beam direction is time-aligned among all antennas, compensating for geometrical and cable delays. Coarse delay to the nearest sample is implemented using a first-infirst-out buffer.
Fine fractional-sample delay is implemented using a finite impulse response (FIR) interpolation filter [5] . Each pair of signals is then multiplied by a 2 x 2 matrix. This provides programmable amplitude weighting for beamforming and also allows correction of any misalignment of polarization among the antennas. can capture the raw output of each ADC. Each TBW can record up to 12,000,000 samples of 12 bit data to a 32 MB RAM. When the buffer is full, its contents are read out and formatted into UDP packets and sent to the Data Recorder subsystem via a 1 Gigabit Ethernet link. Each packet contains data from a particular antenna and both polarizations (X and Y). The packets include a header with the antenna number and time tag. Each packet contains 1200 bytes of data (400 samples of 12 bit data for each of the two polarizations). Although the TBW modules for the various antenna pairs are physically separate, they are synchronized under software control so that data from all antennas is captured simultaneously.
Transient Narrowband Buffer-Each pair of signals has a Transient Narrowband Buffer submodule, where the signal is digitally down converted, filtered using a two stage low pass filter, and decimated to the specified reduced Nyquist frequency. A block diagram of a transient narrowband submodule is shown in Figure 6 . Each submodule contains two downconverter/filter circuits, with independently settable center frequency and bandwidth. The first stage of each circuit is a CIC filter and the second stage is a FIR filter. The center frequency is specified by the user and has a range of 10-88 MHz. The bandwidth is also specified by the user and has possible values of 1 kHz, 3.125 kHz, 6.250 kHz, 12.50 kHz, 25.0 kHz, 50 kHz, or 100 kHz. This bandwidth reduction allows for continuous capture of the raw ADC output. TBN data is 16 bits (8 bits each, I & Q). The data is formatted as UDP packets and sent to the Data Recorder subsystem via the same 1 Gigabit Ethernet link as TBW. Each packet contains data from a particular antenna and polarization. The packets include a header with the antenna number and time tag. Each packet contains 500 samples or 1000 bytes of data . Although the TBN modules are physically separate for each antenna pair, software ensures that all are set to the same center frequency and bandwidth. 
DIGITAL PROCESSING SUBSYSTEM: IMPLEMENTATION
The DP subsystem is implemented using custom printed circuit boards: a Digitizer board which digitizes the signals and a Processor board which performs signal processing, arranged as shown in Figure 7 . The Processor board includes 5 Xilinx Virtex-5 Field-programmable gate arrays (FPGA). This is sufficient to provide beamformer, TBW and TBN modules for 10 antennas (20 signals). All processing for a signal pair is kept together in order to minimize board-level interconnections. Each Processor board mates with a Digitizer board, which carries the corresponding 20 analog-to-digital converters. Thus, 26 of these board pairs are needed to implement the 256-antenna system.
An advantage of using FPGAs is that they may be reprogrammed to allow the Processor boards to have different capabilities. Two additional Processor boards are used to implement the four Digital Receiver modules. These boards are identical to the 26 that are used for the antenna-by-antenna processing except for programming.
The DP subsystem is housed in two 14-slot commercially available Advanced Telecom Computing Architecture (ATCA) chassis with high speed backplanes. Each Processor board has a 1-Gigabit Ethernet link to the 1/10 Gigabit Ethernet switch that is used to communicate with the DP subsystem computer for monitor and control and to send the TBW and TBN data. Each chassis contains 13 Digitizer boards, connected to 13 Processor boards, each of which handles 10 pairs of signals. The partial sums are daisy chained together via the ATCA backplane for each chassis, and via cables to the other chassis. The final sums are routed to another Processor board in each chassis; each of those boards provides two of the Digital Receivers. Each dual-polarization beam is sent from the latter boards to the Data Recorder subsystem via a 10 Gigabit Ethernet link [5] .
Figure 7-DP Subsystem Block Diagram
Clock Synthesizer Board-The DP subsystem's clock synthesizer box accepts as inputs 10 MHz and 1 Hz clocks from the Timebase and Clock Distribution (TCD) subsystem. The connection between the TCD subsystem and the DP subsystem consists of one Category 7 cable. This cable has four individually shielded pairs, two of which are used A phase locked loop and voltage controlled crystal oscillator are used to generate the 196-MHz sampling clock that is phase locked to the 10-MHz and 1-Hz signals. The 196-MHz and 1-Hz clocks are copied 28 times using distribution buffers. Each pair of output signals is provided to a Digitizer board, which passes these clocks to a Processor board Clock distribution is shown in Figure  8 .
Figure 8-Clock distribution block diagram
Digitizer Board-The Digitizer boards are custom printed circuit boards with 16 layers. The Digitizer boards sample signals from the ASP's Analog Receiver (ARX) at a clock rate of 196 MHz with 12-bit resolution, using Analog Devices AD9230 analog-to-digital converters (ADC). The connection between the ASP and the Digitizer boards consists of 130 Category 7 cables. Each cable includes four individually shielded twisted copper wire pairs which are used for differential signals from both polarizations of two pairs of signals [12] . Shielded RJ-45 connectors are used for each cable. Each Digitizer board digitizes 10 pairs of signals. The analog signals from the ASP are sent over differential pairs using Category 7 cables. Five of these cables provide the 20 input signals to a Digitizer board. The Digitizer output data is distributed to the TBW, TBN, and BFU submodules on the Processor board. Gray code data format is used for the ADC output [5] . Embedded Software-The Processor boards utilize the Debian Linux operating system, which is hosted by the DP subsystem computer. When a Processor board is first powered on, the U-boot bootloader that is in its flash memory downloads a custom version of the Debian Linux DENX kernel from the DP subsystem computer. Each Processor board's file system is also provided by the DP subsystem computer and accessed using Network File System. Several Linux device drivers provide an interface for the software to the Processor board hardware. These device drivers are designed to be modular. The EBC device driver configures the EBC bank parameters and allows the embedded processor to communicate with the Xilinx FPGAs as memory mapped devices. The general purpose input/output (GPIO) device driver provides read/write access to the embedded processor's GPIO pins. The I2C device driver provides read/write access to any device on I2C bus, including an I2C to Serial Peripheral Interface (SPI) converter chip that is used to implement the interface between each Processor board and Digitizer board. [13] . Figure 11 is a plot of the theoretical and measured magnitude response of a TBN channel using a bandwidth and output sample rate of 100 kHz. The complex passband is symmetrical about zero frequency, with a Nyquist range of -50 kHz to +50 kHz. The 3 dB bandwidth is 2/3 of the sample rate. The filter suppresses signals outside the Nyquist band by at least 40 dB. 
RESULTS

As of December
CONCLUSION
We have described the design of a signal processing system for an LWA station. Although this is only one station of the full array, its 256 dual-polarization elements make it a large-number-of-antennas ("large N") telescope by current standards. The processing system provides beamforming for four independent, rapidly-steerable beams. It also provides narrow-bandwidth continuous streaming outputs for all element signals, and full bandwidth (10 to 88 MHz) capturing of short segments of all element signals. The architecture was chosen to make efficient use of available technologies, including modern FPGAs and high-speed backplanes, so as to minimize interconnections. The result is a compact and cost-efficient implementation.
ACKNOWLEDGEMENTS
The authors would like to acknowledge the contributions of Robert Proctor, Gerald Crichton, Leslie White, and Charles Goodhart in the implementation and delivery of the DP subsystem for LWA-1.
The LWA is being developed by a consortium of institutions including the University of New Mexico, Naval Research Laboratory, University of Iowa, Virginia Tech, and Los Alamos National Laboratory, and JPL. Many scientists have been involved in establishing its concept and specifications (e.g., [3] [4] [5] [6] [7] ). Steve Ellingson of Virginia Tech has designed many parts of the LWA, and in particular he is responsible for the Monitor and Control subsystem and the Data Recorder subsystem, both of which connect to the Digital Signal Processing subsystem described here; we are grateful for his collaboration on those interfaces [14] . Joseph Craig of the University of New Mexico has designed the Analog Signal Processing subsystem and has also been responsible for much of the system-level design (e.g., [6] ); we are grateful for his collaboration as well. Direct support of our work by Gregory Taylor and Lee J. Rickard of the University of New Mexico is also gratefully acknowledged.
This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology. Government sponsorship acknowledged. Construction of the LWA has been supported by the Office of Naval Research under Contract N00014-07-C-0147.
