I. INTRODUCTION
The increasing complexity and capability of programm able logic devices is opening a wide range of difficult high dimensional problems to real-time hardware implementation. Examples include five-dimensional (5D) motion estimation and four-dimensional (4D) segmentation in computed tomog raphy (CT) and volumetric ultrasound imagery [1]- [3] , 4D seismic monitoring [4] , and 4D computer vision applications such as depth filtering and distractor removal [5] - [8] .
Within these applications, a large number of problems can be addressed by linear filters which can be elegantly implemented in digital hardware. This work is concerned with the formulation of digital hardware architectures appropriate to these higher-dimensional problems.
Although the range of potential applications is broad, for the purpose of evaluating our technique we focus on the problem of depth filtering from light field camera arrays. In particular, we propose a novel architecture for hardware im plementation of the 4D 1st-order frequency-hyperplanar digital filter proposed in [9] . We implement the filter on a single Xilinx Virtex-4 Sx35-lOff668 FPGA device, demonstrating that what would normally be a processor-intensive and possibly slow operation can be implemented on specialized hardware, allowing accelerated or even real-time operation [10] .
II. BACKGROUND: FILTERING OVER 4D LIGHT FIELDS
Light fields first came about as an image-based approach to computer graphics [11] , [12] , foregoing geometric models for a collection of images describing the behaviour of light per meating a scene. They have since gathered attention in image processing, allowing linear filtering techniques to accomplish complex tasks such as depth filtering and distractor isolation from moving cameras [8] , [9] .
The light permeating a static scene is most completely de scribed in terms of the continuous plenoptic function C(x, i), which varies with position x = (X, Y, Z) and direction i = (rx,ry,rz), Ilill = 1 [13] -note that direction has only two degrees of freedom, and so the total dimensionality of the plenoptic function is five. All cameras sample subsets of this function, and a pinhole camera in particular measures a pencil of rays passing through a fixed position x.
A light field camera measures a larger subset of the plenop tic function by placing multiple apertures on a 2D grid, yielding a 4D subset of the plenoptic function in two angular and two spatial dimensions. It may seem that constraining the apertures to a plane represents a loss of information in the third spatial dimension. However, when operating in a non-attenuating medium such as air, and in the absence of occlusion, rays do not change in value along their direction of propagation, and so the third spatial dimension can be dis carded [11] . We operate under the assumption that occlusion accounts for relatively little of a scene's energy.
A convenient way to parametrize the 4D light field is using the two-plane parametrization (2PP) depicted in Fig. 1 , in which each of the measured rays is described by its points of intersection with two reference planes: the (8, t) E JR. 2 plane given by z = 0, and the (u, v ) E JR. 2 plane which is parallel to the (8, t) plane at some positive separation z = D. One can imagine the (8, t) coordinate as fixing the position of a 978-1-61284-857-0/11/$26.00 (92011 IEEE ray, with (u, v) fixing its direction. Alternatively, the (8, t) plane can be imagined as a grid of pinhole cameras facing the (u, v) plane. Fixing (8, t) selects a specific pinhole camera, and (u, v) act as pixel coordinates for that camera, skewed such that all cameras share common (u, v) coordinates.
We denote the continuous-domain 4D light field L (8, t, u, v), which, following uniform sampling at (�8, �t, �u, �v) E 1R 4 , leads to a discrete-domain 4D input tensor x(n1�8, n2 �t, n3�u, n4�v) W X(z), z == (Z1' Z2 , Z3, Z4) E ([: 4 . The output of a 4D IIR filter is another light field y(n1�8, n2 �t, n3�u, n4�v) W Y(z).
In [9] , a lSI-order frequency-hyperplanar filter was proposed using the resonant properties of resistively terminated pas sive 4D inductance-resistance networks [14] . Ideal frequency hyperplanar filters are planar resonant on the 4D passband hyperplane [9] (1) and have the 4D Laplace transform input-output transfer function This hyperplanar filter has the desirable property of selectively filtering a light field for a prescribed depth. That is, one may selectively extract scene elements which lie within a plane parallel to and at a prescribed distance from the light field camera, attenuating all scene elements which are outside of the selected plane. It is this filter (2) which motivates our hardware implementation.
III. SCANNED-ARRAY 4D IIR FILTER ARCHITECTURE
We propose a 4D IIR filter architecture starting from a one-dimensional (lD) direct-form (type II) signal flow graph (SFG). Direct-form II SFGs results in canonical structures that consume the lowest memory resources, leading to the smallest number of first-in-first-out (FIFO) buffers in the digital filter circuit. This important advantage is retained in the proposed 4D filter architecture. The authors believe the proposed novel 4D filter architecture to be the first and only such digital hardware architecture in the current literature.
A. 1st-order 4D llR frequency-hyper planar filters from net work resonance
The 4D IIR frequency-hyperplanar z-domain transfer func tion corresponding to (2) is given by Fig. 2 . Block definitions for the raster-scanned 4D frequency-hyperplanar fil ter architecture (top). Raster-scanned first-order 4D IIR frequency-hyperplanar digital filter architecture in direct-form (type II) (bottom) [10] .
where
The 4D input-output tensors x(n) and y(n) are related to the corresponding 4D z-transforms via
where n = (n1' n2 , n3, n4) E ;£'; 4 . Here, b ih k l are 4D filter feedback coefficients. The first-order practical-BIBO stable 4D IIR frequency-hyperplanar digital filter is implemented using the 4D recursive difference equation under zero initial conditions (ZICs),
Raster-scanned 4D volume arrays or light field tensors
In array processing, the required raster-scanned input stream is obtained by uniformly raster-scanning a 3D NI x N2 X N3 volume sensor-array, first along columns, then rows, and finally depth. Alternatively, for light fields, the input tensor x(n ) is of size NI x N2 X N3 X N4. The sampled stream of data, denoted as W SCAN(k ) = x(n ) , is used for 4D filtering operations, which produces a corresponding filtered output stream YscAN(n). The raster-scanned input stream and filtered output stream are obtained under the raster-scanned sampling relation k = NIN2N3n4 + NIN2n3 + NIn2 + nl.
The circuit clock frequency is FCLK = 1/ t1TAS where t1TAs = t1Ts/NIN2N3 and t1Ts is the uniform volume frame sample-time (i.e. temporal inter-sample time for each sensor).
C. 4D direct-form (type II) structure
The proposed architecture [10] (see Fig. 2 ) is a novel 4D direct-form (type II) structure. ZICs, being essential for practical-BIBO stability, are obtained using spatial-delay processor (SDP) blocks. The architecture requires 15 parallel multipliers, 19 two-input adders including the 16-input adder tree shown in Fig. 2 , one volume-array clocked FIFO buffer r of length NIN2N3 (which requires the most memory resources), three SDPDs, five SDPRs, and nine SDPc ZIC circuits.
Column-and row-wise ZICs are achieved using column and row SDP circuits, denoted SDPc and SDPR respectively [15] . A novel depth-wise spatial delay processor [10] SDPD, that is a 4D extension of SDPs in [15] , is proposed in Fig. 3 . SDPs present a zero value at their output when a ZIC is required, and pass the input signal unchanged otherwise.
D. VLSI resource consumption and critical path delay
If the VLSI hardware requirements for W mul-bit multipli ers and adders are 'YM, and 'Y A / s, respectively, then the total VLSI resource consumption of a circuit is approximated by [10] 'YT � 15'YM + 19'YA/s + W qrKo + KI + K2, where r = NIN2N3, Ko are the total VLSI resource requirements for a one-bit delay buffer, KI are the VLSI resource require ments for the ZIC circuits, and K2 are additional resource requirements for other delay buffer circuits and quantizers. The minimum critical path delay (CPD) of the circuit, following pipelining, is given by TCPD � TM + TA/S + TMUX, where TM, TA/S and TMUX are the propagation delays of a parallel multiplier, adder/subtractor, and two-input W -bit multiplexer, respectively. Therefore, the maximum clock frequency is FCLK,Max = I/TcPD.
E. Estimated real-time throughput
The real-time throughput is 15FcLK fixed-point multipli cations and 19FcLK additions/subtractions per second, cor responding to the volume-rate Fs = FCLK /NIN2N3 Hz. For a light field having dimensions NI x N2 X N3 X N4, the architecture completes the filtering operation in To = FCLK/NIN2N3N4 seconds.
IV. PROTOTYPES ON XILINX V IRTEX-4 FPGAs
Two examples of the 4D IIR frequency-hyperplanar filter were physically implemented on a Xilinx Virtex-4 Xc4vSx35-IOff668 FPGA device installed on a Xilinx XtremeDSP kit-4 development board. The two designs are summarized in Table I , in which Wand D define precision as input word size and binary point position, Wout, Dout define output precision, W mul, Dmul define coefficient precision, and Wq, Dq define the precision of the quantizer immediately preceding the largest FIFO buffer, having depth NIN2N3.
Example 1 is a self-contained FPGA implementation fully operational on a single Sx35 device. FPGA resource consump tion and CPD both linearly increases with fixed-point precision levels. The CPD must be reduced as much as possible for high speeds of operation, thereby leading to the need for smaller circuits (lower precision). Therefore, the finite word sizes of the hardware design has been scaled to various fixed-point precision levels in order to meet both place-and-route and timing requirements. Example 2 is a similar design but has support for a larger light field. For this reason, its largest buffer -a 64 x 64 x 8 17-bit FIFO -is located on the host computer, interfaced through a 32-bit PCI interface.
The PCI cores, glue logic and related drivers needed for on FPGA verification are provided transparently via the hardware co-simulation (HCS) facility of the XtremeDSP Kit-4. The 4D input-output frequency-response of each prototype was measured using HCS by taking the 4D fast Fourier transform (FFT) of the unit impulse response h(n) � H(z), which was measured and confirmed on-chip using the XtremeDSP Kit-4
for a unit impulse input WSCAN(kt:.TAs) = o(k). In Fig. 4 (top), we show a 2D slice of the ideal 4D frequency response (for W3,4 = 0) associated with the ideal 4D unit impulse response h1deal(n) as defined by (6) . Fig.4 
(mid) & (bottom)
correspond to the hardware-derived 4D unit impulse responses, obtained from bit-true cycle-accurate logic simulation, for Examples 1 and 2, respectively.
V. CONCLUSIONS
A novel hardware architecture was proposed for the im plementation of 4D IIR frequency-hyperplanar digital filters, with applications in 3D video processing, volumetric-array beamforming, and light field processing. The proposed 4D IIR digital filter structure is, to our knowledge, the first of any kind to be proposed for realizing 1st order 4D IIR frequency hyperplanar transfer functions. Two 4D raster-scanned archi tectures have been designed, simulated, and implemented on FPGA chips using Xilinx programmable logic technology, and their correct operation has been verified using stepped on chip hardware co-simulation where test vectors were routed to the physical FPGA circuit implementation using Matlab based RCS hardware-in-the-Ioop test features.
Reducing the computational complexity using 4D fast al gorithms, minimizing CPD, removing quantization effects, estimating power consumption, and trials with real-world data remain for future work. 
