Optimal periodical memory allocation for logic-in-memory image processors by 亀山 充隆
Optimal Periodical Memory Allocation for Logic-in-Memory Image Processors
Masanori Hariyama and Michitaka Kameyama
Graduate School of Information Sciences
Tohoku University
Aoba 6-6-05, Aramaki, Aoba, Sendai, 980-8579, Japan
{hariyama,kameyama}@ecei.tohoku.ac.jp
Yasuhiro Kobayashi
Oyama National College of Technology
Nakakuki 771, Oyama, Tochigi, 323-0806, Japan
y-kobayashi@oyama-ct.ac.jp
Abstract
One major issue in designing image processors is to de-
sign a memory system that supports parallel access with a
simple interconnection network. This paper presents a de-
sign methodology for a logic-in-memory architecture where
each of memory modules is connected to its dedicated pro-
cessing element(PE). An efﬁcient memory allocation to min-
imize the number of memory modules and PEs under a time
constraint is proposed based on regularity.
1 Introduction
Highly-parallel image processors requires a complex in-
terconnection network between memory modules and pro-
cessing elements(PEs) for parallel memory access. The
complex interconnection network causes signiﬁcant over-
head in delay and power in deep-submicron and more ad-
vanced technologies since the delay and the power of in-
terconnection units are more dominant than those of logic
units.
This paper presents a design methodology for a logic-
in-memory architecture where each of memory modules is
connected to its dedicated PE. The advantage of the logic-
in-memory architecture is its simple interconnection net-
work between memory modules and PEs. Its disadvantage
is that it may require a large number of PEs when a large
number of memory modules are required for parallel access.
To solve this problem, this paper also presents a memory
allocation technique to enable parallel access with the min-
imum number of memory modules. From a practical point,
a memory allocation should have a simple address function
that calculates which memory module a pixel is stored in.
Hence, we consider a periodical memory allocation where
a memory allocation for a whole image is given by repeat-
ing a memory allocation for a partial image. A periodical
memory allocation has a simple address function because
of its regularity. The memory allocation problem is usually
mapped onto the clique partitioning problem that is known
as an NP-complete problem[1]. An efﬁcient search method
is presented based on the regularity of a periodical memory
allocation. As a result, an optimal memory allocation for a
practical image size is found in much less than 1 second on
Pentium4@2GHz.
The proposed method is used for a road-extraction VLSI
for highly-safe vehicles and a stereo-matching VLSI[2]. In
the former and latter cases, the number of memory modules
is reduced to 91.7% and 19.4%, respectively.
2 Problem formulation
2.1 Target algorithms
We consider window-type algorithms. In these algo-
rithms, the output/intermediate output depends on a small
neighborhood of an input image, where the neighborhood
size is ﬁxed and given as a window as shown in Fig. 1.
Algorithms of this type frequently appear in practical situ-
ations: spatial ﬁlter, morphology, and image matching, and
so on. Moreover, they usually have high degree of paral-
lelism, and are suited for VLSI implementations.
We use window-serial-and-pixel-parallel scheduling as
shown in Fig. 2. In this scheduling, operations are per-
formed in parallel with pixels in a window, whereas opera-
tions are performed in a serial manner with windows. Fig-
ure 2(a) shows the location of the window at each step. The
thick line denotes a window. Figure 2(b) shows the sched-
uled data-ﬂow graph(SDFG) corresponding to Fig. 2(a). A
node in the SDFG denotes a operation. The labels A and
B on operations denote the operation types of the nodes.
There is an edge from node V1 to node V2 when the out-
put of V1 is used as an input of V2. At Step 1 in Fig. 2,
pixels:(0, 0), (0, 1), (0, 2) and (1, 1) are used as inputs of
operations of type A. The pixels must be accessed in par-
allel since the A-type operations are performed in parallel.
Their results are used as inputs of aB-type operation. As the
location of the window changes, the input pixels change.
Proceedings of the 2006 Emerging VLSI Technologies and Architectures (ISVLSI’06) 
0-7695-2533-4/06 $20.00 © 2006 IEEE 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 21:16 from IEEE Xplore.  Restrictions apply.
Figure 1. Window.
(0,0) (0,1) (0,2) (1,1)
(1,0) (1,1) (1,2) (2,1)
(0,1) (0,2) (0,3) (1,2)
(1,1) (1,2) (1,3) (2,2)
Figure 2. Data-ﬂow graph of a pixel-parallel-
and-window-serial schedule.
2.2 Target architecture
Figure 3 shows the basic structure of the logic-in-
memory architecture. A single dedicated PE is connected
to each memory modules. Pixels of input images are dis-
tributed among the memory module. The PEs performs
operations of type A according to the window-serial-and-
pixel-parallel scheduling shown in Fig. 2(b). Their outputs
are used as inputs of the unit for type-B operations. In order
to extend the architecture model, you can add inputs to PEs
as required.
2.3 Periodical memory allocation
From a practical point, a memory allocation should have
a simple address function to map the coordinates of a pixel
onto the memory module number. If the address function is
complex, its overhead in area and delay become serious. For
example, a look-up table is required for address mapping
of all pixels in the worst case. Therefore, we consider a
periodical memory allocation where a memory allocation
Figure 3. Logic-in-memory architecture.
for a whole image is given by repeating a memory allocation
for a partial image. The periodical memory allocation has
a simple address function because of its regularity. Let K
be the number of memory modules. Let Ni be the number
of pixels that are allocated to the memory module Mi for
1 ≤ i ≤ K. Let (xji , yji ) be the coordinates of a pixel
that is allocated to the memory module Mi for 1 ≤ j ≤ Ni.
Then, we exactly deﬁne a periodical memory allocation as a
memory allocation where the coordinates (xji , y
j
i ) of pixels
allocated to Mi are expressed as
[
xji
yji
]
= sji
[
Ux
Uy
]
+ tji
[
Vx
Vy
]
+
[
x0i
y0i
]
(1)
where U = [Ux Uy]T and V = [Vx Vy]T are vectors to rep-
resent periods(called period vectors); the variables sji and
tji are integers; the coordinates
(
x0i , y
0
i
)
are those of the ref-
erence pixel allocated to Mi. Note that you can select an
arbitrary pixel as the reference one from the ones allocated
to the same memory module.
Fig. 4(a) shows an example of a periodical memory al-
location for the SDFG shown in Fig. 2. The label on a pixel
denotes the module number where the pixel is allocated. For
example, the pixels: (0, 0),(0, 1),(1, 0) and (1, 1) are allo-
cated to M1, M2, M3 and M4, respectively. From Fig. 4
(b), the coordinates of the pixels allocated to M1 are given
by
[
xj1
yj1
]
= sj1
[
1
2
]
+ tj1
[
2
0
]
+
[
0
0
]
(2)
where the coordinates of the reference pixel and the period
vectors are [0 0]T , U = [1 2]T and V=[2 0]T , respectively.
Each of the pixels with label 1 in Fig. 4 (b) are given by
(sj1, t
j
1)=(0, 0), (1, 0), (2, 0), (0, 1), (1, 1) or (2, 1). Figures
4-(c), 4-(d) and 4-(e) show the pixels allocated to M2, M3
and M4, respectively. These ﬁgures show that the same pe-
riod vectors as M1 are used for M2, M3 and M4. They also
show that the coordinates of the reference pixels for M2,
M3 and M4 are [0 1]T , [1 0]T and [1 1]T , respectively. The
memory allocation shown in Fig. 4 satisﬁes Eq.( (1)), that
is, the deﬁnition of a periodical memory allocation.
Proceedings of the 2006 Emerging VLSI Technologies and Architectures (ISVLSI’06) 
0-7695-2533-4/06 $20.00 © 2006 IEEE 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 21:16 from IEEE Xplore.  Restrictions apply.
yx
y
x
4 4 4
x
y
x
y
x
y
4 4 4
Figure 4. Example of a periodical memory al-
location.
As mentioned at the beginning of this section, the advan-
tage of the periodical memory allocation is its simple ad-
dress function. For the periodical memory allocation shown
in Fig. 4(a), the address function that maps the coordinates
(x, y) of a pixel to the module number is given by
f(x, y) = (2x + y) mod 4 + 1 (3)
2.4 Optimal memory allocation
Let us minimize the hardware amount when the SDFG
is speciﬁed. In the logic-in-memory architecture, the hard-
ware amount is determined by a memory allocation task
since the number of PEs is determined by the number of
memory modules. The SDFG imposes the degree of paral-
lelism in memory access. For example, in the SDFG shown
in Fig. 2, the degree of parallelism in memory access is 4.
From these observations, minimizing the hardware amount
is formulated as ﬁnding a memory allocation that minimizes
the number of memory modules under a speciﬁed degree of
parallelism in memory access. Given a window, the optimal
memory allocation is deﬁned as the memory allocation that
satisﬁes the following conditions:
C1: For an arbitrary location of the window, pixels in the
window can be retrieved in parallel. In other words, the
pixels in the window are allocated to different memory
modules.
C2: Each pixel is allocated to a single memory module.
This condition ensures that the total memory capacity
is minimized.
C3: The number of memory module is minimized. This
condition ensures that the hardware amount is mini-
mized in the logic-in-memory architecture.
C4: The memory allocation is a periodical one. This condi-
tion ensures that the hardware for the address function
is small.
For example, Fig. 4(a) is the optimal memory allocation
for the window shown in Fig. 1 from the following reasons.
The condition C1 is satisﬁed since pixels in the window are
allocated to different pixels for an arbitrary location of the
window. The condition C2 is satisﬁed since each pixel has a
single label. The condition C3 is satisﬁed since the number
of memory module is 4 and is equal to the minimum num-
ber of memory modules required for parallel access, i.e. the
number of pixels in the window. The condition C4 is satis-
ﬁed since the memory allocation is periodical as mentioned
in section 2.3.
2.5 Overview of search method
In the periodical memory allocation, the period vectors
U and V determine which pixels are allocated to the same
memory module as shown in Fig. 4. The period vectors
make a parallelogram. Given the period vectors, the num-
ber of memory modules is estimated by the area of the par-
allelogram. The area S of the parallelogram is given by
S = |Ux · Vy − Uy · Vx|. (4)
For example shown in Fig. 4,
S = |1 · 0− 2 · 2| = 4,
and S is exactly same as the number of memory modules.
This is because the memory allocation for a whole image
is given by repeating the one for the parallelogram, and the
parallelogram must be ﬁlled with pixels allocated to differ-
ent memory modules. From these observations, ﬁnding the
optimal memory allocation is reduced to ﬁnding period vec-
tors that make the minimum parallelogram still satisfying
the parallel access condition.
Basically, the search for the optimal period vectors for
Umin and Vmin is performed by repeating following steps
for U and V within possible coordinates of pixels.
Step 1 Check the parallel access condition, that is, whether
the window includes the pixels allocated to the same
memory module.
Step 2 If the parallel access condition is satisﬁed, calculate
the area S by (4). Otherwise, go to Step 1.
Proceedings of the 2006 Emerging VLSI Technologies and Architectures (ISVLSI’06) 
0-7695-2533-4/06 $20.00 © 2006 IEEE 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 21:16 from IEEE Xplore.  Restrictions apply.
Figure 5. Road extraction.
Figure 6. Window for road extraction .
Step 3 If S is smaller than current minimum area Smin,
then Smin ← S, Umin ← U, Vmin ← V.
To reduce the computational amount, we can limit the
reference pixel to [0 0]T as shown in Fig. 4(b). Moreover,
we can limit the search space using the branch-and-bound
method such that
S = |Ux · Vy − Uy · Vx| < Smin,
where Smin is the current minimum area of the parallelo-
gram made by the current period vectors.
3 Design examples
3.1 Road extraction VLSI
Let us design a road extraction VLSI for highly-safe ve-
hicles. The “road” is deﬁned as a set of regions that a ve-
hicle can climb over. We assume that the 3-D information
of the land surface is obtained from pre-processing such as
stereo vision. Given the 3-D coordinates of the land surface
and the size of the wheel of a vehicle, let us examine a 3-D
pixel to see if it is a 3-D pixel on the “road”. Let R be the
radius of the wheel. As shown in Fig. 5, let A be the contact
between the wheel and the land. Let B be an arbitrary 3-D
pixel within the radius from A. The 3-D pixel A is a part
of the “road” if the vertical interval |Hb − Ha| is less than
acceptable range Hmax. This condition is given by
−Hmax ≤ Hb −Ha ≤ Hmax (5)
where Ha and Hb are the z-coordinates of A and B, respec-
tively. Considering the traveling direction of the vehicle,
1 2 3 4 5 6 7 8 9 12 13 14 15 16 17 18 1910 11 20 21 22
32 33 34 35 36 37 38 3931 40 4123 24 25 26 27 28 29 30 42 43 44
52 53 54 55 56 57 58 5951 60 6145 46 47 48 49 50 62 6463
72 73 74 75 76 77 78 7910 71 80 81 85 86
65 66
67 68 69 7 82 8483
92 93 94 95 96 97 98 9991 F0 F1 F5 F6 F7 F8
87 88
89 90 F2 F4F3 F9 G0
15 16 17 18 19 20 21 22
37 38 39 40 41 42 43 44
59 60 61 62 6463
81 85 86
65 66
82 8483
F5 F6 F7 F8
87 88
F4F3 F9 G0
1 2 3 4 5 6 7 8 9 12 13 1410 11
32 33 34 35 363123 24 25 26 27 28 29 30
52 53 54 55 56 57 585145 46 47 48 49 50
72 73 74 75 76 77 78 7971 8067 68 69 70
92 93 94 95 96 97 98 9991 F0 F189 90 F2
7 8 9 12 13 14 15 16 17 18 1910 11 20 21 22
32 33 34 35 36 37 38 3931 40 4129 30 42 43 44
52 53 54 55 56 57 58 5951 60 61 62 6463
73 74 75 76 77 78 79 80 81 85 86
65 66
82 8483
95 96 97 98 99 F0 F1 F5 F6 F7 F8
87 88
F2 F4F3 F9 G0
1 2 3 4 5 6
23 24 25 26 27 28
45 46 47 48 49 50
727167 68 69 70
92 93 949189 90
7 8 9 12 13 14 15 16 17 18 1910 11 20
32 33 34 35 36 37 38 3931 40 4129 30 42
52 53 54 55 56 57 58 5951 60 61 62 6463
73 74 75 76 77 78 79 80 81 85 8682 8483
95 96 97 98 99 F0 F1 F5 F6 F7 F8F2 F4F3
21 22
43 44
65 66
87 88
F9 G0
1 2 3 4 5 6
23 24 25 26 27 28
45 46 47 48 49 50
727167 68 69 70
92 93 949189 90
Fk:100+k, Gk: 110+k
Figure 7. Optimal allocation for Fig. 6.
Figure 8. Logic-in-memory architecture for
road extraction.
Eq. (5) is checked for B within a front half circle centered
on A as shown in Fig. 6.The black pixel and gray pixels
denote A and B, respectively.
Let us ﬁnd the optimal memory allocation for the win-
dow shown in Fig. 6. Suppose that the area for road ex-
traction is 10[m] × 50[m] ahead and that the area is di-
vided into a set of squares of 5[cm] × 5[cm]. Then, the
3-D information of the land can be considered as an image
of 200 × 1000 pixels. Figure 7 shows the optimal mem-
ory allocation. It shows the result for a partial image since
the space is limited. The labels Fk and Gk mean mem-
ory modules M100+k and M110+k, respectively. The period
vectors are U = [14 5]T and V = [6 10]T . The number of
memory modules is 110. The search time for the optimal al-
location is 40ms on a PC(Pentium4@2GHz, 2G-byte main
memory, OS:Window XP). Figures 8 and 9 show the logic-
in-memory architecture for road extraction and the block di-
agram of the PE, respectively. Note that the interconnection
between memory modules and PEs is extended for road ex-
traction. The road extraction VLSI is designed in a 0.5μm
CMOS technology. Its performance is 1200 times higher
than that of Pentium4@2GHz.
Let us compare the proposed method with the conven-
tional approach from the point of the number of memory
Proceedings of the 2006 Emerging VLSI Technologies and Architectures (ISVLSI’06) 
0-7695-2533-4/06 $20.00 © 2006 IEEE 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 21:16 from IEEE Xplore.  Restrictions apply.
Figure 9. Block diagram of the PE.
Figure 10. Rectangular memory allocation.
modules. For the window-type image processing, one ef-
ﬁcient memory allocation method is a rectangular memory
allocation[3] as shown in Fig. 10. A minimum bounding
rectangle approximates a given window(Fig. 10(a)). The
optimal memory allocation for the rectangle window is ob-
tained by the rectangular memory allocation(Fig. 10(b)).
The number of memory modules is K × L for a rectangle
window of size K × L. The window shown in Fig. 6 is ap-
proximated by a rectangle window of size 8× 15. The rect-
angular memory allocation requires 8×15(= 120) memory
modules. As a result, the number of memory modules is re-
duced to 91.7%(110/120×100).
3.2 Stereo matching VLSI processor using
multi-resolution images
A coarse-to-ﬁne approach using multi-resolution images
is one of important techniques in image processing such as
pattern matching. This section describes the optimal mem-
ory allocation for stereo matching using multi-resolution
images. The method proposed in Section 2 is extended to
the case with several different windows in the following.
Stereo matching is one efﬁcient method to obtain 3-D in-
formation of real scene. It uses two images taken from two
different cameras at the same time. After correspondence
between the two images is established, the 3-D information
of the scene is computed based on the triangular method.
The correspondence search is time-consuming. Figures 11
and 12 show the correspondence search and the DFG, re-
spectively. Figure 14 shows the logic-in-memory architec-
ture for stereo matching. As a similarity measure between a
reference window and a candidate window, a sum of abso-
lute differences(SAD) is used. Given a reference window,
UR
VR
oUL
VL
o
Left image Right image
Object
Candidate windows
Reference
windows
Epipolar
line
Q
L R
Figure 11. Search for a corresponding point.
AD AD AD
AD AD AD
AD AD AD
MVD
MVD
Q Q ADs
Q Q ADs
Q Q ADs
S 1
S 2
S 3
S n
S n+1
Time
MVD
AD : Absolute Difference
: Minimum Value Detection
Figure 12. Data ﬂow graph of stereo matching
using multi-resolution images.
SADs are computed for all the possible candidate windows.
The candidate window with the minimum SAD is deter-
mined to be the corresponding pixel.
Multi-resolution images are used to reduce the compu-
tational amount. Given sampling periods SR = SRmax,
SRmax−1,..., 2, 1, reduced images are made by sampling
the original images every SR pixels. Beginning with the
lowest-resolution image (SR = SRmax), the resolution is
iteratively increased until SR = 1. The possible location
of candidate windows at a higher resolution are limited by
using the matching result at a lower resolution. Hence, the
computational amount is reduced[2].
Figure 13 shows the windows at multi-resolution images
for window size Q = 3. As shown in Fig. 13(a), the win-
dow of 3×3 at SR = 1 corresponds to the window of 3×3
on the original image. This is because the image at SR = 1
is exactly same as the original image. As shown in Fig.
13(b), the window of 3 × 3 at SR = 2 corresponds to the
window of 5×5 on the original image. Note that only 3×3
gray pixels must be accessed in parallel. In order to ﬁnd
the optimal memory allocation for such different windows,
we make a single window by the union of the windows of
Q × Q atSR = SRmax, SRmax−1, ..., 2, 1 on the original
image.
Figure 15 shows the resulting optimal memory alloca-
tion for Q = 3 and SRmax = 3. The period vectors are
U = [5 0]T and V = [0 5]T . The number of memory mod-
ules is 25. Figures 15 (a), (b) and (c) corresponds to SR=1,
Proceedings of the 2006 Emerging VLSI Technologies and Architectures (ISVLSI’06) 
0-7695-2533-4/06 $20.00 © 2006 IEEE 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 21:16 from IEEE Xplore.  Restrictions apply.
Figure 13. Window.
ADD
ADD
ADD
M0 M1 MK
REG
AD
REG
REG
AD
REG
REG
AD
REG
REG REG
SAD
Mi (i=0,1,..,K):
Memory modules for  a candidate image
Figure 14. Logic-in-memory architecture for
stereo matching.
2 and 3, respectively. The label (x, y) and k(= 1, 2, · · · 25)
on a pixel denote the coordinates of the pixel on the origi-
nal image and the memory module number where the pixel
is allocated, respectively. You can see that all the pixels in
a 3 × 3 window of an arbitrary location are distributed be-
tween different memory modules at each sampling period.
In other words, parallel memory access for a 3× 3 window
is achieved at each sampling period. Table 1 shows com-
parison between the proposed and the rectangular memory
allocation for Q = 4 and SRmax = 8, i.e. a more practical
parameters. The number of memory modules (AD units)
are reduced to 19.4%. The search time is 30ms on the PC.
4 Conclusion
This paper presents an interconnection-aware design
methodology for image processors. A key to success is a
optimal memory allocation for a logic-in-memory architec-
ture. The method is also useful for FPGA implementations
where interconnection overhead in delay is signiﬁcant large.
(a) Original image(SR=1)
(b) Reduced image
(SR=2)
(c) Reduced image
(SR=3)
(0,0)
(0,0)
(0,0)
(0,1)
(1,0)
(1,1)
(2,0)
(2,0)
(2,1)
(3,0)
(3,0)
(3,1)
(4,0)
(4,0)
(4,1)
(5,0)
(5,1)
(6,0)
(6,0)
(6,0)
(6,1)
(7,0)
(7,1)
(8,0)
(8,0)
(8,1)
(9,0)
(9,0)
(9,1)
6 7 8 9 10
11 12
12
13 14
14
15
16 17
17
18 19 20
20
1 2 3 4 5
6
66
7 8
8
9
9
10
10
11 12 13 14 15
16
16
17 18
18
19 20
20
21
21
22 23 24
24
25
1 2 3 4 5
6 7
77
8 9
9
10
10
11 12 13 14 15
16 17
17
18 19
19
20
21 22
22
23 24 25
25
1
1
1
2 3
3
4
4
5
5
6 7 8 9 10
11
11
12 13
13
14 15
15
16
16
17 18 19
19
20
21
21
22 23
23
24 25
25
21 22
22
23 24
24
25
1 2
2
2
3 4
4
5
5
(0,2)
(0,2)
(1,2) (2,2)
(2,2)
(3,2) (4,2)
(4,2)
(5,2) (6,2)
(6,2)
(7,2) (8,2)
(8,2)
(9,2)
(0,9)
(0,9)
(1,9) (2,9) (3,9)
(3,9)
(4,9) (5,9) (6,9)
(6,9)
(7,9) (8,9) (9,9)
(9,9)
(0,3)
(0,3)
(1,3) (2,3) (3,3)
(3,3)
(4,3) (5,3) (6,3)
(6,3)
(7,3) (8,3) (9,3)
(9,3)
(0,4)
(0,4)
(1,4) (2,4)
(2,4)
(3,4) (4,4)
(4,4)
(5,4) (6,4)
(6,4)
(7,4) (8,4)
(8,4)
(9,4)
(0,5) (1,5) (2,5) (3,5) (4,5) (5,5) (6,5) (7,5) (8,5) (9,5)
(0,6)
(0,6)(0,6)
(1,6) (2,6)
(2,6)
(3,6)
(3,6)
(4,6)
(4,6)
(5,6) (6,6)
(6,6)(6,6)
(7,6) (8,6)
(8,6)
(9,6)
(9,6)
(0,7) (1,7) (2,7) (3,7) (4,7) (5,7) (6,7) (7,7) (8,7) (9,7)
(0,8)
(0,8)
(1,8) (2,8)
(2,8)
(3,8) (4,8)
(4,8)
(5,8) (6,8)
(6,8)
(7,8) (8,8)
(8,8)
(9,8)
Figure 15. Optimal memory allocation for
multi-resolution images.
Table 1. Comparison between the proposed
method and the rectangular allocation.
Proposed Rectangular
Allocation
Number of modules 121 625
Number of AD units 121 625
Number of adders 123 624
Acknowledgment
This work was supported in part by industrial technology
research grant program from New Energy and Industrial
Technology Development Organization (NEDO) of Japan.
References
[1] D. Gajski, N. Dutt, A. Wu, and S. Lin, “HIGH-LEVEL
SYNTHESIS -Introduction to Chip and System De-
sign,” Kluwer Academic Publishers, pp.277-278(1992).
[2] M. Hariyama, H. Sasaki, and M. Kameyama, ”Archi-
tecture of a Stereo Matching VLSI Processor Based on
Hierarchically Parallel Memory Access”, IEICE Trans.
Info. and Syst., Vol. E88-D, No.7,pp.1486-1491(2005).
[3] M. Hariyama, S. Lee, and M. Kameyama, “Highly-
Parallel Stereo Vision VLSI Processor Based on an Op-
timal Parallel Memory Access Scheme”, IEICE Trans.
Electron., Vol.E84-C, No.5, pp.382-389(2003).
Proceedings of the 2006 Emerging VLSI Technologies and Architectures (ISVLSI’06) 
0-7695-2533-4/06 $20.00 © 2006 IEEE 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 21:16 from IEEE Xplore.  Restrictions apply.
