A study on design scheme for self-timed pipelined systems by 三宮, 秀次

































2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 . . . . . . . . . . . . . . . 23
2.3.2 . . . . . . . . 25
2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 29
3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4 41
4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 . . . . . . . . . . . . . . . . . . . . 42
4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.1 I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.2 II-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.3 II-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
– iii –
4.4 . . . . . . . . . . . . . . . . . . . . . . . 47
4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 49
5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3 . . . . . . . . . . . . . . . . . . . . . 54
5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.5 . . . . . . . . . . . . . . . . 66
5.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6 75
6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.1 . . . . . . . . . . . . . . . . 76
6.2.2 . . . . . . . . . . . . . . . . . . . . 78
6.2.3 . . . . . . . . . . . . . . . . 80
6.3 . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.4 . . . . . . . . . . . . . . . . . 89
6.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . 92






A.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
A.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
A.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109


































































































F : compound f
i
2.2












































SIMD single instruction multiple data
STP





























2. C sendi−1 acki DLi




5. 2 ∼ 4
2
















Sideward one-way Ab Bb Am










22.1 ( A B )
Interaction name Active paths
(based on static data path) to Am to Bm to Bb
Divide Ab © © —
Bb — — —
Merge Ab — © —
Bb — © —
One-way Ab © © —
Bb — © —
Bi-directional Ab © © —
Bb © © —
Sideward one-way Ab © — ©
Bb © © —












Divide Ab → Am
Ab → Bm
Merge Ab → Bm
Bb → Bm
One-way Ab → Am, Bb → Bm
Ab → Bm
Bb → Bm





Ab → Bm, Bb → Am
Sideward one-way Ab → Am, Bb → Bm
Bb → Am
Ab → Am









































































































































V (t)[ / ]
V (t) STP
D(t)[ ] V (t) D(t)










After Tf + Tr from
the resume
(f)








After Tf + Tr
 : Pipeline Stage  : Packet  : D(t)  : Collision
2.9 STP
3 (Tf +Tr) Tf Tr
send ack
2.9 D(t) V (t)
(Tf + Tr) D(t) V (t)
D(t) ≥ (Tf + Tr) send ack Tr
2.4 D(t) ≥ (Tf + Tr)








D(t) < (Tf + Tr)








STP (Tf + Tr) Tmax D(t) ≥































[30] STP STP DDP
V (t) 2.10
V (t) V (t)
2.11 2.10 2.11





















































































































1. i C Ci Ci+1 send− outi+1
DLi
2. send− outi+ 1 Ci+1 Ci ack− outi+1
3. ack − outi+1 Ci send− outi+1 DLi
4. send − outi+1 Ci+1 ack − out+1












ack − outi+1 Ci
C
low ‘0’ high ‘1’
3.3
ack− out send− out ack− out
C XBM 4
3.4 XBM XBM S0
+
- S1



























































AtoB BtoA transfer control cam cbm inflow sideward
Divide 0 * Ab → Am 00 *1 0 0
1 * Ab → Bm *1 10 1 0
Merge * * Ab → Bm *1 10 0 0
Bb → Bm *1 00 1 0
One-way 0 * Ab → Am, Bb → Bm 00 00 0 0
1 * Ab → Bm *1 10 1 0
Bb → Bm *1 00 0 0
Bi-directional 0 0 Ab → Am, Bb → Bm 00 00 0 0
0 1 Bb → Am 10 *1 0 0
Ab → Am 00 *1 0 0
1 0 Ab → Bm *1 10 1 0
Bb → Bm *1 00 0 0
1 1 Ab → Bm, Bb → Am 10 10 1 0
Sideward 0 0 Ab → Am, Bb → Bm 00 00 0 0
one-way 0 1 Bb → Am 10 *1 1 1
Ab → Am 00 *1 0 0
1 0 Ab → Bb, Bb → Bm *1 00 0 1
1 1 Ab → Bb *1 *1 0 1















di 2dmin + ds + da + 3dmux dmin
[sec.] dmux MUX
[sec.] ds send-out [sec.]
da ack-out [sec.]
C
2dmin + ds + da + 3dmux > 2dmax + df (3.1)






send-out MUX MUX send-out-
[sec.] darb
send-in- send-out- send-in- send-out-
ack-out- send-in+ send-out- 3.3 send-in-
stagei C ack-out- ack-out- stagei−1 C
send-in+ send-in+ stagei C send-out-
send-in- send-out- C 3 send-in-
send-out- 3dmin + 2ds + da + 3dmux
MUX darb + dmux
MUX

































































D(t) (Tf + Tr) V (t)
Ptotal STP i







D(t) ≥ (Tfi + Tri) I














Tr = 0 4.2 Ptotal D(t) ≥ Tmax
∑
Tf ≥ Tmax × Ptotal (4.2)
I Ptotal





















S2 S3 · · · Spl Si Tmax − (Tfi + Tri)
n
Pover × Tmax ≤
n∑
i=1
























































e ≤ pl − Ptotal (4.9)
II-1 Ptotal













STP Btotal = pl − Ptotal
1 Btotal[ ]
STP 1 plBtotal





















Ptotal < pl (4.12)





M B A C Ptotal












Ptotal I Ptotal II-1
Ptotal PIU PII−1U
pl (Tfi + Tri)
Tfi Tri






















V (t) 4.1 4.4 4.11
4.6
V (t)
(Tfi + Tri) Ptotal
Ptotal V (t)










































































    MM: Matching Memory
    ALU: Arithmetic Logic Unit
    PS: Program Storage
Packet flow control modules:
    M: Merge Unit
























































5.5 M B A C







































To the same destination (instruction or output)
3
2


































































































































00 01 10 11
Ordinary Instruction absorb - output copy
Branch Instruction absorb not taken taken -
STP
DDP taken not taken
2bit
DDP 2bit
BHM taken ’1’ not taken ’0’
1bit EBD






















































































ID node ID) Addressing

























N 2×N + 2[gate]





















STP 5.16 8 VD
VD 2 1
























































































Ptotal VD V (Ptotal)




Start Time Stamper Lap Time
























: Data-Driven Processing Module : Module for On-Chip Simulation
MM : Matching Memory
ALU : Arithmetic Logic Unit
PS : Program Storage
M : Merge Unit
B : Branch Unit
BHM : 
  Branch History Memory
VD: Variable Delay
5.17
Packet Queue c. Ptotal VD
Macro-flow Engine 3
Start Time Stamper & Lap Time Stamper
Start Time Stamper Lap Time Stamper
5.18 5.19





































































































V (Ptotal) Macro-flow Engine DFG
BHM PS Packet Queue
Packet Queue Ptotal






























































































































Logic area Throughput Design term
[mm2] [packet/sec.] [man/month]
Proposed 1.25 100 M 0.25



























































































































































































































































6.10 DDP DDP PE M B
DDP 10 PE PE
3 PE : x PE : y
PE : z PE PE : x





6FPGA: DDP core with time stamper










(Tfi + Tri) DDP STP LSI
(Tfi + Tri)






PE : x PE : y PE : z
pl 63 43 43
PIU 34 31 30




Tr 4.1 4.4 4.11
V (t) (Tfi + Tri)





pl V (t) PE : y PE : z
PE : x PE : y
Macro-Sim.
I





















































































































4-Dir. Filter Expansion Filter
Added Load 0% 50% 0% 50%
Macro-Simulation 1.8 4.9 1.3 4.8
Naive-Simulation 6.7 24.1 3.8 20.7






























































# of LE[K] 44.5 44.7 0.4%
MentorGraphics LeonardoSpectrum Altera Quartus
DDP DDP
6.4





Time Stamper Lap Time Stamper
Packet Queue Macro-flow Engine
16bit 37 DDP Start Time










DDP On-Chip Simulator Increased
# of LE[K] 10.9 17.1 56%
RAM[K bit] 26.2 33.3 26%
6.6
DDP System On-Chip Simulation RTL Simulation
Measured [sec.] 3.5m 272m > 1 day
Ratio 1 76 > 1M
Altera FPGA APEX20KC





































(Tfi + Tri) Ptotal
– 96 –
Ptotal V (t)


























[1] International Technology Roadmap for Semiconductors, http://www.itrs.net/.
[2] O. A. Petlin, S. B. Furber, “Built-In Self-Testing of Micropipelines,” Proc. Third
International Symposium on Advanced Research in Asynchronous Circuits and
Systems (ASYNC ’97), Eindhoven, Netherlands, pp.22–29, Apr. 1997.
[3] M. Takamiya, M. Mizuno, and K. Nakamura, “An On-Chip 100GHz-Sampling
Rate 8-channel Sampling Oscilloscope with Embedded Sampling Clock Generator,”
ISSCC 2002, Session 11, No. 2, San Francisco, U.S.A., Feb. 2002.
[4] D. Pham, S. Asano, M. Bolliger, M. N. Day, H. P. Hofstee, C. Johns, J. Kahle, A.
Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki,
M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, K. Yazawa, “The
Design and Implementation of a First-Generation CELL Processor,” ISSCC 2005,
Session 10, No. 2, San Francisco, U.S.A., Feb. 2005.
[5] M. Edahiro, S. Matsushita, M. Yamashina, N. Nishi, “A Single-Chip Multiprocessor
for Smart Terminals,” IEEE Micro, Vol. 20, Issue 4, pp.12–20, 2000.
[6] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, V. De, “Parameter
Variations and Impact on Circuits and Microarchitecture,” Proc. 40th Conference
on Design Automation, pp.338–342, Anaheim, U.S.A., June. 2003.
[7] , “
”, , Vol.103, No.509, pp.29–34, Dec. 2003.
[8] H. Terada, S. Miyata, and M. Iwata, “DDMP’s: self-timed super-pipelined data-
driven multimedia processors,” Proc. IEEE, Vol.87, No.2, pp.282–296, Feb. 1999.
[9] I.E. Sutherland, “Micropipelines,” Commun. ACM, Vol.32, No.6, pp.720–738, Jun.
1989.
– 103 –
[10] A. Lines, “Nexus: an asynchronous crossbar interconnect for synchronous system-
on-chip designs,” Proc. 11th Annual Hot Interconnects Conf., pp.2-10, Stanford,
U.S.A., Aug. 2003.
[11] V. Ekanayake, C. Kelly IV, and R. Manohar, “An Ultra Low-Power Processor for
Sensor Networks,” Proc. 11th International Conference on Architectural Support
for Programming Languages and Operating Systems, pp.27–36, Boston, U.S.A.,
Oct. 2004.
[12] A. Takamura, M. Kuwako, M. Imai, T. Fujii, M. Ozawa, I. Fukasaku, Y. Ueno, and
T. Nanya, “TITAC-2: a 32-bit asynchronous microprocessor based on scalable-
delay-insensitive model,” Proc. 1997 Int. Conf. on Computer Design, pp.288–294,
Austin, U.S.A., Oct. 1997.
[13] T. Villiger, H. Kaslin, F.K. Gurkaynak, S. Oetiker, and W. Fichtner, “Self-timed
ring for globally-asynchronous locally-synchronous systems,” Proc. 9th Int. Symp.
on Asynchronous Circuits and Systems, pp.141–150, Vancouver, Canada, May
2003.
[14] D. Morikawa, M. Iwata, and H. Terada, “Super-pipelined implementation of IP
packet classification,” Journal of Intelligent Automation and Soft Computing,
Vol.10, No.2, pp.175-184, 2004.
[15] R.F. Sproull, I.E. Sutherland, C.E. Molnar, “The counterflow pipeline processor
architecture,” IEEE Des. Test, Vol. 11, No.3. pp.48–59, Jul. 1994.
[16] , “VLSI ,” , Vol.83, No.11, pp.838–842, Nov.
2000.
[17] A. Xie and P. A. Beerel, “Accelerating markovian analysis of asynchronous systems
using string-based state compression,” Proc. 4th Int. Symp. on Advanced Research
in Asynchronous Circuits and Systems, pp.247-260, San Diego, U.S., Mar. 1998.
[18] S. Chakraborty and R. Angrish, “Probabilistic timing analysis of asynchronous
– 104 –
systems with moments of delays,” Proc. 8th Int. Symp. on Advanced Research in
Asynchronous Circuits and Systems, pp.99–108, Manchester, U.K., Apr. 2002.
[19] J. C. Ebergen, S. Fairbanks, and I. E. Sutherland, Predicting Performance of Mi-
cropipelines Using Charlie Diagrams, Proc. 4th Int. Symp. on Advanced Research
in Asynchronous Circuits and Systems, pp. 238-246, San Diego, U.S.A., Mar. 1998.
[20] W.M. Zuberek, “Event-driven simulation of timed petri net models,” Proc. 33rd
Annual Simulation Symp., pp.91–98, Washington, U.S.A., Apr. 2000.
[21] , “ REX
EM - X ,” ARC-152, pp.109-114, Mar. 2003.
[22] J. Sparsφ, “Asynchronous Circuit Design A Tutorial,” Kluwer Academic Publish-
ers, London, 2001.
[23] C.J. Myers “ ” 2003.
[24] M. Iwata, M. Ogura, Y. Ohishi, H. Hayashi, and H. Terada, “100MPacket/s Fully
Self-Timed Priority Queue: FQ,” ISSCC 2004, San Francisco, U.S.A., Session 8,
No.1, Feb. 2004.
[25] S. Furber, ”Computing without clocks: Micropipelining the ARM processor,” Asyn-
chronous Digital Circuit Design, G. Birtwistle and A. Davis, eds., pp. 211-262.
Springer Verlag, 1995.
[26] J. A. “ ”
1987.
[27] R. F. Cmelik and D. Keppel, “Shade: A Fast Instruction-Set Simulator for Ex-
ecution Profiling,” ACM SIGMETRICS Performance Evaluation Review Vol.22
No.1 pp.128-137 1994.
[28] , , , “
FIS ,” , 10-3, 2001.
[29] “ ”
– 105 –
SACSIS 2003 pp.89–96 May. 2003.
[30] “ ” Vol.39 No.3
pp.208–214 Mar. 1998.
[31] T.E. Williams and M.A. Horowitz, “A zero-overhead self-timed 160-ns 54-b CMOS
divider,” ISSCC 91, Session 5, No.5, Feb. 1991.
[32] S. Ogasawara, S. Sannomiya, Y. Omori, and M. Iwata, “An On-Chip Trace-Driven
Emulation for Self-Timed Data-Driven Processors,” Proc. International Conference
on Next Era Information Networking NEINE’04, Kochi, pp. 435-440, Sep. 2004.
[33] R. Zhang, Y. Shirane, M. Iwata, W. Su and Y. Zheng “High Speed Stateful Packet
Inspection in Embedded Data-Driven Firewall,” Proc. International Conference on
Next Era Information Networking NEINE’04 Kochi, Sep. 2004.
[34] , , ,” ,”
, Vol.45, No.2, pp.426-437, Feb. 2004.
[35] K. Miyagi, S. Sannomiya, K. Sakai, M. Iwata and H. Nisikawa, “Autonomous
Power-Supply Control for Ultra-Low-Power Self-Timed Pipeline,” Proc. Interna-
tional Conference on Parallel and Distributed Processing Techniques and Applica-
tions, pp.704–709, Las Vegas, U.S.A., Jul. 2008.
– 106 –
AA.1
1. , , , , “
,” A.
2. K. Komatsu, S. Sannomiya, M. Iwata, H. Terada, S. Kameda, K. Tsubouchi, “In-
teracting Self-Timed Pipelines and Elementary Coupling Control Modules,” IEICE
TRANSACTIONS on Fundamentals.
A.2
1. S. Sannomiya, N. Kagawa, K. Sakai, M. Iwata, “A Data-Driven On-Chip Simulation
Module and Its FPGA Implementation,” in Proceedings of International Conference
on Parallel and Distributed Processing Techniques and Applications (PDPTA’08),
pp.710–716, 2008.
2. K. Komatsu, S. Sannomiya, M. Iwata, “Essential Building Circuits for Self-Timed
Web-Pipeline,” in Proceedings of International Conference on Next Era Information
Networking (NEINE’07), pp.148–149, 2007.
3. N. Kagawa, S. Sannomiya, M. Iwata, “Circuit Design of A Data-Driven On-Chip
Simulation Module,” in Proceedings of International Conference on Next Era In-
formation Networking (NEINE’07), pp.489–450, 2007.
– 107 –
A4. T. Kubo, K. Komatsu, S. Sannomiya, M. Iwata, “A Study on Self-Timed Pipeline
Sorter,” in Proceedings of International Conference on Next Era Information Net-
working (NEINE’07), pp.451–452, 2007.
5. S. Sannomiya, K. Komatsu, M. Iwata, “A Self-Timed Pipeline Circuit for Low-
Power Surrounding LSI Chips,” in Proceedings of International Conference on
Parallel and Distributed Processing Techniques and Applications (PDPTA 07),
pp.613–619, 2007.
6. K. Komatsu, S. Sannomiya, M. Iwata, “A Bi-directional Transfer Control for Multi-
dimensional Self-timed Pipeline,” in Proceedings of International Conference on
Next Era Information Networking (NEINE’06), pp.399–401, 2006.
7. T.Mitsui, K.Komatsu, S.Sannomiya, M.Iwata, “One-Way Data Transfer Circuit
between Self-Timed Pipelines” in Proceedings of International Conference on Next
Era Information Networking (NEINE’06), pp.402–404, 2006.
8. S. Sannomiya, Y. Omori, K. Sakai, M. Iwata, “An On-Chip Macro-Simulation
Mechanism of Self-Timed Pipelined Systems,” in Proceedings of International Con-
ference on Next Era Information Networking (NEINE’06), pp.133–138, 2006.
9. K. Komatsu, S. Sannomiya, M. Iwata, “Systematic Design of Basic Self-Timed
Pipeline Circuit Modules,” in Proceedings of International Conference on Next Era
Information Networking (NEINE’05), pp.542–547, 2005.
10. S. Ogasawara, S. Sannomiya, Y. Oomori, M. Iwata, “Implementation of An On-
Chip Trace-Driven Emulation for Self-Timed Data-Driven Processors,” in Proceed-
ings of International Conference on Next Era Information Networking (NEINE’05),
pp.548–553, 2005.
11. S. Sannomiya, S. Ogasawara, M. Iwata, “A Study on Emulation-Based Rapid Eval-
uation of Self-Timed Super-Pipelined Systems,” in Proceedings of International
Conference on Next Era Information Networking (NEINE’05), pp.554–560, 2005.
– 108 –
A.3
12. S. Sannomiya, Y. Omori, M. Iwata, “A Fast Simulation Scheme for Self-Timed
Super-Pipelined Systems,” in Proceedings of International Conference on Next Era
Information Networking (NEINE’04), pp. 127–138, 2004.
13. S. Ogasawara, S. Sannomiya, Y. Omori, M. Iwata, “An On-Chip Trace-Driven
Emulation for Self-Timed Data-Driven Processors,” in Proceedings of International
Conference on Next Era Information Networking (NEINE’04), pp.435–440, 2004.
14. S. Sannomiya, Y. Omori, M. Iwata, “A Macroscopic Behavior Model for Self-Timed
Pipeline Systems,” in Proceedings of Seventeenth Workshop on Parallel and Dis-
tributed Simulation (PADS2003), pp.133–140, 2003.
A.3
1. , , , , , “
,” , ICD2007-129, pp.53–58, 2007.
2. , , , , “
,” ARC-169, pp.145–150,
Aug. 2006.
3. , , , “
,” ARC-165, pp.93–98, Dec. 2005.
4. , , , “
,” EVA-8, pp.61–66, Mar. 2004.
5. , , , “
,” ARC154, pp.43–48, Aug. 2003.
6. , , , “
,” Forum on Information Technology, C-017, Sep. 2003.
7. , , , “
– 109 –
AFIS ,” , 10-3, 2001.
A.4
1. 2008-70446
– 110 –
