Loop pipelining with resource and timing constraints by Sánchez Carracedo, Fermín










LOOP PIPELINING WITH 















































F. E. Allen and J. Cocke. A Catalogue of optiming transformations, pages 1-30.
Prentice-Hall, Englewood Cliffs, New Jersey, 1971.
T.L. Adam, K.M. Chandy, and J.R. Dickson. A comparison of list schedules for
parallel processing systems. Communications of the ACM, 17(12), December 1974.
H. Achatz. Extended 0/1 LP formulation for the scheduling problem in high-level
synthesis. In Proc. European Design Automation Conf. (EURODAC), pages 226-231,
1993.
J. R. Allen and K. Kennedy. Automatic loop interchange. In Proc. of the ACM
SIGPLAN84 Symposium, on Compiler Construction, pages 233-246, June 1984.
W. Abu-Sufah, D.J. Kuck, and D.H. Lawrie. On the performance enhancement of pag-
ing systems through program analysis and transformations. IEEE Trans. Computers,
C-30(5):341-356, May 1981.
J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control depen-
dence to data dependence. In Proc. of the 10th Annual Symposium on Principles of
Programming Languages, pages 177-189, January 1983.
F.E. Allen. Program Optimization, volume 13 of Int. Tracts in Computer Science and
Technology and their Application, pages 239-307. Pergammon Press, Oxford, England,
1969.
A. Aiken and A. Nicolau. Optimal loop parallelization. In Proc. of the ACM SIG-
PLAN88 Conf. on Programming Languages Design and Implementation, pages SOS-
SI?, 1988.
A. Aiken and A. Nicolau. Perfect Pipelining: A new Loop Parallelization Technique,
volume 300 of Lecture Notes in Computer Science, pages 221-235. Springer Verlag,
March 1988.
U. Banerjee. Dependence Analysis for Supercomputing. Kluwer Acad. Pub., 1989.
F. Bodin and F. Charot. Loop optimization for horizontal microcoded machines. In
Proc. of 90th Int. Conf. on Supercomputing, pages 164-176, June 1990.
R. M. Badia and J. Cortadella. Glass: a graph-theorethical approach for global bind-
ing. Microprocessing and Microprogramming, 38(l-5):775-782, September 1993.
M. Berry, D. Chen, P. Koss, and D. Kuck. The Perfect Club benchmarks: Effective
performance evaluation of supercomputers. Technical Report 827, Center for Super-



















LOOP PIPELINING WITH RESOURCE AND TIMING CONSTRAINTS
P. Briggs, K. D. Cooper, K. Kennedy, and L. Torczon. Coloring heuristics for regis-
ter allocation. In Proc. of the ACM SIGPLAN89 Conf. on Programming Languages
Design and Implementation, 24(7), pages 275-284, July 1989.
D. G. Bradlee, S. J. Eggers, and R. R. Henry. Integrating register allocation and
instruction scheduling for rises. In Proc. of the J,th Ini. Conf. Architectural Support
for Programming Languages and Operating Systems (ASPLOS-IV), pages 122-131,
April 1991.
U. Banerjee, R. Eigenmann, A. Nicolau, and D. A. Padua. Automatic program par-
allelization. Proc. of the IEEE, 81(2):211-243, February 1993.
D. F. Bacon, S. L. Graham, and O. J. Sharp. Compiler transformations for high-
performance computing. Technical Report UCB/CSD-93-781, Computer Science Di-
vision (EECS), University of California, Berkeley, November 1993.
G. R. Beck, D. W. L. Yen, and T. L. Anderson. The Cydra 5 mini-supercomputer.
The Journal of Supercomputing, 7(1/2):143-180, May 1993.
G. J. Chaitin, M. Auslander, A. Chandra, J. Cocke, M. Hopkins, and P. Markstein.
Register allocation via coloring. Computer Languages, (6):47-57, January 1981.
S. Carr. Memory-Hierarchy Management. PhD thesis, Rice' University, 1993;
J. Cortadella, R. M. Badia, and F. Sánchez. A mathematical formulation of the
loop pipelining problem. Technical Report RR UPC-DAC 1995/36, Department of
Computer Architecture (UPC), October 1995.
D. Callahan, J. Cocke, and K. Kennedy. Estimating interlock and improving balance
for pipelined architectures. In Int. Conf. on Parallel Processing, pages 295-304, 1987.
F. Catthoor, W. Geurts, arid H. De Man. Loop transformation methodology for fixed-
rate video, image and telecom processing applications. In Proc. of the Int. Conf. on
Application-Specific Array Processors 1994 ~(ASAP9%'),- Í994. ''
F. Chow and J. Hennessy. Register allocation by priority based coloring. In Proc. of
the ACM SIGPLAN84 Symposium on Compiler Construction, June 1984.
Charlesworth. An approach to scientific array processing: The architectural designof
the AP-120B/FPS-164 family. .Computer, 14(9):18-27, 1981.
G. J. Chaitin. Register allocation and spilling via graph coloring. In Proc. of the ACM
SIGPLAN82 Symposium on Compiler Construction, pages 201-207, June 1982.
L-F. Chao and A. LaPaugh. Scheduling cyclic data-flow graphs by retiming with
resource constraints. In 6th Int. Workshop on High-Level Synthesis, pages 111-134,
November 1992.
T. H. Gormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. McGraw-
Hill Book Company, 1990.
L-F. Chao, A. LaPaugh," and E. H-M. Sha. Rotation scheduling: a loop pipelining
algorithm. In Proc. of the 30th Design Automation Conf. (DAC), pages 566-572, June
1993. ' _ , • ' . ' . ' . . : . . . . . . ' , . . .



































[CS70] J. Cocke and J. Schwartz. Programming Languages and their Compilers (Preliminary
Notes). Courant Institute of Mathematical Sciences, New York University, second
revised edition, 1970.
[CytS4] R. Cytron. Compiler-Time Scheduling and Optimization for Asynchronous Machines.
PhD thesis, University of Illinois at Urbana-Champaign, 1984.
[Da-vSl] J.R.B. Davies. Parallel Loops Constructs for Multiprocessors. PhD thesis, University
of Illinois at Urbana-Champaign, 1981.
[DBR67] G. B. Dantzig, W. O. Blattner, and M. R. Rao. All shortest routes from a fixed origin
in a graph. In Theory of Graphs, Rome, pages 85-90, July 1967.
[DGD94] F. Depuydt, G. Goossens, and H. De Man. Scheduling with register constraints for
DSP architectures. INTEGRATION, the VLSI Journal, (18):95-120, 1994.
[DH79] J. J. Dongarra and A. R. Hinds. Unrolling loops in FORTRAN. Software-Practice
and Experience, 9:219-226, March 1979.
[DHB89] J.C. Dehnert, P.Y.T. Hsu, and J.P. Bratt. Overlapped loop support in the Cydra 5.
In Proc. of the 3rd Int. Conf. Architectural Support for Programming Languages and
Operating Systems (ASPLOS-III), pages 26-38, April 1989.
[DÍJ59] E.W. Dijkstra. A note on two problems in connexion with graphs. Numerische Math-
ematik, 1:269-271, 1959.
[DLSM81] S. Davidson, D. Landskov, B.D. Shriver, and P.W. Mallet. Some experiments in local
microcode compaction for horizontal machines. IEEE Trans. Computers, 30(7) :460-
477, July 1981.
[DT93] J.C. Dehnert and R. A. Towle. Compiling for the Cydra 5. The Journal of Supercom-
puting, 7(l/2):181-228, May 1993.
[EDA94] A. E. Eichenberger, E. S. Davidson, and S. G. Abraham. Minimum register require-
ments for a modulo schedule. In Proc. of the 27th Annual Int. Symp. on Microarchi-
tecture (MICR027), pages 75-84, November 1994.
[EDA95] A. E. Eichenberger, E. S. Davidson, and S. G. Abraham. Optimum modulo schedules
for minimum register requirements. In Proc. of 95th Int. Conf. on Supercomputing,
pages 31-40, July 1995.
[Ers66] A. P. Ershov. An automatic programming system to high efficiency. Journal of the
ACM, 13(l):17-24, 1966.
[FeaSS] P. Feautrier. Array expansion: In Proc. of the ACM Int. Conf. on Supercomputing,
pages 429-441, July 1988.
[FisSl] J.A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE
Trans. Computers, 30(7):478-490, July 1981.
[Fis83] J.A. Fisher. Very long instruction word architectures and the ELI-512. In Proc. of
the 10th Annual Int. Symp. on Computer Architectures, pages 140-150, June 1983.




















LOOP PIPELINING WITH RESOURCE AND TIMING CONSTRAINTS
J.A. Fisher, D. Landskov, and B.D. Shriver. Microcode cornpation: Looking backward
and looking forward. In Proc. of the Í981 Nat. Computer Conf., pages 95-102, 1981.
R. Govindarajan, E. R. Altman, and G. R. Gao. Minimizing register requirements un-
der resource-constrained rate-optimal software pipelining. In Proc. of the 27th Annual
Int. Symp. on Microarchitecture (MICR027), pages 85-94, November 1994.
D. D. Gajski, N. D. Dutt, and B. M. Pangrle. Silicon compilation (tutorial). In IEEE
1986 Custom Integrated Circuits Conf., pages 102-110, May 1986.
C.H. Gebotys and M.I. Elmasry. A global optimization approach for architectural syn-
thesis. In Proc. Int. Conf. Computer-Aided Design (ICCAD), pages 258-261, Novem-
ber 1990.
C.H. Gebotys and M.I. Elmasry. Simultaneous scheduling and allocation for cost
constrained optimal architectural .synthesis. , In. Proc. of the 28th Design Automation
Conf. (DAC), pages 2-7, June 1991.
C.H. Gebotys and M.I. Elmasry. Optimal VLSI Architectural Synthesis. Kluwer Aca-
demic Publishers, 1992.
M. B. Girkar, M. R. Haghighat, C. L. Lee, B. P. Leung, and D. A. Schouten. Parafrase-
2 user's manual. Technical Report TR-75743, Center for Supercomputing Research
and Development, University of Illinois at Urbana-Champagne, 1991.
E. F. Girczyc. Automatic Generation of Micro sequenced Data Paths to Realize ADA
Circuit Descriptions. PhD thesis, Carleton University, July 1984.
M.R. Garey, and O.S. Johnson. A Guide to the Theory of NP-Completeness. W. H.
Freeman and Company, 1979; . . .
M.R. Garey, D.S. Johnson, G. L. Miller, and C. H. Papadimitriou. The complexity of
coloring circular arcs and chords. 'SIAM Journal on Algebraic and Discrete Methods,
1:216-227, 1980.
E.F. Girczyc and J.P. Knight. An ADA to standard cell hardware compiler based on
graph grammars and scheduling. In Proc. Int. Conf. on Computer Design (ICCD),
pages 726-731, October 1984.
G. R. Gao, Q. Ning, and V. H. Van Dongen. Extending software pipelining for schedul-
ing nested loops. In 6th Workshop on Languages and Compilers for Parallel Comput-
ing, 1993.
E.G. Goffman Jr. Computer and Job Scheduling Theory. John Wiley and Sons,
NewYork, 1976.
M. B. Girkar and C.D. Polychronopoulòs. Compiling issues for supercomputers. In
Proc. of Supercomputing'88 (Orlando, Florida), pages 164-173, November 1988.
G. Goossens, J. Rabaey, J. Vandewalle, and H. .De, Man. -An efficient microcode
compiler for custom DSP-processors. In Proc. Int. Conf. Computer-Aided Design
(ICCAD), pages 24-27, November 1987.
G. Goossens, J. Rabaey, J. Vandewalle, and H. De Man. An efficient microcode
compiler for application specific DSP processors. IEEE Trans, on Computer-Aided
Design, 9(9):925-937, September 1990.
. . . • • . , • • , ' • • •/'. " ! • • ; • • * ' - ' . • • • • . ; • ' •



















































F. Gasperoni and U. Schwiegelshohn. Efficient algorithms for cyclic scheduling. Tech-
nical Report RC-17068, IBM T.J. Watson Res. Center, Yorktown Heights, NY, 1991.
H.N. Gabow and R.E. Tarjan. Faster scaling algorithms for network problems. SIAM
Journal on Computing, 18(5):1013-1036, October 1989.
G. Goossens, J. Vandewalle, and H. De Man. Loop optimization in register-transfer
scheduling for DSP systems. In Proc. of the 26th Design Automation Conf. (DAC),
pages 826-831, 1989.
R. Hartley and A. Casavant. Tree-height minimization in pipelined architectures. In
Proc. Int. Syrnp. Circuits and Systems (ISCS), 1989.
W.W. Hwu, T.M. Conte, and P.P. Chang. Comparing software and hardware schemes
for reducing the cost of branches. In Proc. of the 16th Annual Int. Symp. on Computer
Architectures, pages 224-223, May 1989.
L. J. Hendren, G. R. Gao, E. Altman, and Ch. Mukerji. Register allocation using cyclic
interval graphs: A new approach to an old problem. Technical Report ACAPS Tech-
nical Memo 33, Advanced Computer Architecture and Program Structures Group,
McGill University, Montreal, Canada, April 1992.
C-T. Hwang, Y-C. Hsu, and Y-L. Lin. Scheduling for functional pipelining and loop
winding. In Proc. of the 28th Design Automation Conf. (DAC), pages 764-769, June
1991.
P. Hilfinger. A high-level language and silicon compiler for digital signal processing.
In Proc. of the Custom Integrated Circuits Conf., pages 213-216, 1985.
S. Hiranándani, K. Kennedy, and C.-W. Tseng. Compiler Support for Machine-
Independent Programming in Fortran D. American Elsevier Publishing Company,
New York, 199.1.
C.Y. Hwang, Y.S. Lin, and Y.C. Hsu. Data path allocation based on bipartite weighted
matching. In Proc. of the 27th Design Automation Conf. (DAC), pages 499-504, 1990.
C-T. Hwang, J-H. Lee, and Y-C. Hsu. A formal approach to the scheduling problem
in high level synthesis. IEEE Trans, on Computer-Aided Design, 10(4):464-475, April
1991.
W.W. Hwu and Y.N. Patt. HPSm, a high performance restricted data flow architecture
having minimal functionality. In Proc. of the 13th Annual Int. Symp. on Computer
Architectures, pages 297-306, June 1986.
W.W. Hwu and Y.N. Patt. Checkpoint repair for out-of-order execution machines.
IEEE Trans. Computers, C-36(12):1496-1514, December 1987.
J. L. Hennesy and D. A. Patterson. Computer Architecture. A Quantitative Approach.
Morgan Kaufmann Publishers Inc., 1990.
R. A. Huff. Lifetime-sensitive modulo scheduling. In Proc. of the 6th Conf. Program-
ming Language Design and Implementation, pages 258-267, 1993.





















• - , , > . . • ; . . . v • .v, . - • • • • • • , - . ' " .
LOOP PIPELINING WITH RESOURCE AND TIMING CONSTRAINTS
• ' . . . . . . . . ••. . . : . . . • . , . . . , • - . . . i J'.,', . • - • ' •..':!' • • : '••'• > • '
IEEE, New York. IEEE Standard Manual VHDL Language Reference Manual, 1988.
R.B. Jones and V.H. Allan. Software pipelining: A comparison and improvement. In
Proc. 23rd Ann. Workshop on Microprogramming and Microarchitecture, pages 46-56,
November 1990.
L-G. Jeng and L-G. Chen. Synthesis of rate-optimal DSP algorithms by pipeline and
minimum unfolding. In Sixth Int. Workshop on .ff£5, pages 159-168, November 1992.
L-G. Jeng and L-G. Chen. Synthesis of rate-optimal DSP algorithms by pipeline and
minimum unfolding. In 6th Int. Workshop on High-Level Synthesis, pages 159-168,
November 1992.
R. Jain, A. Mujumdar, A. Sharma, and H. Wang. Empirical evaluation of some high-
level synthesis scheduling heuristics. In Proc. of the 28th Design Automation Conf.
(DAC), pages 686-689, June 1991.
D. B. Johnson. Efficient algorithms for shortest paths in sparse networks. Journal of
the ACM, 24(1):1-13, 1977.
M. Johnson. Superscalar Microprocessor Design. Prentice Hall, 1990.
R. Karp. A characterization of the minimum cycle mean in a digraph. Discrete
Mathematics, 23:309-311, 1978.
D. Ku and G. De Micheli. Hardwarec-a language for hardware design. Technical
Report CSL-TR-90-419, Stanford University, California, 1990.
D.G.'Ku and G. De Micheli: Relative scheduling under timing constraints. In Proc.
of the 27th Design Automation Conf. (DAC), pages 59-64, June 1990.
D.C.i Ku and G. De Micheli. High Level Sinthesys of ASICs Under Timing and
Synchronization Constraints. Kluwer Academic Publishers, 1992.
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe. Dependence graphs
and compiler optimizations. In Proc. ACM Symposium of Principles of Programming
Languages, pages 207-218, January 1981.
P. M. Kogge. The Architecture of Pipelined Computers, New York. McGraw-Hill,
1981.
F. J. Kurdahi and A: C. Parker. REAL: A program for register allocation. In Proc.
of the 2^th Design Automation Conf. (DAC), pages 210-215, July 1987.
D. J. Kuck. The Structure of Computers and Computations. New York, John Wiley,
1978.
C. P. Kruskal and A. Weiss. Allocating independent subtasks on parallel processors.
In Proc. of the Int. Conf. Parallel Processing (ICPP), pages 236-240, August 1984.
<•• " • • • • • • < -••*• • * - , - ï*4v • • ' < • • • - • • • - " . • • • ' • ' • ;..'-o. -"• "•• • • " • • • ' ' ' : j ' ' •
S.Y. Kung, H.J. Whitehouse, and T. Kailath. VLSI and Modern Signal Processing.
Prentice Hall, 1985.






































[Lam88] M. Lam. Software pipelining: An effective scheduling technique for VLIW machines.
In Proc. of the ACM SIGPLAN88 Conf. on Programming Languages Design and Im-
plementation, pages 318-328, June 1988.
[Law76] E.L. Lawler. Combinatorial Optimization: Networks and Mairoids. Holt, Rinehart
and Winston, 1976.
[LB94] D. J. Lilja and P. L. Bird. The Interaction of Compilation Technology and Computer
Architecture. Kluwer Academic Publishers, 1994.
[Lei83] C.E. Leiserson. Systolic and semisystolic design (extended abstract). In Proc. IEEE
Int. Conf. on Computer Design/VLSI in Computers, pages 627-632, October 1983.
[LHL89] J-H. Lee, Y-C. Hsu, and Y-L. Lin. A new integer linear programming formulation for
the scheduling program in data path synthesis. In Proc. Int. Conf. Com.puter-Aided
Design (ICCAD), pages 20-23, November 1989.
[LÍ194] D. J. Lilja. Exploiting the parallelism available in loops. Computer, pages 13-26,
February 1994.
[LLo95] J. LLosa. Heuristics for register-constrained software pipelining. Technical Report RR
1995/23, UPC-CEPBA, October 1995.
[Lov77] D. B. Loveman. Program improvement by source-to-source transformation. Journal
of the ACM, 24(1):121-145, 1977.
[LP91] D.A. Lobo and B.M. Pangrle. Redundant operator creation: A scheduling optimiza-
tion technique. In Proc. of the 28th Design Automation Conf. (DAC), pages 775-778,
1991.
[LRS83] C.E. Leiserson, F. Rose, and J. Saxe. Optimizing synchronous circuitry by retiming.
In Proc. Third Caltech Conf. on VLSI, pages 87-116, March 1983.
[LS83] C.E. Leiserson and J. Saxe. Optimizing synchronous systems. Journal of VLSI and
Computer Systems, l(l):41-67, Spring 1983.
[LS91] C.E. Leiserson and J.B. Saxe. Retiming synchronous circuitry. Algorithmica, 6:5-35,
1991.
[LVA95] J. LLosa, M. Valero, and E. Ayguadé. Hypernode reduction modulo scheduling. Tech-
nical Report RR 95/05, CEPBA, April 1995.
[LWGL92] T-F. Lee, A. C-H. Wu, D.D. Gajski, and Y-L. Lin. An effective methodology for
functional pipelining. In Proc. Int. Conf. Computer-Aided Design (ICCAD), pages
230-233, November 1992.
[LWLG94] T-F. Lee, A.C-H. Wu, Y-L. Lin, and D.D. Gajski. A transformation-based method
for loop folding. IEEE Trans, on Computer-Aided Design, 13(4):439-450, April 1994.
[McM86] F.H. McMahon. The livermore fortran kernels: A computer test of the numerical
performance range. Technical Report UCRL-53745, Lawrence Livermore National
Laboratory, December 1986.
[MD90] D.J. Mallon and P.B. Denyer. A new approach to pipeline optimization. In Proc.



















LOOP PIPELINING WITH RESOURCE AND TIMING CONSTRAINTS
. . , - . . • . . . . . . ..•.-. -. • - . .- . . • - . . • - • •
K. Mehlhorn. Data Structures and Algorithms. Springer Publishing Company, Vol.
1-3, 1984.
J. L. Meerbergen, P. E. R. Lippens, W. F. J. Verhaegh, and A. Van Der Werf. Rela-
tive location assignment for repetitive schedules. In Proc. European Conf. on Design
Automation (ED AC), pages 403-407, 1993.
K. Mehlhorn and S. Naeher. LEDA — a library of efficient data types and algorithms.
MFCS 89, LNCS 379, pages 88-106, 1989.
Kurt Mehlhorn and Stefan Naeher. LEDA- a platform for combinatorial and geometric
computing. Communications of the ACM, 38,1:96-102, 1995.
W. Mangione-Smith, S. G. Abraham, and E. S. Davidson. Register requirements of
pipelined processors. In Proc. of the Int. Conf. on Supercomputing ICS-92, pages
260-271,1992. . .. . .
Y. Muraokà. Parallelism Exposure and Explotation. in' Programs. PhD thesis, Univer-
sity of Illinois at Urbana-Champaign, 1971.
Stefan Naeher. LEDA manual: version 3.0. Version 3.0, volume 93-109 of Max-
Planck-Institut fuer Informatik : technical reports. Máx-Planck-Institut fuer Infor-
matik, Saarbruecken, 1993.
Q. Ning and G. R. Gao. A novel framework of register allocation for software pipelin-
ing. In In Conf. Rec. of the 20th Ann. ACM SIGPLAN-SIGACT Symp. on Principles
of Programming Languages, pages 29-42, January 1993.
A. Nicolau. Uniform parallelism exploitation in ordinary programs. In Proc. of the
Int. Conf. .Parallel Processing (ICPP), August 1985.
J. Nestor and G. Krishnamobrthy. SALSA: A new approach to scheduling with timing
constraints. In Proc. Int. Conf. Computer- Aided Design (ICCAD), pages 262-265,
November 1990.
J. Nestor and D.E. Thomas. Behavioral synthesis with interfaces. In Proc. Int. Conf.
Computer-Aided Design (ICCAD), pages 112-115, November 1986.
Nemhauser and Wolsey. Theory in Integer and Combinatorial Optimization. Wiley
Interscience, 1989.
D. A. Padua. Multiprocessors: Discussion of some Theoretical and Practical Problems.
PhD thesis, University of Illinois at Urbana-Champaign, 1979.
B.M. Pangrle and D.D. Gajski. SLICER: a state synthesizer for intelligent silicon
compilation. In Proc. Int. Conf. on Computer Design (ICCD), October 1987.
S. Pinter. Register1. allocation with instruction scheduling. ACM SIGPLAN Notices,
28(26):248-257, 1993. . '
P.G. Paulin and J.P. Knight. Force-directed scheduling for the behavioral synthesis
of ASICs. IEEE Trans, on Computer-Aided Design, 8(6):661-679, June 1989.
P.G. Paulin and J.P. Knight. Scheduling and binding algorithms for high-level syn-
thesis. In Proc. of the 26th Design Automation Conf. (DAC), pages 1-6, 1989.



















































I-Ch. Park and Ch-M. Kyung. Fast and near optimal scheduling in automatic data
path synthesis. In Proc. of the 28th Design Automation Conf. (DAC), pages 680-685,
June 1991.
P.G. Paulin, J.P. Knight, and E.F. Girczyc. HAL: a multi-paradigm approach to
automatic datapath synthesis. In Proc. of the 23th Design Automation Conf. (DAC),
pages 263-270, 1986.
D.A. Padua, D.J. Kuck, and D.H. Lawrie. High-speed multiprocessors and compilation
techniques. IEEE Trans. Computers, C-29(9):763-776, September 1980.
R. Potasman, J. Lis, A. Nicolau, and D.D. Gajski. Percolation based synthesis. In
Proc. of the 27th Design Automation Conf. (DAC), pages 444-449, 1990.
C.D. Polychronopoulos. Loop coalescing: A compiler transformation for parallel ma-
chines. In Proc. of the Int. Conf. Parallel Processing (ICPP), pages 235-242, August
1987.
C. Polychronopoulos. Compiler optimizations for enhancing parallelism and their
impact on architecture design. IEEE Trans. Computers, 37(8):991-1004, August 1988.
N. Park and A.C. Parker. Sehwa: A software package for synthesis of pipelines from be-
havioral specifications. IEEE Trans, on Computer- Aided' Design, 7(3):356-370, March
1988.
A.C. Parker, J. Pizarro, and M. Mlinar. MAHA: a program for data path synthesis.
In Proc. of the 28th Design Automation Conf. (DAC), pages 461-466, June 1986.
M. Potkonjak and J. Rabaey. Optimizing resource utilization using transformations.
In Proc. Int. Conf. Computer- Aided Design (ICCAD), pages 88-91, 1991.
M. Potkonjak and J. Rabaey. Optimizing resource utilization using transformations.
IEEE Trans, on Computer- Aided Design, 13(3):277-292, March 1994.
D.A. Padua and M.J. Wolfe. Advanced compiler optimizations for supercomputers.
Communications of the ACM, 29(12):1184-1201, December 1986.
S. Ramakrishnan. Software pipelining in PA-RISC compilers. Hewlett-Packard Jour-
nal, pages 39-45, July 1992.
B.R. Rau. Cydra 5 directed dataflow architecture. In Proc. of COMPCON'88, 1988.
B.R. Rau. Data flow and dependence analysis for instruction level parallelism. In
Fourth Workshop on Languages and Compilers for Parallel Computing, pages 236-
250, August 1991.
B.R. Rau. Iterative modulo scheduling: an algorithm for software pipelining loops. In
Proc. of the 27th Annual Int. Symp. on Microarchitecture (MICROS?), pages 63-74,
November 1994.
B.R. Rau and J. A. Fisher. Instruction-level parallel processing: History, overview
and perspective. Journal of Supercomputing, 7:9-50, July 1993.
B.R. Rau and C.D. Glaeser. Some scheduling techniques and an easily schedulable
horizontal architecture for high performance scientific computing. In Proc. of the 14th




















LOOP PIPELINING' WITH 'RESOURCE : AND TIMING CONSTRAINTS
K. E. Rimey. A Compiler for Application- Specific Signal Processors. PhD thesis,
University of California at Berkeley, 1989.
M. Rim. High-Level Synthesis of VLSI Designs for Scientific Programs. PhD thesis,
University of Wisconsin-Madison, 1993.
M. Rim and R. Jain. Valid transformations: A new class of loop transformations. In
Int. Conf. on Parallel Processing, Vol II, pages 20-23, August 1994.
B.R. Rau, M. Lee, P. P. Tirumalai, and M. S. Schlansker. Register allocation for
software pipelining loops. In Proc. of the ACM SIGPLAN92 Conf. on Programming
Language Design and Implementation, vol. 27, num. 7, pages 283-299, June 1992.
B.R. Rau, M.S. Schlansker, and P.P. Tirumalai. Code generation schema for mod-
ulo scheduled loops. In Proc. of the 25th Annual Int. Symp. on Microarchitecture
(MICR025), pages 158-169,- December 1992. • • . - • - -
B.R. Rau, D.W.L. Yen, W. Yen, and R.A. Towle. The Cydra 5 departamental super-
computer: Design philosophies, decisions and tradeoffs. IEEE Computer, 22(l):12-35,
January 1989.
R. S. Sandige. Modern Digital Design. McGraw-Hill Publishing Company, 1990.
F. Sánchez and J. Cortadella. RCLP: A novel approach for resource-constrained loop
pipelining. Technical Report RR 93/06, CEPBA, May 1993.
F. Sánchez and J. Cortadella. Resource-constrained pipelining based on loop transfor-
mations. 'Microprocessing and. Microprogramming, 38(l-5):429-436, September 1993.
i
A. Schrijver. Theory of Linear and Integer Programming. Wiley-Interscience series
Discrete Mathematics, John Wiley and Sons, 1986. " :
M.R. Schroeder. Number Theory in Science and Communication. Springer- Verlag,
1990.
B. Su, S. Ding, and J. Xia. URPR: An extension of URCRfor software pipelining. In
19th Microprogramming Workshop (MICRO- 19), pages 104-108, October 1986.
A. Sharma and R. Jain. InSyn: Integrated scheduling for DSP applications. In Proc.
of the 30th Design Automation Conf. (DAC), pages,349-354, June 1993.
M. Schlansker and M. McNamara. The Cydra 5 computer system architecture. In
Proc. Int. Conf. on Computer Design (ICCD), October 1988.
J.E. Smith and A.R. Pleszkun. Implementing precise interrupts in pipelined proces-
sors. IEEE Trans. Computers, C-37(5):562-573, May 1988.
D. Springer and D.E. Thomas. New methods for coloring and clique partitioning in
data path allocation. The VLSI journal, 12(3), December 1991.
L. Stok. ¡Transfer free register allocation in cyclic dataflow graphs. In, Proc. European
Conf. on Design Automation (ED AC), pages 181-185, February 1992.
G.S. Sohi and S. Vajapayem. Instruction issue logic for high-performance, interrupt-
able pipelined processors. In Proc. of the 14th Annual Int. Symp. on Computer Ar-
chitectures, pages 27-36, June 1987.
. . . ..• ,-,, • ,..,.,;,,„,., ,,-...• '.,,,,, . • • . , , ; . « ; . , , , : , . - •.>• v , . . ...... , •< • •
-•< '•'• 1 ' • • • , - ! ' , ' , - • ' i , • • ' ' "'.., ' . . ' ,-•:.. ¡ ' ' •
.- '• • . - - . ' i • ' : • • . . • • ' '•





































[SWT+90] B. Su, J. Wang, Z. Tang, W. Zhao, and Y. Wu. A software pipelining based VLIW
architecture and optimizing compiler. In Proc. of the 23th Annual Int. Symp. on
Microarchitecture (MICROS3), pages 17-27, November 1990.
[TM91] D. Thomas and P. Moorby. The Verilog Hardware Description Language. Kluwer,
Boston, 1991.
[Tom67] R. M. Tomasulo. An efficient algorithm for exploiting multiple arithmetic units. IBM
Journal Research and Development, ll(l):25-33, January 1967.
[Tri87] H. F. Trickey. A high-level hardware compiler. IEEE Transactions on CAD, CAD-
6(2):259-269, March 1987.
[TS86] C-J. Tsenj and O.P. Siewiorek. Automated synthesis of data paths in digital systems.
IEEE Trans, on Computer-Aided Design, 5(3):379-395, March 1986.
[TY86] P. Tang and P. Yew. Processor self-scheduling for multiple-nested parallel loops. In
Proc. of the Int. Conf. Parallel Processing (ICPP), pages 528-535, August 1986.
[UP93] H. F. Ugurdag and C. A. Papachristou. A VLIW architecture based on register files. In
Proc. of the 26th Annual Int. Symp. on Microarchitecture (MICR026), pages 263-268,
December 1993.
[Van50] B.L. Van der Warden. Modern Algebra. New York: Frederick Ungar, 1950.
[VGN92] V.H. Van Dongen, R.G. Guang, and Qi Ning. A polynomial time method for optimal
software pipelining. In 2nd Int. Conf. on Vector and Parallel Processing CONPAR-
VAPPV92, pages 613-624, September 1992.
[VVB+93] J. Vanhoof, K. Van Rompaey, I. Bolsens, G. Goossens, and H. De M^an. High-Level
Synthesis for Real Time Digital Signal Processing. Kluwer Academic Publishers, 1993.
[WC95] R. A. Walker and S. Chaudhuri. Introduction to the scheduling problem. IEEE Design
and Test of Computers, pages 60-69, Summer 1995.
[WE93a] J. Wang and C. Eisenbeis. Decomposed software pipelining. Technical Report RR-
1838, INRIA-Rocquencourt (France), January 1993.
[WE93b] J. Wang and C. Eisenbeis. Decomposed software pipelining: A new approach to exploit
instruction level parallelism for loops programs. In IFIP, January 1993.
[Wed75] D. Wedel. FORTRAN for the Texas Instruments ASC System, volume 10 of SIGPLAN
Notices, chapter 3, pages 119-132. New York, March 1975.
[WKEE94] J. Wang, A. Krall, M. A. Ertl, and C. Eisenbeis. Software pipelining with register
allocation and spilling. In Proc. of the 27th Annual Int. Symp. on Microarchitecture
(MICROS?), pages 95-99, November 1994.
