Recursos anchos: una técnica de bajo coste para explotar paralelismo agresivo en códigos numéricos by López Álvarez, David
UNIVERSITAT POLITÈCNICA DE CATALUNYA 
 







RECURSOS ANCHOS:  
UNA TÉCNICA DE BAJO COSTE 
PARA EXPLOTAR PARALELISMO 











Autor: David López Alvarez 
Directores: Mateo Valero Cortés 
       Josep Llosa i Espuny 
167
Referencias
De estas premisas incontrovertibles dedujo que la Biblioteca es total y que sus anaqueles
registran todas las posibles combinaciones de los veintitantos símbolos ortográficos (número,
aunque vastísimo, no infinito) o sea todo lo que es dable expresar: en todos los idiomas. Todo:
la historia minuciosa del porvenir, las autobiografías de los arcángeles, el catálogo fiel de la
Biblioteca, miles y miles de catálogos falsos, la demostración de la falacia de esos catálogos,
la demostración de la f alacia del catálogo verdadero, el evangelio gnóstico de Basílides, el
comentario de ese evangelio, el comentario del comentario de ese evangelio, la relación
verídica de tu muerte, la versión de cada libro a todas las lenguas, las interpolaciones de cada
libro en todos los libros.
Jorge Luis Borges. La Biblioteca de Babel. 1941
[AAD+93] T. Asprey, O.S. Averill, E. DeLano, R. Masón, B. Weiner and J. Yetter.
Performance features of the PA 7100 microprocessor. In IEEE Micro, 13(3) pp
22-35, June 1993.
[ABC+88] F.Allen, M Burke, P. Charles, R Cytron and J. Ferrante. An overview of the PTRAN
analysis system for multiprocessing. In Journal of Parallel and Distributed
Computing. Vol.5, 1988.
[ABG+96a] E. Ayguadé, C. Barrado, A. González, J. Labarta, D. López, J. Llosa, S. Moreno,
D. Padua, F. Reig, Q. Riera and M. Valero. Ictíneo: A tool for research in ILP.
Poster in Supercomputing '96. Pittsburg, USA. November 1996.
[ABG+96b]E. Ayguadé, C. Barrado, A. González, J. Labarta, J. Llosa, D. López, S. Moreno,
D. Padua, F. Reig, Q. Riera and M. Valero. Ictíneo: A tool for Instruction-Level
Parallelism Research. Research Report UPC-DAC-1996-61. December 1996.
[ABL+95a] E. Ayguadé, C. Barrado, J. Labarta, D. López, S. Moreno, D. Padua, and M.
Valero. A uniform internal representation for high-level and instruction-level
transformations. Technical Report UPC-CEPBA 95-01, Universitat Politècnica de
Catalunya, January 1995.
[ABL+95b] E. Ayguadé, C. Barrado, J. Labarta, J. Llosa, D. López, S. Moreno, D. Padua, Q.
Riera and M. Valero. ICTÍNEO: Una Herramienta para la Investigación en
Paralelismo a Nivel de Instrucciones. En VI Jornadas de Paralelismo, Barcelona,
Julio de 1995.
[ACD74] T.L. Adam, K.M. Chandy and J.R. Dicksin. A comparison of list schedules for























[AgCo87] T. Agerwala and J. Cocke. High performance reduced instruction set processor.
IBM Tech. Report 1987.
[AÍNÍ88] A. Aiken and A. Nicolau. Perfect pipelining: a new loop parallelization
technique. In Lecture Notes in Computer Science, vol 300, pp 221-235. Spring
Verlag. March 1988.
[AÍNÍ91] A. Aiken and A. Nicolau. A realistic resource-constrained software pipelining
algorithm. In Advances in Languages and Compilers for Parallel Processing, pp
274-290. 1991.
[AKW83] J.R. Allen, K. Kennedy and J. Warren. Conversion of control dependence to data
dependence. In Proc. 10th annual Symposium on Principles of Programming
Languages, January 1983.
[AS96] T.M. Austin and G.S. Soni. High-bandwidth address translation f or multiple-issue
processors. In Proc. of the 23th. Int. Symp. on Comp. Arch. ISCA-23, pp 158-167.
May 1996.
[ASU86] A.V. Ano, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques and
Tools. Addison Wesley, March 1986.
[BAH+94] B. Burgess, M. Alexander, Y. Ho, S.P. Litch, S. Mallick, D. Odgen, S.H. Park and
J. Slaton. The Power PC 603 microprocessor: A high performance, low power,
superscalar RISC microprocessor. In CompCon pp.300-306, February 1994.
[Ban88] U. Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic
Publishers, Norwell, MA, 1988.
[BCKK88] M. Berry, D. Chen, P. Koss and D. Kuck. The Perfect Club benchmarks: Effective
performance evaluation of supercomputers. Technical Report 827. Center for
Supercomputer Research and Development. University of Illinois at
Urbana-Champaign. November 1988.
[BCKT89] P. Briggs, K.D. Cooper, K. Kennedy and L. Torczon. Coloring heuristics for
register allocation. In Proc. of the Conf. on Prog. Lang., Design and Impl. (PLDI)
pp 257-284, June 1989.
[BEF+94] B. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoeflinger, D. Padua, P. Petersen,
B. Pottenger, L. Rauchwerger, P. Tu and S. Weatherford. Polaris: the next
generation in parallelizing compilers. In Proceedings of the Seventh Workshop on
Languages and Compilers for Parallel Computers, July 1994.
[BENP93] U. Banerjee, R. Eigenmann, A. Nicolau and D. Padua. Automatic Program
Parallelization. In Proceedings of the IEEE, 81(2), February 1993.
[BGG+89] D. Bernstein, D.Q. Goldin, M.C. Golumbic, H. Krawczyk, Y. Mansour, I. Nahson




































Proc. of the Conf. on Prog. Lang., Design and Impl. (PLDI) pp 258-263. June
1989.
E. Bloch. The engineering design of the Stretch computer. Proc Fall Joint
Computer Conference pp 48-59. 1959.
W.J. Blume. Success and limitations in automatic parallelization of the Perfect
Benchmarks programs. Master's thesis, Center for Supercomputing Research and
Development, Univ. of Illinois at Urbana-Champaign, July 1992.
W. Bucholtz. Planning a Computer System: Project Stretch. McGraw-Hill, N.Y.
1962.
G.R. Beck, D.W.L. Yeu and T.L. Anderson. The Cydra 5 minisupercomputer:
Architecture and implementation. In Journal of Supercomputing, 7 (1/2) pp
143-180. May 1993.
A. Capitanio, N. Dutt and A. Nicolau. Partitioned register files for VLIWs: A
preliminary analysis of tradeoffs. In MICRO-25, pp 292-300, 1992.
A.E. Charlesworth. An approach to scientific array processing: the architecture
design of the AP-120B /FPS-164 family. In Computer 14(12) pp 12-30. December
1981.
G.H. Chaitin. Register allocation and spilling via graph coloring. In Proc. ACM
SIGPLAN Symp. on Compiler Construction, pp 98-105. June 1982.
R.P. Colwell, W.E. Hall, C.S. Joshi, D.B. Papworth, P.K. Rodman and J.E. Tornes.
Architecture and implementation of a VLIW Supercomputing. In Proc.
Supercomputing pp 910-919, November 1990.
D.C. Chang, D. Lyon, C. Chen, L. Peng, M. Massoumi, M. Hakimi, S. lyengar, E.
Li and R. Remedios. Microarchitecture of HAL's memory management unit. In
Proc. of the CompCon95, pp. 272-279, 1995.
P.P. Chang, S.A. Mahlke, W.Y. Chen, N.J. Warter and M.W. Hwu. IMPACT: An
architectural framework for multiple-instruction-issue processors. In Proc. 18th.
Int. Symp. on Comp. Arch. (ISCA-18) pp 226-275. 1991.
R.P. Colwell, R.P. Nix, J.J. O'Donnell, D.B. Papworth and P.K. Rodman. A VLIW
architecture for a trace scheduling compiler. IEEE Trans, on Computers 37(8) pp
967-979. August 1988.
DEC. DECchip 21064-AA Microprocessor Hardware Reference Manual. Digital
Equipment Corporation, 1992.




































J.C. Dehnert and R.A. Towle. Compiling for the Cydra 5. The Journal of
Supercomputing, 7(1/2): 181-228, May 1993.
J.C. Dehnert, P.Y.T. Hsu and J.P. Bratt. Overlapped loop support in the Cydra 5. In
Proc. 3rd. Int. Conf. on Arch. Supp. for Prog. Lang, and Op. Syst. (ASPLOS)
pp26-38, April 1989.
T. A. Diep, C. Nelson amd J.P. Shen. Performance evaluation of the Power PC 620
microarchitecture. In Proc. 22th. Int. Symp. on Comp. Arch. (ISCA-22) pp
163-174, June 1995.
J.J. Dongarra and A.R. Hinds. Unrolling loops in FORTRAN. Software- Practice
and Experience, 9:219-226, March 1979.
K. Ebcioglu and T. Nakatani. A new compilation technique for parallelizing loops
with unpredictable branches on a VLTW architecture. In Proc. 2nd. Workshop on
Languages and Compilers for Parallel Computing, pp 213-229. 1989.
J.R. Ellis. Bulldog: A compiler for VLIW Architectures. MIT Press 1986.
J.H. Edmonson, P. Rubinfeld, R. Preston and V. Rajalopalan. Superscalar
instruction execution in the 21164 Alpha Microprocessor. In IEEE Micro 15(2) pp
33-43. April 1995.
Keith I. Farkas. Memory-System Design Considerations for Dynamically
Scheduled Microprocessors. PhD thesis, Department of Electrical and Computer
Engeneering, Univ. of Toronto, 1997.
K. Faigin, J. P. Hoeflinger, D. A. Padua, P. M. Petersen and S. A. Weatherford. The
polaris internal representation. Tech. Rep. CSRD No. 1317, Univ. of Illinois at
Urbana-Champaign, Cntr for Supercomputing Res. and Dev., October 1993.
J.A. Fisher. Trace scheduling: a technique for global microcode compaction. In
IEEE Trans, on Comp. 39(7) pp 478-490, July 1981.
J.A. Fisher. Very Long Instruction Word Architectures and the ELI-512. In Proc. of
the 10th Int. Symp. on Comp. Archit. (ISCA-10), pp 140-150, June 1983.
R.B. Garner, A. Agrawal, F. Briggs, E.W. Brown, D. Hough, B. Joy, S. Kleiman,
S. Muchnik, M. Namjoo, D. Patterson, J. Pendleton and R. Tuck. The scalable
processor architecture (SPARC). CompCon 88, pp. 278-283, 1988.
R. Govindarajan, E.R. Altman and G.R. Gao. Minimal registers requirements
under resource-constrained software pipelining. In Proc. 27th. Int. Symp. on
Microarchitecture (MICRO-27), pp 85-94. November 1994.
G.R. Gao, Q. Ning and V.H. van Dongen. Extending software pipelining for








































J. Heinrich. MIPS R10000 Microprocessor User's Manual. MIPS Technology Inc.
1994.
J.L. Hennessy and D.A. Patterson. Computer Architecture. A Quantitative
Approach. 2nd Ed. Morgan Kaufmann, 1996.
T.N. Hicks, R.E. Fry and P.E. Harvey. PO WER2 floating-point unit: Architecture
and implementation. IBM J. Res. Develop. 38 (5), 525-536. September 1994
R.G. Hintz and D.P. Tate. Control Data STAR- 100 processor design. CompCon pp
1-4. Spetember 1972.
J. Hennessy, N. Jouppi, F. Baskett, T. Gross and J. Gill. Hardware/Software
tradeoffs for increase performance. In Proc. of the Symp. Arquit. Support for
Paral. Lang, and Op. Systems (ASPLOS) pp 2-11. 1982.
M.W. Hwu, S.A: Mahlke, W.Y. Chen, P.P. Chang, NJ. Warter, R.A. Bringmann,
R.G. Ouelette, R.E. Hark, T. Kyohara, G.E. Haab, J.G. Helm and M.D. Lavery.
The superblock: an effective technique for VLIW and superscalar compilation.
Journal of Supercomputing 7(1/2) pp 229-248, 1993.
P.Y.T. Hsu. Design of the TFP microprocessor. IEEE Micro, 14(2) pp 23-33. April
1994.
R.A. Huff. Lifetime-sensitive modulo scheduling. In 6th. Conf. on Programming
Languages, Design and Implementation (PLDI), pp 258-267. June 1993
IBM. Special issue on the system 360 model 91. IBM Journal of Research and
Development, nil. January 1967.
IBM. Special issue on the RS/6000. IBM Journal of Research and Development.
34(1). January 1990.
Intel corp. Pentiun processor user's manual, 1993.
T. Juan, J.J. Navarro and O. Temam. Data caches for superscalar processors. In
Proc. of the llth. Int. conf. on Supercomputing (ICS-11), Vienna, July 1997.
R.B. Jones and V.H. Allan. Software pipelining: A comparison and improvement.
In Proc. of the 23rd Annual workshop on Microprogramming and
Microarchitecture (MICRO-23), pp 46-56, November 1990.
M. Johnson. Superscalar Microprocessor Design. Prentice-Hall, Englewodd
Cliffs, N.J. 1991.
R. Jolly. A 9-ns 1.4 gigabyte 17 -ported CMOS register file. In IEEE J. of
Solid-State Circuits, 25 pp 1407-1412, October 1991.
N.P. Jouppi. The nonuniform distribution of instruction-level and machine level



































M.P. Jouppi and D.W. Wall. Available instruction-level parallelism for
superscalar and superpipelined machines. In Proc. 3rd. Int. Conf. on Arch.
Support for Prog. Lang, and Op. Systems (ASPLOS), pp 272-282, April 1989.
Toni Juan. Technology-conscioues cache design. Ph.D. Thesis. Universitat
Politècnica de Catalunya (UPC). April 1998.
G. Kupanek, K. Chan, J. Zhug, E. DeLano and W. Bryg. PA7200: a PA-RISC
processor with integrated high performance MP bus interface. In CompCon, pp
375-382, February 1994.
P.M. Kogge. The architecture of Pipelined Processors. McGraw-Hill, N.Y. 1981.
A. Kumar. The HP PA-8000 RISC CPU. In IEEE Micro 17(2) pp 27-33, April
1997.
M.S. Lam. Software pipelining: an effective scheduling technique for VLIW
machines. In Proceedings of the SIGPLAN'88 Conf. on Programming Languages,
Design and Implementation (PLDI-88) pp 318-328. June 1988.
J. Llosa, E. Ayguadé and M. Valero. Quantitative evaluation of register pressure
on software pipelined loops. In Int. Journal of Parallel Programming, vol. 26 n. 2
pp. 121-142. 1998
M.S. Lam and R.P. Wilson. Limits of control flow parallelism. In Proc. 19th. Int.
Symp. on Computer Architecture (ISCA-92), pp 46-57. May 1992.
Corinna G. Lee. Code Optimizers and Register Organizations for Vector
Architectures. Ph. D. Thesis. U. of California at Berkeley. May, 1992.
P.G. Lowney, S.M. Freudenberg, T.J. Karces, W.D. Lichtenstein, R.P. Nix, J.S.
O'Donell and J.C. Ruttenberg. The Multiflow trace scheduling compiler. The
Journal of Supercomputing 7(1/2) pp 51-142. 1993.
[LGAV96] J. Llosa, A. González, E. Ayguadé and M. Valero. Swing modulo scheduling: A
lifetime sensitive approach. In Proc. of the Int. Conf. on Parallel Architectures and
Compiler Techniques (PACT'96) pp 80-87. October 1996.
[Llo96] J. Llosa. Reducing the impact of register pressure on software pipelined loops.
Ph.D. Thesis. UPC, Universitat Politècnica de Catalunya. February 1996.
[LLVA98a] D. López, J. Llosa, M. Valero and E. Ayguadé. Resource widening vs. replication:
Limits and performance-cost trade-off. In Proc. of the 12th International
Conference on Supercomputing ICS-12, pp. 441-448, July 1998.
[LLVA98b] D. López, J. Llosa, M. Valero and E. Ayguadé. Limites de las arquitecturas wide.

































D. López, J. Llosa, M. Valero and E. Ayguadé. Widening resources: A
cost-effective technique for aggressive ILP architectures. To appear in Proc. of the
31st. Int. Symp. on Microarchitecture (MICRO-31). December 1998.
R.B. Lee, M. Mahon, and D. Morris. Pathlenght reduction features in the PA-RISC
architecture. In CompCon 92, pp. 129-135, 1992.
D. López. Desarrollo de una Herramienta para la evaluación de Arquitecturas
con Buses Dobles. Tech. Rep. UPC-DAC-95-23. Agosto 1995.
D. López y E. Riera. Implementation del algoritmo de cálculos invariantes en
Ictíneo. Tech. Rep. UPC-CEPBA-1995-16, Agosto 1995.
J. Llosa, M. Valero and E. Ayguadé. Heurístics for register-constrained software
pipelined loops. In Proc. 29th. Int. Symp. on Microarchitecture (MICRO-29) pp.
250-261, December 1996.
J. Llosa, M. Valero, E. Ayguadé and A. González. Hypernode Reduction Modulo
Scheduling. In Proc. 28th. Int. Symp. on Microarchitecture (MICRO-28) pp
350-360. November 1995.
J. Llosa, M. Valero, E. Ayguadé and A. González. Modulo Scheduling with
reduced register pressure. In IEEE Transactions on Computers, vol. 47 no. 6 pp.
625-638, June 1998.
D. López, M. Valero, J. Llosa and E. Ayguadé. Increasing memory bandwidth with
wide buses: Compiler, hardware and performance trade-off. In Proc. of the llth
International Conference on Supercomputing ICS- 11 pp 12-19. July 1997.
K. Mehlhorn and S. Naher. LEDA: A library of efficient data types and algorithms.
Technical Report TR-A-D4/89. Univesitât der Saarlander, Saarbriicken, Germany.
1989.
MIPS Technologies Inc. R8000 Microprocessor chip set, product overview, 1994.
S.A. Mahlke, D.C. Lin, W.Y. Chen, R.E. Hank and R.A. Bringmann. Effective
compiler support for predicated execution using the hyperblock. In Proc. 25th Int.
Symp. on Microarchitecture (MICRO-25) pp 45-54. 1992.
Gordon E. Moore. Cramming more components Onto Integrated Circuits. In
Electronics Magazine Vol. 38 n. 8 pp 114-117. April 1965.
C.R. Moore. The Power 601 microprocessor. In CompCon, pp 109-116, February
1993.
Gordon E. Moore. Lithography and the Future of Moore's Law. In Optical/Laser
Microlithography VIU: Proceedings of the SPIE, pp 2-17 February 1995.





































Microprocessor Report, 12:(6). AltiVec vectorizes PowerPC. May 1998.
S. Mirapuri, M.Woodacre and N. Vasseghi. The MIPS R4000 processor. In IEEE
Micro 12(2) pp 10-22, April 1992.
S. Nàher. The LEDA U ser Manual Version 3.1. Max-Plank-Institut fur Informatik,
66123 Saarbriicken, Germany. January 1995.
A. Nicolau. Uniform parallelism exploitation in ordinary programs. In. Proc. Int.
Conf. on Parallel Processing. August 1985.
A. Nicolau and J.A. Fisher. Measuring the parallelism available for very long
instruction word architectures. In IEEE transactions on computers 33(11) pp
968-979, Nov. 1984.
Q. Ning ang G.R. Gao. A novel framework of register allocation for software
pipelining. In Proc. 20th Symp. on Principles of Programming Languages, pp
29-42. January 1993.
K. Olukotun, B.A. Nayfeh, L. Hammond, K. Wilson and K. Chang. The Case for a
Single-Chip Multiprocessor. In Proc. of the ASPLOS-VII, pp 2-11, oct. 1996.
D.A. Patterson and D.R. Ditzel. The case for the reduced instruction set computer.
Computer Architecture News 8:6 pp 25-33. October 1980.
D.A. Patterson and C.H. Sequin. RISC I: A reduced instruction set VLSI computer.
In proc. of the 8th. Int. Symp. on Comp. Arch. (ISCA-8) pp. 443-450. May 1981.
Montse Peirón. Optimització del Rendiment del Sistema de Memòria en
Multiprocessadors Vectorials. PhD thesis, Universitat Politècnica de Catalunya
(UPC), February 1996.
C. Polychronopoulos, M. Girkar, M. Haghighat, C. Lee and B. Leung.
Parafrase-2: An environment for parallelizing, partitioning, synchronizing and
scheduling programs on multiprocessors. In Proceedings of the Int. Conf. on
Parallel Processing, V. 2, 1989.
S.S. Pinter. Register allocation with instruction scheduling. In Proc. of the Conf.
on Prog. Lang., Design and Impl. (PLDI) pp 248-257. June 1993.
G. Radin. The 801 minicomputer. In Proc. of the Symp. Arquit. Support for Paral.
Lang, and Op. Systems (ASPLOS) pp 39-47. 1982.
B.R. Rau and C.D. Glaesner. Some scheduling techniques and an easily
schedulable horizontal architecture for high performance scientific computing. In
Proc. 14th Ann. Microprogramming Workshop, pp 183-197. October 1981.
B.R. Rau. Iterative modulo scheduling: An algorithm for software pipelining



































B. R. Rau, M. Lee, P. Tirumalai and P. Schlansker. Register allocation for modulo
scheduling loops. In Proc. 25th. Ann. Int. Symp. on Microarchitecture, pp
158-169, December 1992.
R.M. Rusell. The CRAY-1 processor system. In Comm. of the ACM 21(1) pp
63-72. January 1978.
B.R. Rau, D.W.L. Yen, W. Yen and R.A. Towle. The Cydra 5 departmental
supercomputer: design philosophies, decisions and trade-offs. IEEE Computer
22(1) pp 12-35. January 1989.
A. Schrijver. Theory of Linear and Integer Programming. Wiley-Interscience
series. Discrete Mathematics, John Wiley and Sonss, 1986.
Robert S. Schaller. MOORE'S LAW: past, present and future. In IEEE Spectrum,
pp 52-59. June 1997.
PS. Song, M. Deuman and J. Chang. The POWER PC 604 RISC Microprocessor.
In IEEE Micro 14(5) pp 8-17. October 1994.
J.E. Smith, G.E. Dermer, B.D. Vanderwarn, S.D. Klinger, C.M. Roszewski, D.L.
Fowler, K.R. Scidmore and P.J. Laudon. The ZS-1 central processor. In Proc. 2nd.
Int conf on Architectural Support for Programming Languages and Operating
Systems (ASPLOS) pp. 199-204, October 1987.
B. Su, S. Ding and J. Xia. URPR: an extension ofURCRfor software pipelining.
In 19th Microprogramming Workshop pp 104-108, October 1986
M.D. Smith, M.A. Horowitz and M. Lam. Efficient superscalar performance
through boosting. In Proc. 5th. Int. Conf. on Arch. Supp. for Prog. Lang, and Op.
Syst. (ASPLOS) pp 248-259. October 1992.
The National Technology Roadmap for Semiconductors, Semiconductor Industry
Assoc., San Jose, Calif. 1994.
M.D. Smith, M. Johnson and M.A. Horiwitz. Limits on multiple instruction issue.
In Proc. 3rd. Int. Conf. on Arch. Support for Prog. Lang, and Op. Systems
(ASPLOS) , pp 290-302, April 1989.
M.D. Smith, M.S. Lam and M.A. Horowitz. Boosting beyond static scheduling in
a superscalar processor. In Proc. 17th. Int. Symp. on Comp. Arch. (ISCA-17), pp
344.354, June 1990.
M.D. Smith. Support for Speculative Execution in High-Performance Processors.
Ph.D. Thesis. Stanford University. November 1992.



































Superscalar Processors. In proc. of the ASPLOS-IV, pp. 53-62, April 1991.
G.S. Sohi. Instruction issue logic for high-performance, interruptible, multiple
functional units, pipelined computers. In IEEE Trans, on Computers. 39(3) pp
349-359. March 1990.
G.S. Sohi and S. Vajapayem. Instruction issue logic for high-performance
processors. In Proc. 14th. Int. Symp. on Comp. Arch. (ISCA-14) pp 27-34. June
1987.
B. Su and J. Wang. GURPR*: A new global software pipelining algorithm. In
proc. 24th Int. Symp. on Microarchitecture (MICRO-24), pp 212-216. November
1991.
J.E. Thornton. Parallel operation in the Control Data 6600. In Proc. AFIPS Fall
Joint Computer Conf. pp 33-40, 1964.
R.M. Tomasulo. An efficient algorithm for exploiting multiple arithmetic units. In
IBM J. of Res. andDev. 11(1) pp 25-33. January 1967.
D.W. Wall. Limits of instruction-level parallelism. In Proc. 5th. Int. Conf. on Arch.
Support for Prog. Lang, and Op. Systems (ASPLOS) , pp 176-188, April 1991.
W.J. Watson. The TI ASC-A highly modular and flexible superprocessor
architecture. In Procs. AFIPS Fall Joint Comp. Conf. pp 221-228. 1972
N. Weste and K. Eshraghian. Principles of CMOS VLSI Design: A systems
Perspective. Addison-Wesley Pub. 1988.
S.W. White and S. Dhawan. POWER2: Next Generation of the RISC System/6000
family. IBM J. Res. Develop. 38 (5), 493-502. September 1994.
S.J.E. Wilton and N.P. Jouppi. CACTI: An enhanced cache access and cycle time
model. IEEE Journal of Solid-State Circuits, Vol. 31(5):677-688, May 1996.
M. Wolfe. Optimizing Supercompilers for Supercomputers. PhD thesis, University
of Illinois, 1982
K.M. Wilson, K. Olukotun, and M. Roseblum. Increasing cache port efficiency for
dynamic superscalar microprocessors. ISCA-23, May 1996.
K.C. Yeager. The MIPS R10000 superscalar microprocessor. IEEE Micro 16(2)
pp 28-40, March 1996.




1 2 3 4 5 10 1112
Horizontales:
1. Cómo debe estar una operación para poder
aprovechar una arquitectura ancha. Mil.
2. Herramienta desarrollada por el DAC con la que
hemos obtenido los grafos. Uno de los costes que
hemos calculado.
3. Carriage return. Forman el cuerpo de un grafo, y
están unidos por arcos. Explosivo que hubiera usado
este humilde doctorando en más de una ocasión.
4. Diminutivo de laboratorio. Late. Hemos
estudiado el de varias técnicas.
5. Rosco. Algoritmo que planifica los nodos lo más
pronto que puede. Congreso gordo donde hemos
colado un artículo (¡milagro!).
6. (Al revés) intención, cuando es buena (tal que las
nuestras). Dependencia entre dos.
7. (Al revés) ciclos entre que empieza la ejecución
de una operación y disponemos del resultado de la
misma. Ley que mantiene a los doctorandos en
condiciones bastante precarias.
8. Sistema operativo. No afirmado. Típico nombre
de nodo.
9. Escala de integración muy grande. No con. Voz de
mando.
10. Artículo. Cogía un continente sin acento.
Situado, puesto.
11. Uno. Heurística usada para efectuar la
planificación. Código añadido cuando no tenemos
suficientes registros físicos disponibles.
by david
Fa anys que un grupet d'amics
resolem el cruci de La Vanguardia, quasi
diàriament. També fa anys que dissenyo
els crucis que surten a la revista de la FIB:
l'Oasi. Era massa temptador de posar-ne
un aquí.
Aquest cruci està dedicat, amb
carinyu, als amics del cruci gang (i que
duri), als soferts lectors de l'Oasi que ho
han intentat els últims 10 anys, a tots els
que els hi agraden els crucis en general i,
si està aquí, als directors que m'han
deixat posar una cosa tant poc habitual a
una tesi.
12. Ancho en nuestros artículos. (Al revés) unidad de
mesura del área, para que sea independiente de la
tecnología.
Verticales:
1. Su tiempo marca la diferencia. La primera. Tipo de
arquitectura para la que principalmente se ha realizado
este trabajo.
2. (Al rêvés) une dos nodos. Isola. Isola (pero en otro
sentido).
3. Moto sin ruedas. Los hay de atunes, de pasta y de
registros. Hard Disc.
4. Pata de un chip. Indica que está en el cielo (excepto
en casos como Tander, por ejemplo). Aire desordenado.
5. Pusiesen un pie de página. (Al revés) Modulo
scheduling.
6. (Al revés) los responsables del Alpha. Peso la mente
(léase todo junto). Esto.
7. Herramienta internacional. Lo que parecen algunas
ideas. Asóla.
8. Asóla de nuevo. Hacen ver (aunque en este trabajo
tiene otra acepción). Común en la sed y la soda.
9. Afirmación rusa. Cierto instituto famosillo (de los
USA). 500. (Al revés) Unidad de transmisión de
información.
10. Publicación. (Al revés) número mínimo de ciclos
entre que se lanzan dos iteraciones.
11. Antes de, dejen salir. (Al revés) parabién médico.







• f^egi monumentum aere, perenmus
regaüque situ pyramidum altius...
• O^on omnis moriar...
I
tte. terminado un monumento más duradero que. eíbronce.
I y'más alto que ios males pirámides...
• íA/0 moriré ddtodo...
• Quinto Horacio Flaco, conocido como Horacio.
Epílogo del tercer libro de las Odas, hacia al 23 a.C.
I
I
I
I
I
I
I
I
I
II
