Search CORE

126 research outputs found

Run-time parallelization and scheduling of loops

Author: Baxter Doug
Mirchandaney Ravi
Saltz Joel H.
Publication venue
Publication date
Field of study

The class of problems that can be effectively compiled by parallelizing compilers is discussed. This is accomplished with the doconsider construct which would allow these compilers to parallelize many problems in which substantial loop-level parallelism is available but cannot be detected by standard compile-time analysis. We describe and experimentally analyze mechanisms used to parallelize the work required for these types of loops. In each of these methods, a new loop structure is produced by modifying the loop to be parallelized. We also present the rules by which these loop transformations may be automated in order that they be included in language compilers. The main application area of the research involves problems in scientific computations and engineering. The workload used in our experiment includes a mixture of real problems as well as synthetically generated inputs. From our extensive tests on the Encore Multimax/320, we have reached the conclusion that for the types of workloads we have investigated, self-execution almost always performs better than pre-scheduling. Further, the improvement in performance that accrues as a result of global topological sorting of indices as opposed to the less expensive local sorting, is not very significant in the case of self-execution

NASA Technical Reports Server

Studies on automatic parallelization for heterogeneous and homogeneous multicore processors

Author: Hayashi Akihiro
Publication venue
Publication date: 01/01/2012
Field of study

制度:新 ; 報告番号:甲3537号 ; 学位の種類:博士(工学) ; 授与年月日:2012/2/25 ; 早大学位記番号:新587

Waseda University Repository

Acta Cybernetica : Volume 21. Number 1.

Author
Publication venue
Publication date: 01/01/2013
Field of study

University of Szeged

Preliminary study for a numerical aerodynamic simulation facility

Author: Bonstrom D. B.
Johnson R. W.
Lincoln N. R.
Mchugh R. A.
Vacca A. A.
Publication venue
Publication date
Field of study

NASA Technical Reports Server

Feasibility study of an Integrated Program for Aerospace-vehicle Design (IPAD) system. Volume 6: Implementation schedule, development costs, operational costs, benefit assessment, impact on company organization, spin-off assessment, phase 1, tasks 3 to 8

Author: Dublin M.
Garrocq C. A.
Hurley M. J.
Publication venue
Publication date
Field of study

A baseline implementation plan, including alternative implementation approaches for critical software elements and variants to the plan, was developed. The basic philosophy was aimed at: (1) a progressive release of capability for three major computing systems, (2) an end product that was a working tool, (3) giving participation to industry, government agencies, and universities, and (4) emphasizing the development of critical elements of the IPAD framework software. The results of these tasks indicate an IPAD first release capability 45 months after go-ahead, a five year total implementation schedule, and a total developmental cost of 2027 man-months and 1074 computer hours. Several areas of operational cost increases were identified mainly due to the impact of additional equipment needed and additional computer overhead. The benefits of an IPAD system were related mainly to potential savings in engineering man-hours, reduction of design-cycle calendar time, and indirect upgrading of product quality and performance

NASA Technical Reports Server

Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) Krakow, Poland

Author: Carretero Pérez Jesús
García Blas Francisco Javier
Jeannot Emmanuel
Wyrzykowski Roman
Publication venue
Publication date: 01/10/2015
Field of study

Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015

Universidad Carlos III de Madrid e-Archivo

An integrated soft- and hard-programmable multithreaded architecture

Author: Zhong Shi
Publication venue: The University of Edinburgh
Publication date: 01/01/2007
Field of study

Edinburgh Research Archive

High level compilation for gate reconfigurable architectures

Author: Anant Agarwal
Jonathan William Babb
Jonathan William Babb
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2001
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 205-215).A continuing exponential increase in the number of programmable elements is turning management of gate-reconfigurable architectures as "glue logic" into an intractable problem; it is past time to raise this abstraction level. The physical hardware in gate-reconfigurable architectures is all low level - individual wires, bit-level functions, and single bit registers - hence one should look to the fetch-decode-execute machinery of traditional computers for higher level abstractions. Ordinary computers have machine-level architectural mechanisms that interpret instructions - instructions that are generated by a high-level compiler. Efficiently moving up to the next abstraction level requires leveraging these mechanisms without introducing the overhead of machine-level interpretation. In this dissertation, I solve this fundamental problem by specializing architectural mechanisms with respect to input programs. This solution is the key to efficient compilation of high-level programs to gate reconfigurable architectures. My approach to specialization includes several novel techniques. I develop, with others, extensive bitwidth analyses that apply to registers, pointers, and arrays. I use pointer analysis and memory disambiguation to target devices with blocks of embedded memory. My approach to memory parallelization generates a spatial hierarchy that enables easier-to-synthesize logic state machines with smaller circuits and no long wires.(cont.) My space-time scheduling approach integrates the techniques of high-level synthesis with the static routing concepts developed for single-chip multiprocessors. Using DeepC, a prototype compiler demonstrating my thesis, I compile a new benchmark suite to Xilinx Virtex FPGAs. Resulting performance is comparable to a custom MIPS processor, with smaller area (40 percent on average), higher evaluation speeds (2.4x), and lower energy (18x) and energy-delay (45x). Specialization of advanced mechanisms results in additional speedup, scaling with hardware area, at the expense of power. For comparison, I also target IBM's standard cell SA-27E process and the RAW microprocessor. Results include sensitivity analysis to the different mechanisms specialized and a grand comparison between alternate targets.by Jonathan William Babb.Ph.D

CiteSeerX

DSpace@MIT

Recommended from our members

Smart Resource Sharing for Concurrency and Security

Author: Gao Ying
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

Different layers of the computer system, from the low-level hardware accelerators and networks-on-chip (NoC) in multi-core systems, to the upper-level operating systems and software applications, rely on the sharing of hardware computing resources. Unfortunately such sharing, when not carefully managed, can introduce a host of protection problems and sources of information leakage. We describe a set of methods by which it is possible to systematically scale performance via hardware sharing without exacerbating security properties by being aware of the design and characteristics of individual layers and components. The key to this is efficiently dealing with security vulnerabilities introduced by sharing in terms of time and space through the creation of new security-conscious sharing interfaces. In a systematic way is to first define coordination techniques into more detailed patterns, and by bridging the gap of less efficient universal measures with provably more performant and secure patterns.Specifically we demonstrate the usefulness of a sharing pattern for hardware and software systems where separation is of concern (interference and timing channel mitigation, etc). The most important insight is that in order to fully utilize computing resources (to improve performance and availability), the entities that share these resources must coordinate in a pre-calculated way. More dynamic approaches to improve performance and concurrency are likely to introduce new interference in the system. While we show that certain static scheduling measures in lower level hardware such as networks-on-chip can provably eliminate timing channels, the dynamic nature of software systems makes covert channels harder to be confined. Besides, software systems also face other types of security problems beyond side channels. To improve concurrency and performance without exacerbating security requires a slightly different approach.To study the obstacles that hinder software applications' scaling in a system because of security concerns, we delve into the Android operating system and its appification ecosystem structure. A prime avenue for attack is introduced because of its distributed sharing eco-pattern. We propose a centralized approach with a single reliable service as a method to enable computation reuse among applications. The proposed centralization technique favors well-protected application-to-system communications over vulnerable application-to-application communications. Thus not only computation concurrency is boosted but also the possibility of an app being attacked through the attack-prone Inter-Component Calls (ICCs) due to possible distributed computation sharing is eliminated. This approach further enables improvements to security with the addition of a novel application-centric grouping for isolation. We show through a prototype on Android how our approach supports and protects inter-app resource sharing, while improving concurrency at scale

eScholarship - University of California