616 research outputs found

    Compiler and Architecture Design for Coarse-Grained Programmable Accelerators

    Get PDF
    abstract: The holy grail of computer hardware across all market segments has been to sustain performance improvement at the same pace as silicon technology scales. As the technology scales and the size of transistors shrinks, the power consumption and energy usage per transistor decrease. On the other hand, the transistor density increases significantly by technology scaling. Due to technology factors, the reduction in power consumption per transistor is not sufficient to offset the increase in power consumption per unit area. Therefore, to improve performance, increasing energy-efficiency must be addressed at all design levels from circuit level to application and algorithm levels. At architectural level, one promising approach is to populate the system with hardware accelerators each optimized for a specific task. One drawback of hardware accelerators is that they are not programmable. Therefore, their utilization can be low as they perform one specific function. Using software programmable accelerators is an alternative approach to achieve high energy-efficiency and programmability. Due to intrinsic characteristics of software accelerators, they can exploit both instruction level parallelism and data level parallelism. Coarse-Grained Reconfigurable Architecture (CGRA) is a software programmable accelerator consists of a number of word-level functional units. Motivated by promising characteristics of software programmable accelerators, the potentials of CGRAs in future computing platforms is studied and an end-to-end CGRA research framework is developed. This framework consists of three different aspects: CGRA architectural design, integration in a computing system, and CGRA compiler. First, the design and implementation of a CGRA and its instruction set is presented. This design is then modeled in a cycle accurate system simulator. The simulation platform enables us to investigate several problems associated with a CGRA when it is deployed as an accelerator in a computing system. Next, the problem of mapping a compute intensive region of a program to CGRAs is formulated. From this formulation, several efficient algorithms are developed which effectively utilize CGRA scarce resources very well to minimize the running time of input applications. Finally, these mapping algorithms are integrated in a compiler framework to construct a compiler for CGRADissertation/ThesisDoctoral Dissertation Computer Science 201

    Dagstuhl Reports : Volume 1, Issue 2, February 2011

    Get PDF
    Online Privacy: Towards Informational Self-Determination on the Internet (Dagstuhl Perspectives Workshop 11061) : Simone Fischer-Hübner, Chris Hoofnagle, Kai Rannenberg, Michael Waidner, Ioannis Krontiris and Michael Marhöfer Self-Repairing Programs (Dagstuhl Seminar 11062) : Mauro Pezzé, Martin C. Rinard, Westley Weimer and Andreas Zeller Theory and Applications of Graph Searching Problems (Dagstuhl Seminar 11071) : Fedor V. Fomin, Pierre Fraigniaud, Stephan Kreutzer and Dimitrios M. Thilikos Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081) : Maxime Crochemore, Lila Kari, Mehryar Mohri and Dirk Nowotka Packing and Scheduling Algorithms for Information and Communication Services (Dagstuhl Seminar 11091) Klaus Jansen, Claire Mathieu, Hadas Shachnai and Neal E. Youn

    Automatic Energy Saving Schemes for Parallel Applications

    Get PDF
    Although high-performance computing traditionally focuses on the efficient execution of large-scale applications, both energy and power have become critical concerns when approaching exascale. Drastic increases in the power consumption of supercomputers affect significantly their operating costs and failure rates. In modern microprocessor architectures, equipped with dynamic voltage and frequency scaling (DVFS) and CPU clock modulation (throttling), the power consumption may be controlled in software. Additionally, network interconnect, such as Infiniband, may be exploited to maximize energy savings while the application performance loss and frequency switching overheads must be carefully balanced. This work first studies two important collective communication operations, all-to-all and allgather and proposes energy saving strategies on the per-call basis. Next, it targets point-to-point communications to group them into phases and apply frequency scaling to them to save energy by exploiting the architectural and communication stalls. Finally, it proposes an automatic runtime system which combines both collective and point-to-point communications into phases, and applies throttling to them apart from DVFS to maximize energy savings. The experimental results are presented for NAS parallel benchmark problems as well as for the realistic parallel electronic structure calculations performed by the widely used quantum chemistry package GAMESS. Close to the maximum energy savings were obtained with a substantially low performance loss on the given platform

    Exact Approaches for Higher-Dimensional Orthogonal Packing and Related Problems

    Get PDF
    NP-hard problems of higher-dimensional orthogonal packing are considered. We look closer at their logical structure and show that they can be decomposed into problems of a smaller dimension with a special contiguous structure. This decomposition influences the modeling of the packing process, which results in three new solution approaches. Keeping this decomposition in mind, we model the smaller-dimensional problems in a single position-indexed formulation with non-overlapping inequalities serving as binding constraints. Thus, we come up with a new integer linear programming model, which we subject to polyhedral analysis. Furthermore, we establish general non-overlapping and density inequalities and prove under appropriate assumptions their facet-defining property for the convex hull of the integer solutions. Based on the proposed model and the strong inequalities, we develop a new branch-and-cut algorithm. Being a relaxation of the higher-dimensional problem, each of the smaller-dimensional problems is also relevant for different areas, e.g. for scheduling. To tackle any of these smaller-dimensional problems, we use a Gilmore-Gomory model, which is a Dantzig-Wolfe decomposition of the position-indexed formulation. In order to obtain a contiguous structure for the optimal solution, its basis matrix must have a consecutive 1's property. For construction of such matrices, we develop new branch-and-price algorithms which are distinguished by various strategies for the enumeration of partial solutions. We also prove some characteristics of partial solutions, which tighten the slave problem of column generation. For a nonlinear modeling of the higher-dimensional packing problems, we investigate state-of-the-art constraint programming approaches, modify them, and propose new dichotomy and intersection branching strategies. To tighten the constraint propagation, we introduce new pruning rules. For that, we apply 1D relaxation with intervals and forbidden pairs, an advanced bar relaxation, 2D slice relaxation, and 1D slice-bar relaxation with forbidden pairs. The new rules are based on the relaxation by the smaller-dimensional problems which, in turn, are replaced by a linear programming relaxation of the Gilmore-Gomory model. We conclude with a discussion of implementation issues and numerical studies of all proposed approaches.Es werden NP-schwere höherdimensionale orthogonale Packungsprobleme betrachtet. Wir untersuchen ihre logische Struktur genauer und zeigen, dass sie sich in Probleme kleinerer Dimension mit einer speziellen Nachbarschaftsstruktur zerlegen lassen. Dies beeinflusst die Modellierung des Packungsprozesses, die ihreseits zu drei neuen Lösungsansätzen führt. Unter Beachtung dieser Zerlegung modellieren wir die Probleme kleinerer Dimension in einer einzigen positionsindizierten Formulierung mit Nichtüberlappungsungleichungen, die als Bindungsbedingungen dienen. Damit entwickeln wir ein neues Modell der ganzzahligen linearen Optimierung und unterziehen dies einer Polyederanalyse. Weiterhin geben wir allgemeine Nichtüberlappungs- und Dichtheitsungleichungen an und beweisen unter geeigneten Annahmen ihre facettendefinierende Eigenschaft für die konvexe Hülle der ganzzahligen Lösungen. Basierend auf dem vorgeschlagenen Modell und den starken Ungleichungen entwickeln wir einen neuen Branch-and-Cut-Algorithmus. Jedes Problem kleinerer Dimension ist eine Relaxation des höherdimensionalen Problems. Darüber hinaus besitzt es Anwendungen in verschiedenen Bereichen, wie zum Beispiel im Scheduling. Für die Behandlung der Probleme kleinerer Dimension setzen wir das Gilmore-Gomory-Modell ein, das eine Dantzig-Wolfe-Dekomposition der positionsindizierten Formulierung ist. Um eine Nachbarschaftsstruktur zu erhalten, muss die Basismatrix der optimalen Lösung die consecutive-1’s-Eigenschaft erfüllen. Für die Konstruktion solcher Matrizen entwickeln wir neue Branch-and-Price-Algorithmen, die sich durch Strategien zur Enumeration von partiellen Lösungen unterscheiden. Wir beweisen auch einige Charakteristiken von partiellen Lösungen, die das Hilfsproblem der Spaltengenerierung verschärfen. Für die nichtlineare Modellierung der höherdimensionalen Packungsprobleme untersuchen wir moderne Ansätze des Constraint Programming, modifizieren diese und schlagen neue Dichotomie- und Überschneidungsstrategien für die Verzweigung vor. Für die Verstärkung der Constraint Propagation stellen wir neue Ablehnungskriterien vor. Wir nutzen dabei 1D Relaxationen mit Intervallen und verbotenen Paaren, erweiterte Streifen-Relaxation, 2D Scheiben-Relaxation und 1D Scheiben-Streifen-Relaxation mit verbotenen Paaren. Alle vorgestellten Kriterien basieren auf Relaxationen durch Probleme kleinerer Dimension, die wir weiter durch die LP-Relaxation des Gilmore-Gomory-Modells abschwächen. Wir schließen mit Umsetzungsfragen und numerischen Experimenten aller vorgeschlagenen Ansätze

    Exploring resource/performance trade-offs for streaming applications on embedded multiprocessors

    Get PDF
    Embedded system design is challenged by the gap between the ever-increasing customer demands and the limited resource budgets. The tough competition demands ever-shortening time-to-market and product lifecycles. To solve or, at least to alleviate, the aforementioned issues, designers and manufacturers need model-based quantitative analysis techniques for early design-space exploration to study trade-offs of different implementation candidates. Moreover, modern embedded applications, especially the streaming applications addressed in this thesis, face more and more dynamic input contents, and the platforms that they are running on are more flexible and allow runtime configuration. Quantitative analysis techniques for embedded system design have to be able to handle such dynamic adaptable systems. This thesis has the following contributions: - A resource-aware extension to the Synchronous Dataflow (SDF) model of computation. - Trade-off analysis techniques, both in the time-domain and in the iterationdomain (i.e., on an SDF iteration basis), with support for resource sharing. - Bottleneck-driven design-space exploration techniques for resource-aware SDF. - A game-theoretic approach to controller synthesis, guaranteeing performance under dynamic input. As a first contribution, we propose a new model, as an extension of static synchronous dataflow graphs (SDF) that allows the explicit modeling of resources with consistency checking. The model is called resource-aware SDF (RASDF). The extension enables us to investigate resource sharing and to explore different scheduling options (ways to allocate the resources to the different tasks) using state-space exploration techniques. Consistent SDF and RASDF graphs have the property that an execution occurs in so-called iterations. An iteration typically corresponds to the processing of a meaningful piece of data, and it returns the graph to its initial state. On multiprocessor platforms, iterations may be executed in a pipelined fashion, which makes performance analysis challenging. As the second contribution, this thesis develops trade-off analysis techniques for RASDF, both in the time-domain and in the iteration-domain (i.e., on an SDF iteration basis), to dimension resources on platforms. The time-domain analysis allows interleaving of different iterations, but the size of the explored state space grows quickly. The iteration-based technique trades the potential of interleaving of iterations for a compact size of the iteration state space. An efficient bottleneck-driven designspace exploration technique for streaming applications, the third main contribution in this thesis, is derived from analysis of the critical cycle of the state space, to reveal bottleneck resources that are limiting the throughput. All techniques are based on state-based exploration. They enable system designers to tailor their platform to the required applications, based on their own specific performance requirements. Pruning techniques for efficient exploration of the state space have been developed. Pareto dominance in terms of performance and resource usage is used for exact pruning, and approximation techniques are used for heuristic pruning. Finally, the thesis investigates dynamic scheduling techniques to respond to dynamic changes in input streams. The fourth contribution in this thesis is a game-theoretic approach to tackle controller synthesis to select the appropriate schedules in response to dynamic inputs from the environment. The approach transforms the explored iteration state space of a scenario- and resource-aware SDF (SARA SDF) graph to a bipartite game graph, and maps the controller synthesis problem to the problem of finding a winning positional strategy in a classical mean payoff game. A winning strategy of the game can be used to synthesize the controller of schedules for the system that is guaranteed to satisfy the throughput requirement given by the designer

    VOCAL 2018. 8th VOCAL Optimization Conference: Advanced Algorithms

    Get PDF

    High-level synthesis of VLSI circuits

    Get PDF

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum

    Techniques for Crafting Customizable MPSoCS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Quantum Computing in Logistics and Supply Chain Management an Overview

    Full text link
    The work explores the integration of quantum computing into logistics and supply chain management, emphasising its potential for use in complex optimisation problems. The discussion introduces quantum computing principles, focusing on quantum annealing and gate-based quantum computing, with the Quantum Approximate Optimisation Algorithm and Quantum Annealing as key algorithmic approaches. The paper provides an overview of quantum approaches to routing, logistic network design, fleet maintenance, cargo loading, prediction, and scheduling problems. Notably, most solutions in the literature are hybrid, combining quantum and classical computing. The conclusion highlights the early stage of quantum computing, emphasising its potential impact on logistics and supply chain optimisation. In the final overview, the literature is categorised, identifying quantum annealing dominance and a need for more research in prediction and machine learning is highlighted. The consensus is that quantum computing has great potential but faces current hardware limitations, necessitating further advancements for practical implementation
    corecore