5,161 research outputs found

    Theory and design of portable parallel programs for heterogeneous computing systems and networks

    Get PDF
    A recurring problem with high-performance computing is that advanced architectures generally achieve only a small fraction of their peak performance on many portions of real applications sets. The Amdahl\u27s law corollary of this is that such architectures often spend most of their time on tasks (codes/algorithms and the data sets upon which they operate) for which they are unsuited. Heterogeneous Computing (HC) is needed in the mid 90\u27s and beyond due to ever increasing super-speed requirements and the number of projects with these requirements. HC is defined as a special form of parallel and distributed computing that performs computations using a single autonomous computer operating in both SIMD and MIMD modes, or using a number of connected autonomous computers. Physical implementation of a heterogeneous network or system is currently possible due to the existing technological advances in networking and supercomputing. Unfortunately, software solutions for heterogeneous computing are still in their infancy. Theoretical models, software tools, and intelligent resource-management schemes need to be developed to support heterogeneous computing efficiently. In this thesis, we present a heterogeneous model of computation which encapsulates all the essential parameters for designing efficient software and hardware for HC. We also study a portable parallel programming tool, called Cluster-M, which implements this model. Furthermore, we study and analyze the hardware and software requirements of HC and show that, Cluster-M satisfies the requirements of HC environments

    Optimum and heuristic synthesis of multiple word-length Architectures

    Get PDF
    Published versio

    Generic Schema Matching with Cupid

    Get PDF
    Schema matching is a critical step in many applications, such as XML message mapping, data warehouse loading, and schema integration. In this paper, we investigate algorithms for generic schema matching, outside of any particular data model or application. We first present a taxonomy for past solutions, showing that a rich range of techniques is available. We then propose a new algorithm, Cupid, that discovers mappings between schema elements based on their names, data types, constraints, and schema structure, using a broader set of techniques than past approaches. Some of our innovations are the integrated use of linguistic and structural matching, context-dependent matching of shared types, and a bias toward leaf structure where much of the schema content resides. After describing our algorithm, we present experimental results that compare Cupid to two other schema matching systems

    Python FPGA Programming with Data-Centric Multi-Level Design

    Full text link
    Although high-level synthesis (HLS) tools have significantly improved programmer productivity over hardware description languages, developing for FPGAs remains tedious and error prone. Programmers must learn and implement a large set of vendor-specific syntax, patterns, and tricks to optimize (or even successfully compile) their applications, while dealing with ever-changing toolflows from the FPGA vendors. We propose a new way to develop, optimize, and compile FPGA programs. The Data-Centric parallel programming (DaCe) framework allows applications to be defined by their dataflow and control flow through the Stateful DataFlow multiGraph (SDFG) representation, capturing the abstract program characteristics, and exposing a plethora of optimization opportunities. In this work, we show how extending SDFGs with multi-level Library Nodes incorporates both domain-specific and platform-specific optimizations into the design flow, enabling knowledge transfer across application domains and FPGA vendors. We present the HLS-based FPGA code generation backend of DaCe, and show how SDFGs are code generated for either FPGA vendor, emitting efficient HLS code that is structured and annotated to implement the desired architecture

    Block-level test scheduling under power dissipation constraints

    Get PDF
    As dcvicc technologies such as VLSI and Multichip Module (MCM) become mature, and larger and denser memory ICs arc implemented for high-performancc digital systems, power dissipation becomes a critical factor and can no longer be ignored cither in normal operation of the system or under test conditions. One of the major considerations in test scheduling is the fact that heat dissipated during test application is significantly higher than during normal operation (sometimes 100 - 200% higher). Therefore, this is one of the recent major considerations in test scheduling. Test scheduling is strongly related to test concurrency. Test concurrency is a design property which strongly impacts testability and power dissipation. To satisfy high fault coverage goals with reduced test application time under certain power dissipation constraints, the testing of all components on the system should be performed m parallel to the greatest extent possible. Some theoretical analysis of this problem has been carried out, but only at IC level. The problem was basically described as a compatible test clustering, where the compatibility among tests was given by test resource and power dissipation conflicts at the same time. From an implementation point of view this problem was identified as an Non-Polynomial (NP) complete problem In this thesis, an efficient scheme for overlaying the block-tcsts, called the extended tree growing technique, is proposed together with classical scheduling algorithms to search for power-constrained blocktest scheduling (PTS) profiles m a polynomial time Classical algorithms like listbased scheduling and distribution-graph based scheduling arc employed to tackle at high level the PTS problem. This approach exploits test parallelism under power constraints. This is achieved by overlaying the block-tcst intervals of compatible subcircuits to test as many of them as possible concurrently so that the maximum accumulated power dissipation is balanced and does not exceed the given limit. The test scheduling discipline assumed here is the partitioned testing with run to completion. A constant additive model is employed for power dissipation analysis and estimation throughout the algorithm

    Составление расписания для графов синхронных потоков данных

    Get PDF
    Розглянуто задачу складання розкладу для алгоритму, який заданий графом синхронних потоків даних (ГСПД). Запропоновано метод складання періодичного розкладу ГСПД з періодом L тактів, оснований на перетворенні його у просторовий ГСПД, вершини якого мають координати місця та моменту виконання відповідних операторів алгоритму. На координати просторового ГСПД накладено обмеження: оператори, які виконуються в одному процесорному елементі, не повинні мати однакові такти свого виконання, які взято за модулем L. Завдяки цьому ГСПД відображається у спеціалізований обчислювач, який виконує алгоритм у конвеєрному режимі з оптимізованою завантаженністю ресурсів. Показано алгоритм пошуку субоптимального розкладу на основі просторового ГСПД.The scheduling problem for the synchronous dataflow graph (SDF) is considered. A method of the SDF scheduling is proposed which is based on transforming SDF into spatial SDF. The circular schedule has the period of L cycles. Each spatial SDF node has the coordinates of space and time of an event, where and when the respective algorithm steps were performed. A set of restrictions, which the SDF nodes have, helps to derive the circular schedule with the optimum load balancing. So, the nodes, which are mapped into a single resource, must not have the same clock cycles modulo L, and the number of such nodes has to approach L. The resulting schedule is implemented in the pipelined datapath. The algorithm for computing the suboptimal schedule is proposed based on the spatial SDF.Рассмотрена задача составления расписания для алгоритма, заданного графом синхронных потоков данных (ГСПД). Предложен метод составления расписания ГСПД с периодом L тактов, основанный на преобразовании его в пространственный ГСПД, вершины которого имеют координаты места и момента выполнения соответствующих операторов алгоритма. На координаты пространственного ГСПД наложены ограничения: операторы, исполняемые в дном процессорном элементе, не должны иметь одинаковые такты своего исполнения, взятые по модулю L. Благодаря этому ГСПД отображается в специализированный вычислитель, исполняющий алгоритм в конвейерном режиме с оптимизированной загруженностью ресурсов. Показан алгоритм поиска субоптимального расписания на основе пространственного ГСПД
    corecore