15,800 research outputs found

    TTC: A Tensor Transposition Compiler for Multiple Architectures

    Full text link
    We consider the problem of transposing tensors of arbitrary dimension and describe TTC, an open source domain-specific parallel compiler. TTC generates optimized parallel C++/CUDA C code that achieves a significant fraction of the system's peak memory bandwidth. TTC exhibits high performance across multiple architectures, including modern AVX-based systems (e.g.,~Intel Haswell, AMD Steamroller), Intel's Knights Corner as well as different CUDA-based GPUs such as NVIDIA's Kepler and Maxwell architectures. We report speedups of TTC over a meaningful baseline implementation generated by external C++ compilers; the results suggest that a domain-specific compiler can outperform its general purpose counterpart significantly: For instance, comparing with Intel's latest C++ compiler on the Haswell and Knights Corner architecture, TTC yields speedups of up to 8Ă—8\times and 32Ă—32\times, respectively. We also showcase TTC's support for multiple leading dimensions, making it a suitable candidate for the generation of performance-critical packing functions that are at the core of the ubiquitous BLAS 3 routines

    A Survey on Handover Management in Mobility Architectures

    Full text link
    This work presents a comprehensive and structured taxonomy of available techniques for managing the handover process in mobility architectures. Representative works from the existing literature have been divided into appropriate categories, based on their ability to support horizontal handovers, vertical handovers and multihoming. We describe approaches designed to work on the current Internet (i.e. IPv4-based networks), as well as those that have been devised for the "future" Internet (e.g. IPv6-based networks and extensions). Quantitative measures and qualitative indicators are also presented and used to evaluate and compare the examined approaches. This critical review provides some valuable guidelines and suggestions for designing and developing mobility architectures, including some practical expedients (e.g. those required in the current Internet environment), aimed to cope with the presence of NAT/firewalls and to provide support to legacy systems and several communication protocols working at the application layer

    Model-driven engineering approach to design and implementation of robot control system

    Full text link
    In this paper we apply a model-driven engineering approach to designing domain-specific solutions for robot control system development. We present a case study of the complete process, including identification of the domain meta-model, graphical notation definition and source code generation for subsumption architecture -- a well-known example of robot control architecture. Our goal is to show that both the definition of the robot-control architecture and its supporting tools fits well into the typical workflow of model-driven engineering development.Comment: Presented at DSLRob 2011 (arXiv:cs/1212.3308

    The future of computing beyond Moore's Law.

    Get PDF
    Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

    Domain-oriented architecture design for production control software

    Get PDF
    this paper, we present domain-oriented architectural design heuristics for production control software. Our approach is based upon the following premisses. First, software design, like all other forms of design, consists of the reduction of uncertainty about a final product by making design decisions. These decisions should as much as possible be based upon information that is certain, either because they represent laws of nature or because they represent previously made design decisions. An import class of information concerns the domain of the software. The domain of control software is the part of the world monitored and controlled by the software; it is the larger system into which the software is embedded. The software engineer should exploit system-level domain knowledge in order to make software design decisions. Second, in the case of production control software, using system-level knowledge is not only justified, it is also imposed on the software engineer by the necessity to cooperate with hardware engineers. These represent their designs by means of Process and Instrumentation Diagrams (PIDs) and Input-Output (IO) lists. They do not want to spend time, nor do they see the need, to duplicate the information represented by these diagrams by means of diagrams from software engineering methods. Such a duplication would be an occasion to introduce errors of omission (information lost during the translation process) or commission (misinterpretation, misguided but invisible design decisions made during the translation) anyway. We think it is up to the software engineer to adapt his or her notations to those of the system engineers he or she must work with. Third, work in patterns and software architectures started from the programminglanguage level and is now moving..

    Design of multimedia processor based on metric computation

    Get PDF
    Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

    Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification

    Full text link
    Modern Systems-on-Chip (SoC) designs are increasingly heterogeneous and contain specialized semi-programmable accelerators in addition to programmable processors. In contrast to the pre-accelerator era, when the ISA played an important role in verification by enabling a clean separation of concerns between software and hardware, verification of these "accelerator-rich" SoCs presents new challenges. From the perspective of hardware designers, there is a lack of a common framework for the formal functional specification of accelerator behavior. From the perspective of software developers, there exists no unified framework for reasoning about software/hardware interactions of programs that interact with accelerators. This paper addresses these challenges by providing a formal specification and high-level abstraction for accelerator functional behavior. It formalizes the concept of an Instruction Level Abstraction (ILA), developed informally in our previous work, and shows its application in modeling and verification of accelerators. This formal ILA extends the familiar notion of instructions to accelerators and provides a uniform, modular, and hierarchical abstraction for modeling software-visible behavior of both accelerators and programmable processors. We demonstrate the applicability of the ILA through several case studies of accelerators (for image processing, machine learning, and cryptography), and a general-purpose processor (RISC-V). We show how the ILA model facilitates equivalence checking between two ILAs, and between an ILA and its hardware finite-state machine (FSM) implementation. Further, this equivalence checking supports accelerator upgrades using the notion of ILA compatibility, similar to processor upgrades using ISA compatibility.Comment: 24 pages, 3 figures, 3 table
    • …
    corecore