Search CORE

15,800 research outputs found

TTC: A Tensor Transposition Compiler for Multiple Architectures

Author: Abadi M.
Knijnenburg P. M.
Knijnenburg P. M.
Knijnenburg P. M.
Knijnenburg P. M.
Knijnenburg P. M.
Knijnenburg P. M.
Springer P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

We consider the problem of transposing tensors of arbitrary dimension and describe TTC, an open source domain-specific parallel compiler. TTC generates optimized parallel C++/CUDA C code that achieves a significant fraction of the system's peak memory bandwidth. TTC exhibits high performance across multiple architectures, including modern AVX-based systems (e.g.,~Intel Haswell, AMD Steamroller), Intel's Knights Corner as well as different CUDA-based GPUs such as NVIDIA's Kepler and Maxwell architectures. We report speedups of TTC over a meaningful baseline implementation generated by external C++ compilers; the results suggest that a domain-specific compiler can outperform its general purpose counterpart significantly: For instance, comparing with Intel's latest C++ compiler on the Haswell and Knights Corner architecture, TTC yields speedups of up to

8\times

and

32\times

, respectively. We also showcase TTC's support for multiple leading dimensions, making it a suitable candidate for the generation of performance-critical packing functions that are at the core of the ubiquitous BLAS 3 routines

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

A Survey on Handover Management in Mobility Architectures

Author: Ferretti Stefano
Ghini Vittorio
Panzieri Fabio
Publication venue: 'Elsevier BV'
Publication date: 04/09/2015
Field of study

This work presents a comprehensive and structured taxonomy of available techniques for managing the handover process in mobility architectures. Representative works from the existing literature have been divided into appropriate categories, based on their ability to support horizontal handovers, vertical handovers and multihoming. We describe approaches designed to work on the current Internet (i.e. IPv4-based networks), as well as those that have been devised for the "future" Internet (e.g. IPv6-based networks and extensions). Quantitative measures and qualitative indicators are also presented and used to evaluate and compare the examined approaches. This critical review provides some valuable guidelines and suggestions for designing and developing mobility architectures, including some practical expedients (e.g. those required in the current Internet environment), aimed to cope with the presence of NAT/firewalls and to provide support to legacy systems and several communication protocols working at the application layer

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Model-driven engineering approach to design and implementation of robot control system

Author: Trojanek Piotr
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we apply a model-driven engineering approach to designing domain-specific solutions for robot control system development. We present a case study of the complete process, including identification of the domain meta-model, graphical notation definition and source code generation for subsumption architecture -- a well-known example of robot control architecture. Our goal is to show that both the definition of the robot-control architecture and its supporting tools fits well into the typical workflow of model-driven engineering development.Comment: Presented at DSLRob 2011 (arXiv:cs/1212.3308

arXiv.org e-Print Archive

CiteSeerX

Explore Bristol Research

The future of computing beyond Moore's Law.

Author: Shalf John
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

Ezid

eScholarship - University of California

Domain-oriented architecture design for production control software

Author: Engmann Rolf
Weg Rob van de
Wieringa Roel
Publication venue: Springer Verlag
Publication date: 01/01/1998
Field of study

this paper, we present domain-oriented architectural design heuristics for production control software. Our approach is based upon the following premisses. First, software design, like all other forms of design, consists of the reduction of uncertainty about a final product by making design decisions. These decisions should as much as possible be based upon information that is certain, either because they represent laws of nature or because they represent previously made design decisions. An import class of information concerns the domain of the software. The domain of control software is the part of the world monitored and controlled by the software; it is the larger system into which the software is embedded. The software engineer should exploit system-level domain knowledge in order to make software design decisions. Second, in the case of production control software, using system-level knowledge is not only justified, it is also imposed on the software engineer by the necessity to cooperate with hardware engineers. These represent their designs by means of Process and Instrumentation Diagrams (PIDs) and Input-Output (IO) lists. They do not want to spend time, nor do they see the need, to duplicate the information represented by these diagrams by means of diagrams from software engineering methods. Such a duplication would be an occasion to introduce errors of omission (information lost during the translation process) or commission (misinterpretation, misguided but invisible design decisions made during the translation) anyway. We think it is up to the software engineer to adapt his or her notations to those of the system engineers he or she must work with. Third, work in patterns and software architectures started from the programminglanguage level and is now moving..

CiteSeerX

University of Twente Research Information

Design of multimedia processor based on metric computation

Author: Balasa
Berekovic
Jean Luc Philippe
Jean Philippe Diguet
Mohamed Abid
Nader Ben Amor
Suzuki
Wuytack
Yannick Le Moullec
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

arXiv.org e-Print Archive

Crossref

HAL-Université de Bretagne Occidentale

VBN

Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification

Author: Gupta Aarti
Huang Bo-Yuan
Malik Sharad
Subramanyan Pramod
Vizel Yakir
Zhang Hongce
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2018
Field of study

Modern Systems-on-Chip (SoC) designs are increasingly heterogeneous and contain specialized semi-programmable accelerators in addition to programmable processors. In contrast to the pre-accelerator era, when the ISA played an important role in verification by enabling a clean separation of concerns between software and hardware, verification of these "accelerator-rich" SoCs presents new challenges. From the perspective of hardware designers, there is a lack of a common framework for the formal functional specification of accelerator behavior. From the perspective of software developers, there exists no unified framework for reasoning about software/hardware interactions of programs that interact with accelerators. This paper addresses these challenges by providing a formal specification and high-level abstraction for accelerator functional behavior. It formalizes the concept of an Instruction Level Abstraction (ILA), developed informally in our previous work, and shows its application in modeling and verification of accelerators. This formal ILA extends the familiar notion of instructions to accelerators and provides a uniform, modular, and hierarchical abstraction for modeling software-visible behavior of both accelerators and programmable processors. We demonstrate the applicability of the ILA through several case studies of accelerators (for image processing, machine learning, and cryptography), and a general-purpose processor (RISC-V). We show how the ILA model facilitates equivalence checking between two ILAs, and between an ILA and its hardware finite-state machine (FSM) implementation. Further, this equivalence checking supports accelerator upgrades using the notion of ILA compatibility, similar to processor upgrades using ISA compatibility.Comment: 24 pages, 3 figures, 3 table

arXiv.org e-Print Archive

Princeton University Open Access Repository

Recommended from our members

A component-based product line architecture for workflow management systems

Author: Barroca Leonor
de Oliveira Junior Edson Alves
de Souza Gimenes Itana Maria
Lazilha Fabrício Ricardo
Publication venue
Publication date: 01/12/2004
Field of study

This paper presents a component-based product line for workflow management systems. The process followed to design the product line was based on the Catalysis method. Extensions were made to represent variability across the process. The domain of workflow management systems has been shown to be appropriate to the application of the product line approach as there are a standard architecture and models established by a regulatory board, the Workflow Management Coalition. In addition, there is a demand for similar workflow management systems but with some different features. The product line architecture was evaluated with Rapide simulation tools. The evaluation was based on selected scenarios, thus, avoiding implementation issues. The strategy that has been used to populate the architecture and experiment with the product line is shown. In particular, the design of the workflow execution manager component is described

Open Research Online (The Open University)