15,800 research outputs found
TTC: A Tensor Transposition Compiler for Multiple Architectures
We consider the problem of transposing tensors of arbitrary dimension and
describe TTC, an open source domain-specific parallel compiler. TTC generates
optimized parallel C++/CUDA C code that achieves a significant fraction of the
system's peak memory bandwidth. TTC exhibits high performance across multiple
architectures, including modern AVX-based systems (e.g.,~Intel Haswell, AMD
Steamroller), Intel's Knights Corner as well as different CUDA-based GPUs such
as NVIDIA's Kepler and Maxwell architectures. We report speedups of TTC over a
meaningful baseline implementation generated by external C++ compilers; the
results suggest that a domain-specific compiler can outperform its general
purpose counterpart significantly: For instance, comparing with Intel's latest
C++ compiler on the Haswell and Knights Corner architecture, TTC yields
speedups of up to and , respectively. We also showcase
TTC's support for multiple leading dimensions, making it a suitable candidate
for the generation of performance-critical packing functions that are at the
core of the ubiquitous BLAS 3 routines
A Survey on Handover Management in Mobility Architectures
This work presents a comprehensive and structured taxonomy of available
techniques for managing the handover process in mobility architectures.
Representative works from the existing literature have been divided into
appropriate categories, based on their ability to support horizontal handovers,
vertical handovers and multihoming. We describe approaches designed to work on
the current Internet (i.e. IPv4-based networks), as well as those that have
been devised for the "future" Internet (e.g. IPv6-based networks and
extensions). Quantitative measures and qualitative indicators are also
presented and used to evaluate and compare the examined approaches. This
critical review provides some valuable guidelines and suggestions for designing
and developing mobility architectures, including some practical expedients
(e.g. those required in the current Internet environment), aimed to cope with
the presence of NAT/firewalls and to provide support to legacy systems and
several communication protocols working at the application layer
Model-driven engineering approach to design and implementation of robot control system
In this paper we apply a model-driven engineering approach to designing
domain-specific solutions for robot control system development. We present a
case study of the complete process, including identification of the domain
meta-model, graphical notation definition and source code generation for
subsumption architecture -- a well-known example of robot control architecture.
Our goal is to show that both the definition of the robot-control architecture
and its supporting tools fits well into the typical workflow of model-driven
engineering development.Comment: Presented at DSLRob 2011 (arXiv:cs/1212.3308
The future of computing beyond Moore's Law.
Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Domain-oriented architecture design for production control software
this paper, we present domain-oriented architectural design heuristics for production control software. Our approach is based upon the following premisses. First, software design, like all other forms of design, consists of the reduction of uncertainty about a final product by making design decisions. These decisions should as much as possible be based upon information that is certain, either because they represent laws of nature or because they represent previously made design decisions. An import class of information concerns the domain of the software. The domain of control software is the part of the world monitored and controlled by the software; it is the larger system into which the software is embedded. The software engineer should exploit system-level domain knowledge in order to make software design decisions. Second, in the case of production control software, using system-level knowledge is not only justified, it is also imposed on the software engineer by the necessity to cooperate with hardware engineers. These represent their designs by means of Process and Instrumentation Diagrams (PIDs) and Input-Output (IO) lists. They do not want to spend time, nor do they see the need, to duplicate the information represented by these diagrams by means of diagrams from software engineering methods. Such a duplication would be an occasion to introduce errors of omission (information lost during the translation process) or commission (misinterpretation, misguided but invisible design decisions made during the translation) anyway. We think it is up to the software engineer to adapt his or her notations to those of the system engineers he or she must work with. Third, work in patterns and software architectures started from the programminglanguage level and is now moving..
Design of multimedia processor based on metric computation
Media-processing applications, such as signal processing, 2D and 3D graphics
rendering, and image compression, are the dominant workloads in many embedded
systems today. The real-time constraints of those media applications have
taxing demands on today's processor performances with low cost, low power and
reduced design delay. To satisfy those challenges, a fast and efficient
strategy consists in upgrading a low cost general purpose processor core. This
approach is based on the personalization of a general RISC processor core
according the target multimedia application requirements. Thus, if the extra
cost is justified, the general purpose processor GPP core can be enforced with
instruction level coprocessors, coarse grain dedicated hardware, ad hoc
memories or new GPP cores. In this way the final design solution is tailored to
the application requirements. The proposed approach is based on three main
steps: the first one is the analysis of the targeted application using
efficient metrics. The second step is the selection of the appropriate
architecture template according to the first step results and recommendations.
The third step is the architecture generation. This approach is experimented
using various image and video algorithms showing its feasibility
Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification
Modern Systems-on-Chip (SoC) designs are increasingly heterogeneous and
contain specialized semi-programmable accelerators in addition to programmable
processors. In contrast to the pre-accelerator era, when the ISA played an
important role in verification by enabling a clean separation of concerns
between software and hardware, verification of these "accelerator-rich" SoCs
presents new challenges. From the perspective of hardware designers, there is a
lack of a common framework for the formal functional specification of
accelerator behavior. From the perspective of software developers, there exists
no unified framework for reasoning about software/hardware interactions of
programs that interact with accelerators. This paper addresses these challenges
by providing a formal specification and high-level abstraction for accelerator
functional behavior. It formalizes the concept of an Instruction Level
Abstraction (ILA), developed informally in our previous work, and shows its
application in modeling and verification of accelerators. This formal ILA
extends the familiar notion of instructions to accelerators and provides a
uniform, modular, and hierarchical abstraction for modeling software-visible
behavior of both accelerators and programmable processors. We demonstrate the
applicability of the ILA through several case studies of accelerators (for
image processing, machine learning, and cryptography), and a general-purpose
processor (RISC-V). We show how the ILA model facilitates equivalence checking
between two ILAs, and between an ILA and its hardware finite-state machine
(FSM) implementation. Further, this equivalence checking supports accelerator
upgrades using the notion of ILA compatibility, similar to processor upgrades
using ISA compatibility.Comment: 24 pages, 3 figures, 3 table
Recommended from our members
A component-based product line architecture for workflow management systems
This paper presents a component-based product line for workflow management systems. The process followed to design the product line was based on the Catalysis method. Extensions were made to represent variability across the process. The domain of workflow management systems has been shown to be appropriate to the application of the product line approach as there are a standard architecture and models established by a regulatory board, the Workflow Management Coalition. In addition, there is a demand for similar workflow management systems but with some different features. The product line architecture was evaluated with Rapide simulation tools. The evaluation was based on selected scenarios, thus, avoiding implementation issues. The strategy that has been used to populate the architecture and experiment with the product line is shown. In particular, the design of the workflow execution manager component is described
- …