930,868 research outputs found
An initial performance review of software components for a heterogeneous computing platform
The design of embedded systems is a complex activity that involves a lot of
decisions. With high performance demands of present day usage scenarios and
software, they often involve energy hungry state-of-the-art computing units.
While focusing on power consumption of computing units, the physical properties
of software are often ignored. Recently, there has been a growing interest to
quantify and model the physical footprint of software (e.g. consumed power,
generated heat, execution time, etc.), and a component based approach
facilitates methods for describing such properties. Based on these, software
architects can make energy-efficient software design solutions. This paper
presents power consumption and execution time profiling of a component software
that can be allocated on heterogeneous computing units (CPU, GPU, FPGA) of a
tracked robot
GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems
While many of the architectural details of future exascale-class high
performance computer systems are still a matter of intense research, there
appears to be a general consensus that they will be strongly heterogeneous,
featuring "standard" as well as "accelerated" resources. Today, such resources
are available as multicore processors, graphics processing units (GPUs), and
other accelerators such as the Intel Xeon Phi. Any software infrastructure that
claims usefulness for such environments must be able to meet their inherent
challenges: massive multi-level parallelism, topology, asynchronicity, and
abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a
collection of building blocks that targets algorithms dealing with sparse
matrix representations on current and future large-scale systems. It implements
the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel
numerical kernels, intelligent resource management, and truly heterogeneous
parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We
describe the details of its design with respect to the challenges posed by
modern heterogeneous supercomputers and recent algorithmic developments.
Implementation details which are indispensable for achieving high efficiency
are pointed out and their necessity is justified by performance measurements or
predictions based on performance models. The library code and several
applications are available as open source. We also provide instructions on how
to make use of GHOST in existing software packages, together with a case study
which demonstrates the applicability and performance of GHOST as a component
within a larger software stack.Comment: 32 pages, 11 figure
Boosting the Performance of PC-based Software Routers with FPGA-enhanced Network Interface Cards
The research community is devoting increasing attention to software routers based on off-the-shelf hardware and open-source operating systems running on the personalcomputer (PC) architecture. Today's high-end PCs are equipped with peripheral component interconnect (PCI) shared buses enabling them to easily fit into the multi-gigabit-per-second routing segment, for a price much lower than that of commercial routers. However, commercially-available PC network interface cards (NICs) lack programmability, and require not only packets to cross the PCI bus twice, but also to be processed in software by the operating system, strongly reducing the achievable forwarding rate. It is therefore interesting to explore the performance of customizable NICs based on field-programmable gate array (FPGA) logic devices we developed and assess how well they can overcome the limitations of today's commercially-available NIC
Detecting the Onset of Dementia using Context-Oriented Architecture
In the last few years, Aspect Oriented Software De- velopment (AOSD) and Context Oriented Software Development (COSD) have become interesting alternatives for the design and construction of self-adaptive software systems. An analysis of these technologies shows them all to employ the principle of the separation of concerns, Model Driven Architecture (MDA) and Component-based Software Development (CBSD) for building high quality of software systems. In general, the ultimate goal of these technologies is to be able to reduce development costs and effort, while improving the adaptability, and dependability of software systems. COSD, has emerged as a generic devel- opment paradigm towards constructing self-adaptive software by integrating MDA with context-oriented component model. The self-adaptive applications are developed using a Context- Oriented Component-based Applications Model-Driven Architec- ture (COCA-MDA), which generates an Architecture Description language (ADL) presenting the architecture as a components- based software system. COCA-MDA enables the developers to modularise the application based on their context-dependent behaviours, and separate the context-dependent functionality from the context-free functionality of the application. In this article, we wish to study the impact of the decomposition mechanism performed in MDA approaches over the software self-adaptability. We argue that a better and significant advance in software modularity based on context information can increase software adaptability and increase their performance and modi- fiability
An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling
We present a sparse linear system solver that is based on a multifrontal
variant of Gaussian elimination, and exploits low-rank approximation of the
resulting dense frontal matrices. We use hierarchically semiseparable (HSS)
matrices, which have low-rank off-diagonal blocks, to approximate the frontal
matrices. For HSS matrix construction, a randomized sampling algorithm is used
together with interpolative decompositions. The combination of the randomized
compression with a fast ULV HSS factorization leads to a solver with lower
computational complexity than the standard multifrontal method for many
applications, resulting in speedups up to 7 fold for problems in our test
suite. The implementation targets many-core systems by using task parallelism
with dynamic runtime scheduling. Numerical experiments show performance
improvements over state-of-the-art sparse direct solvers. The implementation
achieves high performance and good scalability on a range of modern shared
memory parallel systems, including the Intel Xeon Phi (MIC). The code is part
of a software package called STRUMPACK -- STRUctured Matrices PACKage, which
also has a distributed memory component for dense rank-structured matrices
- âŠ