Search CORE

1,069 research outputs found

Custom-Enabled System Architectures for High End Computing

Author: Kogge Peter
Sterling Thomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

The US Federal Government has convened a major committee to determine future directions for government sponsored high end computing system acquisitions and enabling research. The High End Computing Revitalization Task Force was inaugurated in 2003 involving all Federal agencies for which high end computing is critical to meeting mission goals. As part of the HECRTF agenda, a multi-day community wide workshop was conducted involving experts from academia, industry, and the national laboratories and centers to provide the broadest perspective on important issues related to the HECRTF purview. Among the most critical issues in establishing future directions is the relative merits of commodity based systems such as clusters and MPPs versus custom system architecture strategies. This paper presents a perspective on the importance and value of the custom architecture approach in meeting future US requirements in supercomputing. The contents of this paper reflect the ideas of the participants of the working group chartered to explore custom enabled system architectures for high end computing. As in any such consensus presentation, while this paper captures the key ideas and tradeoffs, it does not exactly match the viewpoint of any single contributor, and there remains much room for constructive disagreement and refinement of the essential conclusions

Caltech Authors

Recommended from our members

The Design and Implementation of Low-Latency Prediction Serving Systems

Author: Crankshaw Daniel
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Machine learning is being deployed in a growing number of applications which demand real- time, accurate, and cost-efficient predictions under heavy query load. These applications employ a variety of machine learning frameworks and models, often composing several models within the same application. However, most machine learning frameworks and systems are optimized for model training and not deployment.In this thesis, I discuss three prediction serving systems designed to meet the needs of modern interactive machine learning applications. The key idea in this work is to utilize a decoupled, layered design that interposes systems on top of training frameworks to build low-latency, scalable serving systems. Velox introduced this decoupled architecture to enable fast online learning and model personalization in response to feedback. Clipper generalized this system architecture to be framework-agnostic and introduced a set of optimizations to reduce and bound prediction latency and improve prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. And InferLine provisions and manages the individual stages of prediction pipelines to minimize cost while meeting end-to-end tail latency constraints

eScholarship - University of California

PiCo: A Domain-Specific Language for Data Analytics Pipelines

Author: Misale Claudia
Publication venue
Publication date: 01/01/2017
Field of study

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world

ZENODO

Autonomic behavioural framework for structural parallelism over heterogeneous multi-core systems.

Author: Goli Mehdi
Publication venue
Publication date: 31/05/2015
Field of study

With the continuous advancement in hardware technologies, significant research has been devoted to design and develop high-level parallel programming models that allow programmers to exploit the latest developments in heterogeneous multi-core/many-core architectures. Structural programming paradigms propose a viable solution for e ciently programming modern heterogeneous multi-core architectures equipped with one or more programmable Graphics Processing Units (GPUs). Applying structured programming paradigms, it is possible to subdivide a system into building blocks (modules, skids or components) that can be independently created and then used in di erent systems to derive multiple functionalities. Exploiting such systematic divisions, it is possible to address extra-functional features such as application performance, portability and resource utilisations from the component level in heterogeneous multi-core architecture. While the computing function of a building block can vary for di erent applications, the behaviour (semantic) of the block remains intact. Therefore, by understanding the behaviour of building blocks and their structural compositions in parallel patterns, the process of constructing and coordinating a structured application can be automated. In this thesis we have proposed Structural Composition and Interaction Protocol (SKIP) as a systematic methodology to exploit the structural programming paradigm (Building block approach in this case) for constructing a structured application and extracting/injecting information from/to the structured application. Using SKIP methodology, we have designed and developed Performance Enhancement Infrastructure (PEI) as a SKIP compliant autonomic behavioural framework to automatically coordinate structured parallel applications based on the extracted extra-functional properties related to the parallel computation patterns. We have used 15 di erent PEI-based applications (from large scale applications with heavy input workload that take hours to execute to small-scale applications which take seconds to execute) to evaluate PEI in terms of overhead and performance improvements. The experiments have been carried out on 3 di erent Heterogeneous (CPU/GPU) multi-core architectures (including one cluster machine with 4 symmetric nodes with one GPU per node and 2 single machines with one GPU per machine). Our results demonstrate that with less than 3% overhead, we can achieve up to one order of magnitude speed-up when using PEI for enhancing application performance

Open Access Institutional Repository at Robert Gordon University

LayStream: composing standard gossip protocols for live video streaming

Author: Felber Pascal
Matos Miguel
Oliveira Rui Carlos Mendes de
Rivière Etienne
Schiavoni Valerio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Gossip-based live streaming is a popular topic, as attested by the vast literature on the subject. Despite the particular merits of each proposal, all need to implement and deal with common challenges such as membership management, topology construction and video packets dissemination. Well-principled gossip-based protocols have been proposed in the literature for each of these aspects. Our goal is to assess the feasibility of building a live streaming system, \sys, as a composition of these existing protocols, to deploy the resulting system on real testbeds, and report on lessons learned in the process. Unlike previous evaluations conducted by simulations and considering each protocol independently, we use real deployments. We evaluate protocols both independently and as a layered composition, and unearth specific problems and challenges associated with deployment and composition. We discuss and present solutions for these, such as a novel topology construction mechanism able to cope with the specificities of a large-scale and delay-sensitive environment, but also with requirements from the upper layer. Our implementation and data are openly available to support experimental reproducibility

The AXIOM software layers

AXIOM project aims at developing a heterogeneous computing board (SMP-FPGA).The Software Layers developed at the AXIOM project are explained.OmpSs provides an easy way to execute heterogeneous codes in multiple cores. People and objects will soon share the same digital network for information exchange in a world named as the age of the cyber-physical systems. The general expectation is that people and systems will interact in real-time. This poses pressure onto systems design to support increasing demands on computational power, while keeping a low power envelop. Additionally, modular scaling and easy programmability are also important to ensure these systems to become widespread. The whole set of expectations impose scientific and technological challenges that need to be properly addressed.The AXIOM project (Agile, eXtensible, fast I/O Module) will research new hardware/software architectures for cyber-physical systems to meet such expectations. The technical approach aims at solving fundamental problems to enable easy programmability of heterogeneous multi-core multi-board systems. AXIOM proposes the use of the task-based OmpSs programming model, leveraging low-level communication interfaces provided by the hardware. Modular scalability will be possible thanks to a fast interconnect embedded into each module. To this aim, an innovative ARM and FPGA-based board will be designed, with enhanced capabilities for interfacing with the physical world. Its effectiveness will be demonstrated with key scenarios such as Smart Video-Surveillance and Smart Living/Home (domotics).Peer ReviewedPostprint (author's final draft

Archivio della Ricerca - Università degli Studi di Siena

The Morphological Computation Principles as a New Paradigm for Robotic Design

Author: Cecilia Laschi
Davide Zambrano
Matteo Cianchetti
Publication venue: University of Zurich
Publication date: 01/01/2014
Field of study

A theory, by definition, is a generalization of some phenomenon observations, and a principle is a law or a rule that should be followed as a guideline. Their formalization is a creative process, which faces specific and attested steps. The following sections reproduce this logical flow by expressing the principle of Morphological Computation as a timeline: firstly the observations of this phenomenon in Nature has been reported in relation with some recent theories, afterward it has been linked with the current applications in artificial systems and finally the further applications, challenges and objectives will project this principle into future scenarios

Archivio della ricerca della Scuola Superiore Sant'Anna