Search CORE

17 research outputs found

Security challenges and opportunities in adaptive and reconfigurable hardware

Author: Costan Victor Marius
Devadas Srinivas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2011
Field of study

We present a novel approach to building hardware support for providing strong security guarantees for computations running in the cloud (shared hardware in massive data centers), while maintaining the high performance and low cost that make cloud computing attractive in the first place. We propose augmenting regular cloud servers with a Trusted Computation Base (TCB) that can securely perform high-performance computations. Our TCB achieves cost savings by spreading functionality across two paired chips. We show that making a Field-Programmable Gate Array (FPGA) a part of the TCB benefits security and performance, and we explore a new method for defending the computation inside the TCB against side-channel attacks.Northrop Grumman CorporationQuanta Computer (Firm

CiteSeerX

DSpace@MIT

Crossref

Developer Support Tools for tevent Library

Author: Koňař David
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2013
Field of study

Práce se zabývá vytvořením návodu pro knihovnou tevent. Přiblížena je samotná koncepce knihovny a její možnosti spolu s ukázkami kódu, jak s knihovnou vhodně pracovat. Dále se práce zabývá rozšířením pro debuggery, jež bylo současně s touto prací vytvořeno a které umožňuje efektivnější práci s touto knihovnou. Zahrnuto je rovněž porovnání s konkurující knihovnou libevent.Aim of this thesis is creation of description and tutorial for tevent library. Another goal was developing of debugger extension which has been created along with this thesis and is helpful for programmers working with tevent. Furthermore, there is a comparison of tevent with competition library - libevent.

Digital library of Brno University of Technology

National Repository of Grey Literature

Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages

Author: Cohen Albert
Drach Nathalie
Drebes Andi
Heydemann Karine
Pop Antoniu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/08/2014
Field of study

International audienceWe present a joint scheduling and memory allocation algorithm for efficient execution of task-parallel programs on non-uniform memory architecture (NUMA) systems. Task and data placement decisions are based on a static description of the memory hierarchy and on runtime information about intertask communication. Existing locality-aware scheduling strategies for fine-grained tasks have strong limitations: they are specific to some class of machines or applications, they do not handle task dependences, they require manual program annotations, or they rely on fragile profiling schemes. By contrast, our solution makes no assumption on the structure of programs or on the layout of data in memory. Experimental results, based on the OpenStream language, show that locality of accesses to main memory of scientific applications can be increased significantly on a 64-core machine, resulting in a speedup of up to 1.63× compared to a state-of-the-art work-stealing scheduler

Crossref

INRIA a CCSD electronic archive server

The University of Manchester - Institutional Repository

Desafíos en el diseño de sistemas Ciber-Físicos

Author: Chandy John C.
Publication venue: Universidad San Buenaventura - USB (Colombia)
Publication date: 01/12/2010
Field of study

Los sistemas cyber-físicos ─Cyber-Physical Systems CPS─ es un proceso que integra la computación con los procesos físicos. Los computadores embebidos, el monitoreo de redes y el control de procesos físicos, usualmente tienen ciclos de retroalimentación en los que los procesos físicos afectan los cálculos, y viceversa. En este artículo se examinan los desafíos en el diseño de estos sistemas, y se plantea la cuestión de si la informática y las tecnologías de redes actuales proporcionan una base adecuada para ellos. La conclusión es que para mejorar los procesos de diseño de estos sistemas no será suficiente con elevar el nivel de abstracción o verificar, formalmente o no, los diseños en los que se basan las abstracciones de hoy. El potencial social y económico de los CPS es mucho mayor de lo que hasta el momento se ha pensado; en todo el mundo se están realizando grandes inversiones para desarrollar esta tecnología, pero los retos son considerables. Para aprovechar todo el potencial de los CPS se tendrán que reconstruir los procesos de las abstracciones informáticas y de las redes, y los procesos se deberán acoger en pleno a los principios de las dinámicas físicas y de la computación

Directory of Open Access Journals

Universidad de San Buenaventura, sede Bogotá: Editorial Bonaventuriana

Memory-manager/Scheduler Co-design: Optimizing Event-driven Programs to Improve Cache Behavior

Author: Bhatia Sapan
Consel Charles
Lawall Julia,
Publication venue: HAL CCSD
Publication date: 10/06/2006
Field of study

International audienceEvent-driven programming has emerged as a standard to implement high-performance servers due to its ﬂexibility and low OS overhead. Still, memory access remains a bottleneck. Generic optimization techniques yield only small improvements in the memory access behavior of event-driven servers, as such techniques do not exploit their speciﬁc structure and behavior. This paper presents an optimization framework dedicated to event-driven servers, based on a strategy to eliminate data-cache misses. We propose a novel memory manager combined with a tailored scheduling strategy to restrict the working data set of the program to a memory region mapped directly into the data cache. Our approach exploits the ﬂexible scheduling and deterministic execution of event-driven servers. We have applied our framework to industry-standard web servers including TUX and thttpd, as well as to the Squid proxy server and the Cactus QoS framework. Testing TUX and thttpd using a standard HTTP benchmark tool shows that our optimizations applied to the TUX web server reduce L2 data cache misses under heavy load by up to 75% and increase the throughput of the server by up to 38%

INRIA a CCSD electronic archive server

Mely: Efficient Workstealing for Multicore Event-Driven Systems

Author: Gaud Fabien
Genevès Sylvain
Lachaize Renaud
Lepers Baptiste
Mottet Fabien
Muller Gilles
Quéma Vivien
Publication venue: HAL CCSD
Publication date: 21/01/2010
Field of study

Many high-performance communicating systems are designed using the event-driven paradigm. As multicore platforms are now pervasive, it becomes crucial for such systems to take advantage of the available hardware parallelism. Event-coloring is a promising approach in this regard. First, it allows programmers to simply and progressively inject support for the safe, parallel execution of multiple event handlers through the use of annotations. Second, it relies on a workstealing algorithm to dynamically balance the execution of event handlers on the available cores. This paper studies the impact of the workstealing algorithm on the overall system performance. We first show that the only existing workstealing algorithm designed for event-coloring runtimes is not always efficient: for instance, it causes a 33% performance degradation on a Web server. We then introduce several enhancements to improve the workstealing behavior. An evaluation using both microbenchmarks and real applications, a Web server and the Secure File Server (SFS), shows that our system consistently outperforms a state-of-the-art runtime (Libasync-smp), with or without workstealing. In particular, our new workstealing improves performance by up to +25% compared to Libasync-smp without workstealing and by up to +73% compared to the Libasync-smp workstealing algorithm, in the Web server case

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

ZygOS: Achieving Low Tail Latency for Microsecond-scale Networked Tasks

Author: Barroso L. A.
Belay A.
Bronson N.
Dragojevic A.
Dunkels A.
Eisenbud D. E.
Jeong E.
Kivity A.
Lim H.
Nanavati M.
Nishtala R.
Rizzo L.
Schroeder B.
Soares L.
Stonebraker M.
Yang X.
Yasukata K.
Zeldovich N.
Zhang H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/10/2017
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Crossref

High Performance Web Servers: A Study In Concurrent Programming Models

Author: Radhakrishnan Srihari
Publication venue: 'University of Waterloo'
Publication date: 14/05/2019
Field of study

With the advent of commodity large-scale multi-core computers, the performance of software running on these computers has become a challenge to researchers and enterprise developers. While academic research and industrial products have moved in the direction of writing scalable and highly available services using distributed computing, single machine performance remains an active domain, one which is far from saturated. This thesis selects an archetypal software example and workload in this domain, and describes software characteristics affecting performance. The example is highly-parallel web-servers processing a static workload. Particularly, this work examines concurrent programming models in the context of high-performance web-servers across different architectures — threaded (Apache, Go and μKnot), event-driven (Nginx, μServer) and staged (WatPipe) — compared with two static workloads in two different domains. The two workloads are a Zipf distribution of file sizes representing a user session pulling an assortment of many small and a few large files, and a 50KB file representing chunked streaming of a large audio or video file. Significant effort is made to fairly compare eight web-servers by carefully tuning each via their adjustment parameters. Tuning plays a significant role in workload-specific performance. The two domains are no disk I/O (in-memory file set) and medium disk I/O. The domains are created by lowering the amount of RAM available to the web-server from 4GB to 2GB, forcing files to be evicted from the file-system cache. Both domains are also restricted to 4 CPUs. The primary goal of this thesis is to examine fundamental performance differences between threaded and event-driven concurrency models, with particular emphasis on user-level threading models. Additionally, a secondary goal of the work is to examine high-performance software under restricted hardware environments. Over-provisioned hardware environments can mask architectural and implementation shortcomings in software – the hypothesis in this work is that restricting resources stresses the application, bringing out important performance characteristics and properties. Experimental results for the given workload show that memory pressure is one of the most significant factors for the degradation of web-server performance, because it forces both the onset and amount of disk I/O. With an ever increasing need to support more content at faster rates, a web-server relies heavily on in-memory caching of files and related content. In fact, personal and small business web-servers are even run on minimal hardware, like the Raspberry Pi, with only 1GB of RAM and a small SD card for the file system. Therefore, understanding behaviour and performance in restricted contexts should be a normal aspect of testing a web server (and other software systems)

University of Waterloo's Institutional Repository

Building fast and secure Web services with OKWS

Author: Krohn Maxwell (Maxwell N.)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2005
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 69-74).OKWS is a Web server specialized for secure and fast delivery of dynamic content. It provides Web developers with a small set of tools powerful enough to build complex Web-based systems. Despite its emphasis on security, OKWS shows performance improvements compared to popular systems: when servicing fully dynamic, non-disk-bound database workloads, OKWS's throughput and responsiveness exceed that of Apache 2, Flash and Haboob. Experience with OKWS in a commercial deployment suggests it can reduce hardware and system management costs, while providing security guarantees absent in current systems. In the end, lessons gleaned from the OKWS project provide insight into how operating systems might better facilitate secure application design.by Maxwell Krohn.S.M

DSpace@MIT

Towards Implicit Parallel Programming for Systems

Author: Ertel Sebastian
Publication venue
Publication date: 30/12/2019
Field of study

Multi-core processors require a program to be decomposable into independent parts that can execute in parallel in order to scale performance with the number of cores. But parallel programming is hard especially when the program requires state, which many system programs use for optimization, such as for example a cache to reduce disk I/O. Most prevalent parallel programming models do not support a notion of state and require the programmer to synchronize state access manually, i.e., outside the realms of an associated optimizing compiler. This prevents the compiler to introduce parallelism automatically and requires the programmer to optimize the program manually. In this dissertation, we propose a programming language/compiler co-design to provide a new programming model for implicit parallel programming with state and a compiler that can optimize the program for a parallel execution. We define the notion of a stateful function along with their composition and control structures. An example implementation of a highly scalable server shows that stateful functions smoothly integrate into existing programming language concepts, such as object-oriented programming and programming with structs. Our programming model is also highly practical and allows to gradually adapt existing code bases. As a case study, we implemented a new data processing core for the Hadoop Map/Reduce system to overcome existing performance bottlenecks. Our lambda-calculus-based compiler automatically extracts parallelism without changing the program's semantics. We added further domain-specific semantic-preserving transformations that reduce I/O calls for microservice programs. The runtime format of a program is a dataflow graph that can be executed in parallel, performs concurrent I/O and allows for non-blocking live updates

Technische Universität Dresden: Qucosa