Search CORE

1,525 research outputs found

A Unified Operating System for Clouds and Manycore: fos

Author: Agarwal Anant
Beckmann Nathan
Belay Adam
Gruenwald Charles, III
Miller Jason
Modzelewski Kevin
Wentzlaff David
Youseff Lamia
Publication venue
Publication date: 20/11/2009
Field of study

Single chip processors with thousands of cores will be available in the next ten years and clouds of multicore processors afford the operating system designer thousands of cores today. Constructing operating systems for manycore and cloud systems face similar challenges. This work identifies these shared challenges and introduces our solution: a factored operating system (fos) designed to meet the scalability, faultiness, variability of demand, and programming challenges of OSâ s for single-chip thousand-core manycore systems as well as current day cloud computers. Current monolithic operating systems are not well suited for manycores and clouds as they have taken an evolutionary approach to scaling such as adding fine grain locks and redesigning subsystems, however these approaches do not increase scalability quickly enough. fos addresses the OS scalability challenge by using a message passing design and is composed out of a collection of Internet inspired servers. Each operating system service is factored into a set of communicating servers which in aggregate implement a system service. These servers are designed much in the way that distributed Internet services are designed, but provide traditional kernel services instead of Internet services. Also, fos embraces the elasticity of cloud and manycore platforms by adapting resource utilization to match demand. fos facilitates writing applications across the cloud by providing a single system image across both future 1000+ core manycores and current day Infrastructure as a Service cloud computers. In contrast, current cloud environments do not provide a single system image and introduce complexity for the user by requiring different programming models for intra- vs inter-machine communication, and by requiring the use of non-OS standard management tools

DSpace@MIT

Computing in the RAIN: a reliable array of independent nodes

Author: Bohossian Vasken
Bruck Jehoshua
Fan Chenggong C.
LeMahieu Paul S.
Riedel Marc D.
Xu Lihao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data-storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. The RAIN-technology has been transferred to Rainfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper, we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures, 2) fault management techniques based on group membership, and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: a highly-available video server, a highly-available Web server, and a distributed checkpointing system. Also, we describe a commercial product, Rainwall, built with the RAIN technology

CiteSeerX

Caltech Authors

Energy reduction in multiprocessor systems using transactional memory

Author: M. Herlihy
null Tali Moreshet
R.I. Bahar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Crossref

A Survey of Research into Mixed Criticality Systems

Author: Burns Alan
Davis Robert Ian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/11/2017
Field of study

This survey covers research into mixed criticality systems that has been published since Vestal’s seminal paper in 2007, up until the end of 2016. The survey is organised along the lines of the major research areas within this topic. These include single processor analysis (including fixed priority and EDF scheduling, shared resources and static and synchronous scheduling), multiprocessor analysis, realistic models, and systems issues. The survey also explores the relationship between research into mixed criticality systems and other topics such as hard and soft time constraints, fault tolerant scheduling, hierarchical scheduling, cyber physical systems, probabilistic real-time systems, and industrial safety standards

Crossref

White Rose Research Online

SIMULATION-BASED PERFORMABILITY ANALYSIS OF MULTIPROCESSOR SYSTEMS

Author: Hein Axel
Hohl Wolfgang
Publication venue: Periodica Polytechnica Electrical Engineering (Archives)
Publication date: 01/01/1995
Field of study

The primary focus in the analysis of multiprocessor systems has traditionally been on their performance. However, their large number of components, their complex network topologies, and sophisticated system software can make them very unreliable. The dependability of a computing system ought to be considered in an early stage of its development in order to take influence on the system architecture and to achieve best performance with high dependability. In this paper a simulation-based method for the combined performance and dependability analysis of fault tolerant multiprocessor systems are presented which provide meaningful results already during the design phase

Periodica Polytechnica (Budapest University of Technology and Economics)

Quest-V: A Virtualized Multikernel for High-Confidence Systems

Author: Danish Matthew
Li Ye
West Richard
Publication venue
Publication date: 20/12/2011
Field of study

This paper outlines the design of `Quest-V', which is implemented as a collection of separate kernels operating together as a distributed system on a chip. Quest-V uses virtualization techniques to isolate kernels and prevent local faults from affecting remote kernels. This leads to a high-confidence multikernel approach, where failures of system subcomponents do not render the entire system inoperable. A virtual machine monitor for each kernel keeps track of shadow page table mappings that control immutable memory access capabilities. This ensures a level of security and fault tolerance in situations where a service in one kernel fails, or is corrupted by a malicious attack. Communication is supported between kernels using shared memory regions for message passing. Similarly, device driver data structures are shareable between kernels to avoid the need for complex I/O virtualization, or communication with a dedicated kernel responsible for I/O. In Quest-V, device interrupts are delivered directly to a kernel, rather than via a monitor that determines the destination. Apart from bootstrapping each kernel, handling faults and managing shadow page tables, the monitors are not needed. This differs from conventional virtual machine systems in which a central monitor, or hypervisor, is responsible for scheduling and management of host resources amongst a set of guest kernels. In this paper we show how Quest-V can implement novel fault isolation and recovery techniques that are not possible with conventional systems. We also show how the costs of using virtualization for isolation of system services does not add undue overheads to the overall system performance

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores

Author: Arad Cosmin
Haridi Seif
Shafaat Tallat M.
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/2012
Field of study

Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing. This paper introduces consistent quorums as a solution for achieving atomic consistency. We present the design and implementation of CATS, a distributed key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware. Our system shows that consistency can be achieved with practical performance and modest throughput overhead (5%) for read-intensive workloads

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Dynamic server selection in a multithreaded network computing environment

Author: Stapleton Joseph F.
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1990
Field of study

Research has been conducted at the Iowa State University Center for Nondestructive Evaluation (CNDE) to create a structure in which existing numerical modeling programs can be converted to execute in a network computing environment. This research task is to include the development of an extensible architecture which accommodates the timely integration of new processing capabilities and requirements. The research was motivated by many needs within the CNDE to reduce the predicted run times associated with the current and future modeling programs

Digital Repository @ Iowa State University (ISU)