Search CORE

13,232 research outputs found

Integration of tools for the Design and Assessment of High-Performance, Highly Reliable Computing Systems (DAHPHRS), phase 1

Author: Baker R.
Frank G.
Gray G.
Scheper C.
Yalamanchili S.
Publication venue
Publication date
Field of study

Systems for Space Defense Initiative (SDI) space applications typically require both high performance and very high reliability. These requirements present the systems engineer evaluating such systems with the extremely difficult problem of conducting performance and reliability trade-offs over large design spaces. A controlled development process supported by appropriate automated tools must be used to assure that the system will meet design objectives. This report describes an investigation of methods, tools, and techniques necessary to support performance and reliability modeling for SDI systems development. Models of the JPL Hypercubes, the Encore Multimax, and the C.S. Draper Lab Fault-Tolerant Parallel Processor (FTPP) parallel-computing architectures using candidate SDI weapons-to-target assignment algorithms as workloads were built and analyzed as a means of identifying the necessary system models, how the models interact, and what experiments and analyses should be performed. As a result of this effort, weaknesses in the existing methods and tools were revealed and capabilities that will be required for both individual tools and an integrated toolset were identified

NASA Technical Reports Server

Advanced flight control system study

Author: Klafin J. F.
Mcgough J.
Moses K.
Publication venue
Publication date
Field of study

The architecture, requirements, and system elements of an ultrareliable, advanced flight control system are described. The basic criteria are functional reliability of 10 to the minus 10 power/hour of flight and only 6 month scheduled maintenance. A distributed system architecture is described, including a multiplexed communication system, reliable bus controller, the use of skewed sensor arrays, and actuator interfaces. Test bed and flight evaluation program are proposed

NASA Technical Reports Server

Real-time and fault tolerance in distributed control software

Author: Broenink J.F.
Orlic B.
Publication venue: IOS Press
Publication date: 01/01/2003
Field of study

Closed loop control systems typically contain multitude of spatially distributed sensors and actuators operated simultaneously. So those systems are parallel and distributed in their essence. But mapping this parallelism onto the given distributed hardware architecture, brings in some additional requirements: safe multithreading, optimal process allocation, real-time scheduling of bus and network resources. Nowadays, fault tolerance methods and fast even online reconfiguration are becoming increasingly important. All those often conflicting requirements, make design and implementation of real-time distributed control systems an extremely difficult task, that requires substantial knowledge in several areas of control and computer science. Although many design methods have been proposed so far, none of them had succeeded to cover all important aspects of the problem at hand. [1] Continuous increase of production in embedded market, makes a simple and natural design methodology for real-time systems needed more then ever

CiteSeerX

University of Twente Research Information

Predicting Scheduling Failures in the Cloud

Author: Khomh Foutse
Soualhia Mbarka
Tahar Sofiene
Publication venue
Publication date: 01/01/2015
Field of study

Cloud Computing has emerged as a key technology to deliver and manage computing, platform, and software services over the Internet. Task scheduling algorithms play an important role in the efficiency of cloud computing services as they aim to reduce the turnaround time of tasks and improve resource utilization. Several task scheduling algorithms have been proposed in the literature for cloud computing systems, the majority relying on the computational complexity of tasks and the distribution of resources. However, several tasks scheduled following these algorithms still fail because of unforeseen changes in the cloud environments. In this paper, using tasks execution and resource utilization data extracted from the execution traces of real world applications at Google, we explore the possibility of predicting the scheduling outcome of a task using statistical models. If we can successfully predict tasks failures, we may be able to reduce the execution time of jobs by rescheduling failed tasks earlier (i.e., before their actual failing time). Our results show that statistical models can predict task failures with a precision up to 97.4%, and a recall up to 96.2%. We simulate the potential benefits of such predictions using the tool kit GloudSim and found that they can improve the number of finished tasks by up to 40%. We also perform a case study using the Hadoop framework of Amazon Elastic MapReduce (EMR) and the jobs of a gene expression correlations analysis study from breast cancer research. We find that when extending the scheduler of Hadoop with our predictive models, the percentage of failed jobs can be reduced by up to 45%, with an overhead of less than 5 minutes

arXiv.org e-Print Archive

PolyPublie

Deep Space Network information system architecture study

Author: Atkinson D. J.
Beswick C. A.
Cooper L. P.
Crowe R. A.
Jenkins J. S.
Markley R. W.
Masline R. C.
Stoloff M. J.
Tausworthe R. C.
Thomas J. L.
Publication venue
Publication date: 15/05/1992
Field of study

The purpose of this article is to describe an architecture for the Deep Space Network (DSN) information system in the years 2000-2010 and to provide guidelines for its evolution during the 1990s. The study scope is defined to be from the front-end areas at the antennas to the end users (spacecraft teams, principal investigators, archival storage systems, and non-NASA partners). The architectural vision provides guidance for major DSN implementation efforts during the next decade. A strong motivation for the study is an expected dramatic improvement in information-systems technologies, such as the following: computer processing, automation technology (including knowledge-based systems), networking and data transport, software and hardware engineering, and human-interface technology. The proposed Ground Information System has the following major features: unified architecture from the front-end area to the end user; open-systems standards to achieve interoperability; DSN production of level 0 data; delivery of level 0 data from the Deep Space Communications Complex, if desired; dedicated telemetry processors for each receiver; security against unauthorized access and errors; and highly automated monitor and control

NASA Technical Reports Server

A fault-tolerant multiprocessor architecture for aircraft, volume 1

Author: Ausrotas R. A.
Hanley L. D.
Hopkins A. L.
Lala J. H.
Martin J. H.
Smith T. B.
Taylor W.
Publication venue
Publication date
Field of study

A fault-tolerant multiprocessor architecture is reported. This architecture, together with a comprehensive information system architecture, has important potential for future aircraft applications. A preliminary definition and assessment of a suitable multiprocessor architecture for such applications is developed

NASA Technical Reports Server

On Money as a Means of Coordination between Network Packets

Author: Efraimidis Pavlos S.
Koutsiamanis Remous-Aris
Publication venue
Publication date: 01/01/2012
Field of study

In this work, we apply a common economic tool, namely money, to coordinate network packets. In particular, we present a network economy, called PacketEconomy, where each flow is modeled as a population of rational network packets, and these packets can self-regulate their access to network resources by mutually trading their positions in router queues. Every packet of the economy has its price, and this price determines if and when the packet will agree to buy or sell a better position. We consider a corresponding Markov model of trade and show that there are Nash equilibria (NE) where queue positions and money are exchanged directly between the network packets. This simple approach, interestingly, delivers improvements even when fiat money is used. We present theoretical arguments and experimental results to support our claims

arXiv.org e-Print Archive

CiteSeerX

Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

Author: Harper Richard
Publication venue
Publication date
Field of study

In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described

NASA Technical Reports Server

CSP channels for CAN-bus connected embedded control systems

Author: Broenink Jan F.
Orlic Bojan
Publication venue: STW Technology Foundation
Publication date: 01/01/2002
Field of study

Closed loop control system typically contains multitude of sensors and actuators operated simultaneously. So they are parallel and distributed in its essence. But when mapping this parallelism to software, lot of obstacles concerning multithreading communication and synchronization issues arise. To overcome this problem, the CT kernel/library based on CSP algebra has been developed. This project (TES.5410) is about developing communication extension to the CT library to make it applicable in distributed systems. Since the library is tailored for control systems, properties and requirements of control systems are taken into special consideration. Applicability of existing middleware solutions is examined. A comparison of applicable fieldbus protocols is done in order to determine most suitable ones and CAN fieldbus is chosen to be first fieldbus used. Brief overview of CSP and existing CSP based libraries is given. Middleware architecture is proposed along with few novel ideas

University of Twente Research Information