280 research outputs found
Recommended from our members
Global-Scale Data Management with Strong Consistency Guarantees
Global-scale data management(GSDM) empowers systems by providing higher levels of fault-tolerance, read availability, and efficiency in utilizing cloud resources. This has led to the emergence of global-scale data management and event processing. However, the Wide-Area Network (WAN) latency separating datacenters is orders of magnitude larger than typical network latencies, and this requires a reevaluation of many of the traditional design trade-offs of data management systems. Therefore, data management problems must be revisited to account for the new design space. In this dissertation, we propose theoretical foundations to understand the limits imposed by WAN latency on GSDM, and propose practical systems and protocols to minimize the overhead caused by WAN latency. The presented work spans global-scale transaction processing, communication, analytics, and machine learning. In all these directions, the focus is on the trade-off between consistency and latency, where we ask the question: what is the best performance (often latency) we can achieve without compromising the consistency and integrity of data? For transaction processing, we propose a lower-bound formulation for transaction latency that is imposed by the WAN latency. Also, we propose a new paradigm for transaction processing (proactive coordination) that inspired out two proposed protocols, Message Futures and Helios, which can achieve the lower-bound latency. We also propose a communication framework, called Chariots, to scale multi-datacenter communication. Chariots is carefully designed to allow scaling communication while providing a consistent view of the communicated information. Finally, we explore challenges in global-scale analytics and machine learning. Specifically, we propose Ogre, a scalable system for global-scale heterogeneous transactional and analytics workloads. Also, we propose COP, a system designed to speed up machine learning on globally generated data
Mariner Mars 1971 project. Volume 3: Mission operations system implementation and standard mission flight operations
The Mariner Mars 1971 mission which was another step in the continuing program of planetary exploration in search of evidence of exobiological activity, information on the origin and evolution of the solar system, and basic science data related to the study of planetary physics, geology, planetology, and cosmology is reported. The mission plan was designed for two spacecraft, each performing a separate but complementary mission. However, a single mission plan was actually used for Mariner 9 because of failure of the launch vehicle for the first spacecraft. The implementation is described, of the Mission Operations System, including organization, training, and data processing development and operations, and Mariner 9 spacecraft cruise and orbital operations through completion of the standard mission from launch to solar occultation in April 1972 are discussed
Time4: Time for SDN
With the rise of Software Defined Networks (SDN), there is growing interest
in dynamic and centralized traffic engineering, where decisions about
forwarding paths are taken dynamically from a network-wide perspective.
Frequent path reconfiguration can significantly improve the network
performance, but should be handled with care, so as to minimize disruptions
that may occur during network updates.
In this paper we introduce Time4, an approach that uses accurate time to
coordinate network updates. Time4 is a powerful tool in softwarized
environments, that can be used for various network update scenarios.
Specifically, we characterize a set of update scenarios called flow swaps, for
which Time4 is the optimal update approach, yielding less packet loss than
existing update approaches. We define the lossless flow allocation problem, and
formally show that in environments with frequent path allocation, scenarios
that require simultaneous changes at multiple network devices are inevitable.
We present the design, implementation, and evaluation of a Time4-enabled
OpenFlow prototype. The prototype is publicly available as open source. Our
work includes an extension to the OpenFlow protocol that has been adopted by
the Open Networking Foundation (ONF), and is now included in OpenFlow 1.5. Our
experimental results show the significant advantages of Time4 compared to other
network update approaches, and demonstrate an SDN use case that is infeasible
without Time4.Comment: This report is an extended version of "Software Defined Networks:
It's About Time", which was accepted to IEEE INFOCOM 2016. A preliminary
version of this report was published in arXiv in May, 201
Space programs summary no. 37-62, volume 2 for the period 1 January - 28 February 1970. The Deep Space Network
Deep Space Network operations revie
Models of higher-order, type-safe, distributed computation over autonomous persistent object stores
A remote procedure call (RPC) mechanism permits the calling of procedures in another
address space. RPC is a simple but highly effective mechanism for interprocess communication
and enjoys nowadays a great popularity as a tool for building distributed applications.
This popularity is partly a result of their overall simplicity but also partly a consequence
of more than 20 years of research in transpaxent distribution that have failed to deliver
systems that meet the expectations of real-world application programmers.
During the same 20 years, persistent systems have proved their suitability for building
complex database applications by seamlessly integrating features traditionally found in
database management systems into the programming language itself. Some research. effort
has been invested on distributed persistent systems, but the outcomes commonly suffer
from the same problems found with transparent distribution.
In this thesis I claim that a higher-order persistent RPC is useful for building distributed
persistent applications. The proposed mechanism is: realistic in the sense that it uses
current technology and tolerates partial failures; understandable by application programmers;
and general to support the development of many classes of distributed persistent
applications.
In order to demonstrate the validity of these claims, I propose and have implemented three
models for distributed higher-order computation over autonomous persistent stores. Each
model has successively exposed new problems which have then been overcome by the next
model. Together, the three models provide a general yet simple higher-order persistent
RPC that is able to operate in realistic environments with partial failures.
The real strength of this thesis is the demonstration of realism and simplicity. A higherorder
persistent RPC was not only implemented but also used by programmers without
experience of programming distributed applications. Furthermore, a distributed persistent
application has been built using these models which would not have been feasible with a
traditional (non-persistent) programming language
Fault tolerance distributed computing
Issued as Funds expenditure reports [nos. 1-4], Quarterly progress reports [nos. 1-3], and Final report, Project no. G-36-63
Fehlertolerante Mehrkernprozessoren für gemischt-kritische Echtzeitsysteme
Current and future computing systems must be appropriately designed to cope with random hardware faults in order to provide a dependable service and correct functionality. Dependability has many facets to be addressed when designing a system and that is specially challenging in mixed-critical real-time systems, where safety standards play an important role and where responding in time can be as important as responding correctly or even responding at all.
The thesis addresses the dependability of mixed-critical real-time systems, considering three important requirements: integrity, resilience and real-time. More specifically, it looks into the architectural and performance aspects of achieving dependability, concentrating its scope on error detection and handling in hardware -- more specifically in the Network-on-Chip (NoC), the backbone of modern MPSoC -- and on the performance of error handling and recovery in software.
The thesis starts by looking at the impacts of random hardware faults on the NoC and on the system, with special focus on soft errors. Then, it addresses the uncovered weaknesses in the NoC by proposing a resilient NoC for mixed-critical real-time systems that is able to provide a highly reliable service with transparent protection for the applications. Formal communication time analysis is provided with common ARQ protocols modeled for NoCs and including a novel ARQ-based protocol optimized for DMAs. After addressing the efficient use of ARQ-based protocols in NoCs, the thesis proposes the Advanced Integrity Q-service (AIQ), a low-overhead mechanism to achieve integrity and real-time guarantees of NoC transactions on an End-to-End (E2E) basis. Inspired by transactions in distributed systems, the mechanism differs from the previous approach in that it does not provide error recovery in hardware but delegates the task to software, making use of existing functionality in cross-layer fault-tolerance solutions.
Finally, the thesis addresses error handling in software as seen in cross-layer approaches. It addresses the performance of replicated software execution in many-core platforms. Replicated software execution provides protection to the system against random hardware faults. It relies on hardware-supported error detection and error handling in software. The replica-aware co-scheduling is proposed to achieve high performance with replicated execution, which is not possible with standard real-time schedulers.Um einen zuverlässigen Betrieb und korrekte Funktionalität zu gewährleisten, müssen aktuelle und zukünftige Computersysteme so ausgelegt werden, dass sie mit diesen Fehlern umgehen können. Zuverlässigkeit hat viele Aspekte, die bei der Entwicklung eines Systems berücksichtigt werden müssen. Das gilt insbesondere für Echtzeitsysteme mit gemischter Kritikalität, bei denen Sicherheitsstandards, die ein korrektes und rechtzeitiges Verhalten fordern, eine wichtige Rolle spielen.
Diese Dissertation befasst sich mit der Zuverlässigkeit von gemischt-kritischen Echtzeitsystemen unter Berücksichtigung von drei wichtigen Anforderungen: Integrität, Resilienz und Echtzeit.
Genauer gesagt, behandelt sie Architektur- und Leistungsaspekte die notwendig sind um Zuverlässigkeit zu erreichen, wobei der Schwerpunkt auf der Fehlererkennung und -behandlung in der Hardware – genauer gesagt im Network-on-Chip (NoC), dem Rückgrat des modernen MPSoC – und auf der Leistung der Fehlerbehandlung und -behebung in der Software liegt.
Die Arbeit beginnt mit der Untersuchung der Auswirkung von zufälligen Hardwarefehlern auf das NoC und das System, wobei der Schwerpunkt auf weichen Fehler (soft errors) liegt. Anschließend werden die aufgedeckten Schwachstellen im NoC behoben, indem ein widerstandsfähiges NoC für gemischt-kritische Echtzeitsysteme vorgeschlagen wird, das in der Lage ist, einen höchst zuverlässigen Betrieb mit transparentem Schutz für die Anwendungen zu bieten. Nach der Auseinandersetzung mit der effizienten Nutzung von ARQ-basierten Protokolle in NoCs, wird der Advanced Integrity Q-Service (AIQ) vorgestellt, der ein Mechanismus mit geringem Overhead ist, um Integrität und Echtzeit-Garantien von NoC-Transaktionen auf Ende-zu-Ende (E2E)-Basis zu erreichen. Inspiriert von Transaktionen in verteilten Systemen unterscheidet sich der Mechanismus vom bisherigen Konzept dadurch, dass er keine Fehlerbehebung in der Hardware vorsieht, sondern diese Aufgabe an die Software delegiert.
Schließlich befasst sich die Dissertation mit der Fehlerbehandlung in Software, wie sie in schichtübergreifenden Methoden zu sehen ist. Sie behandelt die Leistung der replizierten Software-Ausführung in Many-Core-Plattformen. Es setzt auf hardwaregestützte Fehlererkennung und Fehlerbehandlung in der Software. Das Replika-bewusste Co-Scheduling wird vorgeschlagen, um eine hohe Performance bei replizierter Ausführung zu erreichen, was mit Standard-Echtzeit-Schedulern nicht möglich ist
A comparative study of structured and un-structured remote data access in distributed computing systems
Recently, the use of distributed computing systems has been growing rapidly due to the result of cheap and advanced microelectronic technology. In addition to the decrease in hardware costs, the tremendous development in machine to machine communication interfaces, especially in local area networking, also favours the use of distributed systems. Distributed systems often require remote access to data stored at different sites. Generally, two models of access to remote data storage exist: the un structured and structured models. In the former, data is simply stored as row of bytes, whereas in the latter, data is stored along with the associated access codes. The objective of this thesis is to compare these two models and hence determines the tradeoffs of each model. First of all, an extended review of the field of distributed data access is provided which addressing key issues such as the basic design principles of distributed computing systems, the notions of abstract data types, data inheritance, data type system and data persistence. Secondly, a distributed system is implemented using the persistent programming language PS-algol and the high level language C in conjunction with the remote procedure call facilities available in Unix(^1) 4.2 BSD operating system. This distributed system makes extensive use of Unix's software tools and hence it is called DCSUNIX for Distributed Computing System on UNIX. Thirdly, two specific applications which employ the implemented system will be given so that a comparison can be made between the two remote data access models mentioned above. Finally, the implemented system is compared with the criteria established earlier in the thesis. keywords: abstract data types, class, database management, data persistence, information hiding, inheritance, object oriented programming, programming languages, remote procedure calls, transparency, and type checking
The deep space network
A report is given of the Deep Space Networks progress in (1) flight project support, (2) tracking and data acquisition research and technology, (3) network engineering, (4) hardware and software implementation, and (5) operations
- …