Search CORE

5,026 research outputs found

Dynamic and Transparent Analysis of Commodity Production Systems

Author: Fattori Aristide
Martignoni Lorenzo
Monga Mattia
Paleari Roberto
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

We propose a framework that provides a programming interface to perform complex dynamic system-level analyses of deployed production systems. By leveraging hardware support for virtualization available nowadays on all commodity machines, our framework is completely transparent to the system under analysis and it guarantees isolation of the analysis tools running on its top. Thus, the internals of the kernel of the running system needs not to be modified and the whole platform runs unaware of the framework. Moreover, errors in the analysis tools do not affect the running system and the framework. This is accomplished by installing a minimalistic virtual machine monitor and migrating the system, as it runs, into a virtual machine. In order to demonstrate the potentials of our framework we developed an interactive kernel debugger, nicknamed HyperDbg. HyperDbg can be used to debug any critical kernel component, and even to single step the execution of exception and interrupt handlers.Comment: 10 pages, To appear in the 25th IEEE/ACM International Conference on Automated Software Engineering, Antwerp, Belgium, 20-24 September 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

AIR Universita degli studi di Milano

A Generic Checkpoint-Restart Mechanism for Virtual Machines

Author: Cooperman Gene
Garg Rohan
Sodha Komal
Publication venue
Publication date: 08/12/2012
Field of study

It is common today to deploy complex software inside a virtual machine (VM). Snapshots provide rapid deployment, migration between hosts, dependability (fault tolerance), and security (insulating a guest VM from the host). Yet, for each virtual machine, the code for snapshots is laboriously developed on a per-VM basis. This work demonstrates a generic checkpoint-restart mechanism for virtual machines. The mechanism is based on a plugin on top of an unmodified user-space checkpoint-restart package, DMTCP. Checkpoint-restart is demonstrated for three virtual machines: Lguest, user-space QEMU, and KVM/QEMU. The plugins for Lguest and KVM/QEMU require just 200 lines of code. The Lguest kernel driver API is augmented by 40 lines of code. DMTCP checkpoints user-space QEMU without any new code. KVM/QEMU, user-space QEMU, and DMTCP need no modification. The design benefits from other DMTCP features and plugins. Experiments demonstrate checkpoint and restart in 0.2 seconds using forked checkpointing, mmap-based fast-restart, and incremental Btrfs-based snapshots

arXiv.org e-Print Archive

CiteSeerX

Improving the reliability of commodity operating systems

Author: Brian N. Bershad
Chen P.
Custer H.
Forin A.
Gosling J.
Hand S. M.
Henry M. Levy
Intel Corporation
Levy H. M.
Michael M. Swift
Ng W. T.
Organick E. I.
UDI.
Young M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Quality assessment technique for ubiquitous software and middleware

Author: Hong G.Y.
James H.
Ryu H.
Publication venue: 'Massey University'
Publication date: 01/01/2006
Field of study

The new paradigm of computing or information systems is ubiquitous computing systems. The technology-oriented issues of ubiquitous computing systems have made researchers pay much attention to the feasibility study of the technologies rather than building quality assurance indices or guidelines. In this context, measuring quality is the key to developing high-quality ubiquitous computing products. For this reason, various quality models have been defined, adopted and enhanced over the years, for example, the need for one recognised standard quality model (ISO/IEC 9126) is the result of a consensus for a software quality model on three levels: characteristics, sub-characteristics, and metrics. However, it is very much unlikely that this scheme will be directly applicable to ubiquitous computing environments which are considerably different to conventional software, trailing a big concern which is being given to reformulate existing methods, and especially to elaborate new assessment techniques for ubiquitous computing environments. This paper selects appropriate quality characteristics for the ubiquitous computing environment, which can be used as the quality target for both ubiquitous computing product evaluation processes ad development processes. Further, each of the quality characteristics has been expanded with evaluation questions and metrics, in some cases with measures. In addition, this quality model has been applied to the industrial setting of the ubiquitous computing environment. These have revealed that while the approach was sound, there are some parts to be more developed in the future

Massey Research Online

A proactive fault tolerance framework for high performance computing (HPC) systems in the cloud

Author: Egwutuoha Ifeanyi Paulinus
Publication venue: Faculty of Engineering and Information Technologies, School of Electrical and Information Engineering
Publication date: 01/01/2014
Field of study

High Performance Computing (HPC) systems have been widely used by scientists and researchers in both industry and university laboratories to solve advanced computation problems. Most advanced computation problems are either data-intensive or computation-intensive. They may take hours, days or even weeks to complete execution. For example, some of the traditional HPC systems computations run on 100,000 processors for weeks. Consequently traditional HPC systems often require huge capital investments. As a result, scientists and researchers sometimes have to wait in long queues to access shared, expensive HPC systems. Cloud computing, on the other hand, offers new computing paradigms, capacity, and flexible solutions for both business and HPC applications. Some of the computation-intensive applications that are usually executed in traditional HPC systems can now be executed in the cloud. Cloud computing price model eliminates huge capital investments. However, even for cloud-based HPC systems, fault tolerance is still an issue of growing concern. The large number of virtual machines and electronic components, as well as software complexity and overall system reliability, availability and serviceability (RAS), are factors with which HPC systems in the cloud must contend. The reactive fault tolerance approach of checkpoint/restart, which is commonly used in HPC systems, does not scale well in the cloud due to resource sharing and distributed systems networks. Hence, the need for reliable fault tolerant HPC systems is even greater in a cloud environment. In this thesis we present a proactive fault tolerance approach to HPC systems in the cloud to reduce the wall-clock execution time, as well as dollar cost, in the presence of hardware failure. We have developed a generic fault tolerance algorithm for HPC systems in the cloud. We have further developed a cost model for executing computation-intensive applications on HPC systems in the cloud. Our experimental results obtained from a real cloud execution environment show that the wall-clock execution time and cost of running computation-intensive applications in the cloud can be considerably reduced compared to checkpoint and redundancy techniques used in traditional HPC systems

Sydney eScholarship

Grid Infrastructure for Domain Decomposition Methods in Computational ElectroMagnetics

Author: Francavilla A.
Mossucca L.
Ruiu Pietro
Terzo Olivier
Vipiana Francesca
Publication venue: Soha Maad (Ed.)
Publication date: 01/01/2012
Field of study

The accurate and efficient solution of Maxwell's equation is the problem addressed by the scientific discipline called Computational ElectroMagnetics (CEM). Many macroscopic phenomena in a great number of fields are governed by this set of differential equations: electronic, geophysics, medical and biomedical technologies, virtual EM prototyping, besides the traditional antenna and propagation applications. Therefore, many efforts are focussed on the development of new and more efficient approach to solve Maxwell's equation. The interest in CEM applications is growing on. Several problems, hard to figure out few years ago, can now be easily addressed thanks to the reliability and flexibility of new technologies, together with the increased computational power. This technology evolution opens the possibility to address large and complex tasks. Many of these applications aim to simulate the electromagnetic behavior, for example in terms of input impedance and radiation pattern in antenna problems, or Radar Cross Section for scattering applications. Instead, problems, which solution requires high accuracy, need to implement full wave analysis techniques, e.g., virtual prototyping context, where the objective is to obtain reliable simulations in order to minimize measurement number, and as consequence their cost. Besides, other tasks require the analysis of complete structures (that include an high number of details) by directly simulating a CAD Model. This approach allows to relieve researcher of the burden of removing useless details, while maintaining the original complexity and taking into account all details. Unfortunately, this reduction implies: (a) high computational effort, due to the increased number of degrees of freedom, and (b) worsening of spectral properties of the linear system during complex analysis. The above considerations underline the needs to identify appropriate information technologies that ease solution achievement and fasten required elaborations. The authors analysis and expertise infer that Grid Computing techniques can be very useful to these purposes. Grids appear mainly in high performance computing environments. In this context, hundreds of off-the-shelf nodes are linked together and work in parallel to solve problems, that, previously, could be addressed sequentially or by using supercomputers. Grid Computing is a technique developed to elaborate enormous amounts of data and enables large-scale resource sharing to solve problem by exploiting distributed scenarios. The main advantage of Grid is due to parallel computing, indeed if a problem can be split in smaller tasks, that can be executed independently, its solution calculation fasten up considerably. To exploit this advantage, it is necessary to identify a technique able to split original electromagnetic task into a set of smaller subproblems. The Domain Decomposition (DD) technique, based on the block generation algorithm introduced in Matekovits et al. (2007) and Francavilla et al. (2011), perfectly addresses our requirements (see Section 3.4 for details). In this chapter, a Grid Computing infrastructure is presented. This architecture allows parallel block execution by distributing tasks to nodes that belong to the Grid. The set of nodes is composed by physical machines and virtualized ones. This feature enables great flexibility and increase available computational power. Furthermore, the presence of virtual nodes allows a full and efficient Grid usage, indeed the presented architecture can be used by different users that run different applications

IntechOpen

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino