241 research outputs found
A shared-disk parallel cluster file system
Dissertação apresentada para obtenção do Grau de Doutor em Informática Pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaToday, clusters are the de facto cost effective platform both for high performance
computing (HPC) as well as IT environments. HPC and IT are quite different environments
and differences include, among others, their choices on file systems and storage: HPC favours parallel file systems geared towards maximum I/O bandwidth, but which are not fully POSIX-compliant and were devised to run on top of (fault prone) partitioned storage; conversely, IT data centres favour both external disk arrays (to provide highly available storage) and POSIX compliant file systems, (either general purpose or shared-disk cluster file systems, CFSs).
These specialised file systems do perform very well in their target environments provided that applications do not require some lateral features, e.g., no file locking on parallel file systems, and no high performance writes over cluster-wide shared files on CFSs. In brief, we can say
that none of the above approaches solves the problem of providing high levels of reliability and performance to both worlds.
Our pCFS proposal makes a contribution to change this situation: the rationale is to take advantage on the best of both – the reliability of cluster file systems and the high performance of parallel file systems. We don’t claim to provide the absolute best of each, but we aim at full POSIX compliance, a rich feature set, and levels of reliability and performance good enough
for broad usage – e.g., traditional as well as HPC applications, support of clustered DBMS engines that may run over regular files, and video streaming. pCFS’ main ideas include:
· Cooperative caching, a technique that has been used in file systems for distributed disks but, as far as we know, was never used either in SAN based cluster file systems or in parallel file systems. As a result, pCFS may use all infrastructures (LAN and SAN) to move data.
· Fine-grain locking, whereby processes running across distinct nodes may define nonoverlapping byte-range regions in a file (instead of the whole file) and access them in parallel, reading and writing over those regions at the infrastructure’s full speed (provided that no major metadata changes are required).
A prototype was built on top of GFS (a Red Hat shared disk CFS): GFS’ kernel code was
slightly modified, and two kernel modules and a user-level daemon were added. In the
prototype, fine grain locking is fully implemented and a cluster-wide coherent cache is maintained through data (page fragments) movement over the LAN.
Our benchmarks for non-overlapping writers over a single file shared among processes
running on different nodes show that pCFS’ bandwidth is 2 times greater than NFS’ while
being comparable to that of the Parallel Virtual File System (PVFS), both requiring about 10 times more CPU. And pCFS’ bandwidth also surpasses GFS’ (600 times for small record sizes, e.g., 4 KB, decreasing down to 2 times for large record sizes, e.g., 4 MB), at about the same CPU usage.Lusitania, Companhia de Seguros S.A, Programa
IBM Shared University Research (SUR
Adaptive Mid-term and Short-term Scheduling of Mixed-criticality Systems
A mixed-criticality real-time system is a real-time system having multiple tasks classified according to their criticality. Research on mixed-criticality systems started to provide an effective and cost efficient a priori verification process for safety critical systems. The higher the criticality of a task within a system and the more the system should guarantee the required level of service for it. However, such model poses new challenges with respect to scheduling and fault tolerance within real-time systems. Currently, mixed-criticality scheduling protocols severely degrade lower criticality tasks in case of resource shortage to provide the required level of service for the most critical
ones. The actual research challenge in this field is to devise robust scheduling protocols
to minimise the impact on less critical tasks.
This dissertation introduces two approaches, one short-term and the other medium-term, to appropriately allocate computing resources to tasks within mixed-criticality systems both on uniprocessor and multiprocessor systems.
The short-term strategy consists of a protocol named Lazy Bailout Protocol (LBP) to schedule mixed-criticality task sets on single core architectures. Scheduling decisions are made about tasks that are active in the ready queue and that have to be dispatched to the CPU. LBP minimises the service degradation for lower criticality tasks by providing to them a background execution during the system idle time. After, I refined LBP with variants that aim to further increase the service level provided for lower criticality tasks. However, this is achieved at an increased cost of either system offline analysis or complexity at runtime.
The second approach, named Adaptive Tolerance-based Mixed-criticality Protocol (ATMP), decides at runtime which task has to be allocated to the active cores according to the available resources. ATMP permits to optimise the overall system utility by tuning the system workload in case of shortage of computing capacity at runtime. Unlike the majority of current mixed-criticality approaches, ATMP allows to smoothly degrade also higher criticality tasks to keep allocated lower criticality ones
Fault tolerant software technology for distributed computing system
Issued as Monthly reports [nos. 1-23], Interim technical report, Technical guide books [nos. 1-2], and Final report, Project no. G-36-64
The exploitation of parallelism on shared memory multiprocessors
PhD ThesisWith the arrival of many general purpose shared memory multiple processor
(multiprocessor) computers into the commercial arena during the mid-1980's, a
rift has opened between the raw processing power offered by the emerging
hardware and the relative inability of its operating software to effectively deliver
this power to potential users. This rift stems from the fact that, currently, no
computational model with the capability to elegantly express parallel activity is
mature enough to be universally accepted, and used as the basis for programming
languages to exploit the parallelism that multiprocessors offer. To add to this,
there is a lack of software tools to assist programmers in the processes of designing
and debugging parallel programs.
Although much research has been done in the field of programming languages,
no undisputed candidate for the most appropriate language for programming
shared memory multiprocessors has yet been found. This thesis examines why this
state of affairs has arisen and proposes programming language constructs,
together with a programming methodology and environment, to close the ever
widening hardware to software gap.
The novel programming constructs described in this thesis are intended for use
in imperative languages even though they make use of the synchronisation
inherent in the dataflow model by using the semantics of single assignment when
operating on shared data, so giving rise to the term shared values. As there are
several distinct parallel programming paradigms, matching flavours of shared
value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council
Locating Objects in a Wide-area System
Steen, M.R. van [Promotor]Tanenbaum, A.S. [Promotor
Second CLIPS Conference Proceedings, volume 1
Topics covered at the 2nd CLIPS Conference held at the Johnson Space Center, September 23-25, 1991 are given. Topics include rule groupings, fault detection using expert systems, decision making using expert systems, knowledge representation, computer aided design and debugging expert systems
Organization based multiagent architecture for distributed environments
[EN]Distributed environments represent a complex field in which applied solutions should be flexible and include significant adaptation capabilities. These environments are related to problems where multiple users and devices may interact, and where simple and local solutions could possibly generate good results, but may not be effective with regards to use and interaction.
There are many techniques that can be employed to face this kind of problems, from CORBA to multi-agent systems, passing by web-services and SOA, among others. All those methodologies have their advantages and disadvantages that are properly analyzed in this documents, to finally explain the new architecture presented as a solution for distributed environment problems.
The new architecture for solving complex solutions in distributed environments presented here is called OBaMADE: Organization Based Multiagent Architecture for Distributed Environments. It is a multiagent architecture based on the organizations of agents paradigm, where the agents in the architecture are structured into organizations to improve their organizational capabilities.
The reasoning power of the architecture is based on the Case-Based Reasoning methology, being implemented in a internal organization that uses agents to create services to solve the external request made by the users.
The OBaMADE architecture has been successfully applied to two different case studies where its prediction capabilities have been properly checked. Those case studies have showed optimistic results and, being complex systems, have demonstrated the abstraction and generalizations capabilities of the architecture.
Nevertheless OBaMADE is intended to be able to solve much other kind of problems in distributed environments scenarios. It should be applied to other varieties of situations and to other knowledge fields to fully develop its potencial.[ES]Los entornos distribuidos representan un campo de conocimiento complejo en el que las soluciones a aplicar deben ser flexibles y deben contar con gran capacidad de adaptación. Este tipo de entornos está normalmente relacionado con problemas donde varios usuarios y dispositivos entran en juego. Para solucionar dichos problemas, pueden utilizarse sistemas locales que, aunque ofrezcan buenos resultados en términos de calidad de los mismos, no son tan efectivos en cuanto a la interacción y posibilidades de uso.
Existen mĂşltiples tĂ©cnicas que pueden ser empleadas para resolver este tipo de problemas, desde CORBA a sistemas multiagente, pasando por servicios web y SOA, entre otros. Todas estas mitologĂas tienen sus ventajas e inconvenientes, que se analizan en este documento, para explicar, finalmente, la nueva arquitectura presentada como una soluciĂłn para los problemas generados en entornos distribuidos.
La nueva arquitectura aquà se llama OBaMADE, que es el acrónimo del inglés Organization Based Multiagent Architecture for Distributed Environments (Arquitectura Multiagente Basada en Organizaciones para Entornos Distribuidos). Se trata de una arquitectura multiagente basasa en el paradigma de las organizaciones de agente, donde los agentes que forman parte de la arquitectura se estructuran en organizaciones para mejorar sus capacidades organizativas.
La capacidad de razonamiento de la arquitectura está basada en la metodologĂa de razonamiento basado en casos, que se ha implementado en una de las organizaciones internas de la arquitectura por medio de agentes que crean servicios que responden a las solicitudes externas de los usuarios.
La arquitectura OBaMADE se ha aplicado de forma exitosa a dos casos de estudio diferentes, en los que se han demostrado sus capacidades predictivas. Aplicando OBaMADE a estos casos de estudio se han obtenido resultados esperanzadores y, al ser sistemas complejos, se han demostrado las capacidades tanto de abstracciĂłn como de generalizaciĂłn de la arquitectura presentada.
Sin embargo, esta arquitectura está diseñada para poder ser aplicada a más tipo de problemas de entornos distribuidos. Debe ser aplicada a más variadas situaciones y a otros campos de conocimiento para desarrollar completamente el potencial de esta arquitectura
Software test and evaluation study phase I and II : survey and analysis
Issued as Final report, Project no. G-36-661 (continues G-36-636; includes A-2568
- …