8 research outputs found

    Reliable massively parallel symbolic computing : fault tolerance for a distributed Haskell

    Get PDF
    As the number of cores in manycore systems grows exponentially, the number of failures is also predicted to grow exponentially. Hence massively parallel computations must be able to tolerate faults. Moreover new approaches to language design and system architecture are needed to address the resilience of massively parallel heterogeneous architectures. Symbolic computation has underpinned key advances in Mathematics and Computer Science, for example in number theory, cryptography, and coding theory. Computer algebra software systems facilitate symbolic mathematics. Developing these at scale has its own distinctive set of challenges, as symbolic algorithms tend to employ complex irregular data and control structures. SymGridParII is a middleware for parallel symbolic computing on massively parallel High Performance Computing platforms. A key element of SymGridParII is a domain specific language (DSL) called Haskell Distributed Parallel Haskell (HdpH). It is explicitly designed for scalable distributed-memory parallelism, and employs work stealing to load balance dynamically generated irregular task sizes. To investigate providing scalable fault tolerant symbolic computation we design, implement and evaluate a reliable version of HdpH, HdpH-RS. Its reliable scheduler detects and handles faults, using task replication as a key recovery strategy. The scheduler supports load balancing with a fault tolerant work stealing protocol. The reliable scheduler is invoked with two fault tolerance primitives for implicit and explicit work placement, and 10 fault tolerant parallel skeletons that encapsulate common parallel programming patterns. The user is oblivious to many failures, they are instead handled by the scheduler. An operational semantics describes small-step reductions on states. A simple abstract machine for scheduling transitions and task evaluation is presented. It defines the semantics of supervised futures, and the transition rules for recovering tasks in the presence of failure. The transition rules are demonstrated with a fault-free execution, and three executions that recover from faults. The fault tolerant work stealing has been abstracted in to a Promela model. The SPIN model checker is used to exhaustively search the intersection of states in this automaton to validate a key resiliency property of the protocol. It asserts that an initially empty supervised future on the supervisor node will eventually be full in the presence of all possible combinations of failures. The performance of HdpH-RS is measured using five benchmarks. Supervised scheduling achieves a speedup of 757 with explicit task placement and 340 with lazy work stealing when executing Summatory Liouville up to 1400 cores of a HPC architecture. Moreover, supervision overheads are consistently low scaling up to 1400 cores. Low recovery overheads are observed in the presence of frequent failure when lazy on-demand work stealing is used. A Chaos Monkey mechanism has been developed for stress testing resiliency with random failure combinations. All unit tests pass in the presence of random failure, terminating with the expected results

    Mathematics in Software Reliability and Quality Assurance

    Get PDF
    This monograph concerns the mathematical aspects of software reliability and quality assurance and consists of 11 technical papers in this emerging area. Included are the latest research results related to formal methods and design, automatic software testing, software verification and validation, coalgebra theory, automata theory, hybrid system and software reliability modeling and assessment

    Resilience-Building Technologies: State of Knowledge -- ReSIST NoE Deliverable D12

    Get PDF
    This document is the first product of work package WP2, "Resilience-building and -scaling technologies", in the programme of jointly executed research (JER) of the ReSIST Network of Excellenc

    One solution for TTEthernet synchronization analysis using genetic algorithm

    Get PDF
    Bezbjednosno-kritični sistemi poput aviona ili automobila zahtijevaju visoko-pouzdanu razmjenu poruka između uređaja u sistemu, što se postiže primjenom determinističkih mreža. Pravilno uspostavljanje međusobne usklađenosti časovnika, kao i konstantno održavanje vremenske usklađenosti, svrstavaju se među najbitnije aspekte determinističkih mreža među kojima su i TTEthernet mreže. Ukoliko časovnici mrežnih uređaja nisu vremenski usklađeni, deterministička razmjena poruka u mreži nije izvodljiva. S obzirom da se informacije o najkritičnijim funkcijama sistema prenose preko determinističke klase poruka, očigledno je da ovakvi servisi neće biti dostupni sve dok se časovnici ne usklade. Teza se bavi procjenom najgoreg slučaja vremena koje je potrebno da protekne da bi se časovnici mrežnih uređaja međusobno uskladili, u slučaju da u mreži postoji jedan uređaj pod otkazom. Procjene su vršene pomoću OMNeT++ simulacija uz primjenu genetskog algoritma. Simulacije pokazuju da se vrijeme neophodno da se uspostavi usklađenost časovnika u TTEthernet mreži značajno povećava pod uticajem uređaja pod otkazom, a samim tim se produžava i vrijeme nedostupnosti najkritičnijih servisa mreže. Simulacije pokazuju da se za mrežu posmatranu u tezi, za izabrane parametre mreže dobija procijenjena vrijednost medijane jednaka 489579μs za najgori slučaj uspostavljanja vremenske usklađenosti u mreži.Safety-critical systems like airplanes and cars demand high-reliable communication between components within the system, which is achieved by using deterministic networks. Proper establishing and maintenance of synchronization of device clocks in the network components represents one of crucial aspects in deterministic networks where belong TTEthernet as well. If device clocks are not synchronized, deterministic communication is not feasible. Keeping in mind that most critical information has been exchanged between the network components using deterministic traffic class, it is obvious that such services will not be available until the clocks in the network are synchronized. The thesis deals with estimation of worst-case startup time for observed TTEthernet network, in case that one device in the network is under failure. The estimation is performed by OMNeT++ simulations and using genetic algorithm. The simulations show that startup time of the network is extended significantly under impact of faulty component. Also, unavailability of most critical services in the network is extended for the same time. For the network simulated in this thesis, estimated median value equals 489579 μs for worst-case startup time

    Cyber Security of Critical Infrastructures

    Get PDF
    Critical infrastructures are vital assets for public safety, economic welfare, and the national security of countries. The vulnerabilities of critical infrastructures have increased with the widespread use of information technologies. As Critical National Infrastructures are becoming more vulnerable to cyber-attacks, their protection becomes a significant issue for organizations as well as nations. The risks to continued operations, from failing to upgrade aging infrastructure or not meeting mandated regulatory regimes, are considered highly significant, given the demonstrable impact of such circumstances. Due to the rapid increase of sophisticated cyber threats targeting critical infrastructures with significant destructive effects, the cybersecurity of critical infrastructures has become an agenda item for academics, practitioners, and policy makers. A holistic view which covers technical, policy, human, and behavioural aspects is essential to handle cyber security of critical infrastructures effectively. Moreover, the ability to attribute crimes to criminals is a vital element of avoiding impunity in cyberspace. In this book, both research and practical aspects of cyber security considerations in critical infrastructures are presented. Aligned with the interdisciplinary nature of cyber security, authors from academia, government, and industry have contributed 13 chapters. The issues that are discussed and analysed include cybersecurity training, maturity assessment frameworks, malware analysis techniques, ransomware attacks, security solutions for industrial control systems, and privacy preservation methods

    Transactions of the First International Conference on Health Information Technology Advancement vol. 1, no. 1

    Get PDF
    Full proceedings of The First International Conference on Health Information Technology Advancement held at Western Michigan University in Kalamazoo, Michigan on October 28, 2011. Conference Co-Chairs: Dr. Bernard Han, Director of the Center for HIT Advancement (CHITA) at Western Michigan University Dr. Sharie Falan, Associate Director of the Center for HIT Advancement (CHITA) at Western Michigan University Transactions Editor: Dr. Huei Lee, Professor in the Department of Computer Information Systems at Eastern Michigan Universit

    A multi-modelS based approach for the modelling and the analysis of usable and resilient partly autonomous interactive systems

    Get PDF
    La croissance prévisionnelle du trafic aérien est telle que les moyens de gestion actuels doivent évoluer et être améliorés et l'automatisation de certains aspects de cette gestion semble être un moyen pour gérer cet accroissement du trafic tout en gardant comme invariant un niveau de sécurité constant. Toutefois, cette augmentation du trafic pourrait entraîner un accroissement de la variabilité de la performance de l'ensemble des moyens de gestion du trafic aérien, en particulier dans le cas de dégradation de cette automatisation. Les systèmes de gestion du trafic aérien sont considérés comme complexes car ils impliquent de nombreuses interactions entre humains et systèmes, et peuvent être profondément influencés par les aspects environnementaux (météorologie, organisation, stress ...) et tombent, de fait, dans la catégorie des Systèmes Sociotechniques (STS) (Emery & Trist, 1960). A cause de leur complexité, les interactions entre les différents éléments (humains, systèmes et organisations) de ces STS peuvent être linéaires et partiellement non linéaires, ce qui rend l'évolution de leur performance difficilement prévisible. Au sein de ces STS, les systèmes interactifs doivent être utilisables, i.e. permettre à leurs utilisateurs d'accomplir leurs tâches de manière efficace et efficiente. Un STS doit aussi être résilient aux perturbations telles que les défaillances logicielles et matérielles, les potentielles dégradations de l'automatisation ou les problèmes d'interaction entre les systèmes et leurs opérateurs. Ces problèmes peuvent affecter plusieurs aspects des systèmes sociotechniques comme les ressources, le temps d'exécution d'une tâche, la capacité à d'adaptation à l'environnement... Afin de pouvoir analyser l'impact de ces perturbations et d'évaluer la variabilité de la performance d'un STS, des techniques et méthodes dédiées sont requises. Elles doivent fournir un support à la modélisation et à l'analyse systématique de l'utilisabilité et de la résilience de systèmes interactifs aux comportements partiellement autonomes. Elles doivent aussi permettre de décrire et de structurer un grand nombre d'informations, ainsi que de traiter la variabilité de chaque élément du STS et la variabilité liée à leurs interrelations. Les techniques et méthodes existantes ne permettent actuellement ni de modéliser un STS dans son ensemble, ni d'en analyser les propriétés d'utilisabilité et de résilience (ou alors se focalisent sur un sous-ensemble du STS perdant, de fait, la vision systémique). Enfin, elles ne fournissent pas les moyens d'analyser la migration de tâches suite à l'introduction d'une nouvelle technologie ou d'analyser la variabilité de la performance en cas de dégradation de fonctions récemment automatisées. Ces arguments sont développés dans la thèse et appuyés par une analyse détaillée des techniques de modélisation existantes et des méthodes qui leurs sont associées. La contribution présentée est basée sur l'identification d'un ensemble d'exigences requises pour pouvoir modéliser et analyser chacun des éléments d'un STS. Certaines de ces exigences ont été remplies grâce à l'utilisation de techniques de modélisation existantes, les autres grâce à l'extension et au raffinement d'autres techniques. Cette thèse propose une approche qui intègre 3 techniques en particulier : FRAM (centrée sur les fonctions organisationnelles), HAMSTERS (centrée les objectifs et activités humaines) et ICO (dédiée à la modélisation du comportement des systèmes interactifs). Cette approche est illustrée par un exemple mettant en œuvre les extensions proposées et l'intégration des modèles. Une étude de cas plus complexe sur la gestion du trafic aérien (changement de route d'un avion en cas de mauvaises conditions météorologiques) est ensuite présentée pour montrer le passage à l'échelle de l'approche. Elle met en avant les bénéfices de l'intégration des modèles pour la prise en compte de la variabilité de la performance des différents éléments d'un STSThe current European Air Traffic Management (ATM) System needs to be improved for coping with the growth in air traffic forecasted for next years. It has been broadly recognised that the future ATM capacity and safety objectives can only be achieved by an intense enhancement of integrated automation support. However, increase of automation might come along with an increase of performance variability of the whole ATM System especially in case of automation degradation. ATM systems are considered complex as they encompass interactions involving humans and machines deeply influenced by environmental aspects (i.e. weather, organizational structure) making them belong to the class of Socio-Technical Systems (STS) (Emery & Trist, 1960). Due to this complexity, the interactions between the STS elements (human, system and organisational) can be partly linear and partly non-linear making its performance evolution complex and hardly predictable. Within such STS, interactive systems have to be usable i.e. enabling users to perform their tasks efficiently and effectively while ensuring a certain level of operator satisfaction. Besides, the STS has to be resilient to adverse events including potential automation degradation issues but also interaction problems between their interactive systems and the operators. These issues may affect several STS aspects such as resources, time in tasks performance, ability to adjust to environment, etc. In order to be able to analyse the impact of these perturbations and to assess the potential performance variability of a STS, dedicated techniques and methods are required. These techniques and methods have to provide support for modelling and analysing in a systematic way usability and resilience of interactive systems featuring partly autonomous behaviours. They also have to provide support for describing and structuring a large amount of information and to be able to address the variability of each of STS elements as well as the variability related to their interrelations. Current techniques, methods and processes do not enable to model a STS as a whole and to analyse both usability and resilience properties. Also, they do not embed all the elements that are required to describe and analyse each part of the STS (such as knowledge of different types which is needed by a user for accomplishing tasks or for interacting with dedicated technologies). Lastly, they do not provide means for analysing task migrations when a new technology is introduced or for analysing performance variability in case of degradation of the newly introduced automation. Such statements are argued in this thesis by a detailed analysis of existing modelling techniques and associated methods highlighting their advantages and limitations. This thesis proposes a multi-models based approach for the modelling and the analysis of partly-autonomous interactive systems for assessing their resilience and usability. The contribution is based on the identification of a set of requirements needed being able to model and analyse each of the STS elements. Some of these requirements were met by existing modelling techniques, others were reachable by extending and refining existing ones. This thesis proposes an approach which integrates 3 modelling techniques: FRAM (focused on organisational functions), HAMSTERS (centred on human goals and activities) and ICO (dedicated to the modelling of interactive systems). The principles of the multi-models approach is illustrated on an example for carefully showing the extensions proposed to the selected modelling techniques and how they integrate together. A more complex case study from the ATM World is then presented to demonstrate the scalability of the approach. This case study, dealing with aircraft route change due to bad weather conditions, highlights the ability of the integration of models to cope with performance variability of the various parts of the ST
    corecore