16 research outputs found

    Distributed aop middleware for large-scale scenarios

    Get PDF
    En aquesta tesi doctoral presentem una proposta de middleware distribuït pel desenvolupament d'aplicacions de gran escala. La nostra motivació principal és permetre que les responsabilitats distribuïdes d'aquestes aplicacions, com per exemple la replicació, puguin integrar-se de forma transparent i independent. El nostre enfoc es basa en la implementació d'aquestes responsabilitats mitjançant el paradigma d'aspectes distribuïts i es beneficia dels substrats de les xarxes peer-to-peer (P2P) i de la programació orientada a aspectes (AOP) per realitzar-ho de forma descentralitzada, desacoblada, eficient i transparent. La nostra arquitectura middleware es divideix en dues capes: un model de composició i una plataforma escalable de desplegament d'aspectes distribuïts. Per últim, es demostra la viabilitat i aplicabilitat del nostre model mitjançant la implementació i experimentació de prototipus en xarxes de gran escala reals.In this PhD dissertation we present a distributed middleware proposal for large-scale application development. Our main aim is to separate the distributed concerns of these applications, like replication, which can be integrated independently and transparently. Our approach is based on the implementation of these concerns using the paradigm of distributed aspects. In addition, our proposal benefits from the peer-to-peer (P2P) networks and aspect-oriented programming (AOP) substrates to provide these concerns in a decentralized, decoupled, efficient, and transparent way. Our middleware architecture is divided into two layers: a composition model and a scalable deployment platform for distributed aspects. Finally, we demonstrate the viability and applicability of our model via implementation and experimentation of prototypes in real large-scale networks

    Proactive Interference-aware Resource Management in Deep Learning Training Cluster

    Get PDF
    Deep Learning (DL) applications are growing at an unprecedented rate across many domains, ranging from weather prediction, map navigation to medical imaging. However, training these deep learning models in large-scale compute clusters face substantial challenges in terms of low cluster resource utilisation and high job waiting time. State-of-the-art DL cluster resource managers are needed to increase GPU utilisation and maximise throughput. While co-locating DL jobs within the same GPU has been shown to be an effective means towards achieving this, co-location subsequently incurs performance interference resulting in job slowdown. We argue that effective workload placement can minimise DL cluster interference at scheduling runtime by understanding the DL workload characteristics and their respective hardware resource consumption. However, existing DL cluster resource managers reserve isolated GPUs to perform online profiling to directly measure GPU utilisation and kernel patterns for each unique submitted job. Such a feedback-based reactive approach results in additional waiting times as well as reduced cluster resource efficiency and availability. In this thesis, we propose Horus: an interference-aware and prediction-based DL cluster resource manager. Through empirically studying a series of microbenchmarks and DL workload co-location combinations across heterogeneous GPU hardware, we demonstrate the negative effects of performance interference when colocating DL workload, and identify GPU utilisation as a general proxy metric to determine good placement decisions. From these findings, we design Horus, which in contrast to existing approaches, proactively predicts GPU utilisation of heterogeneous DL workload extrapolated from the DL model computation graph features when performing placement decisions, removing the need for online profiling and isolated reserved GPUs. By conducting empirical experimentation within a medium-scale DL cluster as well as a large-scale trace-driven simulation of a production system, we demonstrate Horus improves cluster GPU utilisation, reduces cluster makespan and waiting time, and can scale to operate within hundreds of machines

    RICIS Software Engineering 90 Symposium: Aerospace Applications and Research Directions Proceedings Appendices

    Get PDF
    Papers presented at RICIS Software Engineering Symposium are compiled. The following subject areas are covered: flight critical software; management of real-time Ada; software reuse; megaprogramming software; Ada net; POSIX and Ada integration in the Space Station Freedom Program; and assessment of formal methods for trustworthy computer systems

    Software Technologies - 8th International Joint Conference, ICSOFT 2013 : Revised Selected Papers

    Get PDF

    Combining SOA and BPM Technologies for Cross-System Process Automation

    Get PDF
    This paper summarizes the results of an industry case study that introduced a cross-system business process automation solution based on a combination of SOA and BPM standard technologies (i.e., BPMN, BPEL, WSDL). Besides discussing major weaknesses of the existing, custom-built, solution and comparing them against experiences with the developed prototype, the paper presents a course of action for transforming the current solution into the proposed solution. This includes a general approach, consisting of four distinct steps, as well as specific action items that are to be performed for every step. The discussion also covers language and tool support and challenges arising from the transformation

    Doctor of Philosophy

    Get PDF
    dissertationWe propose a collective approach for harnessing the idle resources (cpu, storage, and bandwidth) of nodes (e.g., home desktops) distributed across the Internet. Instead of a purely peer-to-peer (P2P) approach, we organize participating nodes to act collectively using collective managers (CMs). Participating nodes provide idle resources to CMs, which unify these resources to run meaningful distributed services for external clients. We do not assume altruistic users or employ a barter-based incentive model; instead, participating nodes provide resources to CMs for long durations and are compensated in proportion to their contribution. In this dissertation we discuss the challenges faced by collective systems, present a design that addresses these challenges, and study the effect of selfish nodes. We believe that the collective service model is a useful alternative to the dominant pure P2P and centralized work queue models. It provides more effective utilization of idle resources, has a more meaningful economic model, and is better suited for building legal and commercial distributed services. We demonstrate the value of our work by building two distributed services using the collective approach. These services are a collective content distribution service and a collective data backup service

    Software test and evaluation study phase I and II : survey and analysis

    Get PDF
    Issued as Final report, Project no. G-36-661 (continues G-36-636; includes A-2568
    corecore