121 research outputs found

    An FPTAS for parallel-machine scheduling under a grade of service provision to minimize makespan

    Get PDF
    Author name used in this publication: T. C. E. Cheng2008-2009 > Academic research: refereed > Publication in refereed journalAccepted ManuscriptPublishe

    Optical Grid Network Dimensioning, Provisioning, and Job Scheduling

    Get PDF
    An optical grid network reliably provides high speed communications. It consists of grid resources (e.g., computing and data servers) and huge-data paths that are connected to geographically dispersed resources and users. One of the important issues is dimensioning optical grid networks, i.e., to determine the link bandwidth utilization and amount of server resources, and finding the location of servers. Another issue is the provisioning of the job requests (maximization of services) on the capacitated networks, also referred to as Grade of Service (GoS). Additionally, job scheduling on the servers has also an important impact on the utilization of computing and network resources. Dimensioning optical grid network is based on Anycast Routing and Wavelength Assignment (ACRWA) with the objective of minimizing (min-ACRWA) the resources. The objective of GoS is maximizing the number of job requests (max-ACRWA) under the limited resources. Given that users of such optical grid networks in general do not care about the exact physical locations of the server resources, a degree of freedom arises in choosing for each of their requests the most appropriate server location. We will exploit this anycast routing principle -- i.e., the source of the traffic is given, but the destination can be chosen rather freely. To provide resilience, traffic may be relocated to alternate destinations in case of network/server failures. This thesis investigates dimensioning optical grids networks and task scheduling. In the first part, we present the link capacity dimensioning through scalable exact Integer Linear Programming (ILP) optimization models (min-ACRWA) with survivability. These models take step by step transition from the classical RWA (fixed destination) to anycast routing principle including shared path protection scheme. In the second part, we present scalable optimization models for maximizing the IT services (max-ACRWA) subject to survivability mechanism under limited link transport capacities. We also propose the link capacity formulations based on the distance from the servers and the traffic data set. In the third part, we jointly investigate the link dimensioning and the location of servers in an optical grid, where the anycast routing principle is applied for resiliency under different levels of protection schemes. We propose three different decomposition schemes for joint optimization of link dimensioning and finding the location of servers. In the last part of this research, we propose the exact task scheduling ILP formulations for optical grids (data centers). These formulations can also be used in advance reservation systems to allocate the grid resources. The purpose of this study is to design efficient tools for planning and management of the optical grid networks

    Just-in-time Hardware generation for abstracted reconfigurable computing

    Get PDF
    This thesis addresses the use of reconfigurable hardware in computing platforms, in order to harness the performance benefits of dedicated hardware whilst maintaining the flexibility associated with software. Although the reconfigurable computing concept is not new, the low level nature of the supporting tools normally used, together with the consequent limited level of abstraction and resultant lack of backwards compatibility, has prevented the widespread adoption of this technology. In addition, bandwidth and architectural limitations, have seriously constrained the potential improvements in performance. A review of existing approaches and tools flows is conducted to highlight the current problems being faced in this field. The objective of the work presented in this thesis is to introduce a radically new approach to reconfigurable computing tool flows. The runtime based tool flow introduces complete abstraction between the application developer and the underlying hardware. This new technique eliminates the ease of use and backwards compatibility issues that have plagued the reconfigurable computing concept, and could pave the way for viable mainstream reconfigurable computing platforms. An easy to use, cycle accurate behavioural modelling system is also presented, which was used extensively during the early exploration of new concepts and architectures. Some performance improvements produced by the new reconfigurable computing tool flow, when applied to both a MIPS based embedded platform, and the Cray XDl, are also presented. These results are then analyzed and the hardware and software factors affecting the performance increases that were obtained are discussed, together with potential techniques that could be used to further increase the performance of the system. Lastly a heterogenous computing concept is proposed, in which, a computer system, containing multiple types of computational resource is envisaged, each having their own strengths and weaknesses (e.g. DSPs, CPUs, FPGAs). A revolutionary new method of fully exploiting the potential of such a system, whilst maintaining scalability, backwards compatibility, and ease of use is also presented

    Constraint Programming-based Job Dispatching for Modern HPC Applications

    Get PDF
    A High-Performance Computing job dispatcher is a critical software that assigns the finite computing resources to submitted jobs. This resource assignment over time is known as the on-line job dispatching problem in HPC systems. The fact the problem is on-line means that solutions must be computed in real-time, and their required time cannot exceed some threshold to do not affect the normal system functioning. In addition, a job dispatcher must deal with a lot of uncertainty: submission times, the number of requested resources, and duration of jobs. Heuristic-based techniques have been broadly used in HPC systems, at the cost of achieving (sub-)optimal solutions in a short time. However, the scheduling and resource allocation components are separated, thus generates a decoupled decision that may cause a performance loss. Optimization-based techniques are less used for this problem, although they can significantly improve the performance of HPC systems at the expense of higher computation time. Nowadays, HPC systems are being used for modern applications, such as big data analytics and predictive model building, that employ, in general, many short jobs. However, this information is unknown at dispatching time, and job dispatchers need to process large numbers of them quickly while ensuring high Quality-of-Service (QoS) levels. Constraint Programming (CP) has been shown to be an effective approach to tackle job dispatching problems. However, state-of-the-art CP-based job dispatchers are unable to satisfy the challenges of on-line dispatching, such as generate dispatching decisions in a brief period and integrate current and past information of the housing system. Given the previous reasons, we propose CP-based dispatchers that are more suitable for HPC systems running modern applications, generating on-line dispatching decisions in a proper time and are able to make effective use of job duration predictions to improve QoS levels, especially for workloads dominated by short jobs

    Applications Development for the Computational Grid

    Get PDF

    The 1988 Goddard Conference on Space Applications of Artificial Intelligence

    Get PDF
    This publication comprises the papers presented at the 1988 Goddard Conference on Space Applications of Artificial Intelligence held at the NASA/Goddard Space Flight Center, Greenbelt, Maryland on May 24, 1988. The purpose of this annual conference is to provide a forum in which current research and development directed at space applications of artificial intelligence can be presented and discussed. The papers in these proceedings fall into the following areas: mission operations support, planning and scheduling; fault isolation/diagnosis; image processing and machine vision; data management; modeling and simulation; and development tools/methodologies

    Automatic Scaling in Cloud Computing

    Get PDF
    This dissertation thesis deals with automatic scaling in cloud computing, mainly focusing on the performance of interactive workloads, that is web servers and services, running in an elastic cloud environment. In the rst part of the thesis, the possibility of forecasting the daily curve of workload is evaluated using long-range seasonal techniques of statistical time series analysis. The accuracy is high enough to enable either green computing or lling the unused capacity with batch jobs, hence the need for long-range forecasts. The second part focuses on simulations of automatic scaling, which is necessary for the interactive workload to actually free up space when it is not being utilized at peak capacity. Cloud users are mostly scared of letting a machine control their servers, which is why realistic simulations are needed. We have explored two methods, event-driven simulation and queuetheoretic models. During work on the rst, we have extended the widely-used CloudSim simulation package to be able to dynamically scale the simulation setup at run time and have corrected its engine using knowledge from queueing theory. Our own simulator then relies solely on theoretical models, making it much more precise and much faster than the more general CloudSim. The tools from the two parts together constitute the theoretical foundation which, once implemented in practice, can help leverage cloud technology to actually increase the e ciency of data center hardware. In particular, the main contributions of the dissertation thesis are as follows: 1. New methodology for forecasting time series of web server load and its validation 2. Extension of the often-used simulator CloudSim for interactive load and increasing the accuracy of its output 3. Design and implementation of a fast and accurate simulator of automatic scaling using queueing theoryTato dizerta cn pr ace se zab yv a cloud computingem, konkr etn e se zam e ruje na v ykon interaktivn z at e ze, nap r klad webov ych server u a slu zeb, kter e b e z v elastick em cloudov em prost red . V prvn c asti pr ace je zhodnocena mo znost p redpov d an denn k rivky z at e ze pomoc metod statistick e anal yzy casov ych rad se sez onn m prvkem a dlouh ym dosahem. P resnost je dostate cn e vysok a, aby umo znila bu d set ren energi nebo vypl nov an nevyu zit e kapacity d avkov ymi ulohami, jejich z doba b ehu je hlavn m d uvodem pro pot rebu dlouhodob e p redpov edi. Druh a c ast se zam e ruje na simulace automatick eho sk alov an , kter e je nutn e, aby interaktivn z at e z skute cn e uvolnila prostor, pokud nen vyt e zov ana na plnou kapacitu. U zivatel e cloud u se p rev a zn e boj nechat stroj, aby ovl adal jejich servery, a pr av e proto jsou pot reba realistick e simulace. Prozkoumali jsme dv e metody, konkr etn e simulaci s prom enn ym casov ym krokem r zen ym ud alostmi a modely z teorie hromadn e obsluhy. B ehem pr ace na prvn z t echto metod jsme roz s rili siroce pou z van y simula cn bal k CloudSim o mo znost dynamicky sk alovat simulovan y syst em za b ehu a opravili jsme jeho j adro za pomoci znalost z teorie hromadn e obsluhy. N a s vlastn simul ator se pak spol eh a pouze na teoretick e modely, co z ho cin p resn ej s m a mnohem rychlej s m ne zli obecn ej s CloudSim. N astroje z obou c ast pr ace tvo r dohromady teoretick y z aklad, kter y, pokud bude implementov an v praxi, pom u ze vyu z t technologii cloudu tak, aby se skute cn e zv y sila efektivita vyu zit hardwaru datov ych center. Hlavn p r nosy t eto dizerta cn pr ace jsou n asleduj c : 1. Stanoven metodologie pro p redpov d an casov ych rad z at e ze webov ych server u a jej validace 2. Roz s ren casto citovan eho simul atoru CloudSim o mo znost simulace interaktivn z at e ze a zp resn en jeho v ysledk u 3. N avrh a implementace rychl eho a p resn eho simul atoru automatick eho sk alov an vyu z vaj c ho teorii hromadn e obsluhyKatedra kybernetik

    Adaptive Computing Systems for Aerospace

    Get PDF
    RÉSUMÉ En raison de leur complexitĂ© croissante, les systĂšmes informatiques modernes nĂ©cessitent de nouvelles mĂ©thodologies permettant d’automatiser leur conception et d’amĂ©liorer leurs performances. L’espace, en particulier, constitue un environnement trĂšs dĂ©favorable au maintien de la performance de ces systĂšmes : sans protection des rayonnements ionisants et des particules, l’électronique basĂ©e sur CMOS peut subir des erreurs transitoires, une dĂ©gradation des performances et une usure accĂ©lĂ©rĂ©e causant ultimement une dĂ©faillance du systĂšme. Les approches traditionnellement adoptees pour garantir la fiabilitĂ© du systĂšme et prolonger sa durĂ©e de vie sont basĂ©es sur la redondance, gĂ©nĂ©ralement Ă©tablie durant la conception. En revanche, ces solutions sont coĂ»teuses et parfois inefficaces, puisqu'elles augmentent la taille et la complexitĂ© du systĂšme, l'exposant Ă  des risques plus Ă©levĂ©s de surchauffe et d'erreurs. Les consĂ©quences de ces limites sont d'autant plus importantes lorsqu'elles s’appliquent aux systĂšmes critiques (e.g., contraintes par le temps ou dont l’accĂšs est limitĂ©) qui doivent ĂȘtre en mesure de prendre des dĂ©cisions sans intervention humaine. Sur la base de ces besoins et limites, le dĂ©veloppement en aĂ©rospatial de systĂšmes informatiques avec capacitĂ©s adaptatives peut ĂȘtre considĂ©rĂ© comme la solution la plus appropriĂ©e pour les dispositifs intĂ©grĂ©s Ă  haute performance. L’informatique auto-adaptative offre un potentiel sans Ă©gal pour assurer la crĂ©ation d’une gĂ©nĂ©ration d’ordinateurs plus intelligents et fiables. Qui plus est, elle rĂ©pond aux besoins modernes de concevoir et programmer des systĂšmes informatiques capables de rĂ©pondre Ă  des objectifs en conflit. En nous inspirant des domaines de l’intelligence artificielle et des systĂšmes reconfigurables, nous aspirons Ă  dĂ©velopper des systĂšmes informatiques auto-adaptatifs pour l’aĂ©rospatiale qui rĂ©pondent aux enjeux et besoins actuels. Notre objectif est d’amĂ©liorer l’efficacitĂ© de ces systĂšmes, leur tolerance aux pannes et leur capacitĂ© de calcul. Afin d’atteindre cet objectif, une analyse expĂ©rimentale et comparative des algorithmes les plus populaires pour l’exploration multi-objectifs de l’espace de conception est d’abord effectuĂ©e. Les algorithmes ont Ă©tĂ© recueillis suite Ă  une revue de la plus rĂ©cente littĂ©rature et comprennent des mĂ©thodes heuristiques, Ă©volutives et statistiques. L’analyse et la comparaison de ceux-ci permettent de cerner les forces et limites de chacun et d'ainsi dĂ©finir des lignes directrices favorisant un choix optimal d’algorithmes d’exploration. Pour la crĂ©ation d’un systĂšme d’optimisation autonome—permettant le compromis entre plusieurs objectifs—nous exploitons les capacitĂ©s des modĂšles graphiques probabilistes. Nous introduisons une mĂ©thodologie basĂ©e sur les modĂšles de Markov cachĂ©s dynamiques, laquelle permet d’équilibrer la disponibilitĂ© et la durĂ©e de vie d’un systĂšme multiprocesseur. Ceci est obtenu en estimant l'occurrence des erreurs permanentes parmi les erreurs transitoires et en migrant dynamiquement le calcul sur les ressources supplĂ©mentaires en cas de dĂ©faillance. La nature dynamique du modĂšle rend celui-ci adaptable Ă  diffĂ©rents profils de mission et taux d’erreur. Les rĂ©sultats montrent que nous sommes en mesure de prolonger la durĂ©e de vie du systĂšme tout en conservant une disponibilitĂ© proche du cas idĂ©al. En raison des contraintes de temps rigoureuses imposĂ©es par les systĂšmes aĂ©rospatiaux, nous Ă©tudions aussi l’optimisation de la tolĂ©rance aux pannes en prĂ©sence d'exigences d’exĂ©cution en temps rĂ©el. Nous proposons une mĂ©thodologie pour amĂ©liorer la fiabilitĂ© du calcul en prĂ©sence d’erreurs transitoires pour les tĂąches en temps rĂ©el d’un systĂšme multiprocesseur homogĂšne avec des capacitĂ©s de rĂ©glage de tension et de frĂ©quence. Dans ce cadre, nous dĂ©finissons un nouveau compromis probabiliste entre la consommation d’énergie et la tolĂ©rance aux erreurs. Comme nous reconnaissons que la rĂ©silience est une propriĂ©tĂ© d’intĂ©rĂȘt omniprĂ©sente (par exemple, pour la conception et l’analyse de systems complexes gĂ©nĂ©riques), nous adaptons une dĂ©finition formelle de celle-ci Ă  un cadre probabiliste dĂ©rivĂ© Ă  nouveau de modĂšles de Markov cachĂ©s. Ce cadre nous permet de modĂ©liser de façon rĂ©aliste l’évolution stochastique et l’observabilitĂ© partielle des phĂ©nomĂšnes du monde rĂ©el. Nous proposons un algorithme permettant le calcul exact efficace de l’étape essentielle d’infĂ©rence laquelle est requise pour vĂ©rifier des propriĂ©tĂ©s gĂ©nĂ©riques. Pour dĂ©montrer la flexibilitĂ© de cette approche, nous la validons, entre autres, dans le contexte d’un systĂšme informatisĂ© reconfigurable pour l’aĂ©rospatiale. Enfin, nous Ă©tendons la portĂ©e de nos recherches vers la robotique et les systĂšmes multi-agents, deux sujets dont la popularitĂ© est croissante en exploration spatiale. Nous abordons le problĂšme de l’évaluation et de l’entretien de la connectivitĂ© dans le context distribuĂ© et auto-adaptatif de la robotique en essaim. Nous examinons les limites des solutions existantes et proposons une nouvelle mĂ©thodologie pour crĂ©er des gĂ©omĂ©tries complexes connectĂ©es gĂ©rant plusieurs tĂąches simultanĂ©ment. Des contributions additionnelles dans plusieurs domaines sont rĂ©sumĂ©s dans les annexes, nommĂ©ment : (i) la conception de CubeSats, (ii) la modĂ©lisation des rayonnements spatiaux pour l’injection d’erreur dans FPGA et (iii) l’analyse temporelle probabiliste pour les systĂšmes en temps rĂ©el. À notre avis, cette recherche constitue un tremplin utile vers la crĂ©ation d’une nouvelle gĂ©nĂ©ration de systĂšmes informatiques qui exĂ©cutent leurs tĂąches d’une façon autonome et fiable, favorisant une exploration spatiale plus simple et moins coĂ»teuse.----------ABSTRACT Today's computer systems are growing more and more complex at a pace that requires the development of novel and more effective methodologies to automate their design. Space, in particular, represents a challenging environment: without protection from ionizing and particle radiation, CMOS-based electronics are subject to transients faults, performance degradation, accelerated wear, and, ultimately, system failure. Traditional approaches adopted to guarantee reliability and extended lifetime are based on redundancy that is established at design-time. These solutions are expensive and sometimes inefficient, as they increase the complexity and size of a system, exposing it to higher risks of overheating and incurring in radiation-induced errors. Moreover, critical systems---e.g., time-constrained ones and those where access is limited---must be able to cope with pivotal situations without relying on human intervention. Hence, the emerging interest in computer systems with adaptive capabilities as the most suitable solution for novel high-performance embedded devices for aerospace. Self-adaptive computing carries unmatched potential and great promises for the creation of a new generation of smart, more reliable computers, and it addresses the challenge of designing and programming modern and future computer systems that must meet conflicting goals. Drawing from the fields of artificial intelligence and reconfigurable systems, we aim at developing self-adaptive computer systems for aerospace. Our goal is to improve their efficiency, fault-tolerance, and computational capabilities. The first step in this research is the experimental analysis of the most popular multi-objective design-space exploration algorithms for high-level design. These algorithms were collected from the recent literature and include heuristic, evolutionary, and statistical methods. Their comparison provides insights that we use to define guidelines for the choice of the most appropriate optimization algorithms, given the features of the design space. For the creation of a self-managing optimization framework---enabling the adaptive trade-off of multiple objectives---we leverage the tools of probabilistic graphical models. We introduce a mechanism based on dynamic hidden Markov models that balances the availability and lifetime of multiprocessor systems. This is achieved by estimating the occurrence of permanent faults amid transient faults, and by dynamically migrating the computation on excess resources, when failure occurs. The dynamic nature of the model makes it adjustable to different mission profiles and fault rates. The results show that we are able to lead systems to extended lifetimes, while keeping their availability close to ideal. On account of the stringent timing constraints imposed by aerospace systems, we then investigate the optimization of fault-tolerance under real-time requirements. We propose a methodology to improve the reliability of computation in the presence of transient errors when considering the mapping of real-time tasks on a homogeneous multiprocessor system with voltage and frequency scaling capabilities. In this framework, we take advantage of probability theory to define a novel trade-off between power consumption and fault-tolerance. As we recognize that resilience is a pervasive property of interest (e.g., for the design and analysis of generic complex systems), we adapt a formal definition of it to one more probabilistic framework derived from hidden Markov models. This allows us to realistically model the stochastic evolution and partial observability of complex real-world environments. Within this framework, we propose an efficient algorithm for the exact computation of the essential inference step required to construct generic property checking. To demonstrate the flexibility of this approach, we validate it in the context, among others, of a self-aware, reconfigurable computing system for aerospace. Finally, we move the scope of our research towards robotics and multi-agent systems: a topic of thriving popularity for space exploration. We tackle the problem of connectivity assessment and maintenance in the distributed and self-adaptive context of swarm robotics. We review the limitations of existing solutions and propose a novel methodology to create connected complex geometries for multiple task coverage. Additional contributions in the areas of (i) CubeSat design, (ii) the modelling of space radiation for FPGA fault-injection, and (iii) probabilistic timing analysis for real-time systems are summarized in the appendices. In the author's opinion, this research provides a number of useful stepping stones for the creation of a new generation of computing systems that autonomously---and reliably---perform their tasks for longer periods of time, fostering simpler and cheaper space exploration
    • 

    corecore