Search CORE

121 research outputs found

An FPTAS for parallel-machine scheduling under a grade of service provision to minimize makespan

Author: Cheng TCE
Ji M
Publication venue: 'Elsevier BV'
Publication date: 11/12/2014
Field of study

Author name used in this publication: T. C. E. Cheng2008-2009 > Academic research: refereed > Publication in refereed journalAccepted ManuscriptPublishe

PolyU Institutional Repository

Optical Grid Network Dimensioning, Provisioning, and Job Scheduling

Author: Shaikh Ali Asghar
Publication venue
Publication date: 01/04/2014
Field of study

An optical grid network reliably provides high speed communications. It consists of grid resources (e.g., computing and data servers) and huge-data paths that are connected to geographically dispersed resources and users. One of the important issues is dimensioning optical grid networks, i.e., to determine the link bandwidth utilization and amount of server resources, and finding the location of servers. Another issue is the provisioning of the job requests (maximization of services) on the capacitated networks, also referred to as Grade of Service (GoS). Additionally, job scheduling on the servers has also an important impact on the utilization of computing and network resources. Dimensioning optical grid network is based on Anycast Routing and Wavelength Assignment (ACRWA) with the objective of minimizing (min-ACRWA) the resources. The objective of GoS is maximizing the number of job requests (max-ACRWA) under the limited resources. Given that users of such optical grid networks in general do not care about the exact physical locations of the server resources, a degree of freedom arises in choosing for each of their requests the most appropriate server location. We will exploit this anycast routing principle -- i.e., the source of the traffic is given, but the destination can be chosen rather freely. To provide resilience, traffic may be relocated to alternate destinations in case of network/server failures. This thesis investigates dimensioning optical grids networks and task scheduling. In the first part, we present the link capacity dimensioning through scalable exact Integer Linear Programming (ILP) optimization models (min-ACRWA) with survivability. These models take step by step transition from the classical RWA (fixed destination) to anycast routing principle including shared path protection scheme. In the second part, we present scalable optimization models for maximizing the IT services (max-ACRWA) subject to survivability mechanism under limited link transport capacities. We also propose the link capacity formulations based on the distance from the servers and the traffic data set. In the third part, we jointly investigate the link dimensioning and the location of servers in an optical grid, where the anycast routing principle is applied for resiliency under different levels of protection schemes. We propose three different decomposition schemes for joint optimization of link dimensioning and finding the location of servers. In the last part of this research, we propose the exact task scheduling ILP formulations for optical grids (data centers). These formulations can also be used in advance reservation systems to allocate the grid resources. The purpose of this study is to design efficient tools for planning and management of the optical grid networks

Concordia University Research Repository

Just-in-time Hardware generation for abstracted reconfigurable computing

Author: Grocutt Thomas Christopher
Publication venue
Publication date: 01/01/2005
Field of study

This thesis addresses the use of reconfigurable hardware in computing platforms, in order to harness the performance benefits of dedicated hardware whilst maintaining the flexibility associated with software. Although the reconfigurable computing concept is not new, the low level nature of the supporting tools normally used, together with the consequent limited level of abstraction and resultant lack of backwards compatibility, has prevented the widespread adoption of this technology. In addition, bandwidth and architectural limitations, have seriously constrained the potential improvements in performance. A review of existing approaches and tools flows is conducted to highlight the current problems being faced in this field. The objective of the work presented in this thesis is to introduce a radically new approach to reconfigurable computing tool flows. The runtime based tool flow introduces complete abstraction between the application developer and the underlying hardware. This new technique eliminates the ease of use and backwards compatibility issues that have plagued the reconfigurable computing concept, and could pave the way for viable mainstream reconfigurable computing platforms. An easy to use, cycle accurate behavioural modelling system is also presented, which was used extensively during the early exploration of new concepts and architectures. Some performance improvements produced by the new reconfigurable computing tool flow, when applied to both a MIPS based embedded platform, and the Cray XDl, are also presented. These results are then analyzed and the hardware and software factors affecting the performance increases that were obtained are discussed, together with potential techniques that could be used to further increase the performance of the system. Lastly a heterogenous computing concept is proposed, in which, a computer system, containing multiple types of computational resource is envisaged, each having their own strengths and weaknesses (e.g. DSPs, CPUs, FPGAs). A revolutionary new method of fully exploiting the potential of such a system, whilst maintaining scalability, backwards compatibility, and ease of use is also presented

Durham e-Theses

OpenGrey Repository

Constraint Programming-based Job Dispatching for Modern HPC Applications

Author: Galleguillos Miccono Cristian Alejandro <1987>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 15/07/2020
Field of study

A High-Performance Computing job dispatcher is a critical software that assigns the finite computing resources to submitted jobs. This resource assignment over time is known as the on-line job dispatching problem in HPC systems. The fact the problem is on-line means that solutions must be computed in real-time, and their required time cannot exceed some threshold to do not affect the normal system functioning. In addition, a job dispatcher must deal with a lot of uncertainty: submission times, the number of requested resources, and duration of jobs. Heuristic-based techniques have been broadly used in HPC systems, at the cost of achieving (sub-)optimal solutions in a short time. However, the scheduling and resource allocation components are separated, thus generates a decoupled decision that may cause a performance loss. Optimization-based techniques are less used for this problem, although they can significantly improve the performance of HPC systems at the expense of higher computation time. Nowadays, HPC systems are being used for modern applications, such as big data analytics and predictive model building, that employ, in general, many short jobs. However, this information is unknown at dispatching time, and job dispatchers need to process large numbers of them quickly while ensuring high Quality-of-Service (QoS) levels. Constraint Programming (CP) has been shown to be an effective approach to tackle job dispatching problems. However, state-of-the-art CP-based job dispatchers are unable to satisfy the challenges of on-line dispatching, such as generate dispatching decisions in a brief period and integrate current and past information of the housing system. Given the previous reasons, we propose CP-based dispatchers that are more suitable for HPC systems running modern applications, generating on-line dispatching decisions in a proper time and are able to make effective use of job duration predictions to improve QoS levels, especially for workloads dominated by short jobs

AMS Tesi di Dottorato

Applications Development for the Computational Grid

Author: Abramson David
Publication venue: Monash University, InformationTechnology
Publication date: 01/01/2011
Field of study

University of Queensland eSpace

The 1988 Goddard Conference on Space Applications of Artificial Intelligence

Author: Hughes Peter
Rash James
Publication venue
Publication date
Field of study

This publication comprises the papers presented at the 1988 Goddard Conference on Space Applications of Artificial Intelligence held at the NASA/Goddard Space Flight Center, Greenbelt, Maryland on May 24, 1988. The purpose of this annual conference is to provide a forum in which current research and development directed at space applications of artificial intelligence can be presented and discussed. The papers in these proceedings fall into the following areas: mission operations support, planning and scheduling; fault isolation/diagnosis; image processing and machine vision; data management; modeling and simulation; and development tools/methodologies

NASA Technical Reports Server

Automatic Scaling in Cloud Computing

Author: Vondra Tomáš
Publication venue
Publication date: 01/01/2017
Field of study

This dissertation thesis deals with automatic scaling in cloud computing, mainly focusing on the performance of interactive workloads, that is web servers and services, running in an elastic cloud environment. In the rst part of the thesis, the possibility of forecasting the daily curve of workload is evaluated using long-range seasonal techniques of statistical time series analysis. The accuracy is high enough to enable either green computing or lling the unused capacity with batch jobs, hence the need for long-range forecasts. The second part focuses on simulations of automatic scaling, which is necessary for the interactive workload to actually free up space when it is not being utilized at peak capacity. Cloud users are mostly scared of letting a machine control their servers, which is why realistic simulations are needed. We have explored two methods, event-driven simulation and queuetheoretic models. During work on the rst, we have extended the widely-used CloudSim simulation package to be able to dynamically scale the simulation setup at run time and have corrected its engine using knowledge from queueing theory. Our own simulator then relies solely on theoretical models, making it much more precise and much faster than the more general CloudSim. The tools from the two parts together constitute the theoretical foundation which, once implemented in practice, can help leverage cloud technology to actually increase the e ciency of data center hardware. In particular, the main contributions of the dissertation thesis are as follows: 1. New methodology for forecasting time series of web server load and its validation 2. Extension of the often-used simulator CloudSim for interactive load and increasing the accuracy of its output 3. Design and implementation of a fast and accurate simulator of automatic scaling using queueing theoryTato dizerta cn pr ace se zab yv a cloud computingem, konkr etn e se zam e ruje na v ykon interaktivn z at e ze, nap r klad webov ych server u a slu zeb, kter e b e z v elastick em cloudov em prost red . V prvn c asti pr ace je zhodnocena mo znost p redpov d an denn k rivky z at e ze pomoc metod statistick e anal yzy casov ych rad se sez onn m prvkem a dlouh ym dosahem. P resnost je dostate cn e vysok a, aby umo znila bu d set ren energi nebo vypl nov an nevyu zit e kapacity d avkov ymi ulohami, jejich z doba b ehu je hlavn m d uvodem pro pot rebu dlouhodob e p redpov edi. Druh a c ast se zam e ruje na simulace automatick eho sk alov an , kter e je nutn e, aby interaktivn z at e z skute cn e uvolnila prostor, pokud nen vyt e zov ana na plnou kapacitu. U zivatel e cloud u se p rev a zn e boj nechat stroj, aby ovl adal jejich servery, a pr av e proto jsou pot reba realistick e simulace. Prozkoumali jsme dv e metody, konkr etn e simulaci s prom enn ym casov ym krokem r zen ym ud alostmi a modely z teorie hromadn e obsluhy. B ehem pr ace na prvn z t echto metod jsme roz s rili siroce pou z van y simula cn bal k CloudSim o mo znost dynamicky sk alovat simulovan y syst em za b ehu a opravili jsme jeho j adro za pomoci znalost z teorie hromadn e obsluhy. N a s vlastn simul ator se pak spol eh a pouze na teoretick e modely, co z ho cin p resn ej s m a mnohem rychlej s m ne zli obecn ej s CloudSim. N astroje z obou c ast pr ace tvo r dohromady teoretick y z aklad, kter y, pokud bude implementov an v praxi, pom u ze vyu z t technologii cloudu tak, aby se skute cn e zv y sila efektivita vyu zit hardwaru datov ych center. Hlavn p r nosy t eto dizerta cn pr ace jsou n asleduj c : 1. Stanoven metodologie pro p redpov d an casov ych rad z at e ze webov ych server u a jej validace 2. Roz s ren casto citovan eho simul atoru CloudSim o mo znost simulace interaktivn z at e ze a zp resn en jeho v ysledk u 3. N avrh a implementace rychl eho a p resn eho simul atoru automatick eho sk alov an vyu z vaj c ho teorii hromadn e obsluhyKatedra kybernetik

Digital Library of the Czech Technical University in Prague

Energy- efficient and SLA-based management of IaaS Cloud Data Centers

Author: Altino Manuel Silva Sampaio
Publication venue
Publication date: 15/06/2015
Field of study

Repositório Aberto da Universidade do Porto

Adaptive Computing Systems for Aerospace

Author: Panerati Jacopo
Publication venue
Publication date: 01/05/2017
Field of study

RÉSUMÉ En raison de leur complexité croissante, les systèmes informatiques modernes nécessitent de nouvelles méthodologies permettant d’automatiser leur conception et d’améliorer leurs performances. L’espace, en particulier, constitue un environnement très défavorable au maintien de la performance de ces systèmes : sans protection des rayonnements ionisants et des particules, l’électronique basée sur CMOS peut subir des erreurs transitoires, une dégradation des performances et une usure accélérée causant ultimement une défaillance du système. Les approches traditionnellement adoptees pour garantir la fiabilité du système et prolonger sa durée de vie sont basées sur la redondance, généralement établie durant la conception. En revanche, ces solutions sont coûteuses et parfois inefficaces, puisqu'elles augmentent la taille et la complexité du système, l'exposant à des risques plus élevés de surchauffe et d'erreurs. Les conséquences de ces limites sont d'autant plus importantes lorsqu'elles s’appliquent aux systèmes critiques (e.g., contraintes par le temps ou dont l’accès est limité) qui doivent être en mesure de prendre des décisions sans intervention humaine. Sur la base de ces besoins et limites, le développement en aérospatial de systèmes informatiques avec capacités adaptatives peut être considéré comme la solution la plus appropriée pour les dispositifs intégrés à haute performance. L’informatique auto-adaptative offre un potentiel sans égal pour assurer la création d’une génération d’ordinateurs plus intelligents et fiables. Qui plus est, elle répond aux besoins modernes de concevoir et programmer des systèmes informatiques capables de répondre à des objectifs en conflit. En nous inspirant des domaines de l’intelligence artificielle et des systèmes reconfigurables, nous aspirons à développer des systèmes informatiques auto-adaptatifs pour l’aérospatiale qui répondent aux enjeux et besoins actuels. Notre objectif est d’améliorer l’efficacité de ces systèmes, leur tolerance aux pannes et leur capacité de calcul. Afin d’atteindre cet objectif, une analyse expérimentale et comparative des algorithmes les plus populaires pour l’exploration multi-objectifs de l’espace de conception est d’abord effectuée. Les algorithmes ont été recueillis suite à une revue de la plus récente littérature et comprennent des méthodes heuristiques, évolutives et statistiques. L’analyse et la comparaison de ceux-ci permettent de cerner les forces et limites de chacun et d'ainsi définir des lignes directrices favorisant un choix optimal d’algorithmes d’exploration. Pour la création d’un système d’optimisation autonome—permettant le compromis entre plusieurs objectifs—nous exploitons les capacités des modèles graphiques probabilistes. Nous introduisons une méthodologie basée sur les modèles de Markov cachés dynamiques, laquelle permet d’équilibrer la disponibilité et la durée de vie d’un système multiprocesseur. Ceci est obtenu en estimant l'occurrence des erreurs permanentes parmi les erreurs transitoires et en migrant dynamiquement le calcul sur les ressources supplémentaires en cas de défaillance. La nature dynamique du modèle rend celui-ci adaptable à différents profils de mission et taux d’erreur. Les résultats montrent que nous sommes en mesure de prolonger la durée de vie du système tout en conservant une disponibilité proche du cas idéal. En raison des contraintes de temps rigoureuses imposées par les systèmes aérospatiaux, nous étudions aussi l’optimisation de la tolérance aux pannes en présence d'exigences d’exécution en temps réel. Nous proposons une méthodologie pour améliorer la fiabilité du calcul en présence d’erreurs transitoires pour les tâches en temps réel d’un système multiprocesseur homogène avec des capacités de réglage de tension et de fréquence. Dans ce cadre, nous définissons un nouveau compromis probabiliste entre la consommation d’énergie et la tolérance aux erreurs. Comme nous reconnaissons que la résilience est une propriété d’intérêt omniprésente (par exemple, pour la conception et l’analyse de systems complexes génériques), nous adaptons une définition formelle de celle-ci à un cadre probabiliste dérivé à nouveau de modèles de Markov cachés. Ce cadre nous permet de modéliser de façon réaliste l’évolution stochastique et l’observabilité partielle des phénomènes du monde réel. Nous proposons un algorithme permettant le calcul exact efficace de l’étape essentielle d’inférence laquelle est requise pour vérifier des propriétés génériques. Pour démontrer la flexibilité de cette approche, nous la validons, entre autres, dans le contexte d’un système informatisé reconfigurable pour l’aérospatiale. Enfin, nous étendons la portée de nos recherches vers la robotique et les systèmes multi-agents, deux sujets dont la popularité est croissante en exploration spatiale. Nous abordons le problème de l’évaluation et de l’entretien de la connectivité dans le context distribué et auto-adaptatif de la robotique en essaim. Nous examinons les limites des solutions existantes et proposons une nouvelle méthodologie pour créer des géométries complexes connectées gérant plusieurs tâches simultanément. Des contributions additionnelles dans plusieurs domaines sont résumés dans les annexes, nommément : (i) la conception de CubeSats, (ii) la modélisation des rayonnements spatiaux pour l’injection d’erreur dans FPGA et (iii) l’analyse temporelle probabiliste pour les systèmes en temps réel. À notre avis, cette recherche constitue un tremplin utile vers la création d’une nouvelle génération de systèmes informatiques qui exécutent leurs tâches d’une façon autonome et fiable, favorisant une exploration spatiale plus simple et moins coûteuse.----------ABSTRACT Today's computer systems are growing more and more complex at a pace that requires the development of novel and more effective methodologies to automate their design. Space, in particular, represents a challenging environment: without protection from ionizing and particle radiation, CMOS-based electronics are subject to transients faults, performance degradation, accelerated wear, and, ultimately, system failure. Traditional approaches adopted to guarantee reliability and extended lifetime are based on redundancy that is established at design-time. These solutions are expensive and sometimes inefficient, as they increase the complexity and size of a system, exposing it to higher risks of overheating and incurring in radiation-induced errors. Moreover, critical systems---e.g., time-constrained ones and those where access is limited---must be able to cope with pivotal situations without relying on human intervention. Hence, the emerging interest in computer systems with adaptive capabilities as the most suitable solution for novel high-performance embedded devices for aerospace. Self-adaptive computing carries unmatched potential and great promises for the creation of a new generation of smart, more reliable computers, and it addresses the challenge of designing and programming modern and future computer systems that must meet conflicting goals. Drawing from the fields of artificial intelligence and reconfigurable systems, we aim at developing self-adaptive computer systems for aerospace. Our goal is to improve their efficiency, fault-tolerance, and computational capabilities. The first step in this research is the experimental analysis of the most popular multi-objective design-space exploration algorithms for high-level design. These algorithms were collected from the recent literature and include heuristic, evolutionary, and statistical methods. Their comparison provides insights that we use to define guidelines for the choice of the most appropriate optimization algorithms, given the features of the design space. For the creation of a self-managing optimization framework---enabling the adaptive trade-off of multiple objectives---we leverage the tools of probabilistic graphical models. We introduce a mechanism based on dynamic hidden Markov models that balances the availability and lifetime of multiprocessor systems. This is achieved by estimating the occurrence of permanent faults amid transient faults, and by dynamically migrating the computation on excess resources, when failure occurs. The dynamic nature of the model makes it adjustable to different mission profiles and fault rates. The results show that we are able to lead systems to extended lifetimes, while keeping their availability close to ideal. On account of the stringent timing constraints imposed by aerospace systems, we then investigate the optimization of fault-tolerance under real-time requirements. We propose a methodology to improve the reliability of computation in the presence of transient errors when considering the mapping of real-time tasks on a homogeneous multiprocessor system with voltage and frequency scaling capabilities. In this framework, we take advantage of probability theory to define a novel trade-off between power consumption and fault-tolerance. As we recognize that resilience is a pervasive property of interest (e.g., for the design and analysis of generic complex systems), we adapt a formal definition of it to one more probabilistic framework derived from hidden Markov models. This allows us to realistically model the stochastic evolution and partial observability of complex real-world environments. Within this framework, we propose an efficient algorithm for the exact computation of the essential inference step required to construct generic property checking. To demonstrate the flexibility of this approach, we validate it in the context, among others, of a self-aware, reconfigurable computing system for aerospace. Finally, we move the scope of our research towards robotics and multi-agent systems: a topic of thriving popularity for space exploration. We tackle the problem of connectivity assessment and maintenance in the distributed and self-adaptive context of swarm robotics. We review the limitations of existing solutions and propose a novel methodology to create connected complex geometries for multiple task coverage. Additional contributions in the areas of (i) CubeSat design, (ii) the modelling of space radiation for FPGA fault-injection, and (iii) probabilistic timing analysis for real-time systems are summarized in the appendices. In the author's opinion, this research provides a number of useful stepping stones for the creation of a new generation of computing systems that autonomously---and reliably---perform their tasks for longer periods of time, fostering simpler and cheaper space exploration

PolyPublie