52 research outputs found
Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors
International audienceIn this paper we present an efficient algorithm for compile-time scheduling and clustering of parallel programs onto parallel processing systems with distributed memory, which is called The Dynamic Critical Path Scheduling DCPS. The DCPS is superior to several other algorithms from the literature in terms of computational complexity, processors consumption and solution quality. DCPS has a time complexity of O (e + v\log v), as opposed to DSC algorithm O((e + v)\log v) which is the best known algorithm. Experimental results demonstrate the superiority of DCPS over the DSC algorithm
A near optimal algorithm for lifetime optimization in wireless sensor networks
International audienceA problem that has received a lot of interest in wireless sensor networks (WSN) is lifetime optimization. Indeed, in WSN each sensor node is battery powered and it is not convenient to recharge or replace the batteries in many cases, especially in remote and hostile environments. In this paper, we introduce an efficient energy-aware algorithm to enhance the lifetime of WSN by i) organizing/clustering the sensor nodes into disjoint cover sets where each cover set is capable of monitoring all the targets of the region of interest and ii) scheduling these cover sets successively/periodically. This study differs from previous works for the following reasons: i) it achieves near optimal solutions compared to the optimal ones obtained by the exact method and ii) unlike existing algorithms that construct gradually cover sets one after the other, our algorithm builds the different sets in parallel. Indeed, at each step of the clustering process, the algorithm attempts to add to each cover set a sensor capable of monitoring the most critical target (a critical target is defined to be the one covered by the smallest set of sensors). The choice of a sensor to be placed/clustered in each cover set is based on solving a linear assignment problem. The proposed algorithm provides a lower bound Kmin of the optimal number of disjoint cover sets Kopt . Intuitively, the upper bound Kmax of the optimal value is given by the size of the smallest set of sensors covering a target. We deduce Kopt by performing a binary search procedure. At each step of the binary search process, we check if there exists a partition of the sensors in K cover sets by solving an integer programming problem. Simulation results show the efficiency of our algorithm
Iso-Level CAFT: How to Tackle the Combination of Communication Overhead Reduction and Fault Tolerance Scheduling
To schedule precedence task graphs in a more realistic framework, we introduce an efficient fault tolerant scheduling algorithm that is both contention-aware and capable of supporting arbitrary fail-silent (fail-stop) processor failures. The design of the proposed algorithm which we call Iso-Level CAFT, is motivated by (i) the search for a better load-balance and (ii) the generation of fewer communications. These goals are achieved by scheduling a chunk of ready tasks simultaneously, which enables for a global view of the potential communications. Our goal is to minimize the total execution time, or latency, while tolerating an arbitrary number of processor failures. Our approach is based on an active replication scheme to mask failures, so that there is no need for detecting and handling such failures. Major achievements include a low complexity, and a drastic reduction of the number of additional communications induced by the replication mechanism. The experimental results fully demonstrate the usefulness of Iso-Level~CAFT
Fault Tolerant Scheduling of Precedence Task Graphs on Heterogeneous Platforms
Fault tolerance and latency are important requirements in several applications which are time critical in nature: such applications require guaranties in terms of latency, even when processors are subject to failures. In this paper, we propose a fault tolerant scheduling heuristic for mapping precedence task graphs on heterogeneous systems. Our approach is based on an active replication scheme, capable of supporting arbitrary fail-silent (fail-stop) processor failures, hence valid results will be provided even if processors fail. We focus on a bi-criteria approach, where we aim at minimizing the latency given a fixed number of failures supported in the system, or the other way round. Major achievements include a low complexity, and a drastic reduction of the number of additional communications induced by the replication mechanism. Experimental results demonstrate that our heuristics, despite their lower complexity, outperform their direct competitor, the FTBAR scheduling algorithm[8].La tolérance aux pannes et la latence sont deux critères importants pour plusieurs applications qui sont critiques par nature. Ce type d’applications exige des garanties en terme de temps de latence, même lorsque les processeurs sont sujets aux pannes. Dans ce rapport, nous proposons une heuristique tolérante aux pannes pour l’ordonnancement de graphes de tâches sur des systèmes hétérogènes. Notre approche est basée sur un mécanisme de réplication active, capable de supporter " pannes arbitraires de type silence sur défaillance. En d’autres termes, des résultats valides seront fournis même si " processeurs tombent en panne. Nous nous concentrons sur une approche bi-critère, où nous avons pour objectif de minimiser le temps de latence pour un nombre donné (fixé) de pannes tolérées dans le système, ou l’inverse. Les principales contributions incluent une faible complexité en temps d’exécution, et une réduction importante du nombre de communications induites par le mécanisme de réplication.Les résultats expérimentaux montrent que notre algorithme, en dépit de sa faible complexité temporelle, est meilleur que son direct compétiteur,l’algorithme FTBA
Algorithmes d’ordonnancement des tâches dans un environnement Cloud
Les systèmes distribués à grande échelle comme les Grilles ou les Nuages (Clouds) [8] sont fondamentalement dynamiques et instables, et il est également réaliste de considérer que certaines ressources vont subir des défaillances pendant leur utilisation. La panne d’une ressource peut affecter l’entière exécution des applications qui nécessitent la disponibilité de plusieurs ressources en même temps. Afin de pouvoir gérer des plates-formes dynamiques à grande échelle, il faut se tourner vers des algorithmes d'ordonnancement et d'équilibrage de charge décentralisés, de telle sorte que le système puisse passer à l'échelle, sans que les performances de la plate-forme soient limitées par celle du noeud en charge de l'ordonnancement. Dans ce papier, nous présentons un état de l’art sur les algorithmes d'ordonnancement et d'équilibrage de charge destinés pour les Clouds. Nous proposons comme synthèse une classification de ces algorithmes sur la base de critères et de dimensions que nous avons définis à cet effet
Reliable diagnostics using wireless sensor networks
Monitoring activities in industry may require the use of wireless sensor networks, for instance due to difficult access or hostile environment. But it is well known that this type of networks has various limitations like the amount of disposable energy. Indeed, once a sensor node exhausts its resources, it will be dropped from the network, stopping so to forward information about maybe relevant features towards the sink. This will result in broken links and data loss which impacts the diagnostic accuracy at the sink level. It is therefore important to keep the network's monitoring service as long as possible by preserving the energy held by the nodes. As packet transfer consumes the highest amount of energy comparing to other activities in the network, various topologies are usually implemented in wireless sensor networks to increase the network lifetime. In this paper, we emphasize that it is more difficult to perform a good diagnostic when data are gathered by a wireless sensor network instead of a wired one, due to broken links and data loss on the one hand, and deployed network topologies on the other hand. Three strategies are considered to reduce packet transfers: (1) sensor nodes send directly their data to the sink, (2) nodes are divided by clusters, and the cluster heads send the average of their clusters directly to the sink, and (3)averaged data are sent from cluster heads to cluster heads in a hop-by-hop mode, leading to an avalanche of averages. Their impact on the diagnostic accuracy is then evaluated. We show that the use of random forests is relevant for diagnostics when data are aggregated through the network and when sensors stop to transmit their values when their batteries are emptied. This relevance is discussed qualitatively and evaluated numerically by comparing the random forests performance to state-of-the-art PHM approaches, namely: basic bagging of decision trees, support vector machine, multinomial naive Bayes, AdaBoost, and Gradient Boosting. Finally, a way to couple the two best methods, namely the random forests and the gradient boosting, is proposed by finding the best hyperparameters of the former by using the latter
Resiliency in distributed sensor networks for PHM of the monitoring targets
In condition-based maintenance, real-time observations are crucial for on-line health assessment. When the monitoring system is a wireless sensor network, data loss becomes highly probable and this affects the quality of the remaining useful life prediction. In this paper, we present a fully distributed algorithm that ensures fault tolerance and recovers data loss in wireless sensor networks. We first theoretically analyze the algorithm and give correctness proofs, then provide simulation results and show that the algorithm is (i) able to ensure data recovery with a low failure rate and (ii) preserves the overall energy for dense networks
- …