Search CORE

194 research outputs found

Scheduling with Predictions and the Price of Misprediction

Author: Mitzenmacher Michael
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 23/05/2019
Field of study

In many traditional job scheduling settings, it is assumed that one knows the time it will take for a job to complete service. In such cases, strategies such as shortest job first can be used to improve performance in terms of measures such as the average time a job waits in the system. We consider the setting where the service time is not known, but is predicted by for example a machine learning algorithm. Our main result is the derivation, under natural assumptions, of formulae for the performance of several strategies for queueing systems that use predictions for service times in order to schedule jobs. As part of our analysis, we suggest the framework of the "price of misprediction," which offers a measure of the cost of using predicted information

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Uniform Bounds for Scheduling with Job Size Estimates

Author: Grosof Isaac
Mitzenmacher Michael
Scully Ziv
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)
Publication date: 01/01/2022
Field of study

We consider the problem of scheduling to minimize mean response time in M/G/1 queues where only estimated job sizes (processing times) are known to the scheduler, where a job of true size s has estimated size in the interval [? s, ? s] for some ? ? ? > 0. We evaluate each scheduling policy by its approximation ratio, which we define to be the ratio between its mean response time and that of Shortest Remaining Processing Time (SRPT), the optimal policy when true sizes are known. Our question: is there a scheduling policy that (a) has approximation ratio near 1 when ? and ? are near 1, (b) has approximation ratio bounded by some function of ? and ? even when they are far from 1, and (c) can be implemented without knowledge of ? and ?? We first show that naively running SRPT using estimated sizes in place of true sizes is not such a policy: its approximation ratio can be arbitrarily large for any fixed ? < 1. We then provide a simple variant of SRPT for estimated sizes that satisfies criteria (a), (b), and (c). In particular, we prove its approximation ratio approaches 1 uniformly as ? and ? approach 1. This is the first result showing this type of convergence for M/G/1 scheduling. We also study the Preemptive Shortest Job First (PSJF) policy, a cousin of SRPT. We show that, unlike SRPT, naively running PSJF using estimated sizes in place of true sizes satisfies criteria (b) and (c), as well as a weaker version of (a)

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Learning Augmented Online Facility Location

Author: Fotakis Dimitris
Gergatsouli Evangelia
Gouleakis Themis
Patris Nikolas
Publication venue
Publication date: 17/07/2021
Field of study

Following the research agenda initiated by Munoz & Vassilvitskii [1] and Lykouris & Vassilvitskii [2] on learning-augmented online algorithms for classical online optimization problems, in this work, we consider the Online Facility Location problem under this framework. In Online Facility Location (OFL), demands arrive one-by-one in a metric space and must be (irrevocably) assigned to an open facility upon arrival, without any knowledge about future demands. We present an online algorithm for OFL that exploits potentially imperfect predictions on the locations of the optimal facilities. We prove that the competitive ratio decreases smoothly from sublogarithmic in the number of demands to constant, as the error, i.e., the total distance of the predicted locations to the optimal facility locations, decreases towards zero. We complement our analysis with a matching lower bound establishing that the dependence of the algorithm's competitive ratio on the error is optimal, up to constant factors. Finally, we evaluate our algorithm on real world data and compare our learning augmented approach with the current best online algorithm for the problem

arXiv.org e-Print Archive

SEH: Size Estimate Hedging for Single-Server Queues

Author: Akbari-Moghaddam Maryam
Down Douglas G.
Publication venue
Publication date: 27/06/2021
Field of study

For a single server system, Shortest Remaining Processing Time (SRPT) is an optimal size-based policy. In this paper, we discuss scheduling a single-server system when exact information about the jobs' processing times is not available. When the SRPT policy uses estimated processing times, the underestimation of large jobs can significantly degrade performance. We propose a simple heuristic, Size Estimate Hedging (SEH), that only uses jobs' estimated processing times for scheduling decisions. A job's priority is increased dynamically according to an SRPT rule until it is determined that it is underestimated, at which time the priority is frozen. Numerical results suggest that SEH has desirable performance when estimation errors are not unreasonably large

arXiv.org e-Print Archive

EOLE: Paving the Way for an Effective Implementation of Value Prediction

Author: Perais Arthur
Seznec André
Publication venue: HAL CCSD
Publication date: 22/11/2013
Field of study

A fait l'objet d'une publication au "International Symposium on Computer Architecture (ISCA) 2014" Lien : http://people.irisa.fr/Arthur.Perais/data/ISCA%2714_EOLE.pdfEven in the multicore era, there is a continuous demand to increase the performance of single-threaded applications. However, the conventional path of increasing both issue width and instruction window size inevitably leads to the power wall. Value prediction (VP) was proposed in the mid 90's as an alternative path to further enhance the performance of wide-issue superscalar processors. Still, it was considered up to recently that a performance-effective implementation of Value Prediction would add tremendous complexity and power consumption in almost every stage of the pipeline. Nonetheless, recent work in the field of VP has shown that given an efficient confidence estimation mechanism, prediction validation could be removed from the out-of-order engine and delayed until commit time. As a result, recovering from mispredictions via selective replay can be avoided and a much simpler mechanism - pipeline squashing - can be used, while the out-of-order engine remains mostly unmodified. Nonetheless, VP and validation at commit time entail strong constraints on the Physical Register File. Write ports are needed to write predicted results and read ports are needed in order to validate them at commit time, potentially rendering the overall number of ports unbearable. Fortunately, VP also implies that many single-cycle ALU instructions have their operands predicted in the front-end and can be executed in-place and in-order. Similarly, the execution of single-cycle instructions whose result has been predicted can be delayed until just before commit since predictions are validated at commit time. Consequently, a significant number of instructions - 10% to 60% in our experiments - can bypass the out-of-order engine, allowing the reduction of the issue width, which is a major contributor to both out-of-order engine complexity and register file port requirement. This reduction paves the way for a truly practical implementation of Value Prediction. Furthermore, since Value Prediction in itself usually increases performance, our resulting {Early | Out-of-Order | Late} Execution architecture (EOLE), is often more efficient than a baseline VP-augmented 6-issue superscalar while having a significantly narrower 4-issue out-of-order engine.Même à l'ère des multicoeurs, il existe une demande continue pour l'augmentation de la performance sur les applications mono-threads. Cependant, la solution conventionnelle consistant à augmenter la largeur d'exécution ainsi que la taille de la fenêtre d'instructions se heurte inévitablement au mur de la consommation. La Prédiction de Valeurs (VP) a été proposée dans les années 90 comme une alternative permettant d'améliorer la performance des processeurs superscalaires. Cela étant, une implémentation intéressante du point de vue cout-efficacité était jusqu'ici considérée comme impossible à cause de la complexité ainsi que de la consommation induite. Cependant, des travaux récents dans le domaine de la Prédiction de Valeurs ont montrés qu'avec un mécanisme d'estimation de la confiance efficace, la validation d'une prédiction pouvait être repoussée au moment ou l'instruction est retirée du pipeline. Conséquemment, récupérer d'une mauvaise prédiction via une ré-exécution sélective peut-être évité et un mécanisme bien plus simple - vidage du pipeline - peut-être utilisé. Toute la partie du processeur chargée d'exécuter les instructions dans le désordre n'est donc pas modifiée. Néanmoins, VP et la validation au retirement impliquent des contraintes fortes sur le fichier de registres. Des ports d'écriture sont requis pour écrire les prédictions et des ports de lecture sont requis pour valider les prédictions au retirement. Heureusement, VP implique aussi que beaucoup d'instructions simples ont leurs opérandes disponibles tôt dans le pipeline et peuvent être exécutées dans l'ordre. De façon similaire, l'exécution des instructions simples ayant été prédites peut être reportée aux derniers étages du pipeline puisque les prédictions sont validées au retirement. Au final, une proportion significative des instructions - 10% to 60% dans notre étude - peuvent contourner le moteur d'exécution dans le désordre, ce qui permet de réduire la largeur d'exécution, qui contribue grandement à la complexité du processeur. Cette réduction ouvre la porte à une implémentation réaliste de la Prédiction de Valeurs. De plus, puisque la VP augmente la performance, notre architecture {Early | Out-of-Order | Late} Execution architecture (EOLE), est souvent plus performante qu'une architecture superscalaire implémentant la VP tout en ayant un moteur d'exécution dans le désordre bien moins complexe

INRIA a CCSD electronic archive server

Speculation in elastic systems

Author: Cortadella Jordi
Galcerán Oms Marc
Kishinevsky Mike
Publication venue
Publication date: 01/01/2009
Field of study

Speculation is a well-known technique for increasing parallelism of the microprocessor pipelines and hence their performance. While implementing speculation in modern design practice is error-prone and mostly ad-hoc, this paper proposes a correct-by-construction method for implementing speculation in Elastic Systems. The technique is based on applying provably correct transformations such as early evaluation, insertion of anti-tokens and bubbles, retiming, and sharing. It allows to explore different micro-architectural solutions for better design trade-offs. The benefits of speculation are illustrated with two examples in which these transformations are systematically applied. The method proposed in this paper is amenable for automation in a synthesis flow.Postprint (published version

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Design of a distributed memory unit for clustered microarchitectures

Author: Bieschewski Stefan
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

Power constraints led to the end of exponential growth in single–processor performance, which characterized the semiconductor industry for many years. Single–chip multiprocessors allowed the performance growth to continue so far. Yet, Amdahl’s law asserts that the overall performance of future single–chip multiprocessors will depend crucially on single–processor performance. In a multiprocessor a small growth in single–processor performance can justify the use of significant resources. Partitioning the layout of critical components can improve the energy–efficiency and ultimately the performance of a single processor. In a clustered microarchitecture parts of these components form clusters. Instructions are processed locally in the clusters and benefit from the smaller size and complexity of the clusters components. Because the clusters together process a single instruction stream communications between clusters are necessary and introduce an additional cost. This thesis proposes the design of a distributed memory unit and first level cache in the context of a clustered microarchitecture. While the partitioning of other parts of the microarchitecture has been well studied the distribution of the memory unit and the cache has received comparatively little attention. The first proposal consists of a set of cache bank predictors. Eight different predictor designs are compared based on cost and accuracy. The second proposal is the distributed memory unit. The load and store queues are split into smaller queues for distributed disambiguation. The mapping of memory instructions to cache banks is delayed until addresses have been calculated. We show how disambiguation can be implemented efficiently with unordered queues. A bank predictor is used to map instructions that consume memory data near the data origin. We show that this organization significantly reduces both energy usage and latency. The third proposal introduces Dispatch Throttling and Pre-Access Queues. These mechanisms avoid load/store queue overflows that are a result of the late allocation of entries. The fourth proposal introduces Memory Issue Queues, which add functionality to select instructions for execution and re-execution to the memory unit. The fifth proposal introduces Conservative Deadlock Aware Entry Allocation. This mechanism is a deadlock safe issue policy for the Memory Issue Queues. Deadlocks can result from certain queue allocations because entries are allocated out-of-order instead of in-order like in traditional architectures. The sixth proposal is the Early Release of Load Queue Entries. Architectures with weak memory ordering such as Alpha, PowerPC or ARMv7 can take advantage of this mechanism to release load queue entries before the commit stage. Together, these proposals allow significantly smaller and more energy efficient load queues without the need of energy hungry recovery mechanisms and without performance penalties. Finally, we present a detailed study that compares the proposed distributed memory unit to a centralized memory unit and confirms its advantages of reduced energy usage and of improved performance

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura