Search CORE

15 research outputs found

Graph Searches and Their End Vertices

Author: Cao Yixin
Rong Guozhen
Wang Jianxin
Wang Zhifeng
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th International Symposium on Algorithms and Computation (ISAAC 2019)
Publication date: 01/01/2019
Field of study

Graph search, the process of visiting vertices in a graph in a specific order, has demonstrated magical powers in many important algorithms. But a systematic study was only initiated by Corneil et al.~a decade ago, and only by then we started to realize how little we understand it. Even the apparently na\"{i}ve question "which vertex can be the last visited by a graph search algorithm," known as the end vertex problem, turns out to be quite elusive. We give a full picture of all maximum cardinality searches on chordal graphs, which implies a polynomial-time algorithm for the end vertex problem of maximum cardinality search. It is complemented by a proof of NP-completeness of the same problem on weakly chordal graphs. We also show linear-time algorithms for deciding end vertices of breadth-first searches on interval graphs, and end vertices of lexicographic depth-first searches on chordal graphs. Finally, we present

2^n\cdot n^{O(1)}

-time algorithms for deciding the end vertices of breadth-first searches, depth-first searches, maximum cardinality searches, and maximum neighborhood searches on general graphs

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Makespan Scheduling of Unit Jobs with Precedence Constraints in $O(1.995^n)$ time

Author: Nederlof J.
Swennenhuis C.
Węgrzycki K.
Publication venue
Publication date: 01/01/2022
Field of study

In a classical scheduling problem, we are given a set of

n

jobs of unitlength along with precedence constraints and the goal is to find a schedule ofthese jobs on

m

identical machines that minimizes the makespan. This problemis well-known to be NP-hard for an unbounded number of machines. Using standard3-field notation, it is known as

P|\text{prec}, p_j=1|C_{\max}

. We present an algorithm for this problem that runs in

O(1.995^n)

time.Before our work, even for

m=3

machines the best known algorithms ran in

O^\ast(2^n)

time. In contrast, our algorithm works when the number ofmachines

m

is unbounded. A crucial ingredient of our approach is an algorithmwith a runtime that is only single-exponential in the vertex cover of thecomparability graph of the precedence constraint graph. This heavily relies oninsights from a classical result by Dolev and Warmuth (Journal of Algorithms1984) for precedence graphs without long chains.<br

MPG.PuRe

Makespan Scheduling of Unit Jobs with Precedence Constraints in $O(1.995^n)$ time

Author: Nederlof Jesper
Swennenhuis Céline M. F.
Węgrzycki Karol
Publication venue
Publication date: 04/08/2022
Field of study

In a classical scheduling problem, we are given a set of

n

jobs of unit length along with precedence constraints and the goal is to find a schedule of these jobs on

m

identical machines that minimizes the makespan. This problem is well-known to be NP-hard for an unbounded number of machines. Using standard 3-field notation, it is known as

P|\text{prec}, p_j=1|C_{\max}

. We present an algorithm for this problem that runs in

O(1.995^n)

time. Before our work, even for

m=3

machines the best known algorithms ran in

O^\ast(2^n)

time. In contrast, our algorithm works when the number of machines

m

is unbounded. A crucial ingredient of our approach is an algorithm with a runtime that is only single-exponential in the vertex cover of the comparability graph of the precedence constraint graph. This heavily relies on insights from a classical result by Dolev and Warmuth (Journal of Algorithms 1984) for precedence graphs without long chains.Comment: 26 pages, 7 figure

arXiv.org e-Print Archive

Recommended from our members

Piece-wise scheduling of composite task graphs onto distributed memory parallel computers

Author: Lewis Ted G.
Publication venue: Oregon State University. Department of Computer Science
Publication date
Field of study

Heuristics for static scheduling of task graphs using list scheduling techniques have continued to improve by adding real-world factors such as processor speed, network transmission speed, interconnection topology, and link contention considerations to the basic task graph model. Yet, the resulting schedules do not fully model program loops and branches, startup costs for both process creation and message initiation, and a number of interesting parallel processing patterns such as meshes, tress, and supervisor/workers. In fact, improvements in the schedule may be obtained when the task graph is regular as when it contains repeated or replicated tasks, divide-and-conquer patterns of communication, or a mesh-structured pattern of computation. In this paper we describe a limited approach to scheduling composite task graphs that considers process and message startup costs, and three regular patterns : replicated, tree, and mesh. The approach is to model programs with such regular patterns as a composite task graph, where each regular structure is a decomposable sub-task node in the task graph. Then, we compute an optimal schedule for each sub-task. graph, piece the sub-tasks together, and perform an ordinary static scheduling heuristic on the pieces, to produce an overall schedule. We define a composite task graph as a hierarchical task graph containing regular-structured sub-task graphs as components. At the top level of this hierarchy, each graph node represents either a simple task or a hierarchically decomposable sub-task graph. We propose a piece-wise scheduling algorithm that simply allocates processors to sub-task graphs according to closed-form expressions which give determine the optimal number of processors, and then uses a list scheduling algorithm to schedule the flattened graph onto these processors. We do not address the pressing problem of loops and branches in the task graph representation, but we speculate that the technique of piece-wise scheduling introduced here can be adapted to a hybrid form of scheduling that may accommodate branches and loops. Piece-wise scheduling is not guaranteed to yield the best global schedule. Rather, it pieces together locally optimum sub-schedules. Finding globally optimum schedules for composite task graphs remains an open problem. We present an heuristic approach that has been experimentally used to schedule small parallel programs with encouraging results. More empirical evidence is needed to determine the usefulness of this technique, but early indications are encouraging

ScholarsArchive@OSU

Partitioning Hypergraphs is Hard: Models, Inapproximability, and Applications

Author: Anegg Georg
Papp Pál András
Yzelman A. N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/04/2023
Field of study

We study the balanced

k

-way hypergraph partitioning problem, with a special focus on its practical applications to manycore scheduling. Given a hypergraph on

n

nodes, our goal is to partition the node set into

k

parts of size at most

(1+\epsilon)\cdot \frac{n}{k}

each, while minimizing the cost of the partitioning, defined as the number of cut hyperedges, possibly also weighted by the number of partitions they intersect. We show that this problem cannot be approximated to within a

n^{1/\text{poly} \log\log n}

factor of the optimal solution in polynomial time if the Exponential Time Hypothesis holds, even for hypergraphs of maximal degree 2. We also study the hardness of the partitioning problem from a parameterized complexity perspective, and in the more general case when we have multiple balance constraints. Furthermore, we consider two extensions of the partitioning problem that are motivated from practical considerations. Firstly, we introduce the concept of hyperDAGs to model precedence-constrained computations as hypergraphs, and we analyze the adaptation of the balanced partitioning problem to this case. Secondly, we study the hierarchical partitioning problem to model hierarchical NUMA (non-uniform memory access) effects in modern computer architectures, and we show that ignoring this hierarchical aspect of the communication cost can yield significantly weaker solutions.Comment: Published in the 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2023

arXiv.org e-Print Archive

Recent developments in deterministic sequencing and scheduling: a survey : (preprint)

Author: Lawler E.L.
Lenstra J.K. (Jan Karel)
Rinnooy Kan A.H.G.
Publication venue: Stichting Mathematisch Centrum
Publication date: 01/01/1981
Field of study

CWI's Institutional Repository

Analysis of job scheduling algorithms for heterogeneous multiprocessor computing systems

Author: Constantinos D. Spyropoulos (7170236)
Publication venue
Publication date: 01/01/1979
Field of study

The problem of scheduling independent jobs on heterogeneous multiprocessor models (i.e., those with non-identical or uniform processors) with independent memories has been studied. Actually, a number of demand scheduling nonpreemptive algorithms have been evaluated, with respect to their mean flow and completion time performance criterion. In particular, the deterministic analysis has been used to predict the worst-case performance whereas simulation techniques have been applied to estimate the expected performance of the algorithms. As a result from the deterministic analysis, informative worstcase bounds have been proven, from which the behaviour of the extreme performance of the considered algorithms can be well predicted. However, relaxing some or a combination of the system parameters then, our model corresponds to versions which have already been studied. (i.e. the classical homogeneous and heterogeneous models or the homogeneous one with independent memories). For such cases, the proven bounds in this thesis either agree or are better and more informative than the ones found for these simpler models.. Finally, the analysis of the worst-case and expected performance results reveals that there is a high degree of correlation in the behaviour of the algorithms as predicted or estimated by these two performance measurements, respectively

Loughborough University Institutional Repository

Generalizing List Scheduling for Stochastic Soft Real-time Parallel Applications

Author: Dandass Yoginder Singh
Publication venue: Scholars Junction
Publication date: 05/11/2003
Field of study

Advanced architecture processors provide features such as caches and branch prediction that result in improved, but variable, execution time of software. Hard real-time systems require tasks to complete within timing constraints. Consequently, hard real-time systems are typically designed conservatively through the use of tasks? worst-case execution times (WCET) in order to compute deterministic schedules that guarantee task?s execution within giving time constraints. This use of pessimistic execution time assumptions provides real-time guarantees at the cost of decreased performance and resource utilization. In soft real-time systems, however, meeting deadlines is not an absolute requirement (i.e., missing a few deadlines does not severely degrade system performance or cause catastrophic failure). In such systems, a guaranteed minimum probability of completing by the deadline is sufficient. Therefore, there is considerable latitude in such systems for improving resource utilization and performance as compared with hard real-time systems, through the use of more realistic execution time assumptions. Given probability distribution functions (PDFs) representing tasks? execution time requirements, and tasks? communication and precedence requirements, represented as a directed acyclic graph (DAG), this dissertation proposes and investigates algorithms for constructing non-preemptive stochastic schedules. New PDF manipulation operators developed in this dissertation are used to compute tasks? start and completion time PDFs during schedule construction. PDFs of the schedules? completion times are also computed and used to systematically trade the probability of meeting end-to-end deadlines for schedule length and jitter in task completion times. Because of the NP-hard nature of the non-preemptive DAG scheduling problem, the new stochastic scheduling algorithms extend traditional heuristic list scheduling and genetic list scheduling algorithms for DAGs by using PDFs instead of fixed time values for task execution requirements. The stochastic scheduling algorithms also account for delays caused by communication contention, typically ignored in prior DAG scheduling research. Extensive experimental results are used to demonstrate the efficacy of the new algorithms in constructing stochastic schedules. Results also show that through the use of the techniques developed in this dissertation, the probability of meeting deadlines can be usefully traded for performance and jitter in soft real-time systems

Mississippi State University Libraries ETD database

Scholars Junction - Mississippi State University Institutional Repository

Analyzable dataflow executions with adaptive redundancy

Author: Kühbacher Christoph
Publication venue
Publication date: 04/11/2022
Field of study

Increasing performance requirements in the embedded systems domain have encouraged a drift from singlecore to multicore processors, and thus multicore processors are widely used in embedded systems today. Cars are an example for complex embedded systems in which the use of multicore processors is continuously increasing. A major reason for this is to consolidate different software components on one chip and thus reduce the number of electronic control units. However, the de facto standard in the automotive industry, AUTOSAR (AUTomotive Open System ARchitecture), was originally designed for singlecore processors. Although basic support for multicore processors was added, more complex architectures are currently not compatible with the software stack. Regarding the software components running on the ECUS of modern cars, requirements are diverse. On the one hand, there are safety-critical tasks, like the airbag control, anti-lock braking system, electronic stability control and emergency brake assist, and on the other hand, tasks which do not have any safety-related requirements at all, for example tasks controlling the infotainment system. Trends like autonomous driving lead to even more demanding tasks in the system since such tasks are both safety-critical and data-intensive. As embedded applications, like those in the automotive domain, become more complex, new approaches are necessary. Data-intensive tasks are usually tackled with large-scale computing frameworks. In this thesis, some major concepts of such frameworks are transferred to the high-performance embedded systems domain. For this purpose, the thesis describes a runtime environment (RTE) that is suitable for different kinds of multi- and manycore hardware architectures. The RTE follows a dataflow execution model based on directed acyclic graphs (DAGs). Graphs are divided into sections which are scheduled separately. For each section, the RTE uses a DAG scheduling heuristic to compute multiple schedules covering different redundancy configurations. This allows the RTE to dynamically change the redundancy of parts of the graph at runtime despite the use of fixed schedules. Alternatively, the RTE also provides an online scheduler. To specify suitable graphs, the RTE also provides a programming model which shares similarities with common large-scale computing frameworks, for example Apache Spark. Using this programming model, three common distributed algorithms, namely Cannon's algorithm, the Cooley-Tukey algorithm and bitonic sort, were implemented. With these three programs, the performance of the RTE was evaluated for a variety of configurations on two different hardware architectures. The results show that the proposed RTE is able to reach the performance of established parallel computation frameworks and that for suitable graphs with reasonable sectionings the negative influence on the runtime is either small or non-existent.Aufgrund steigender Anforderungen an die Leistungsfähigkeit von eingebetteten Systemen finden Mehrkernprozessoren mittlerweile auch in eingebetteten Systemen Verwendung. Autos sind ein Beispiel für eingebettete Systeme, in denen die Verbreitung von Mehrkernprozessoren kontinuierlich zunimmt. Ein Hauptgrund ist, dass es dadurch möglich wird, mehrere Applikationen, für die ursprünglich mehrere Electronic Control Units (ECUs) notwendig waren, auf ein und demselben Chip auszuführen und dadurch die Anzahl der ECUs im Gesamtsystem zu verringern. Der De-facto-Standard AUTOSAR (AUTomotive Open System ARchitecture) wurde jedoch ursprünglich nur im Hinblick auf Einkernprozessoren entworfen und, obwohl der Softwarestack um grundlegende Unterstützung für Mehrkernprozessoren erweitert wurde, sind komplexere Architekturen nicht damit kompatibel. Die Anforderungen der Softwarekomponenten von modernen Autos sind vielfältig. Einerseits gibt es hochgradig sicherheitskritische Tasks, die beispielsweise die Airbags, das Antiblockiersystem, die Fahrdynamikregelung oder den Notbremsassistenten steuern und andererseits Tasks, die keinerlei sicherheitskritische Anforderungen aufweisen, wie zum Beispiel Tasks zur Steuerung des Infotainment-Systems. Neue Trends wie autonomes Fahren führen zu weiteren anspruchsvollen Tasks, die sowohl hohe Leistungs- als auch Sicherheitsanforderungen aufweisen. Da die Komplexität eingebetteter Anwendungen, beispielsweise im Automobilbereich, stetig zunimmt, sind neue Ansätze erforderlich. Für komplexe, datenintensive Aufgaben werden in der Regel Cluster-Computing-Frameworks eingesetzt. In dieser Arbeit werden Konzepte solcher Frameworks auf den Bereich der eingebetteten Systeme übertragen. Dazu beschreibt die Arbeit eine Laufzeitumgebung (RTE) für eingebettete Mehrkernarchitekturen. Die RTE folgt einem Datenfluss-Ausführungsmodell, das auf gerichteten azyklischen Graphen basiert. Graphen können in Abschnitte eingeteilt werden, für welche separat mehrere unterschiedlich redundante Schedules mit Hilfe einer Scheduling-Heuristik berechnet werden. Dieser Ansatz erlaubt es, die Redundanz von Teilen der Anwendung zur Laufzeit zu verändern. Alternativ unterstützt die RTE auch Scheduling zur Laufzeit. Zur Erzeugung von Graphen stellt die RTE ein Programmiermodell bereit, welches sich an etablierten Frameworks, insbesondere Apache Spark, orientiert. Damit wurden drei Beispielanwendungen implementiert, die auf gängigen Algorithmen basieren. Konkret handelt es sich um Cannon's Algorithmus, den Cooley-Tukey-Algorithmus und bitonisches Sortieren. Um die Leistungsfähigkeit der RTE zu ermitteln, wurden diese drei Anwendungen mehrfach mit verschiedenen Konfigurationen auf zwei Hardware-Architekturen ausgeführt. Die Ergebnisse zeigen, dass die RTE in ihrer Leistungsfähigkeit mit etablierten Systemen vergleichbar ist und die Laufzeit bei einer sinnvollen Graphaufteilung im besten Fall nur geringfügig beeinflusst wird

OPUS Augsburg