213 research outputs found

    Dynamic Control of the Join-Queue Lengths in Saturated Fork-Join Stations

    Get PDF
    The analysis of fork-join queueing systems has played an important role for the performance evaluation of distributed systems where parallel computations associated with the same job are carried out and a job is considered served only when all the parallel tasks it consists of are served and then joined. The fork-join nodes that we consider consist of K >= 2 parallel servers each of which is equipped with two FCFS queues, namely the service-queue and the join-queue. The former stores the tasks waiting for being served while the latter stores the served tasks waiting for being joined. When the queueing station is saturated, i.e., the service-queues are never empty, we observe that the join-queue sizes tend to grow infinitely even if the expected service times at the servers are the same. In fact, this is due to the variance of the service time distribution. To tackle this problem, we propose a simple service-rate control mechanism, and show that under the exponential assumption on the service times, we can analytically study a set of relevant performance indices. We show that by selectively reducing the speed of some servers, significant energy saving can be achieved

    JMT – Performance Engineering Tools for System Modeling

    Get PDF
    We present the Java Modelling Tools (JMT) suite, an integrated framework of Java tools for performance evaluation of computer systems using queueing models. The suite offers a rich user interface that simplifies the definition of performance models by means of wizard dialogs and of a graphical design workspace. The performance evaluation features of JMT span a wide range of state-of-the-art methodologies including discrete-event simulation, mean value analysis of product-form networks, analytical identification of bottleneck resources in multiclass environments, and workload characterization with fuzzy clustering. The discrete-event simulator supports several advanced modeling features such as finite capacity regions, load-dependent service times, bursty processes, fork-and-join nodes, and implements spectral estimation for analysis of simulative results. The suite is open-source, released under the GNU general public license (GPL), and it is available for free download at http://jmt.sourceforge.net

    Three-way optimisation of response time, subtask dispersion and energy consumption in split-merge systems

    Get PDF
    This paper investigates various ways in which the triple trade-off metrics between task response time, subtask dispersion and energy can be improved in split-merge queueing systems. Four ideas, namely dynamic subtask dispersion reduction, state-dependent service times, multiple redundant subtask service servers and restarting subtask service, are examined in the paper. It transpires that all four techniques can be used to improve the triple trade-off, while combinations of the techniques are not necessarily beneficial

    Extreme-Value Theory for Large Fork-Join Queues

    Get PDF

    Automatic and Accurate Performance Prediction in Distributed Systems

    Get PDF
    System performance is getting attention by industry as it affects user experience, and much research focused on performance evaluation approaches. Profiling is the most straightforward approach to performance evaluation of software systems, despite being limited to shallow analyses. Conversely, software performance models excel in representing complex interactions between components. Still, practitioners do not integrate performance models in the software development cycle, as the learning curve is too steep, and the approaches do not adapt well to incremental development practices. In this thesis, we propose three approaches towards automatic learning of performance models. The first approach employs a Recurrent Neural Network (RNN) to extract a full Queueing Network (QN) model of the system; the second one calibrates a Layered Queueing Network (LQN) using an RNN; the third one presents μP, a framework that allows the user to develop microservice systems and obtain the corresponding LQN model from source code analysis. We considered the microservices architecture as it is embraced by influential players (e.g., Amazon, Netflix). Those approaches have two advantages: i) minimal user intervention to flatten the learning curve; ii) continuous synchronization between software and performance model, such as each software development iteration is reflected on the model. We validated our approaches on several benchmarks taken from the literature. The models we generate can be queried to predict the system behavior under conditions significantly different from the learning setting, and the results show sensible advancements in the quality of the predictions

    Methodology for modeling high performance distributed and parallel systems

    Get PDF
    Performance modeling of distributed and parallel systems is of considerable importance to the high performance computing community. To achieve high performance, proper task or process assignment and data or file allocation among processing sites is essential. This dissertation describes an elegant approach to model distributed and parallel systems, which combines the optimal static solutions for data allocation with dynamic policies for task assignment. A performance-efficient system model is developed using analytical tools and techniques. The system model is accomplished in three steps. First, the basic client-server model which allows only data transfer is evaluated. A prediction and evaluation method is developed to examine the system behavior and estimate performance measures. The method is based on known product form queueing networks. The next step extends the model so that each site of the system behaves as both client and server. A data-allocation strategy is designed at this stage which optimally assigns the data to the processing sites. The strategy is based on flow deviation technique in queueing models. The third stage considers process-migration policies. A novel on-line adaptive load-balancing algorithm is proposed which dynamically migrates processes and transfers data among different sites to minimize the job execution cost. The gradient-descent rule is used to optimize the cost function, which expresses the cost of process execution at different processing sites. The accuracy of the prediction method and the effectiveness of the analytical techniques is established by the simulations. The modeling procedure described here is general and applicable to any message-passing distributed and parallel system. The proposed techniques and tools can be easily utilized in other related areas such as networking and operating systems. This work contributes significantly towards the design of distributed and parallel systems where performance is critical

    Manufacturing flow line systems: a review of models and analytical results

    Get PDF
    The most important models and results of the manufacturing flow line literature are described. These include the major classes of models (asynchronous, synchronous, and continuous); the major features (blocking, processing times, failures and repairs); the major properties (conservation of flow, flow rate-idle time, reversibility, and others); and the relationships among different models. Exact and approximate methods for obtaining quantitative measures of performance are also reviewed. The exact methods are appropriate for small systems. The approximate methods, which are the only means available for large systems, are generally based on decomposition, and make use of the exact methods for small systems. Extensions are briefly discussed. Directions for future research are suggested.National Science Foundation (U.S.) (Grant DDM-8914277

    Spontananfragen auf Datenströmen

    Get PDF
    Many modern applications require processing large amounts of data in a real-time fashion. As a result, distributed stream processing engines (SPEs) have gained significant attention as an important new class of big data processing systems. The central design principle of these SPEs is to handle queries that potentially run forever on data streams with a query-at-a-time model, i.e., each query is optimized and executed separately. However, in many real applications, not only long-running queries but also many short-running queries are processed on data streams. In these applications, multiple stream queries are created and deleted concurrently, in an ad-hoc manner. The best practice to handle ad-hoc stream queries is to fork input stream and add additional resources for each query. However, this approach leads to redundant computation and data copy. This thesis lays the foundation for efficient ad-hoc stream query processing. To bridge the gap between stream data processing and ad-hoc query processing, we follow a top-down approach. First, we propose a benchmarking framework to analyze state-of-the-art SPEs. We provide a definition of latency and throughput for stateful operators. Moreover, we carefully separate the system under test and the driver, to correctly represent the open-world model of typical stream processing deployments. This separation enables us to measure the system performance under realistic conditions. Our solution is the first benchmarking framework to define and test the sustainable performance of SPEs. Throughout our analysis, we realize that the state-of-the-art SPEs are unable to execute stream queries in an ad-hoc manner. Second, we propose the first ad-hoc stream query processing engine for distributed data processing environments. We develop our solution based on three main requirements: (1) Integration: Ad-hoc query processing should be a composable layer that can extend stream operators, such as join, aggregation, and window operators; (2) Consistency: Ad-hoc query creation and deletion must be performed consistently and ensure exactly-once semantics and correctness; (3) Performance: In contrast to modern SPEs, ad-hoc SPEs should not only maximize data throughput but also query throughout via incremental computation and resource sharing. Third, we propose an ad-hoc stream join processing framework that integrates dynamic query processing and query re-optimization techniques with ad-hoc stream query processing. Our solution comprises an optimization layer and a stream data processing layer. The optimization layer periodically re-optimizes the query execution plan, performing join reordering and vertical and horizontal scaling at runtime without stopping the execution. The data processing layer enables incremental and consistent query processing, supporting all the actions triggered by the optimizer. The result of the second and the third contributions forms a complete ad-hoc SPE. We utilize the first contribution not only for benchmarking modern SPEs but also for evaluating the ad-hoc SPE.Eine Vielzahl moderner Anwendungen setzten die Echtzeitverarbeitung großer Datenmengen voraus. Aus diesem Grund haben neuerdings verteilte Systeme zur Verarbeitung von Datenströmen (sog. Datenstrom-Verarbeitungssysteme, abgek. "DSV") eine wichtige Bedeutung als neue Kategorie von Massendaten-Verarbeitungssystemen erlangt. Das zentrale Entwurfsprinzip dieser DSVs ist es, Anfragen, die potenziell unendlich lange auf einem Datenstrom laufen, jeweils Eine nach der Anderen zu verarbeiten (Englisch: "query-at-a-time model"). Das bedeutet, dass jede Anfrage eigenständig vom System optimiert und ausgeführt wird. Allerdings stellen vielen reale Anwendungen nicht nur lang laufende Anfragen auf Datenströmen, sondern auch kurz laufende Spontananfragen. Solche Anwendungen können mehrere Anfragen spontan und zeitgleich erstellen und entfernen. Das bewährte Verfahren, um Spontananfragen zu bearbeiten, zweigt den eingehenden Datenstrom ab und belegt zusätzliche Ressourcen für jede neue Anfrage. Allerdings ist dieses Verfahren ineffizient, weil Spontananfragen damit redundante Berechnungen und Daten-Kopieroperationen verursachen. In dieser Arbeit legen wir das Fundament für die effiziente Verarbeitung von Spontananfragen auf Datenströmen. Wir schließen in den folgenden drei Schritten die Lücke zwischen verteilter Datenstromanfrage-Verarbeitung und Spontananfrage-Verarbeitung. Erstens stellen wir ein Benchmark-Framework zur Analyse von modernen DSVs vor. In diesem Framework stellen wir eine neue Definition für die Latenz und den Durchsatz von zustandsbehafteten Operatoren vor. Zudem unterscheiden wir genau zwischen dem zu testenden System und dem Treibersystem, um das offene-Welt Modell, welches den typischen Anwendungsszenarien in der Datenstromverabeitung entspricht, korrekt zu repräsentieren. Diese strikte Unterscheidung ermöglicht es, die Systemleistung unter realen Bedingungen zu messen. Unsere Lösung ist damit das erste Benchmark-Framework, welches die dauerhaft durchhaltbare Systemleistung von DSVs definiert und testet. Durch eine systematische Analyse aktueller DSVs stellen wir fest, dass aktuelle DSVs außerstande sind, Spontananfragen effizient zu verarbeiten. Zweitens stellen wir das erste verteilte DSV zur Spontananfrageverarbeitung vor. Wir entwickeln unser Lösungskonzept basierend auf drei Hauptanforderungen: (1) Integration: Spontananfrageverarbeitung soll ein modularer Baustein sein, mit dem Datenstrom-Operatoren wie z.B. Join, Aggregation, und Zeitfenster-Operatoren erweitert werden können; (2) Konsistenz: die Erstellung und Entfernung von Spontananfragen müssen konsistent ausgeführt werden, die Semantik für einmalige Nachrichtenzustellung erhalten, sowie die Korrektheit des Anfrage-Ergebnisses sicherstellen; (3) Leistung: Im Gegensatz zu modernen DSVs sollen DSVs zur Spontananfrageverarbeitung nicht nur den Datendurchsatz, sondern auch den Anfragedurchsatz maximieren. Dies ermöglichen wir durch inkrementelle Kompilation und der Ressourcenteilung zwischen Anfragen. Drittens stellen wir ein Programmiergerüst zur Verbeitung von Spontananfragen auf Datenströmen vor. Dieses integriert die dynamische Anfrageverarbeitung und die Nachoptimierung von Anfragen mit der Spontananfrageverarbeitung auf Datenströmen. Unser Lösungsansatz besteht aus einer Schicht zur Anfrageoptimierung und einer Schicht zur Anfrageverarbeitung. Die Optimierungsschicht optimiert periodisch den Anfrageverarbeitungsplan nach, wobei sie zur Laufzeit Joins neu anordnet und vertikal sowie horizontal skaliert, ohne die Verarbeitung anzuhalten. Die Verarbeitungsschicht ermöglicht eine inkrementelle und konsistente Anfrageverarbeitung und unterstützt alle zuvor beschriebenen Eingriffe der Optimierungsschicht in die Anfrageverarbeitung. Zusammengefasst ergeben unsere zweiten und dritten Lösungskonzepte eine vollständige DSV zur Spontananfrageverarbeitung. Wir verwenden hierzu unseren ersten Beitrag nicht nur zur Bewertung moderner DSVs, sondern auch zur Evaluation unseres DSVs zur Spontananfrageverarbeitung

    Queueing models with dependence structures

    Get PDF
    corecore