277 research outputs found
Serial-batch scheduling â the special case of laser-cutting machines
The dissertation deals with a problem in the field of short-term production planning, namely the scheduling of laser-cutting machines. The object of decision is the grouping of production orders (batching) and the sequencing of these order groups on one or more machines (scheduling). This problem is also known in the literature as "batch scheduling problem" and belongs to the class of combinatorial optimization problems due to the interdependencies between the batching and the scheduling decisions. The concepts and methods used are mainly from production planning, operations research and machine learning
Heuristic Solutions for Loading in Flexible Manufacturing Systems
Production planning in flexible manufacturing system deals with the efficient organization of the production resources in order to meet a given production schedule. It is a complex problem and typically leads to several hierarchical subproblems that need to be solved sequentially or simultaneously. Loading is one of the planning subproblems that has to addressed. It involves assigning the necessary operations and tools among the various machines in some optimal fashion to achieve the production of all selected part types. In this paper, we first formulate the loading problem as a 0-1 mixed integer program and then propose heuristic procedures based on Lagrangian relaxation and tabu search to solve the problem. Computational results are presented for all the algorithms and finally, conclusions drawn based on the results are discussed
PiCo: A Domain-Specific Language for Data Analytics Pipelines
In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution modelsâfor which only informal (and often confusing) semantics is generally providedâall share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks.
From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics.
The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.
Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world
Bicriterion scheduling with equal processing times on a batch processing machine
Author name used in this publication: C. T. NgAuthor name used in this publication: T. C. E. Cheng2008-2009 > Academic research: refereed > Publication in refereed journalAccepted ManuscriptPublishe
A Comparison of Big Data Frameworks on a Layered Dataflow Model
In the world of Big Data analytics, there is a series of tools aiming at
simplifying programming applications to be executed on clusters. Although each
tool claims to provide better programming, data and execution models, for which
only informal (and often confusing) semantics is generally provided, all share
a common underlying model, namely, the Dataflow model. The Dataflow model we
propose shows how various tools share the same expressiveness at different
levels of abstraction. The contribution of this work is twofold: first, we show
that the proposed model is (at least) as general as existing batch and
streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to
understand high-level data-processing applications written in such frameworks.
Second, we provide a layered model that can represent tools and applications
following the Dataflow paradigm and we show how the analyzed tools fit in each
level.Comment: 19 pages, 6 figures, 2 tables, In Proc. of the 9th Intl Symposium on
High-Level Parallel Programming and Applications (HLPP), July 4-5 2016,
Muenster, German
Local SGD Converges Fast and Communicates Little
Mini-batch stochastic gradient descent (SGD) is state of the art in large
scale distributed training. The scheme can reach a linear speedup with respect
to the number of workers, but this is rarely seen in practice as the scheme
often suffers from large network delays and bandwidth limits. To overcome this
communication bottleneck recent works propose to reduce the communication
frequency. An algorithm of this type is local SGD that runs SGD independently
in parallel on different workers and averages the sequences only once in a
while.
This scheme shows promising results in practice, but eluded thorough
theoretical analysis. We prove concise convergence rates for local SGD on
convex problems and show that it converges at the same rate as mini-batch SGD
in terms of number of evaluated gradients, that is, the scheme achieves linear
speedup in the number of workers and mini-batch size. The number of
communication rounds can be reduced up to a factor of T^{1/2}---where T denotes
the number of total steps---compared to mini-batch SGD. This also holds for
asynchronous implementations. Local SGD can also be used for large scale
training of deep learning models.
The results shown here aim serving as a guideline to further explore the
theoretical and practical aspects of local SGD in these applications.Comment: to appear at ICLR 2019, 19 page
Quality of Service Aware Data Stream Processing for Highly Dynamic and Scalable Applications
Huge amounts of georeferenced data streams are arriving daily to data stream management systems that are deployed for serving highly scalable and dynamic applications. There are innumerable ways at which those loads can be exploited to gain deep insights in various domains. Decision makers require an interactive visualization of such data in the form of maps and dashboards for decision making and strategic planning. Data streams normally exhibit fluctuation and oscillation in arrival rates and skewness. Those are the two predominant factors that greatly impact the overall quality of service. This requires data stream management systems to be attuned to those factors in addition to the spatial shape of the data that may exaggerate the negative impact of those factors. Current systems do not natively support services with quality guarantees for dynamic scenarios, leaving the handling of those logistics to the user which is challenging and cumbersome. Three workloads are predominant for any data stream, batch processing, scalable storage and stream processing. In this thesis, we have designed a quality of service aware system, SpatialDSMS, that constitutes several subsystems that are covering those loads and any mixed load that results from intermixing them. Most importantly, we natively have incorporated quality of service optimizations for processing avalanches of geo-referenced data streams in highly dynamic application scenarios. This has been achieved transparently on top of the codebases of emerging de facto standard best-in-class representatives, thus relieving the overburdened shoulders of the users in the presentation layer from having to reason about those services. Instead, users express their queries with quality goals and our system optimizers compiles that down into query plans with an embedded quality guarantee and leaves logistic handling to the underlying layers. We have developed standard compliant prototypes for all the subsystems that constitutes SpatialDSMS
Adaptive Performance and Power Management in Distributed Computing Systems
The complexity of distributed computing systems has raised two unprecedented challenges for system management. First, various customers need to be assured by meeting their required service-level agreements such as response time and throughput. Second, system power consumption must be controlled in order to avoid system failures caused by power capacity overload or system overheating due to increasingly high server density. However, most existing work, unfortunately, either relies on open-loop estimations based on off-line profiled system models, or evolves in a more ad hoc fashion, which requires exhaustive iterations of tuning and testing, or oversimplifies the problem by ignoring the coupling between different system characteristics (\ie, response time and throughput, power consumption of different servers). As a result, the majority of previous work lacks rigorous guarantees on the performance and power consumption for computing systems, and may result in degraded overall system performance. In this thesis, we extensively study adaptive performance/power management and power-efficient performance management for distributed computing systems such as information dissemination systems, power grid management systems, and data centers, by proposing Multiple-Input-Multiple-Output (MIMO) control and hierarchical designs based on feedback control theory. For adaptive performance management, we design an integrated solution that controls both the average response time and CPU utilization in information dissemination systems to achieve bounded response time for high-priority information and maximized system throughput in an example information dissemination system. In addition, we design a hierarchical control solution to guarantee the deadlines of real-time tasks in power grid computing by grouping them based on their characteristics, respectively. For adaptive power management, we design MIMO optimal control solutions for power control at the cluster and server level and a hierarchical solution for large-scale data centers. Our MIMO control design can capture the coupling among different system characteristics, while our hierarchical design can coordinate controllers at different levels. For power-efficient performance management, we discuss a two-layer coordinated management solution for virtualized data centers. Experimental results in both physical testbeds and simulations demonstrate that all the solutions outperform state-of-the-art management schemes by significantly improving overall system performance
Recommended from our members
Generating, Optimizing, and Scheduling a Compiler Level Representation of Stream Parallelism
Stream parallelism is often cited as a powerful programming model for expressing parallel computation for multi-core and heterogeneous computers. It allows programmers to concisely describe the concurrency and communication requirements found in a program and it allows compilers and runtime systems to easily generate efficient code targeting parallel hardware. This type of stream parallelism is often restricted to use the Synchronous Dataflow (SDF) model and implemented using static compilation and scheduling techniques. While powerful, SDF and the associated static methods have real limitations when applied to general purpose programing on general purpose hardware. To increase generality, we can define stream parallelism as a graph of processes communicating with one another over unidirectional data channels. Although dynamic scheduling techniques have been developed for this more general model, the powerful compiler transformations that are available under the SDF model no longer apply. This is made worse by the fact that general purpose models are typically implemented as software frameworks on top of high-level general purpose languages. The Stream and Kernel Intermediate Representation (SKIR) is a compiler level representation of stream parallelism for general purpose languages. A SKIR compiler is able to recognize and take advantage of SDF style parallelism while allowing more general programs. This thesis presents the SKIR program representation and describes how it can be used as compiler target for several different high level languages. We show a dynamic scheduling mechanism for SKIR programs based on the concepts of coroutines and task stealing. We also propose code optimizations to reduce runtime overhead associated with dynamic scheduling. Such techniques are not possible in a high-level software framework and provide performance that meets or exceeds the performance of existing systems while providing greater generality and portability than static methods
- âŠ