65 research outputs found

    Derandomized Distributed Multi-resource Allocation with Little Communication Overhead

    Full text link
    We study a class of distributed optimization problems for multiple shared resource allocation in Internet-connected devices. We propose a derandomized version of an existing stochastic additive-increase and multiplicative-decrease (AIMD) algorithm. The proposed solution uses one bit feedback signal for each resource between the system and the Internet-connected devices and does not require inter-device communication. Additionally, the Internet-connected devices do not compromise their privacy and the solution does not dependent on the number of participating devices. In the system, each Internet-connected device has private cost functions which are strictly convex, twice continuously differentiable and increasing. We show empirically that the long-term average allocations of multiple shared resources converge to optimal allocations and the system achieves minimum social cost. Furthermore, we show that the proposed derandomized AIMD algorithm converges faster than the stochastic AIMD algorithm and both the approaches provide approximately same solutions

    Cooperative Coevolution for Non-Separable Large-Scale Black-Box Optimization: Convergence Analyses and Distributed Accelerations

    Full text link
    Given the ubiquity of non-separable optimization problems in real worlds, in this paper we analyze and extend the large-scale version of the well-known cooperative coevolution (CC), a divide-and-conquer optimization framework, on non-separable functions. First, we reveal empirical reasons of why decomposition-based methods are preferred or not in practice on some non-separable large-scale problems, which have not been clearly pointed out in many previous CC papers. Then, we formalize CC to a continuous game model via simplification, but without losing its essential property. Different from previous evolutionary game theory for CC, our new model provides a much simpler but useful viewpoint to analyze its convergence, since only the pure Nash equilibrium concept is needed and more general fitness landscapes can be explicitly considered. Based on convergence analyses, we propose a hierarchical decomposition strategy for better generalization, as for any decomposition there is a risk of getting trapped into a suboptimal Nash equilibrium. Finally, we use powerful distributed computing to accelerate it under the multi-level learning framework, which combines the fine-tuning ability from decomposition with the invariance property of CMA-ES. Experiments on a set of high-dimensional functions validate both its search performance and scalability (w.r.t. CPU cores) on a clustering computing platform with 400 CPU cores

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Resource Allocation in Networked and Distributed Environments

    Get PDF
    A central challenge in networked and distributed systems is resource management: how can we partition the available resources in the system across competing users, such that individual users are satisfied and certain system-wide objectives of interest are optimized? In this thesis, we deal with many such fundamental and practical resource allocation problems that arise in networked and distributed environments. We invoke two sophisticated paradigms -- linear programming and probabilistic methods -- and develop provably-good approximation algorithms for a diverse collection of applications. Our main contributions are as follows. Assignment problems: An assignment problem involves a collection of objects and locations, and a load value associated with each object-location pair. Our goal is to assign the objects to locations while minimizing various cost functions of the assignment. This setting models many applications in manufacturing, parallel processing, distributed storage, and wireless networks. We present a single algorithm for assignment which generalizes many classical assignment schemes known in the literature. Our scheme is derived through a fusion of linear algebra and randomization. In conjunction with other ideas, it leads to novel guarantees for multi-criteria parallel scheduling, broadcast scheduling, and social network modeling. Precedence constrained scheduling: We consider two precedence constrained scheduling problems, namely sweep scheduling and tree scheduling, which are inspired by emerging applications in high performance computing. Through a careful use of randomization, we devise the first approximation algorithms for these problems with near-optimal performance guarantees. Wireless communication: Wireless networks are prone to interference. This prohibits proximate network nodes from transmitting simultaneously, and introduces fundamental challenges in the design of wireless communication protocols. We develop fresh geometric insights for characterizing wireless interference. We combine our geometric analysis with linear programming and randomization, to derive near-optimal algorithms for latency minimization and throughput capacity estimation in wireless networks. In summary, the innovative use of linear programming and probabilistic techniques for resource allocation, and the novel ways of connecting them with application-specific ideas is the pivotal theme and the focal point of this thesis

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    OpenFPM: A scalable environment for particle and particle-mesh codes on parallel computers

    Get PDF
    Scalable and efficient numerical simulations continue to gain importance, as computation is firmly established tool of discovery, together with theory and experiment. Meanwhile, the performance of computing hardware grows with increasing heterogeneous hardware, enabling simulations of ever more complex models. However, efficiently implementing scalable codes on heterogeneous, distributed hardware systems becomes the bottleneck. This bottleneck can be alleviated by intermediate software layers that provide higher-level abstractions closer to the problem domain, hence allowing the computational scientist to focus on the simulation. Here, we present OpenFPM, an open and scalable framework that provides an abstraction layer for numerical simulations using particles and/or meshes. OpenFPM provides transparent and scalable infrastructure for shared-memory and distributed-memory implementations of particles-only and hybrid particle-mesh simulations of both discrete and continuous models, as well as non-simulation codes. This infrastructure is complemented with frequently used numerical routines, as well as interfaces to third-party libraries. This thesis will present the architecture and design of OpenFPM, detail the underlying abstractions, and benchmark the framework in applications ranging from Smoothed-Particle Hydrodynamics (SPH) to Molecular Dynamics (MD), Discrete Element Methods (DEM), Vortex Methods, stencil codes, high-dimensional Monte Carlo sampling (CMA-ES), and Reaction-Diffusion solvers, comparing it to the current state of the art and existing software frameworks

    Real-Time FaaS: Towards a Latency Bounded Serverless Cloud

    Get PDF
    Today, Function-as-a-Service is the most promising concept of serverless cloud computing. It makes possible for developers to focus on application development without any system management effort: FaaS ensures resource allocation, fast response time, schedulability, scalability, resiliency, and upgradability. Applications of 5G, IoT, and Industry 4.0 raise the idea to open cloud-edge computing infrastructures for time-critical applications too, i.e., there is a strong desire to pose real-time requirements for computing systems like FaaS. However, multi-node systems make real-time scheduling significantly complex since guaranteeing real-time task execution and communication is challenging even on one computing node with multi-core processors. In this paper, we present an analytical model and a heuristic partitioning scheduling algorithm suitable for real-time FaaS platforms of multi-node clusters. We show that our task scheduling heuristics could outperform existing algorithms by 55%. Furthermore, we propose three conceptual designs to enable the necessary real-time communications. We present the architecture of the envisioned real-time FaaS platform, emphasize its benefits and the requirements for the underlying network and nodes, and survey the related work that could meet these demands

    Angles and devices for quantum approximate optimization

    Get PDF
    A potential application of emerging Noisy Intermediate-Scale Quantum (NISQ) devices is that of approximately solving combinatorial optimization problems. This thesis investigates a gate-based algorithm for this purpose, the Quantum Approximate Optimization Algorithm (QAOA), in two major themes. First, we examine how the QAOA resolves the problems it is designed to solve. We take a statistical view of the algorithm applied to ensembles of problems, first, considering a highly symmetric version of the algorithm, using Grover drivers. In this highly symmetric context, we find a simple dependence of the QAOA state’s expected value on how values of the cost function are distributed. Furthering this theme, we demonstrate that, generally, QAOA performance depends on problem statistics with respect to a metric induced by a chosen driver Hamiltonian. We obtain a method for evaluating QAOA performance on worst-case problems, those of random costs, for differing driver choices. Second, we investigate a QAOA context with device control occurring only via single-qubit gates, rather than using individually programmable one- and two-qubit gates. In this reduced control overhead scheme---the digital-analog scheme---the complexity of devices running QAOA circuits is decreased at the cost of errors which are shown to be non-harmful in certain regimes. We then explore hypothetical device designs one could use for this purpose.Eine mögliche Anwendung für “Noisy Intermediate-Scale Quantum devices” (NISQ devices) ist die näherungsweise Lösung von kombinatorischen Optimierungsproblemen. Die vorliegende Arbeit untersucht anhand zweier Hauptthemen einen gatterbasierten Algorithmus, den sogenannten “Quantum Approximate Optimization Algorithm” (QAOA). Zuerst prüfen wir, wie der QAOA jene Probleme löst, für die er entwickelt wurde. Wir betrachten den Algorithmus in einer Kombination mit hochsymmetrischen Grover-Treibern für statistische Ensembles von Probleminstanzen. In diesem Kontext finden wir eine einfache Abhängigkeit von der Verteilung der Kostenfunktionswerte. Weiterführend zeigen wir, dass die QAOA-Leistung generell von der Problemstatistik in Bezug auf eine durch den gewählten Treiber-Hamiltonian induzierte Metrik abhängt. Wir erhalten eine Methode zur Bewertung der QAOA-Leistung bei schwersten Problemen (solche zufälliger Kosten) für unterschiedliche Treiberauswahlen. Zweitens untersuchen wir eine QAOA-Variante, bei der sich die Hardware- Kontrolle nur auf Ein-Qubit-Gatter anstatt individuell programmierbare Ein- und Zwei-Qubit-Gatter erstreckt. In diesem reduzierten Kontrollaufwandsschema—dem digital-analogen Schema—sinkt die Komplexität der Hardware, welche die QAOASchaltungen ausführt, auf Kosten von Fehlern, die in bestimmten Bereichen als ungefährlich nachgewiesen werden. Danach erkunden wir hypothetische Hardware- Konzepte, die für diesen Zweck genutzt werden könnten
    • …
    corecore