693 research outputs found

    A powerful heuristic for telephone gossiping

    Get PDF
    A refined heuristic for computing schedules for gossiping in the telephone model is presented. The heuristic is fast: for a network with n nodes and m edges, requiring R rounds for gossiping, the running time is O(R n log(n) m) for all tested classes of graphs. This moderate time consumption allows to compute gossiping schedules for networks with more than 10,000 PUs and 100,000 connections. The heuristic is good: in practice the computed schedules never exceed the optimum by more than a few rounds. The heuristic is versatile: it can also be used for broadcasting and more general information dispersion patterns. It can handle both the unit-cost and the linear-cost model. Actually, the heuristic is so good, that for CCC, shuffle-exchange, butterfly de Bruijn, star and pancake networks the constructed gossiping schedules are better than the best theoretically derived ones. For example, for gossiping on a shuffle-exchange network with 2^{13} PUs, the former upper bound was 49 rounds, while our heuristic finds a schedule requiring 31 rounds. Also for broadcasting the heuristic improves on many formerly known results. A second heuristic, works even better for CCC, butterfly, star and pancake networks. For example, with this heuristic we found that gossiping on a pancake network with 7! PUs can be performed in 15 rounds, 2 fewer than achieved by the best theoretical construction. This second heuristic is less versatile than the first, but by refined search techniques it can tackle even larger problems, the main limitation being the storage capacity. Another advantage is that the constructed schedules can be represented concisely

    Multi-stage switching networks for waveguide optical technology

    Get PDF
    Multi-stage switching is very suitable for implementing interconnection systems operating at different physical scale (from rack-to-rack to on-chip) and with several technologies (either photonics or electronics). Several multistage architectures have been proposed to design these systems in a highly modular and efficient way. Since these proposals are general and applicable to a vast range of technologies, optimizations are possible once a specific technology is considered. In this work, we aim at optimizing multi-stage banyan and EGS architectures in case of optical waveguide technology implementation. We propose a method to decrease the number of waveguide crossovers, while avoiding an excessive increase of waveguide bends

    Individual Tariffs for Mobile Communication Services

    Get PDF
    This paper introduces a conceptual framework and a computational model for individual tariffs for mobile communication services. The purpose is to provide guidance for implementation by communication service suppliers or user groups alike. The paper first examines the sociological and economic incentives for personalized services and individual tariffs. Then it introduces a framework for individual tariffs which is centered on user and supplier behaviours. The user, instead of being fully rational, has "bounded rationality" and his behaviours are subject to economic constraints and influenced by social needs. The supplier can belong to different types of entities such as firms and communities; each has his own goals which lead to different behaviors. Individual tariffs are decided through interactions between the user and the supplier and can be analyzed in a structured way using game theory. A numerical case in mobile music training is developed to illustrate the concepts.risks;mobile communication services;Individual tariffs;computational games

    CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure

    Full text link
    Fully homomorphic encryption (FHE) is in the spotlight as a definitive solution for privacy, but the high computational overhead of FHE poses a challenge to its practical adoption. Although prior studies have attempted to design ASIC accelerators to mitigate the overhead, their designs require excessive amounts of chip resources (e.g., areas) to contain and process massive data for FHE operations. We propose CiFHER, a chiplet-based FHE accelerator with a resizable structure, to tackle the challenge with a cost-effective multi-chip module (MCM) design. First, we devise a flexible architecture of a chiplet core whose configuration can be adjusted to conform to the global organization of chiplets and design constraints. The distinctive feature of our core is a recomposable functional unit providing varying computational throughput for number-theoretic transform (NTT), the most dominant function in FHE. Then, we establish generalized data mapping methodologies to minimize the network overhead when organizing the chips into the MCM package in a tiled manner, which becomes a significant bottleneck due to the technology constraints of MCMs. Also, we analyze the effectiveness of various algorithms, including a novel limb duplication algorithm, on the MCM architecture. A detailed evaluation shows that a CiFHER package composed of 4 to 64 compact chiplets provides performance comparable to state-of-the-art monolithic ASIC FHE accelerators with significantly lower package-wide power consumption while reducing the area of a single core to as small as 4.28mm2^2.Comment: 15 pages, 9 figure

    Calibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems

    Full text link
    Calibration is defined as the ratio of the average predicted click rate to the true click rate. The optimization of calibration is essential to many online advertising recommendation systems because it directly affects the downstream bids in ads auctions and the amount of money charged to advertisers. Despite its importance, calibration optimization often suffers from a problem called "maximization bias". Maximization bias refers to the phenomenon that the maximum of predicted values overestimates the true maximum. The problem is introduced because the calibration is computed on the set selected by the prediction model itself. It persists even if unbiased predictions can be achieved on every datapoint and worsens when covariate shifts exist between the training and test sets. To mitigate this problem, we theorize the quantification of maximization bias and propose a variance-adjusting debiasing (VAD) meta-algorithm in this paper. The algorithm is efficient, robust, and practical as it is able to mitigate maximization bias problems under covariate shifts, neither incurring additional online serving costs nor compromising the ranking performance. We demonstrate the effectiveness of the proposed algorithm using a state-of-the-art recommendation neural network model on a large-scale real-world dataset

    Partial aggregation for collective communication in distributed memory machines

    Get PDF
    High Performance Computing (HPC) systems interconnect a large number of Processing Elements (PEs) in high-bandwidth networks to simulate complex scientific problems. The increasing scale of HPC systems poses great challenges on algorithm designers. As the average distance between PEs increases, data movement across hierarchical memory subsystems introduces high latency. Minimizing latency is particularly challenging in collective communications, where many PEs may interact in complex communication patterns. Although collective communications can be optimized for network-level parallelism, occasional synchronization delays due to dependencies in the communication pattern degrade application performance. To reduce the performance impact of communication and synchronization costs, parallel algorithms are designed with sophisticated latency hiding techniques. The principle is to interleave computation with asynchronous communication, which increases the overall occupancy of compute cores. However, collective communication primitives abstract parallelism which limits the integration of latency hiding techniques. Approaches to work around these limitations either modify the algorithmic structure of application codes, or replace collective primitives with verbose low-level communication calls. While these approaches give fine-grained control for latency hiding, implementing collective communication algorithms is challenging and requires expertise knowledge about HPC network topologies. A collective communication pattern is commonly described as a Directed Acyclic Graph (DAG) where a set of PEs, represented as vertices, resolve data dependencies through communication along the edges. Our approach improves latency hiding in collective communication through partial aggregation. Based on mathematical rules of binary operations and homomorphism, we expose data parallelism in a respective DAG to overlap computation with communication. The proposed concepts are implemented and evaluated with a subset of collective primitives in the Message Passing Interface (MPI), an established communication standard in scientific computing. An experimental analysis with communication-bound microbenchmarks shows considerable performance benefits for the evaluated collective primitives. A detailed case study with a large-scale distributed sort algorithm demonstrates, how partial aggregation significantly improves performance in data-intensive scenarios. Besides better latency hiding capabilities with collective communication primitives, our approach enables further optimizations of their implementations within MPI libraries. The vast amount of asynchronous programming models, which are actively studied in the HPC community, benefit from partial aggregation in collective communication patterns. Future work can utilize partial aggregation to improve the interaction of MPI collectives with acclerator architectures, and to design more efficient communication algorithms
    • …
    corecore