693 research outputs found
A powerful heuristic for telephone gossiping
A refined heuristic for computing schedules for gossiping in the telephone model is presented. The heuristic is fast: for a network with n nodes and m edges, requiring R rounds for gossiping, the running time is O(R n log(n) m) for all tested classes of graphs. This moderate time consumption allows to compute gossiping schedules for networks with more than 10,000 PUs and 100,000 connections. The heuristic is good: in practice the computed schedules never exceed the optimum by more than a few rounds. The heuristic is versatile: it can also be used for broadcasting and more general information dispersion patterns. It can handle both the unit-cost and the linear-cost model. Actually, the heuristic is so good, that for CCC, shuffle-exchange, butterfly de Bruijn, star and pancake networks the constructed gossiping schedules are better than the best theoretically derived ones. For example, for gossiping on a shuffle-exchange network with 2^{13} PUs, the former upper bound was 49 rounds, while our heuristic finds a schedule requiring 31 rounds. Also for broadcasting the heuristic improves on many formerly known results. A second heuristic, works even better for CCC, butterfly, star and pancake networks. For example, with this heuristic we found that gossiping on a pancake network with 7! PUs can be performed in 15 rounds, 2 fewer than achieved by the best theoretical construction. This second heuristic is less versatile than the first, but by refined search techniques it can tackle even larger problems, the main limitation being the storage capacity. Another advantage is that the constructed schedules can be represented concisely
Multi-stage switching networks for waveguide optical technology
Multi-stage switching is very suitable for implementing interconnection systems operating at different physical scale (from rack-to-rack to on-chip) and with several technologies (either photonics or electronics). Several multistage architectures have been proposed to design these systems in a highly modular and efficient way. Since these proposals are general and applicable to a vast range of technologies, optimizations are possible once a specific technology is considered. In this work, we aim at optimizing multi-stage banyan and EGS architectures in case of optical waveguide technology implementation. We propose a method to decrease the number of waveguide crossovers, while avoiding an excessive increase of waveguide bends
Individual Tariffs for Mobile Communication Services
This paper introduces a conceptual framework and a computational model for individual tariffs for mobile communication services. The purpose is to provide guidance for implementation by communication service suppliers or user groups alike. The paper first examines the sociological and economic incentives for personalized services and individual tariffs. Then it introduces a framework for individual tariffs which is centered on user and supplier behaviours. The user, instead of being fully rational, has "bounded rationality" and his behaviours are subject to economic constraints and influenced by social needs. The supplier can belong to different types of entities such as firms and communities; each has his own goals which lead to different behaviors. Individual tariffs are decided through interactions between the user and the supplier and can be analyzed in a structured way using game theory. A numerical case in mobile music training is developed to illustrate the concepts.risks;mobile communication services;Individual tariffs;computational games
CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure
Fully homomorphic encryption (FHE) is in the spotlight as a definitive
solution for privacy, but the high computational overhead of FHE poses a
challenge to its practical adoption. Although prior studies have attempted to
design ASIC accelerators to mitigate the overhead, their designs require
excessive amounts of chip resources (e.g., areas) to contain and process
massive data for FHE operations.
We propose CiFHER, a chiplet-based FHE accelerator with a resizable
structure, to tackle the challenge with a cost-effective multi-chip module
(MCM) design. First, we devise a flexible architecture of a chiplet core whose
configuration can be adjusted to conform to the global organization of chiplets
and design constraints. The distinctive feature of our core is a recomposable
functional unit providing varying computational throughput for number-theoretic
transform (NTT), the most dominant function in FHE. Then, we establish
generalized data mapping methodologies to minimize the network overhead when
organizing the chips into the MCM package in a tiled manner, which becomes a
significant bottleneck due to the technology constraints of MCMs. Also, we
analyze the effectiveness of various algorithms, including a novel limb
duplication algorithm, on the MCM architecture. A detailed evaluation shows
that a CiFHER package composed of 4 to 64 compact chiplets provides performance
comparable to state-of-the-art monolithic ASIC FHE accelerators with
significantly lower package-wide power consumption while reducing the area of a
single core to as small as 4.28mm.Comment: 15 pages, 9 figure
Calibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems
Calibration is defined as the ratio of the average predicted click rate to
the true click rate. The optimization of calibration is essential to many
online advertising recommendation systems because it directly affects the
downstream bids in ads auctions and the amount of money charged to advertisers.
Despite its importance, calibration optimization often suffers from a problem
called "maximization bias". Maximization bias refers to the phenomenon that the
maximum of predicted values overestimates the true maximum. The problem is
introduced because the calibration is computed on the set selected by the
prediction model itself. It persists even if unbiased predictions can be
achieved on every datapoint and worsens when covariate shifts exist between the
training and test sets. To mitigate this problem, we theorize the
quantification of maximization bias and propose a variance-adjusting debiasing
(VAD) meta-algorithm in this paper. The algorithm is efficient, robust, and
practical as it is able to mitigate maximization bias problems under covariate
shifts, neither incurring additional online serving costs nor compromising the
ranking performance. We demonstrate the effectiveness of the proposed algorithm
using a state-of-the-art recommendation neural network model on a large-scale
real-world dataset
Partial aggregation for collective communication in distributed memory machines
High Performance Computing (HPC) systems interconnect a large number of Processing Elements (PEs) in high-bandwidth networks to simulate complex scientific problems. The increasing scale of HPC systems poses great challenges on algorithm designers. As the average distance between PEs increases, data movement across hierarchical memory subsystems introduces high latency. Minimizing latency is particularly challenging in collective communications, where many PEs may interact in complex communication patterns. Although collective communications can be optimized for network-level parallelism, occasional synchronization delays due to dependencies in the communication pattern degrade application performance.
To reduce the performance impact of communication and synchronization costs, parallel algorithms are designed with sophisticated latency hiding techniques. The principle is to interleave computation with asynchronous communication, which increases the overall occupancy of compute cores. However, collective communication primitives abstract parallelism which limits the integration of latency hiding techniques. Approaches to work around these limitations either modify the algorithmic structure of application codes, or replace collective primitives with verbose low-level communication calls. While these approaches give fine-grained control for latency hiding, implementing collective communication algorithms is challenging and requires expertise knowledge about HPC network topologies.
A collective communication pattern is commonly described as a Directed Acyclic Graph (DAG) where a set of PEs, represented as vertices, resolve data dependencies through communication along the edges. Our approach improves latency hiding in collective communication through partial aggregation. Based on mathematical rules of binary operations and homomorphism, we expose data parallelism in a respective DAG to overlap computation with communication. The proposed concepts are implemented and evaluated with a subset of collective primitives in the Message Passing Interface (MPI), an established communication standard in scientific computing. An experimental analysis with communication-bound microbenchmarks shows considerable performance benefits for the evaluated collective primitives. A detailed case study with a large-scale distributed sort algorithm demonstrates, how partial aggregation significantly improves performance in data-intensive scenarios. Besides better latency hiding capabilities with collective communication primitives, our approach enables further optimizations of their implementations within MPI libraries.
The vast amount of asynchronous programming models, which are actively studied in the HPC community, benefit from partial aggregation in collective communication patterns. Future work can utilize partial aggregation to improve the interaction of MPI collectives with acclerator architectures, and to design more efficient communication algorithms
Recommended from our members
Anonymity in Bitcoin and Bitmessage
This report describes two projects created by the author which are based on ideas which originate from the Bitcoin community. The first, bmd, is a re-implementation of the Bitmessage protocol in go. Bitmessage is an anonymous and secure messaging system invented by Jonathan Warren, who was inspired by the design of Bitcoin's p2p network. [WARR1] The second is Shufflepuff, an implementation of a protocol called CoinShuffle[RUFF1] which allows several people to construct a Bitcoin transaction with an input and an output for each participant without any participant knowing who owns which output. CoinShuffle was invented by Tim Ruffing et al, and it is an upgrade of a protocol called CoinJoin, invented by Gregory Maxwell. This paper discusses the background, properties, applications, and design of bmd and Shufflepuff. There is also a report of a performance analysis on bmd.Electrical and Computer Engineerin
- …