119 research outputs found

    Throughput constrained parallelism reduction in cyclo-static dataflow applications

    Get PDF
    International audienceThis paper deals with semantics-preserving parallelism reduction methods for cyclo-static dataflow applications. Parallelism reduction is the process of equivalent actors fusioning. The principal objectives of parallelism reduction are to decrease the memory footprint of an application and to increase its execution performance. We focus on parallelism reduction methodologies constrained by application throughput. A generic parallelism reduction methodology is introduced. Experimental results are provided for asserting the performance of the proposed method

    A linear programming approach to general dataflow process network verification and dimensioning

    Full text link
    In this paper, we present linear programming-based sufficient conditions, some of them polynomial-time, to establish the liveness and memory boundedness of general dataflow process networks. Furthermore, this approach can be used to obtain safe upper bounds on the size of the channel buffers of such a network.Comment: In Proceedings ICE 2010, arXiv:1010.530

    Generating Code and Memory Buffers to Reorganize Data on Many-core Architectures

    Get PDF
    International audienceThe dataflow programming model has shown to be a relevant approach to efficiently run mas-sively parallel applications over many-core architectures. In this model, some particular builtin agents are in charge of data reorganizations between user agents. Such agents can Split, Join and Duplicate data onto their communication ports. They are widely used in signal processing for example. These system agents, and their associated implementations, are of major impor-tance when it comes to performance, because they can stand on the critical path (think about Amdhal's law). Furthermore, a particular data reorganization can be expressed by the devel-oper in several ways that may lead to inefficient solutions (mostly unneeded data copies and transfers). In this paper, we propose several strategies to manage data reorganization at compile time, with a focus on indexed accesses to shared buffers to avoid data copies. These strategies are complementary: they ensure correctness for each system agent configuration, as well as performance when possible. They have been implemented within the Sigma-C industry-grade compilation toolchain and evaluated over the Kalray MPPA 256-core processor

    Towards real-time hidden speaker recognition by means of fully homomorphic encryption

    Get PDF
    Securing Neural Network (NN) computations through the use of Fully Homomorphic Encryption (FHE) is the subject of a growing interest in both communities. Among different possible approaches to that topic, our work focuses on applying FHE to hide the model of a neural network-based system in the case of a plain input. In this paper, using the TFHE homomorphic encryption scheme, we propose an efficient fully homomorphic method for an argmin computation on an arbitrary number of encrypted inputs and an asymptotically faster - though levelled - equivalent scheme. Using these schemes and a unifying framework for LWE-based homomorphic encryption schemes (Chimera), we implement a very time-wise efficient, homomorphic speaker recognition scheme using the neural-based embedding system VGGVox. This work can be generalized to all other similar Euclidean embedding-based recognition systems. While maintaining the best-of-class classification rate of the VGGVox system, we implement a speaker-recognition system that can classify a speech sample as coming from one of a 100 hidden model speakers in less than one second

    Faster homomorphic encryption is not enough: improved heuristic for multiplicative depth minimization of Boolean circuits

    Get PDF
    In somewhat homomorphic encryption schemes (e.g. B/FV, BGV) the size of ciphertexts and the execution performance of homomorphic operations depends heavily on the multiplicative depth. The multiplicative depth is the maximal number of consecutive multiplications for which an homomorphic encryption scheme was parameterized. In this work we propose an improved multiplicative depth minimization heuristic. In particular, a new circuit rewriting operator is introduced, the so called cone rewrite operator. The results we obtain using the new method are relevant in terms of accuracy and performance. Smaller multiplicative depths for a benchmark of Boolean circuits are obtained when compared to a previous work found in the literature. In average, the multiplicative depth is highly improved and the new heuristic execution time is significantly lower. The proposed rewrite operator and heuristic are not limited to Boolean circuits, but can also be used for arithmetic circuits

    A probabilistic design for practical homomorphic majority voting with intrinsic differential privacy

    Get PDF
    As machine learning (ML) has become pervasive throughout various fields (industry, healthcare, social networks), privacy concerns regarding the data used for its training have gained a critical importance. In settings where several parties wish to collaboratively train a common model without jeopardizing their sensitive data, the need for a private training protocol is particularly stringent and implies to protect the data against both the model's end-users and the other actors of the training phase. In this context of secure collaborative learning, Differential Privacy (DP) and Fully Homomorphic Encryption (FHE) are two complementary countermeasures of growing interest to thwart privacy attacks in ML systems. Central to many collaborative training protocols, in the line of PATE, is majority voting aggregation. Thus, in this paper, we design SHIELD, a probabilistic approximate majority voting operator which is faster when homomorphically executed than existing approaches based on exact argmax computation over an histogram of votes. As an additional benefit, the inaccuracy of SHIELD is used as a feature to provably enable DP guarantees. Although SHIELD may have other applications, we focus here on one setting and seamlessly integrate it in the SPEED collaborative training framework from \cite{grivet2021speed} to improve its computational efficiency. After thoroughly describing the FHE implementation of our algorithm and its DP analysis, we present experimental results. To the best of our knowledge, it is the first work in which relaxing the accuracy of an algorithm is constructively usable as a degree of freedom to achieve better FHE performances

    Stochastic graph partitioning: quadratic versus SOCP formulations

    Get PDF
    International audienceWe consider a variant of the graph partitioning problem involving knapsack constraints with Gaussian random coefficients. In this new variant, under this assumption of probability distribution, the problem can be traditionally formulated as a binary SOCP for which the continuous relaxation is convex. In this paper, we reformulate the problem as a binary quadratic constrained program for which the continuous relaxation is not necessarily convex. We propose several linearization techniques for latter: the classical linearization proposed by Fortet (Trabajos de Estadistica 11(2):111–118, 1960) and the linearization proposed by Sherali and Smith (Optim Lett 1(1):33–47, 2007). In addition to the basic implementation of the latter, we propose an improvement which includes, in the computation, constraints coming from the SOCP formulation. Numerical results show that an improvement of Sherali–Smith’s linearization outperforms largely the binary SOCP program and the classical linearization when investigated in a branch-and-bound approach

    Practical Multi-Key Homomorphic Encryption for More Flexible and Efficient Secure Federated Aggregation (preliminary work)

    Get PDF
    In this work, we introduce a lightweight communication-efficient multi-key approach suitable for the Federated Averaging rule. By combining secret-key RLWE-based HE, additive secret sharing and PRFs, we reduce approximately by a half the communication cost per party when compared to the usual public-key instantiations, while keeping practical homomorphic aggregation performances. Additionally, for LWE-based instantiations, our approach reduces the communication cost per party from quadratic to linear in terms of the lattice dimension

    DFA on LS-Designs with a Practical Implementation on SCREAM (extended version)

    Get PDF
    LS-Designs are a family of SPN-based block ciphers whose linear layer is based on the so-called interleaved construction. They will be dedicated to low-end devices with high performance and low-resource constraints, objects which need to be resistant to physical attacks. In this paper we describe a complete Differential Fault Analysis against LS-Designs and also on other families of SPN-based block ciphers. First we explain how fault attacks can be used against their implementations depending on fault models. Then, we validate the DFA in a practical example on a hardware implementation of SCREAM running on an FPGA. The faults have been injected using electromagnetic pulses during the execution of SCREAM and the faulty ciphertexts have been used to recover the key’s bits. Finally, we discuss some countermeasures that could be used to thwart such attacks

    Towards Better Availability and Accountability for IoT Updates by means of a Blockchain

    Get PDF
    International audienceBuilding the Internet of Things requires deploying a huge number of devices with full or limited connectivity to the Internet. Given that these devices are exposed to attackers and generally not secured-by-design, it is essential to be able to update them, to patch their vulnerabilities and to prevent hackers from enrolling them into botnets. Ideally, the update infrastructure should implement the CIA triad properties, i.e., confidentiality, integrity and availability. In this work, we investigate how the use of a blockchain infrastructure can meet these requirements, with a focus on availability
    corecore