10 research outputs found

    Bridging a Gap Between Research and Production: Contributions to Scheduling and Simulation

    Get PDF
    Large scale distributed computing infrastructures (e.g., data centers, grids, or clouds) are used by scientists from various domains to produce outstanding research results, such as the discovery of the Higgs Boson in High Energy Physics. These infrastructures are also studied by Computer Scientists to produce their own set of scientific results. Ideally, a virtuous circle should exist between Domain and Computer Scientists: the former raising challenges that could be addressed by the latter. Unfortunately, in many occasions, a gap exists that prevents such an ideal and fostering collaboration. This habilitation covers research works conducted in the fields of scheduling and simulation that contribute to the filling of this gap. It discusses the necessary conditions to achieve this goal and details concrete initiatives in this endeavor

    Secure Strassen-Winograd Matrix Multiplication with MapReduce

    No full text
    International audienceMatrix multiplication is a mathematical brick for solving many real life problems. We consider the Strassen-Winograd algorithm (SW), one of the most efficient matrix multiplication algorithms. Our first contribution is to redesign SW with the MapReduce programming model that allows to process big data sets in parallel on a cluster. Moreover, our main contribution is to address the inherent security and privacy concerns that occur when outsourcing data to a public cloud. We propose a secure approach of SW with MapReduce called S2M3, for Secure Strassen-Winograd Matrix Multiplication with MapReduce. We prove the security of our protocol in a standard security model and provide a proof-of-concept empirical evaluation suggesting its efficiency

    Influence Analysis towards Big Social Data

    Get PDF
    Large scale social data from online social networks, instant messaging applications, and wearable devices have seen an exponential growth in a number of users and activities recently. The rapid proliferation of social data provides rich information and infinite possibilities for us to understand and analyze the complex inherent mechanism which governs the evolution of the new technology age. Influence, as a natural product of information diffusion (or propagation), which represents the change in an individual’s thoughts, attitudes, and behaviors resulting from interaction with others, is one of the fundamental processes in social worlds. Therefore, influence analysis occupies a very prominent place in social related data analysis, theory, model, and algorithms. In this dissertation, we study the influence analysis under the scenario of big social data. Firstly, we investigate the uncertainty of influence relationship among the social network. A novel sampling scheme is proposed which enables the development of an efficient algorithm to measure uncertainty. Considering the practicality of neighborhood relationship in real social data, a framework is introduced to transform the uncertain networks into deterministic weight networks where the weight on edges can be measured as Jaccard-like index. Secondly, focusing on the dynamic of social data, a practical framework is proposed by only probing partial communities to explore the real changes of a social network data. Our probing framework minimizes the possible difference between the observed topology and the actual network through several representative communities. We also propose an algorithm that takes full advantage of our divide-and-conquer strategy which reduces the computational overhead. Thirdly, if let the number of users who are influenced be the depth of propagation and the area covered by influenced users be the breadth, most of the research results are only focused on the influence depth instead of the influence breadth. Timeliness, acceptance ratio, and breadth are three important factors that significantly affect the result of influence maximization in reality, but they are neglected by researchers in most of time. To fill the gap, a novel algorithm that incorporates time delay for timeliness, opportunistic selection for acceptance ratio, and broad diffusion for influence breadth has been investigated. In our model, the breadth of influence is measured by the number of covered communities, and the tradeoff between depth and breadth of influence could be balanced by a specific parameter. Furthermore, the problem of privacy preserved influence maximization in both physical location network and online social network was addressed. We merge both the sensed location information collected from cyber-physical world and relationship information gathered from online social network into a unified framework with a comprehensive model. Then we propose the resolution for influence maximization problem with an efficient algorithm. At the same time, a privacy-preserving mechanism are proposed to protect the cyber physical location and link information from the application aspect. Last but not least, to address the challenge of large-scale data, we take the lead in designing an efficient influence maximization framework based on two new models which incorporate the dynamism of networks with consideration of time constraint during the influence spreading process in practice. All proposed problems and models of influence analysis have been empirically studied and verified by different, large-scale, real-world social data in this dissertation

    Secure and Efficient Matrix Multiplication with MapReduce

    No full text
    International audienceMapReduce is one of the most popular distributed programming paradigms that allows processing big data sets in parallel on a cluster. MapReduce users often outsource data and computations to a public cloud, which yields inherent security concerns. In this paper, we consider the problem of matrix multiplication and one of the most efficient matrix multiplication algorithms: the Strassen-Winograd (SW) algorithm. Our first contribution is a distributed MapReduce algorithm based on SW. Then, we tackle the security concerns that occur when outsourcing matrix multiplication computation to a honest-but-curious cloud i.e., that executes tasks dutifully, but tries to learn as much information as possible. Our main contribution is a secure distributed MapReduce algorithm called S2M3 (Secure Strassen-Winograd Matrix Multiplication with MapReduce) that enjoys security guarantees such as: none of the cloud nodes can learn the input or the output data. We formally prove the security properties of S2M3 and we present an empirical evaluation devoted to show its efficiency

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    35th Symposium on Theoretical Aspects of Computer Science: STACS 2018, February 28-March 3, 2018, Caen, France

    Get PDF

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Protocoles distribués et sécurisés pour le paradigme MapReduce : Comment avoir des applications dans les nuages respectueuses de la vie privée ?

    No full text
    In the age of social networks and connected objects, many and diverse data are produced at every moment. The analysis of these data has led to a new science called "Big Data". To best handle this constant flow of data, new calculation methods have emerged.This thesis focuses on cryptography applied to processing of large volumes of data, with the aim of protection of user data. In particular, we focus on securing algorithms using the distributed computing MapReduce paradigm to perform a number of primitives (or algorithms) essential for data processing, ranging from the calculation of graph metrics (e.g. PageRank) to SQL queries (i.e. set intersection, aggregation, natural join).In the first part of this thesis, we discuss the multiplication of matrices. We first describe a standard and secure matrix multiplication for the MapReduce architecture that is based on the Paillier’s additive encryption scheme to guarantee the confidentiality of the data. The proposed algorithms correspond to a specific security hypothesis: collusion or not of MapReduce cluster nodes, the general security model being honest-but-curious. The aim is to protect the confidentiality of both matrices, as well as the final result, and this for all participants (matrix owners, calculation nodes, user wishing to compute the result). On the other hand, we also use the matrix multiplication algorithm of Strassen-Winograd, whose asymptotic complexity is O(n^log2(7)) or about O(n^2.81) which is an improvement compared to the standard matrix multiplication. A new version of this algorithm adapted to the MapReduce paradigm is proposed. The safety assumption adopted here is limited to the non-collusion between the cloud and the end user. The version uses the Paillier’s encryption scheme.The second part of this thesis focuses on data protection when relational algebra operations are delegated to a public cloud server using the MapReduce paradigm. In particular, we present a secureintersection solution that allows a cloud user to obtain the intersection of n > 1 relations belonging to n data owners. In this solution, all data owners share a key and a selected data owner sharesa key with each of the remaining keys. Therefore, while this specific data owner stores n keys, the other owners only store two keys. The encryption of the real relation tuple consists in combining the use of asymmetric encryption with a pseudo-random function. Once the data is stored in the cloud, each reducer is assigned a specific relation. If there are n different elements, XOR operations are performed. The proposed solution is very effective. Next, we describe the variants of grouping and aggregation operations that preserve confidentiality in terms of performance and security. The proposed solutions combine the use of pseudo-random functions with the use of homomorphic encryption for COUNT, SUM and AVG operations and order preserving encryption for MIN and MAX operations. Finally, we offer secure versions of two protocols (cascade and hypercube) adapted to the MapReduce paradigm. The solutions consist in using pseudo-random functions to perform equality checks and thus allow joining operations when common components are detected. All the solutions described above are evaluated and their security proven.À l’heure des réseaux sociaux et des objets connectés, de nombreuses et diverses données sont produites à chaque instant. L’analyse de ces données a donné lieu à une nouvelle science nommée "Big Data". Pour traiter du mieux possible ce flux incessant de données, de nouvelles méthodes de calcul ont vu le jour. Les travaux de cette thèse portent sur la cryptographie appliquée au traitement de grands volumes de données, avec comme finalité la protection des données des utilisateurs. En particulier, nous nous intéressons à la sécurisation d’algorithmes utilisant le paradigme de calcul distribué MapReduce pour réaliser un certain nombre de primitives (ou algorithmes) indispensables aux opérations de traitement de données, allant du calcul de métriques de graphes (e.g. PageRank) aux requêtes SQL (i.e. intersection d’ensembles, agrégation, jointure naturelle). Nous traitons dans la première partie de cette thèse de la multiplication de matrices. Nous décrivons d’abord une multiplication matricielle standard et sécurisée pour l’architecture MapReduce qui est basée sur l’utilisation du chiffrement additif de Paillier pour garantir la confidentialité des données. Les algorithmes proposés correspondent à une hypothèse spécifique de sécurité : collusion ou non des nœuds du cluster MapReduce, le modèle général de sécurité étant honnête mais curieux. L’objectif est de protéger la confidentialité de l’une et l’autre matrice, ainsi que le résultat final, et ce pour tous les participants (propriétaires des matrices, nœuds de calcul, utilisateur souhaitant calculer le résultat). D’autre part, nous exploitons également l’algorithme de multiplication de matrices de Strassen-Winograd, dont la complexité asymptotique est O(n^log2(7)) soit environ O(n^2.81) ce qui est une amélioration par rapport à la multiplication matricielle standard. Une nouvelle version de cet algorithme adaptée au paradigme MapReduce est proposée. L’hypothèse de sécurité adoptée ici est limitée à la non-collusion entre le cloud et l’utilisateur final. La version sécurisée utilise comme pour la multiplication standard l’algorithme de chiffrement Paillier. La seconde partie de cette thèse porte sur la protection des données lorsque des opérations d’algèbre relationnelle sont déléguées à un serveur public de cloud qui implémente à nouveau le paradigme MapReduce. En particulier, nous présentons une solution d’intersection sécurisée qui permet à un utilisateur du cloud d’obtenir l’intersection de n > 1 relations appartenant à n propriétaires de données. Dans cette solution, tous les propriétaires de données partagent une clé et un propriétaire de données sélectionné partage une clé avec chacune des clés restantes. Par conséquent, alors que ce propriétaire de données spécifique stocke n clés, les autres propriétaires n’en stockent que deux. Le chiffrement du tuple de relation réelle consiste à combiner l’utilisation d’un chiffrement asymétrique avec une fonction pseudo-aléatoire. Une fois que les données sont stockées dans le cloud, chaque réducteur (Reducer) se voit attribuer une relation particulière. S’il existe n éléments différents, des opérations XOR sont effectuées. La solution proposée reste donc très efficace. Par la suite, nous décrivons les variantes des opérations de regroupement et d’agrégation préservant la confidentialité en termes de performance et de sécurité. Les solutions proposées associent l’utilisation de fonctions pseudo-aléatoires à celle du chiffrement homomorphe pour les opérations COUNT, SUM et AVG et à un chiffrement préservant l’ordre pour les opérations MIN et MAX. Enfin, nous proposons les versions sécurisées de deux protocoles de jointure (cascade et hypercube) adaptées au paradigme MapReduce. Les solutions consistent à utiliser des fonctions pseudo-aléatoires pour effectuer des contrôles d’égalité et ainsi permettre les opérations de jointure lorsque des composants communs sont détectés.(...
    corecore