13 research outputs found

    A Systematic Survey of General Sparse Matrix-Matrix Multiplication

    Full text link
    SpGEMM (General Sparse Matrix-Matrix Multiplication) has attracted much attention from researchers in fields of multigrid methods and graph analysis. Many optimization techniques have been developed for certain application fields and computing architecture over the decades. The objective of this paper is to provide a structured and comprehensive overview of the research on SpGEMM. Existing optimization techniques have been grouped into different categories based on their target problems and architectures. Covered topics include SpGEMM applications, size prediction of result matrix, matrix partitioning and load balancing, result accumulating, and target architecture-oriented optimization. The rationales of different algorithms in each category are analyzed, and a wide range of SpGEMM algorithms are summarized. This survey sufficiently reveals the latest progress and research status of SpGEMM optimization from 1977 to 2019. More specifically, an experimentally comparative study of existing implementations on CPU and GPU is presented. Based on our findings, we highlight future research directions and how future studies can leverage our findings to encourage better design and implementation.Comment: 19 pages, 11 figures, 2 tables, 4 algorithm

    Adaptive Hybrid Storage Format for Sparse Matrix–Vector Multiplication on Multi-Core SIMD CPUs

    Get PDF
    Optimizing sparse matrix–vector multiplication (SpMV) is challenging due to the non-uniform distribution of the non-zero elements of the sparse matrix. The best-performing SpMV format changes depending on the input matrix and the underlying architecture, and there is no “one-size-fit-for-all” format. A hybrid scheme combining multiple SpMV storage formats allows one to choose an appropriate format to use for the target matrix and hardware. However, existing hybrid approaches are inadequate for utilizing the SIMD cores of modern multi-core CPUs with SIMDs, and it remains unclear how to best mix different SpMV formats for a given matrix. This paper presents a new hybrid storage format for sparse matrices, specifically targeting multi-core CPUs with SIMDs. Our approach partitions the target sparse matrix into two segmentations based on the regularities of the memory access pattern, where each segmentation is stored in a format suitable for its memory access patterns. Unlike prior hybrid storage schemes that rely on the user to determine the data partition among storage formats, we employ machine learning to build a predictive model to automatically determine the partition threshold on a per matrix basis. Our predictive model is first trained off line, and the trained model can be applied to any new, unseen sparse matrix. We apply our approach to 956 matrices and evaluate its performance on three distinct multi-core CPU platforms: a 72-core Intel Knights Landing (KNL) CPU, a 128-core AMD EPYC CPU, and a 64-core Phytium ARMv8 CPU. Experimental results show that our hybrid scheme, combined with the predictive model, outperforms the best-performing alternative by 2.9%, 17.5% and 16% on average on KNL, AMD, and Phytium, respectively

    構文構造に基づくニューラル機械翻訳

    Get PDF
    学位の種別: 課程博士審査委員会委員 : (主査)東京大学准教授 鶴岡 慶雅, 東京大学教授 峯松 信明, 東京大学准教授 山崎 俊彦, 東京大学講師 齋藤 大輔, 東京大学准教授 吉永 直樹University of Tokyo(東京大学

    Algorithmic skeletons for exact combinatorial search at scale

    Get PDF
    Exact combinatorial search is essential to a wide range of application areas including constraint optimisation, graph matching, and computer algebra. Solutions to combinatorial problems are found by systematically exploring a search space, either to enumerate solutions, determine if a specific solution exists, or to find an optimal solution. Combinatorial searches are computationally hard both in theory and practice, and efficiently exploring the huge number of combinations is a real challenge, often addressed using approximate search algorithms. Alternatively, exact search can be parallelised to reduce execution time. However, parallel search is challenging due to both highly irregular search trees and sensitivity to search order, leading to anomalies that can cause unexpected speedups and slowdowns. As core counts continue to grow, parallel search becomes increasingly useful for improving the performance of existing searches, and allowing larger instances to be solved. A high-level approach to parallel search allows non-expert users to benefit from increasing core counts. Algorithmic Skeletons provide reusable implementations of common parallelism patterns that are parameterised with user code which determines the specific computation, e.g. a particular search. We define a set of skeletons for exact search, requiring the user to provide in the minimal case a single class that specifies how the search tree is generated and a parameter that specifies the type of search required. The five are: Sequential search; three general-purpose parallel search methods: Depth-Bounded, Stack-Stealing, and Budget; and a specific parallel search method, Ordered, that guarantees replicable performance. We implement and evaluate the skeletons in a new C++ parallel search framework, YewPar. YewPar provides both high-level skeletons and low-level search specific schedulers and utilities to deal with the irregularity of search and knowledge exchange between workers. YewPar is based on the HPX library for distributed task-parallelism potentially allowing search to execute on multi-cores, clusters, cloud, and high performance computing systems. Underpinning the skeleton design is a novel formal model, MT^3 , a parallel operational semantics that describes multi-threaded tree traversals, allowing reasoning about parallel search, e.g. describing common parallel search phenomena such as performance anomalies. YewPar is evaluated using seven different search applications (and over 25 specific instances): Maximum Clique, k-Clique, Subgraph Isomorphism, Travelling Salesperson, Binary Knapsack, Enumerating Numerical Semigroups, and the Unbalanced Tree Search Benchmark. The search instances are evaluated at multiple scales from 1 to 255 workers, on a 17 host, 272 core Beowulf cluster. The overheads of the skeletons are low, with a mean 6.1% slowdown compared to hand-coded sequential implementation. Crucially, for all search applications YewPar reduces search times by an order of magnitude, i.e hours/minutes to minutes/seconds, and we commonly see greater than 60% (average) parallel efficiency speedups for up to 255 workers. Comparing skeleton performance reveals that no one skeleton is best for all searches, highlighting a benefit of a skeleton approach that allows multiple parallelisations to be explored with minimal refactoring. The Ordered skeleton avoids slowdown anomalies where, due to search knowledge being order dependent, a parallel search takes longer than a sequential search. Analysis of Ordered shows that, while being 41% slower on average (73% worse-case) than Depth-Bounded, in nearly all cases it maintains the following replicable performance properties: 1) parallel executions are no slower than one worker sequential executions 2) runtimes do not increase as workers are added, and 3) variance between repeated runs is low. In particular, where Ordered maintains a relative standard deviation (RSD) of less than 15%, Depth-Bounded suffers from an RSD greater than 50%, showing the importance of carefully controlling search orders for repeatability

    The Prom Problem: Fair and Privacy-Enhanced Matchmaking with Identity Linked Wishes

    Get PDF
    In the Prom Problem (TPP), Alice wishes to attend a school dance with Bob and needs a risk-free, privacy preserving way to find out whether Bob shares that same wish. If not, no one should know that she inquired about it, not even Bob. TPP represents a special class of matchmaking challenges, augmenting the properties of privacy-enhanced matchmaking, further requiring fairness and support for identity linked wishes (ILW) – wishes involving specific identities that are only valid if all involved parties have those same wishes. The Horne-Nair (HN) protocol was proposed as a solution to TPP along with a sample pseudo-code embodiment leveraging an untrusted matchmaker. Neither identities nor pseudo-identities are included in any messages or stored in the matchmaker’s database. Privacy relevant data stay within user control. A security analysis and proof-of-concept implementation validated the approach, fairness was quantified, and a feasibility analysis demonstrated practicality in real-world networks and systems, thereby bounding risk prior to incurring the full costs of development. The SecretMatch™ Prom app leverages one embodiment of the patented HN protocol to achieve privacy-enhanced and fair matchmaking with ILW. The endeavor led to practical lessons learned and recommendations for privacy engineering in an era of rapidly evolving privacy legislation. Next steps include design of SecretMatch™ apps for contexts like voting negotiations in legislative bodies and executive recruiting. The roadmap toward a quantum resistant SecretMatch™ began with design of a Hybrid Post-Quantum Horne-Nair (HPQHN) protocol. Future directions include enhancements to HPQHN, a fully Post Quantum HN protocol, and more

    XXI Workshop de Investigadores en Ciencias de la Computación - WICC 2019: libro de actas

    Get PDF
    Trabajos presentados en el XXI Workshop de Investigadores en Ciencias de la Computación (WICC), celebrado en la provincia de San Juan los días 25 y 26 de abril 2019, organizado por la Red de Universidades con Carreras en Informática (RedUNCI) y la Facultad de Ciencias Exactas, Físicas y Naturales de la Universidad Nacional de San Juan.Red de Universidades con Carreras en Informátic

    Análisis y resolución de los problemas asociados al diseño de sistemas de IOT

    Get PDF
    Al momento de diseñar un sistema de IoT, sin importar si se parte desde un sistema existente que trabaja de forma offline, o si se desea crear un sistema desde sus inicios, se presentarán los siguientes desafíos: En primer lugar, los sistemas de IoT pueden estar conformados por una amplia variedad de dispositivos, cada uno utilizando diferentes protocolos de comunicación y medios físicos para el establecimiento de la misma. Además, los dispositivos podrían encontrarse en ubicaciones geográficas muy distantes, en las que estén regidos por diferentes sistemas legales, y en las cuales la estructura de costos asociada a la conectividad entre los mismos sea muy diferente. Por otra parte, la selección del hardware asociado a cada dispositivo puede variar dependiendo de los riesgos asociados a la actividad en la que se los involucre; de los costos asociados a la adquisición, instalación y mantenimiento en la región geográfica donde se los despliegue; de los protocolos de comunicación que se deseen utilizar; del nivel de calidad deseada en el desempeño de cada dispositivo; y de otros factores técnicos o comerciales. La selección de las tecnologías de Software a utilizar en cada dispositivo podría depender de factores similares a aquellos mencionados en la selección del hardware. Además de estudiar las necesidades particulares de cada dispositivo, debe analizarse la arquitectura general del sistema de IoT. Esta arquitectura debe contemplar las diferentes formas de conectar a los dispositivos entre sí; las jerarquías de dispositivos; los servidores Web involucrados; los proveedores de servicios que serán contratados; los medios de almacenamiento, procesamiento y publicación de la información; las personas involucradas y los demás componentes internos o externos que interactúan en el sistema. Todas las consideraciones mencionadas previamente deben realizarse dentro de un marco de trabajo que garantice la privacidad y seguridad de la información tratada. Es por ello que en algunas regiones geográficas se han establecido diferentes legislaciones asociadas al tema, las cuales deben ser consideradas desde el comienzo del diseño del sistema de IoT. No obstante, si las reglas establecidas en las legislaciones no fueran lo suficientemente claras o completas (o incluso, inexistentes), pueden tomarse como fundamentos los estándares internacionales sobre privacidad y seguridad de los datos, en hardware y software. En este artículo, se presenta una línea de investigación que aborda el Análisis y Resolución de los Problemas Asociados al Diseño de Sistemas de IoT.Red de Universidades con Carreras en Informátic

    XXI Workshop de Investigadores en Ciencias de la Computación - WICC 2019: libro de actas

    Get PDF
    Trabajos presentados en el XXI Workshop de Investigadores en Ciencias de la Computación (WICC), celebrado en la provincia de San Juan los días 25 y 26 de abril 2019, organizado por la Red de Universidades con Carreras en Informática (RedUNCI) y la Facultad de Ciencias Exactas, Físicas y Naturales de la Universidad Nacional de San Juan.Red de Universidades con Carreras en Informátic

    WICC 2017 : XIX Workshop de Investigadores en Ciencias de la Computación

    Get PDF
    Actas del XIX Workshop de Investigadores en Ciencias de la Computación (WICC 2017), realizado en el Instituto Tecnológico de Buenos Aires (ITBA), el 27 y 28 de abril de 2017.Red de Universidades con Carreras en Informática (RedUNCI
    corecore