326 research outputs found

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Data Tiling for Sparse Computation

    Get PDF
    Many real-world data contain internal relationships. Efficient analysis of these relationship data is crucial for important problems including genome alignment, network vulnerability analysis, ranking web pages, among others. Such relationship data is frequently sparse and analysis on it is called sparse computation. We demonstrate that the important technique of data tiling is more powerful than previously known by broadening its application space. We focus on three important sparse computation areas: graph analysis, linear algebra, and bioinformatics. We demonstrate data tiling's power by addressing key issues and providing significant improvements---to both runtime and solution quality---in each area. For graph analysis, we focus on fast data tiling techniques that can produce well-structured tiles and demonstrate theoretical hardness results. These tiles are suitable for graph problems as they reduce data movement and ultimately improve end-to-end runtime performance. For linear algebra, we introduce a new cache-aware tiling technique and apply it to the key kernel of sparse matrix by sparse matrix multiplication. This technique tiles the second input matrix and then uses a small, summary matrix to guide access to the tiles during computation. Our approach results in the fastest known implementation across three distinct CPU architectures. In bioinformatics, we develop a tiling based de novo genome assembly pipeline. We start with reads and develop either a graph or hypergraph that captures internal relationships between reads. This is then tiled to minimize connections while maintaining balance. We then treat each resulting tile independently as the input to an existing, shared-memory assembler. Our pipeline improves existing state-of-the-art de novo genome assemblers and brings both runtime and quality improvements to them on both real-world and simulated datasets.Ph.D

    Optimal Domain-Partitioning Algorithm for Real-Life Transportation Networks and Finite Element Meshes

    Get PDF
    For large-scale engineering problems, it has been generally accepted that domain-partitioning algorithms are highly desirable for general-purpose finite element analysis (FEA). This paper presents a heuristic numerical algorithm that can efficiently partition any transportation network (or any finite element mesh) into a specified number of subdomains (usually depending on the number of parallel processors available on a computer), which will result in “minimising the total number of system BOUNDARY nodes” (as a primary criterion) and achieve “balancing work loads” amongst the subdomains (as a secondary criterion). The proposed seven-step heuristic algorithm (with enhancement features) is based on engineering common sense and observation. This current work has the following novelty features: (i) complicated graph theories that are NOT needed and (ii) unified treatments of transportation networks (using line elements) and finite element (FE) meshes (using triangular, tetrahedral, and brick elements) that can be performed through transforming the original network (or FE mesh) into a pseudo-transportation network which only uses line elements. Several examples, including real-life transportation networks and finite element meshes (using triangular/brick/tetrahedral elements) are used (under MATLAB computer environments) to explain, validate and compare the proposed algorithm’s performance with the popular METIS software

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    Towards Performance Portable Graph Algorithms

    Get PDF
    In today's data-driven world, our computational resources have become heterogeneous, making the processing of large-scale graphs in an architecture agnostic manner crucial. Traditionally, hand-optimized high-performance computing (HPC) solutions have been studied and used to implement highly efficient and scalable graph algorithms. In recent years, several graph processing and management systems have also been proposed. Hand optimized HPC approaches require high levels of expertise and graph processing frameworks suffer from expressibility and performance. Portability is a major concern for both approaches. The main thesis of this work is that block-based graph algorithms offer a compromise between efficient parallelism and architecture agnostic algorithm design for a wide class of graph problems. This dissertation seeks to prove this thesis by focusing the work on the three pillars; data/computation partitioning, block-based algorithm design, and performance portability. In this dissertation, we first show how we can partition the computation and the data to design efficient block-based algorithms for solving graph merging and triangle counting problems. Then, generalizing from our experiences, we propose an algorithmic framework, for shared-memory, heterogeneous machines for implementing block-based graph algorithms; PGAbB. PGAbB aims to maximally leverage different architectures by implementing a task-based execution on top of a block-based programming model. In this talk we will discuss PGAbB's programming model, algorithmic optimizations for scheduling, and load-balancing strategies for graph problems on real-world and synthetic inputs.Ph.D

    TD-NUCA: runtime driven management of NUCA caches in task dataflow programming models

    Get PDF
    In high performance processors, the design of on-chip memory hierarchies is crucial for performance and energy efficiency. Current processors rely on large shared Non-Uniform Cache Architectures (NUCA) to improve performance and reduce data movement. Multiple solutions exploit information available at the microarchitecture level or in the operating system to optimize NUCA performance. However, existing methods have not taken advantage of the information captured by task dataflow programming models to guide the management of NUCA caches. In this paper we propose TD-NUCA, a hardware/software co-designed approach that leverages information present in the runtime system of task dataflow programming models to efficiently manage NUCA caches. TD-NUCA identifies the data access and reuse patterns of parallel applications in the runtime system and guides the operation of the NUCA caches in the hardware. As a result, TD-NUCA achieves a 1.18x average speedup over the baseline S-NUCA while requiring only 0.62x the data movement.This work has been supported by the Spanish Ministry of Science and Technology (contract PID2019-107255GB-C21) and the Generalitat de Catalunya (contract 2017-SGR-1414). M. Casas has been partially supported by the Grant RYC- 2017-23269 funded by MCIN/AEI/10.13039/501100011033 and ESF ‘Investing in your future’. M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship No. RYC-2016-21104.Peer ReviewedPostprint (published version

    Load Balancing Algorithms for Parallel Spatial Join on HPC Platforms

    Get PDF
    Geospatial datasets are growing in volume, complexity, and heterogeneity. For efficient execution of geospatial computations and analytics on large scale datasets, parallel processing is necessary. To exploit fine-grained parallel processing on large scale compute clusters, partitioning of skewed datasets in a load-balanced way is challenging. The workload in spatial join is data dependent and highly irregular. Moreover, wide variation in the size and density of geometries from one region of the map to another, further exacerbates the load imbalance. This dissertation focuses on spatial join operation used in Geographic Information Systems (GIS) and spatial databases, where the inputs are two layers of geospatial data, and the output is a combination of the two layers according to join predicate.This dissertation introduces a novel spatial data partitioning algorithm geared towards load balancing the parallel spatial join processing. Unlike existing partitioning techniques, the proposed partitioning algorithm divides the spatial join workload instead of partitioning the individual datasets separately to provide better load-balancing. This workload partitioning algorithm has been evaluated on a high-performance computing system using real-world datasets. An intermediate output-sensitive duplication avoidance technique is proposed that decreases the external memory space requirement for storing spatial join candidates across the partitions. GPU acceleration is used to further reduce the spatial partitioning runtime. For dynamic load balancing in spatial join, a novel framework for fine-grained work stealing is presented. This framework is efficient and NUMA-aware. Performance improvements are demonstrated on shared and distributed memory architectures using threads and message passing. Experimental results show effective mitigation of data skew. The framework supports a variety of spatial join predicates and spatial overlay using partitioned and un-partitioned datasets

    Network embedding and its applications

    Get PDF
    Apart from the attached attributes of entities, the relationships among entities are also an important perspective that reveals the topological structure of entities in a complex system. A network (or graph) with nodes representing entities and links indicating relationships, has been widely used in sociology, biology, chemistry, medicine, the Internet, etc. However, traditional machine learning and data mining algorithms, designed for the entities with attributes (i.e., data points in a vector space), cannot effectively and/or efficiently utilize the topological information of a network formed by relationships among entities. To fill this gap, Network Embedding (NE) is proposed to embed a network into a low dimensional vector space while preserving some topologies and/or properties, so that the resulting embeddings can facilitate various downstream machine learning and data mining tasks. Although there have been many successful NE methods, most of them are designed for embedding static plain networks. In fact, real-world networks often come with one or more additional properties such as node attributes and dynamic changes. The central research question of this thesis is "where and how can we apply NE for more realistic scenarios?". To this end, we propose three novel NE methods, each of which is for addressing the new challenges resulting from one type of more realistic networks. Besides, we also discuss the applications of NE with the focus to the drug-target interaction prediction problem. To be more specific, first, we investigate how to embed the attributed network, which can better describe a real-world complex system by including node attributes to a network. Previous Attributed Network Embedding (ANE) methods cannot effectively embed attributed networks especially when networks become sparse, and/or are not scalable to large-scale networks. To deal with these challenges, we propose a scalable ANE method to effectively and robustly embed attributed networks with different sparsities. Second, we study how to embed the dynamic network, which is often the case in real-world scenarios as real-world complex systems often evolve over time. Most previous Dynamic Network Embedding (DNE) methods try to capture the topological changes at or around the most affected nodes and accordingly update node embeddings. Unfortunately, this kind of approximation, although can improve efficiency, cannot effectively preserve the global topology of a dynamic network at each timestep, due to not considering the inactive sub-networks that receive accumulated topological changes propagated via the high-order proximity. To tackle this challenge, we propose a DNE method for better global topology preservation. Third, comparing to static networks, dynamic networks have a unique character called the degree of changes, which can be used to describe a kind of dynamic character of an input dynamic network about its rate of streaming edges between consecutive snapshots. The degree of changes could be very different for different dynamic networks. However, it remains unknown if existing DNE methods can robustly obtain good effectiveness to different degrees of changes, in particular for corresponding dynamic networks generated from the same dataset by different slicing settings. To answer this open question, we test several state-of-the-art DNE methods, and then further propose a DNE method that can more robustly obtain good effectiveness to the dynamic networks with different degree of changes. Fourth, regarding a specific application of NE to a real-world problem, we propose a NE based Drug-Target Interaction (DTI) prediction method by additionally utilizing the two implicit networks which are extracted from a given DTI network but are ignored in previous DTI prediction methods. A case study indicates that the proposed method can predict novel DTIs

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    Cellular Automata

    Get PDF
    Modelling and simulation are disciplines of major importance for science and engineering. There is no science without models, and simulation has nowadays become a very useful tool, sometimes unavoidable, for development of both science and engineering. The main attractive feature of cellular automata is that, in spite of their conceptual simplicity which allows an easiness of implementation for computer simulation, as a detailed and complete mathematical analysis in principle, they are able to exhibit a wide variety of amazingly complex behaviour. This feature of cellular automata has attracted the researchers' attention from a wide variety of divergent fields of the exact disciplines of science and engineering, but also of the social sciences, and sometimes beyond. The collective complex behaviour of numerous systems, which emerge from the interaction of a multitude of simple individuals, is being conveniently modelled and simulated with cellular automata for very different purposes. In this book, a number of innovative applications of cellular automata models in the fields of Quantum Computing, Materials Science, Cryptography and Coding, and Robotics and Image Processing are presented
    • 

    corecore