696 research outputs found

    Data-Parallel Hashing Techniques for GPU Architectures

    Full text link
    Hash tables are one of the most fundamental data structures for effectively storing and accessing sparse data, with widespread usage in domains ranging from computer graphics to machine learning. This study surveys the state-of-the-art research on data-parallel hashing techniques for emerging massively-parallel, many-core GPU architectures. Key factors affecting the performance of different hashing schemes are discovered and used to suggest best practices and pinpoint areas for further research

    FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

    Full text link
    We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for \textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms. By leveraging a LSH style randomized indexing procedure and combining it with several principled techniques, such as reservoir sampling, recent advances in one-pass minwise hashing, and count based estimations, we reduce the computational and parallelization costs of similarity search, while retaining sound theoretical guarantees. We evaluate FLASH on several real, high-dimensional datasets from different domains, including text, malicious URL, click-through prediction, social networks, etc. Our experiments shed new light on the difficulties associated with datasets having several million dimensions. Current state-of-the-art implementations either fail on the presented scale or are orders of magnitude slower than FLASH. FLASH is capable of computing an approximate k-NN graph, from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than 10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam dataset, using brute-force (n2Dn^2D), will require at least 20 teraflops. We provide CPU and GPU implementations of FLASH for replicability of our results

    DACHash: A Dynamic, Cache-Aware and Concurrent Hash Table on GPUs

    Get PDF
    GPU acceleration of hash tables in high-volume transaction applications such as computational geometry and bio-informatics are emerging. Recently, several hash table designs have been proposed on GPUs, but our analysis shows that they still do not adequately factor in several important aspects of a GPU’s execution environment, leaving large room forfurther optimization

    Efficient similarity computations on parallel machines using data shaping

    Get PDF
    Similarity computation is a fundamental operation in all forms of data. Big Data is, typically, characterized by attributes such as volume, velocity, variety, veracity, etc. In general, Big Data variety appears as structured, semi-structured or unstructured forms. The volume of Big Data in general, and semi-structured data in particular, is increasing at a phenomenal rate. Big Data phenomenon is posing new set of challenges to similarity computation problems occurring in semi-structured data. Technology and processor architecture trends suggest very strongly that future processors shall have ten\u27s of thousands of cores (hardware threads). Another crucial trend is that ratio between on-chip and off-chip memory to core counts is decreasing. State-of-the-art parallel computing platforms such as General Purpose Graphics Processors (GPUs) and MICs are promising for high performance as well high throughput computing. However, processing semi-structured component of Big Data efficiently using parallel computing systems (e.g. GPUs) is challenging. Reason being most of the emerging platforms (e.g. GPUs) are organized as Single Instruction Multiple Thread/Data machines which are highly structured, where several cores (streaming processors) operate in lock-step manner, or they require a high degree of task-level parallelism. We argue that effective and efficient solutions to key similarity computation problems need to operate in a synergistic manner with the underlying computing hardware. Moreover, semi-structured form input data needs to be shaped or reorganized with the goal to exploit the enormous computing power of \textit{state-of-the-art} highly threaded architectures such as GPUs. For example, shaping input data (via encoding) with minimal data-dependence can facilitate flexible and concurrent computations on high throughput accelerators/co-processors such as GPU, MIC, etc. We consider various instances of traditional and futuristic problems occurring in intersection of semi-structured data and data analytics. Preprocessing is an operation common at initial stages of data processing pipelines. Typically, the preprocessing involves operations such as data extraction, data selection, etc. In context of semi-structured data, twig filtering is used in identifying (and extracting) data of interest. Duplicate detection and record linkage operations are useful in preprocessing tasks such as data cleaning, data fusion, and also useful in data mining, etc., in order to find similar tree objects. Likewise, tree edit is a fundamental metric used in context of tree problems; and similarity computation between trees another key problem in context of Big Data. This dissertation makes a case for platform-centric data shaping as a potent mechanism to tackle the data- and architecture-borne issues in context of semi-structured data processing on GPU and GPU-like parallel architecture machines. In this dissertation, we propose several data shaping techniques for tree matching problems occurring in semi-structured data. We experiment with real world datasets. The experimental results obtained reveal that the proposed platform-centric data shaping approach is effective for computing similarities between tree objects using GPGPUs. The techniques proposed result in performance gains up to three orders of magnitude, subject to problem and platform

    A Survey of Parallel Sequential Pattern Mining

    Full text link
    With the growing popularity of shared resources, large volumes of complex data of different types are collected automatically. Traditional data mining algorithms generally have problems and challenges including huge memory cost, low processing speed, and inadequate hard disk space. As a fundamental task of data mining, sequential pattern mining (SPM) is used in a wide variety of real-life applications. However, it is more complex and challenging than other pattern mining tasks, i.e., frequent itemset mining and association rule mining, and also suffers from the above challenges when handling the large-scale data. To solve these problems, mining sequential patterns in a parallel or distributed computing environment has emerged as an important issue with many applications. In this paper, an in-depth survey of the current status of parallel sequential pattern mining (PSPM) is investigated and provided, including detailed categorization of traditional serial SPM approaches, and state of the art parallel SPM. We review the related work of parallel sequential pattern mining in detail, including partition-based algorithms for PSPM, Apriori-based PSPM, pattern growth based PSPM, and hybrid algorithms for PSPM, and provide deep description (i.e., characteristics, advantages, disadvantages and summarization) of these parallel approaches of PSPM. Some advanced topics for PSPM, including parallel quantitative / weighted / utility sequential pattern mining, PSPM from uncertain data and stream data, hardware acceleration for PSPM, are further reviewed in details. Besides, we review and provide some well-known open-source software of PSPM. Finally, we summarize some challenges and opportunities of PSPM in the big data era.Comment: Accepted by ACM Trans. on Knowl. Discov. Data, 33 page

    Advanced Techniques for Improving the Efficacy of Digital Forensics Investigations

    Get PDF
    Digital forensics is the science concerned with discovering, preserving, and analyzing evidence on digital devices. The intent is to be able to determine what events have taken place, when they occurred, who performed them, and how they were performed. In order for an investigation to be effective, it must exhibit several characteristics. The results produced must be reliable, or else the theory of events based on the results will be flawed. The investigation must be comprehensive, meaning that it must analyze all targets which may contain evidence of forensic interest. Since any investigation must be performed within the constraints of available time, storage, manpower, and computation, investigative techniques must be efficient. Finally, an investigation must provide a coherent view of the events under question using the evidence gathered. Unfortunately the set of currently available tools and techniques used in digital forensic investigations does a poor job of supporting these characteristics. Many tools used contain bugs which generate inaccurate results; there are many types of devices and data for which no analysis techniques exist; most existing tools are woefully inefficient, failing to take advantage of modern hardware; and the task of aggregating data into a coherent picture of events is largely left to the investigator to perform manually. To remedy this situation, we developed a set of techniques to facilitate more effective investigations. To improve reliability, we developed the Forensic Discovery Auditing Module, a mechanism for auditing and enforcing controls on accesses to evidence. To improve comprehensiveness, we developed ramparser, a tool for deep parsing of Linux RAM images, which provides previously inaccessible data on the live state of a machine. To improve efficiency, we developed a set of performance optimizations, and applied them to the Scalpel file carver, creating order of magnitude improvements to processing speed and storage requirements. Last, to facilitate more coherent investigations, we developed the Forensic Automated Coherence Engine, which generates a high-level view of a system from the data generated by low-level forensics tools. Together, these techniques significantly improve the effectiveness of digital forensic investigations conducted using them

    Hardware-Aware Algorithm Designs for Efficient Parallel and Distributed Processing

    Get PDF
    The introduction and widespread adoption of the Internet of Things, together with emerging new industrial applications, bring new requirements in data processing. Specifically, the need for timely processing of data that arrives at high rates creates a challenge for the traditional cloud computing paradigm, where data collected at various sources is sent to the cloud for processing. As an approach to this challenge, processing algorithms and infrastructure are distributed from the cloud to multiple tiers of computing, closer to the sources of data. This creates a wide range of devices for algorithms to be deployed on and software designs to adapt to.In this thesis, we investigate how hardware-aware algorithm designs on a variety of platforms lead to algorithm implementations that efficiently utilize the underlying resources. We design, implement and evaluate new techniques for representative applications that involve the whole spectrum of devices, from resource-constrained sensors in the field, to highly parallel servers. At each tier of processing capability, we identify key architectural features that are relevant for applications and propose designs that make use of these features to achieve high-rate, timely and energy-efficient processing.In the first part of the thesis, we focus on high-end servers and utilize two main approaches to achieve high throughput processing: vectorization and thread parallelism. We employ vectorization for the case of pattern matching algorithms used in security applications. We show that re-thinking the design of algorithms to better utilize the resources available in the platforms they are deployed on, such as vector processing units, can bring significant speedups in processing throughout. We then show how thread-aware data distribution and proper inter-thread synchronization allow scalability, especially for the problem of high-rate network traffic monitoring. We design a parallelization scheme for sketch-based algorithms that summarize traffic information, which allows them to handle incoming data at high rates and be able to answer queries on that data efficiently, without overheads.In the second part of the thesis, we target the intermediate tier of computing devices and focus on the typical examples of hardware that is found there. We show how single-board computers with embedded accelerators can be used to handle the computationally heavy part of applications and showcase it specifically for pattern matching for security-related processing. We further identify key hardware features that affect the performance of pattern matching algorithms on such devices, present a co-evaluation framework to compare algorithms, and design a new algorithm that efficiently utilizes the hardware features.In the last part of the thesis, we shift the focus to the low-power, resource-constrained tier of processing devices. We target wireless sensor networks and study distributed data processing algorithms where the processing happens on the same devices that generate the data. Specifically, we focus on a continuous monitoring algorithm (geometric monitoring) that aims to minimize communication between nodes. By deploying that algorithm in action, under realistic environments, we demonstrate that the interplay between the network protocol and the application plays an important role in this layer of devices. Based on that observation, we co-design a continuous monitoring application with a modern network stack and augment it further with an in-network aggregation technique. In this way, we show that awareness of the underlying network stack is important to realize the full potential of the continuous monitoring algorithm.The techniques and solutions presented in this thesis contribute to better utilization of hardware characteristics, across a wide spectrum of platforms. We employ these techniques on problems that are representative examples of current and upcoming applications and contribute with an outlook of emerging possibilities that can build on the results of the thesis

    The Secure Machine: Efficient Secure Execution On Untrusted Platforms

    Full text link
    In this work we present the Secure Machine, SeM for short, a CPU architecture extension for secure computing. SeM uses a small amount of in-chip additional hardware that monitors key communication channels inside the CPU chip, and only acts when required. SeM provides confidentiality and integrity for a secure program without trusting the platform software or any off-chip hardware. SeM supports existing binaries of single- and multi-threaded applications running on single- or multi-core, multi-CPU. The performance reduction caused by it is only few percent, most of which is due to the memory encryption layer that is commonly used in many secure architectures. We also developed SeM-Prepare, a software tool that automatically instruments existing applications (binaries) with additional instructions so they can be securely executed on our architecture without requiring any programming efforts or the availability of the desired program`s source code. To enable secure data sharing in shared memory environments, we developed Secure Distributed Shared Memory (SDSM), an efficient (time and memory) algorithm for allowing thousands of compute nodes to share data securely while running on an untrusted computing environment. SDSM shows a negligible reduction in performance, and it requires negligible and hardware resources. We developed Distributed Memory Integrity Trees, a method for enhancing single node integrity trees for preserving the integrity of a distributed application running on an untrusted computing environment. We show that our method is applicable to existing single node integrity trees such as Merkle Tree, Bonsai Merkle Tree, and Intel`s SGX memory integrity engine. All these building blocks may be used together to form a practical secure system, and some can be used in conjunction with other secure systems.Comment: A PhD thesis, to appear at the Technion's librar

    Particle Modeling of Fuel Plate Melting during Coolant Flow Blockage in HFIR

    Get PDF
    Cooling channel inlet flow blockage has damaged fuel in plate fueled reactors and contributes significantly to the probability of fuel damage based on Probabilistic Risk Assessment. A Smoothed Particle Hydrodynamics (SPH) model for fuel melt from inlet flow blockage for the High Flux Isotope Reactor is created. The model is coded for high throughput graphics processing unit (GPU) calculations. This modeling approach allows movement toward quantification of the uncertainty in fuel coolant flow blockage consequence assessment. The SPH modeling approach is convenient for following movement of fuel and coolant during melt progression and provides a tool for capturing the interactions of fuel melting into the coolant. The development of this new model is presented. The implementation of the model for GPU simulation is described. The model is compared against analytical solutions. Modeling of a scaled fuel melt progression is simulated for different conditions showing the sensitivities of melting fuel to conditions in the coolant channel

    Parallel and Distributed Collaborative Filtering: A Survey

    Full text link
    Collaborative filtering is amongst the most preferred techniques when implementing recommender systems. Recently, great interest has turned towards parallel and distributed implementations of collaborative filtering algorithms. This work is a survey of the parallel and distributed collaborative filtering implementations, aiming not only to provide a comprehensive presentation of the field's development, but also to offer future research orientation by highlighting the issues that need to be further developed.Comment: 46 page
    • …
    corecore