159 research outputs found

    OpenForensics:a digital forensics GPU pattern matching approach for the 21st century

    Get PDF
    Pattern matching is a crucial component employed in many digital forensic (DF) analysis techniques, such as file-carving. The capacity of storage available on modern consumer devices has increased substantially in the past century, making pattern matching approaches of current generation DF tools increasingly ineffective in performing timely analyses on data seized in a DF investigation. As pattern matching is a trivally parallelisable problem, general purpose programming on graphic processing units (GPGPU) is a natural fit for this problem. This paper presents a pattern matching framework - OpenForensics - that demonstrates substantial performance improvements from the use of modern parallelisable algorithms and graphic processing units (GPUs) to search for patterns within forensic images and local storage devices

    Advanced Techniques for Improving the Efficacy of Digital Forensics Investigations

    Get PDF
    Digital forensics is the science concerned with discovering, preserving, and analyzing evidence on digital devices. The intent is to be able to determine what events have taken place, when they occurred, who performed them, and how they were performed. In order for an investigation to be effective, it must exhibit several characteristics. The results produced must be reliable, or else the theory of events based on the results will be flawed. The investigation must be comprehensive, meaning that it must analyze all targets which may contain evidence of forensic interest. Since any investigation must be performed within the constraints of available time, storage, manpower, and computation, investigative techniques must be efficient. Finally, an investigation must provide a coherent view of the events under question using the evidence gathered. Unfortunately the set of currently available tools and techniques used in digital forensic investigations does a poor job of supporting these characteristics. Many tools used contain bugs which generate inaccurate results; there are many types of devices and data for which no analysis techniques exist; most existing tools are woefully inefficient, failing to take advantage of modern hardware; and the task of aggregating data into a coherent picture of events is largely left to the investigator to perform manually. To remedy this situation, we developed a set of techniques to facilitate more effective investigations. To improve reliability, we developed the Forensic Discovery Auditing Module, a mechanism for auditing and enforcing controls on accesses to evidence. To improve comprehensiveness, we developed ramparser, a tool for deep parsing of Linux RAM images, which provides previously inaccessible data on the live state of a machine. To improve efficiency, we developed a set of performance optimizations, and applied them to the Scalpel file carver, creating order of magnitude improvements to processing speed and storage requirements. Last, to facilitate more coherent investigations, we developed the Forensic Automated Coherence Engine, which generates a high-level view of a system from the data generated by low-level forensics tools. Together, these techniques significantly improve the effectiveness of digital forensic investigations conducted using them

    A Survey of Techniques for Improving Security of GPUs

    Full text link
    Graphics processing unit (GPU), although a powerful performance-booster, also has many security vulnerabilities. Due to these, the GPU can act as a safe-haven for stealthy malware and the weakest `link' in the security `chain'. In this paper, we present a survey of techniques for analyzing and improving GPU security. We classify the works on key attributes to highlight their similarities and differences. More than informing users and researchers about GPU security techniques, this survey aims to increase their awareness about GPU security vulnerabilities and potential countermeasures

    VIABILITY OF TIME-MEMORY TRADE-OFFS IN LARGE DATA SETS

    Get PDF
    The main hypothesis of this paper is whether compression performance – both hardware and software – is at, approaching, or will ever reach a point where real-time compression of cached data in large data sets will be viable to improve hit ratios and overall throughput. The problem identified is: storage access is unable to keep up with application and user demands, and cache (RAM) is too small to contain full data sets. A literature review of several existing techniques discusses how storage IO is reduced or optimized to maximize the available performance of the storage medium. However, none of the techniques discovered preclude, or are mutually exclusive with, the hypothesis proposed herein. The methodology includes gauging three popular compressors which meet the criteria for viability: zlib, lz4, and zstd. Common storage devices are also benchmarked to establish costs for both IO and compression operations to help build charts and discover break-even points under various circumstances. The results indicate that modern CISC processors and compressors are already approaching tradeoff viability, and that FPGA and ASIC could potentially reduce all overhead by pipelining compression – nearly eliminating the cost portion of the tradeoff, leaving mostly benefit

    The Role of Distributed Computing in Big Data Science: Case Studies in Forensics and Bioinformatics

    Get PDF
    2014 - 2015The era of Big Data is leading the generation of large amounts of data, which require storage and analysis capabilities that can be only ad- dressed by distributed computing systems. To facilitate large-scale distributed computing, many programming paradigms and frame- works have been proposed, such as MapReduce and Apache Hadoop, which transparently address some issues of distributed systems and hide most of their technical details. Hadoop is currently the most popular and mature framework sup- porting the MapReduce paradigm, and it is widely used to store and process Big Data using a cluster of computers. The solutions such as Hadoop are attractive, since they simplify the transformation of an application from non-parallel to the distributed one by means of general utilities and without many skills. However, without any algorithm engineering activity, some target applications are not alto- gether fast and e cient, and they can su er from several problems and drawbacks when are executed on a distributed system. In fact, a distributed implementation is a necessary but not su cient condition to obtain remarkable performance with respect to a non-parallel coun- terpart. Therefore, it is required to assess how distributed solutions are run on a Hadoop cluster, and/or how their performance can be improved to reduce resources consumption and completion times. In this dissertation, we will show how Hadoop-based implementations can be enhanced by using carefully algorithm engineering activity, tuning, pro ling and code improvements. It is also analyzed how to achieve these goals by working on some critical points, such as: data local computation, input split size, number and granularity of tasks, cluster con guration, input/output representation, etc. i In particular, to address these issues, we choose some case studies coming from two research areas where the amount of data is rapidly increasing, namely, Digital Image Forensics and Bioinformatics. We mainly describe full- edged implementations to show how to design, engineer, improve and evaluate Hadoop-based solutions for Source Camera Identi cation problem, i.e., recognizing the camera used for taking a given digital image, adopting the algorithm by Fridrich et al., and for two of the main problems in Bioinformatics, i.e., alignment- free sequence comparison and extraction of k-mer cumulative or local statistics. The results achieved by our improved implementations show that they are substantially faster than the non-parallel counterparts, and re- markably faster than the corresponding Hadoop-based naive imple- mentations. In some cases, for example, our solution for k-mer statis- tics is approximately 30× faster than our Hadoop-based naive im- plementation, and about 40× faster than an analogous tool build on Hadoop. In addition, our applications are also scalable, i.e., execution times are (approximately) halved by doubling the computing units. Indeed, algorithm engineering activities based on the implementation of smart improvements and supported by careful pro ling and tun- ing may lead to a much better experimental performance avoiding potential problems. We also highlight how the proposed solutions, tips, tricks and insights can be used in other research areas and problems. Although Hadoop simpli es some tasks of the distributed environ- ments, we must thoroughly know it to achieve remarkable perfor- mance. It is not enough to be an expert of the application domain to build Hadop-based implementations, indeed, in order to achieve good performance, an expert of distributed systems, algorithm engi- neering, tuning, pro ling, etc. is also required. Therefore, the best performance depend heavily on the cooperation degree between the domain expert and the distributed algorithm engineer. [edited by Author]XIV n.s

    Facilitating High Performance Code Parallelization

    Get PDF
    With the surge of social media on one hand and the ease of obtaining information due to cheap sensing devices and open source APIs on the other hand, the amount of data that can be processed is as well vastly increasing. In addition, the world of computing has recently been witnessing a growing shift towards massively parallel distributed systems due to the increasing importance of transforming data into knowledge in today’s data-driven world. At the core of data analysis for all sorts of applications lies pattern matching. Therefore, parallelizing pattern matching algorithms should be made efficient in order to cater to this ever-increasing abundance of data. We propose a method that automatically detects a user’s single threaded function call to search for a pattern using Java’s standard regular expression library, and replaces it with our own data parallel implementation using Java bytecode injection. Our approach facilitates parallel processing on different platforms consisting of shared memory systems (using multithreading and NVIDIA GPUs) and distributed systems (using MPI and Hadoop). The major contributions of our implementation consist of reducing the execution time while at the same time being transparent to the user. In addition to that, and in the same spirit of facilitating high performance code parallelization, we present a tool that automatically generates Spark Java code from minimal user-supplied inputs. Spark has emerged as the tool of choice for efficient big data analysis. However, users still have to learn the complicated Spark API in order to write even a simple application. Our tool is easy to use, interactive and offers Spark’s native Java API performance. To the best of our knowledge and until the time of this writing, such a tool has not been yet implemented

    MR-CUDASW - GPU accelerated Smith-Waterman algorithm for medium-length (meta)genomic data

    Get PDF
    The idea of using a graphics processing unit (GPU) for more than simply graphic output purposes has been around for quite some time in scientific communities. However, it is only recently that its benefits for a range of bioinformatics and life sciences compute-intensive tasks has been recognized. This thesis investigates the possibility of improving the performance of the overlap determination stage of an Overlap Layout Consensus (OLC)-based assembler by using a GPU-based implementation of the Smith-Waterman algorithm. In this thesis an existing GPU-accelerated sequence alignment algorithm is adapted and expanded to reduce its completion time. A number of improvements and changes are made to the original software. Workload distribution, query profile construction, and thread scheduling techniques implemented by the original program are replaced by custom methods specifically designed to handle medium-length reads. Accordingly, this algorithm is the first highly parallel solution that has been specifically optimized to process medium-length nucleotide reads (DNA/RNA) from modern sequencing machines (i.e. Ion Torrent). Results show that the software reaches up to 82 GCUPS (Giga Cell Updates Per Second) on a single-GPU graphic card running on a commodity desktop hardware. As a result it is the fastest GPU-based implemen- tation of the Smith-Waterman algorithm tailored for processing medium-length nucleotide reads. Despite being designed for performing the Smith-Waterman algorithm on medium-length nucleotide sequences, this program also presents great potential for improving heterogeneous computing with CUDA-enabled GPUs in general and is expected to make contributions to other research problems that require sensitive pairwise alignment to be applied to a large number of reads. Our results show that it is possible to improve the performance of bioinformatics algorithms by taking full advantage of the compute resources of the underlying commodity hardware and further, these results are especially encouraging since GPU performance grows faster than multi-core CPUs

    High performance digital signal processing: Theory, design, and applications in finance

    Get PDF
    The way scientific research and business is conducted has drastically changed over the last decade. Big data and data-intensive scientific discovery are two terms that have been coined recently. They describe the tremendous amounts of noisy data, created extremely rapidly by various sensing devices and methods that need to be explored for information inference. Researchers and practitioners who can obtain meaningful information out of big data in the shortest time gain a competitive advantage. Hence, there is more need than ever for a variety of high performance computational tools for scientific and business analytics. Interest in developing efficient data processing methods like compression and noise filtering tools enabling real-time analytics of big data is increasing. A common concern in digital signal processing applications has been the lack of fast handling of observed data. This problem has been an active research topic being addressed by the progress in analytical tools allowing fast processing of big data. One particular tool is the Karhunen-Loève transform (KLT) (also known as the principal component analysis) where covariance matrix of a stochastic process is decomposed into its eigenvectors and eigenvalues as the optimal orthonormal transform. Specifically, eigenanalysis is utilized to determine the KLT basis functions. KLT is a widely employed signal analysis method used in applications including noise filtering of measured data and compression. However, defining KLT basis for a given signal covariance matrix demands prohibitive computational resources in many real-world scenarios. In this dissertation, engineering implementation of KLT as well as the theory of eigenanalysis for auto-regressive order one, AR(1), discrete stochastic processes are investigated and novel improvements are proposed. The new findings are applied to well-known problems in quantitative finance (QF). First, an efficient method to derive the explicit KLT kernel for AR(1) processes that utilizes a simple root finding method for the transcendental equations is introduced. Performance improvement over a popular numerical eigenanalysis algorithm, called divide and conquer, is shown. Second, implementation of parallel Jacobi algorithm for eigenanalysis on graphics processing units is improved such that the access to the dynamic random access memory is entirely coalesced. The speed is improved by a factor of 68.5 by the proposed method compared to a CPU implementation for a square matrix of size 1,024. Third, several tools developed and implemented in the dissertation are applied to QF problems such as risk analysis and portfolio risk management. In addition, several topics in QF, such as price models, Epps effect, and jump processes are investigated and new insights are suggested from a multi-resolution (multi-rate) signal processing perspective. It is expected to see this dissertation to make contributions in better understanding and bridging the analytical methods in digital signal processing and applied mathematics, and their wider utilization in the finance sector. The emerging joint research and technology development efforts in QF and financial engineering will benefit the investors, bankers, and regulators to build and maintain more robust and fair financial markets in the future
    corecore