56 research outputs found

    Weak RSA Key Discovery on GPGPU

    Get PDF
    We address one of the weaknesses of the RSA ciphering systems \textit{i.e.} the existence of the private keys that are relatively easy to compromise by the attacker. The problem can be mitigated by the Internet services providers, but it requires some computational effort. We propose the proof of concept of the GPGPU-accelerated system that can help detect and eliminate users' weak keys.We have proposed the algorithms and developed the GPU-optimised program code that is now publicly available and substantially outperforms the tested CPU processor. The source code of the OpenSSL library was adapted for GPGPU, and the resulting code can perform both on the GPU and CPU processors. Additionally, we present the solution how to map a triangular grid into the GPU rectangular grid \textendash{} the basic dilemma in many problems that concern pair-wise analysis for the set of elements. Also, the comparison of two data caching methods on GPGPU leads to the interesting general conclusions. We present the results of the experiments of the performance analysis of the selected algorithms for the various RSA key length, configurations of GPU grid, and size of the tested key set

    The Algorithms for FPGA Implementation of Sparse Matrices Multiplication

    Get PDF
    In comparison to dense matrices multiplication, sparse matrices multiplication real performance for CPU is roughly 5--100 times lower when expressed in GFLOPs. For sparse matrices, microprocessors spend most of the time on comparing matrices indices rather than performing floating-point multiply and add operations. For 16-bit integer operations, like indices comparisons, computational power of the FPGA significantly surpasses that of CPU. Consequently, this paper presents a novel theoretical study how matrices sparsity factor influences the indices comparison to floating-point operation workload ratio. As a result, a novel FPGAs architecture for sparse matrix-matrix multiplication is presented for which indices comparison and floating-point operations are separated. We also verified our idea in practice, and the initial implementations results are very promising. To further decrease hardware resources required by the floating-point multiplier, a reduced width multiplication is proposed in the case when IEEE-754 standard compliance is not required

    Analysis of the Basic Implementation Aspects of Hardware-Accelerated Density Functional Theory Calculations

    Get PDF
    This paper presents a Field Programmable Gate Array (FPGA) implementation of a calculation module for exponential part of Gaussian Type Orbital (GTO). The module is composed of several specially crafted floating-point modules which are fully pipelined and optimized for high performance. The hardware implementation revealed significant speed-up for the finite sum of the exponential products calculation ranging from 2.5x to 20x in comparison to a general-purpose Central Processing Unit (CPU) version. Calculating values of GTOs is one of computationally critical parts of the Kohn-Sham algorithm. The approach proposed in the paper aims to increase the performance of a part of the quantum chemistry computational system by employing FPGA-based accelerator. Several issues are addressed, such as identification of code fragments which benefit most from hardware acceleration, porting a part of the Kohn-Sham algorithm to FPGA, data precision adjustment and data transfer overhead. The authors' intention was also to make hardware implementation of calculating the orbital function universal and easily attachable to different quantum-chemistry software packages

    Evaluation and Implementation of n-Gram-Based Algorithm for Fast Text Comparison

    Get PDF
    This paper presents a study of an n-gram-based document comparison method. The method is intended to build a large-scale plagiarism detection system. The work focuses not only on an efficiency of the text similarity extraction but also on the execution performance of the implemented algorithms. We took notice of detection performance, storage requirements and execution time of the proposed approach. The obtained results show the trade-offs between detection quality and computational requirements. The GPGPU and multi-CPU platforms were considered to implement the algorithms and to achieve good execution speed. The method consists of two main algorithms: a document's feature extraction and fast text comparison. The winnowing algorithm is used to generate a compressed representation of the analyzed documents. The authors designed and implemented a dedicated test framework for the algorithm. That allowed for the tuning, evaluation, and optimization of the parameters. Well-known metrics (e.g. precision, recall) were used to evaluate detection performance. The authors conducted the tests to determine the performance of the winnowing algorithm for obfuscated and unobfuscated texts for a different window and n-gram size. Also, a simplified version of the text comparison algorithm was proposed and evaluated to reduce the computational complexity of the text comparison process. The paper also presents GPGPU and multi-CPU implementations of the algorithms for different data structures. The implementation speed was tested for different algorithms' parameters and the size of data. The scalability of the algorithm on multi-CPU platforms was verified. The authors of the paper provide the repository of software tools and programs used to perform the conducted experiments.he appropriate fast document comparison system. Its performance is given in the paper

    USING STANDARD HARDWARE ACCELERATORS TO DECREASE COMPUTATION TIMES IN SCIENTIFIC APPLICATIONS

    Get PDF
    Nowadays, general-purpose processors are being used in scientific computing. However, whenhigh computational throughput is needed, it’s worth to think it over if dedicated hardwaresolutions would be more efficient, either in terms of performance (or performance to price ratio),or in terms of power efficiency, or both. This paper describes them briefly and comparesto contemporary general-purpose processors’ architecture

    Experiment on Methods for Clustering and Categorization of Polish Text

    Get PDF
    The main goal of this work was to experimentally verify the methods for a challenging task of categorization and clustering Polish text. Supervised and unsupervised learning was employed respectively for the categorization and clustering. A profound examination of the employed methods was done for the custom-built corpus of Polish texts. The corpus was assembled by the authors from Internet resources. The corpus data was acquired from the news portal and, therefore, it was sorted by type by journalists according to their specialization. The presented algorithms employ Vector Space Model (VSM) and TF-IDF (Term Frequency-Inverse Document Frequency) weighing scheme. Series of experiments were conducted that revealed certain properties of algorithms and their accuracy. The accuracy of algorithms was elaborated regarding their ability to match human arrangement of the documents by the topic. For both the categorization and clustering, the authors used F-measure to assess the quality of allocation

    COMPUTATION ACCELERATION ON SGI RASC: FPGA BASED RECONFIGURABLE COMPUTING HARDWARE

    Get PDF
    In this paper a novel method of computation using FPGA technology is presented. In severalcases this method provides a calculations speedup with respect to the General PurposeProcessors (GPP). The main concept of this approach is based on such a design of computinghardware architecture to fit algorithm dataflow and best utilize well known computingtechniques as pipelining and parallelism. Configurable hardware is used as a implementationplatform for custom designed hardware. Paper will present implementation results ofalgorithms those are used in such areas as cryptography, data analysis and scientific computation.The other promising areas of new technology utilization will also be mentioned,bioinformatics for instance. Mentioned algorithms were designed, tested and implemented onSGI RASC platform. RASC module is a part of Cyfronet’s SGI Altix 4700 SMP system. Wewill also present RASC modern architecture. In principle it consists of FPGA chips and veryfast, 128-bit wide local memory. Design tools avaliable for designers will also be presented

    Using simulation to calibrate real data acquisition in veterinary medicine

    Full text link
    This paper explores the innovative use of simulation environments to enhance data acquisition and diagnostics in veterinary medicine, focusing specifically on gait analysis in dogs. The study harnesses the power of Blender and the Blenderproc library to generate synthetic datasets that reflect diverse anatomical, environmental, and behavioral conditions. The generated data, represented in graph form and standardized for optimal analysis, is utilized to train machine learning algorithms for identifying normal and abnormal gaits. Two distinct datasets with varying degrees of camera angle granularity are created to further investigate the influence of camera perspective on model accuracy. Preliminary results suggest that this simulation-based approach holds promise for advancing veterinary diagnostics by enabling more precise data acquisition and more effective machine learning models. By integrating synthetic and real-world patient data, the study lays a robust foundation for improving overall effectiveness and efficiency in veterinary medicine

    The Enhancement of a Computer System for Sorting Capabilities Using FPGA Custom Architecture

    Get PDF
    The primary goal of the presented experiment was to judge the usefulness of FPGA technology in the sorting operation performed by computer systems. We were interested to see if it was possible to achieve better system performance and lower energy consumption when the CPU is supported by FPGA chips. The method of custom processing was applied. We proposed dedicated sorting hardware to increase performance and save energy. Our concept addresses High Throughput Computing (HTC) systems. The custom hardware approach was proposed because this technique is available in supercomputing infrastructures today. Another important issue of the work is that the hardware was programmed using High Level Language (HLL). As a semiconductor platform for hardware implementation the FPGA was chosen. We evaluated the efficiency of an FPGA based sorting processor that was programmed in Mitrion-C HLL. The FPGA approach was compared to the CPU approach in terms of efficiency and power consumption

    Perspektywy przyśpieszenia obliczeń w instalacjach o wielkich mocach obliczeniowych za pomocą układów logiki rekonfigurowalnej

    No full text
    Tyt. z nagłówka.Bibliografia s. 462.Dostępny również w formie drukowanej.ABSTRACT: The authors presents already known but not frequently used technique of computation acceleration by circuits of reconfigurable logic with special focus on possibility of its mass usage in multi-processor and multi-threads systems which offer huge computation power. Basic principles and techniques accommodated by reconfigurable computing paradigm are presented and discussion over prospect of this promising technique common usage is preformed taking into account current state of commercially available FPGA reconfigurable logic. STRESZCZENIE: Autorzy prezentują znaną, lecz nie w pełni dotychczas wykorzystaną technikę przyśpieszania obliczeń za pomocą układów logiki rekonfigurowanej w kontekście współczesnych możliwości masowego jej wykorzystania w wieloprocesorowych i wielowątkowych systemach o wielkich mocach obliczeniowych. W skrócie zaprezentowane są podstawowe pojęcia i techniki stosowane w tej technologii oraz przeprowadzone są rozważania nad możliwością szerokiego zastosowania powyższej techniki przy teraźniejszym stanie rozwoju półprzewodnikowych układów rekonfigurowalnych FPGA. SŁOWA KLUCZOWE: układy logiki rekonfigurowalnej, przetwarzanie sygnałów, systemy hybrydowe, RC (Reconfigurable Computing)
    corecore