104 research outputs found

    Heap-based Algorithms to Accelerate Fingerprint Matching on Parallel Platforms

    Get PDF
    Nowadays, fingerprint is the most used biometric trait for individuals identification. In this area, the state-of-the-art algorithms are very accurate, but when the database contains millions of identities, an acceleration of the algorithm is required. From these algorithms, Minutia Cylinder-Code (MCC) stands out for its good results in terms of accuracy, however its efficiency in computational time is not high. In this work, we propose to use two different parallel platforms to accelerate fingerprint matching process by using MCC: (1) a multi-core server, and (2) a Xeon Phi coprocessor. Our proposal is based on heaps as auxiliary structure to process the global similarity of MCC. As heap-based algorithms are exhaustive (all the elements are accessed), we also explored the use an indexing algorithm to avoid comparing the query against all the fingerprints of the database. Experimental results show an improvement up to 97.15x of speed-up, which is competitive compared to other state-of-the-art algorithms in GPU and FPGA. To the best of our knowledge, this is the first work for fingerprint identification using a Xeon Phi coprocessor.Instituto de Investigación en Informátic

    Heap-based Algorithms to Accelerate Fingerprint Matching on Parallel Platforms

    Get PDF
    Nowadays, fingerprint is the most used biometric trait for individuals identification. In this area, the state-of-the-art algorithms are very accurate, but when the database contains millions of identities, an acceleration of the algorithm is required. From these algorithms, Minutia Cylinder-Code (MCC) stands out for its good results in terms of accuracy, however its efficiency in computational time is not high. In this work, we propose to use two different parallel platforms to accelerate fingerprint matching process by using MCC: (1) a multi-core server, and (2) a Xeon Phi coprocessor. Our proposal is based on heaps as auxiliary structure to process the global similarity of MCC. As heap-based algorithms are exhaustive (all the elements are accessed), we also explored the use an indexing algorithm to avoid comparing the query against all the fingerprints of the database. Experimental results show an improvement up to 97.15x of speed-up, which is competitive compared to other state-of-the-art algorithms in GPU and FPGA. To the best of our knowledge, this is the first work for fingerprint identification using a Xeon Phi coprocessor.Instituto de Investigación en Informátic

    Co-design Hardware and Algorithm for Vector Search

    Full text link
    Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0×\times and 37.2×\times speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5×\times and 7.6×\times speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.Comment: 11 page

    Exploiting multiple levels of parallelism of Convergent Cross Mapping

    Get PDF
    Identifying causal relationships between variables remains an essential problem across various scientific fields. Such identification is particularly important but challenging in complex systems, such as those involving human behaviour, sociotechnical contexts, and natural ecosystems. By exploiting state space reconstruction via lagged embeddings of time series, convergent cross mapping (CCM) serves as an important method for addressing this problem. While powerful, CCM is computationally costly; moreover, CCM results are highly sensitive to several parameter values. Current best practice involves performing a systematic search on a range of parameters, but results in high computational burden, which mainly raises barriers to practical use. In light of both such challenges and the growing size of commonly encountered datasets from complex systems, inferring the causality with confidence using CCM in a reasonable time becomes a biggest challenge. In this thesis, I investigate the performance associated with a variety of parallel techniques (CUDA, Thrust, OpenMP, MPI and Spark, etc.,) to accelerate convergent cross mapping. The performance of each method was collected and compared across multiple experiments to further evaluate potential bottlenecks. Moreover, the work deployed and tested combinations of these techniques to more thoroughly exploit available computation resources. The results obtained from these experiments indicate that GPUs can only accelerate the CCM algorithm under certain circumstances and requirements. Otherwise, the overhead of data transfer and communication can become the limiting bottleneck. On the other hand, in cluster computing, the MPI/OpenMP framework outperforms the Spark framework by more than one order of magnitude in terms of processing speed and provides more consistent performance for distributed computing. This also reflects the large size of the output from the CCM algorithm. However, Spark shows better cluster infrastructure management, ease of software engineering, and more ready handling of other aspects, such as node failure and data replication. Furthermore, combinations of GPU and cluster frameworks are deployed and compared in GPU/CPU clusters. An apparent speedup can be achieved in the Spark framework, while extra time cost is incurred in the MPI/OpenMP framework. The underlying reason reflects the fact that the code complexity imposed by GPU utilization cannot be readily offset in the MPI/OpenMP framework. Overall, the experimental results on parallelized solutions have demonstrated a capacity for over an order of magnitude performance improvement when compared with the widely used current library rEDM. Such economies in computation time can speed learning and robust identification of causal drivers in complex systems. I conclude that these parallel techniques can achieve significant improvements. However, the performance gain varies among different techniques or frameworks. Although the use of GPUs can accelerate the application, there still exists constraints required to be taken into consideration, especially with regards to the input data scale. Without proper usage, GPUs use can even slow down the whole execution time. Convergent cross mapping can achieve a maximum speedup by adopting the MPI/OpenMP framework, as it is suitable to computation-intensive algorithms. By contrast, the Spark framework with integrated GPU accelerators still offers low execution cost comparing to the pure Spark version, which mainly fits in data-intensive problems

    Heap-based Algorithms to Accelerate Fingerprint Matching on Parallel Platforms

    Get PDF
    Nowadays, fingerprint is the most used biometric trait for individuals identification. In this area, the state-of-the-art algorithms are very accurate, but when the database contains millions of identities, an acceleration of the algorithm is required. From these algorithms, Minutia Cylinder-Code (MCC) stands out for its good results in terms of accuracy, however its efficiency in computational time is not high. In this work, we propose to use two different parallel platforms to accelerate fingerprint matching process by using MCC: (1) a multi-core server, and (2) a Xeon Phi coprocessor. Our proposal is based on heaps as auxiliary structure to process the global similarity of MCC. As heap-based algorithms are exhaustive (all the elements are accessed), we also explored the use an indexing algorithm to avoid comparing the query against all the fingerprints of the database. Experimental results show an improvement up to 97.15x of speed-up, which is competitive compared to other state-of-the-art algorithms in GPU and FPGA. To the best of our knowledge, this is the first work for fingerprint identification using a Xeon Phi coprocessor.Instituto de Investigación en Informátic
    • …
    corecore