109 research outputs found

    FPGA Acceleration of Pre-Alignment Filters for Short Read Mapping With HLS

    Get PDF
    Pre-alignment filters are useful for reducing the computational requirements of genomic sequence mappers. Most of them are based on estimating or computing the edit distance between sequences and their candidate locations in a reference genome using a subset of the dynamic programming table used to compute Levenshtein distance. Some of their FPGA implementations of use classic HDL toolchains, thus limiting their portability. Currently, most FPGA accelerators offered by heterogeneous cloud providers support C/C++ HLS. In this work, we implement and optimize several state-of-the-art pre-alignment filters using C/C++ based-HLS to expand their portability to a wide range of systems supporting the OpenCL runtime. Moreover, we perform a complete analysis of the performance and accuracy of the filters and analyze the implications of the results. The maximum throughput obtained by an exact filter is 95.1 MPairs/s including memory transfers using 100 bp sequences, which is the highest ever reported for a comparable system and more than two times faster than previous HDL-based results. The best energy efficiency obtained from the accelerator (not considering host CPU) is 2.1 MPairs/J, more than one order of magnitude higher than other accelerator-based comparable approaches from the state of the art.10.13039/501100008530-European Union Regional Development Fund (ERDF) within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of the total cost eligible under the Designing RISC-V based Accelerators for next generation computers project (DRAC) (Grant Number: [001-P-001723]) 10.13039/501100002809-Catalan Government (Grant Number: 2017-SGR-313 and 2017-SGR-1624) 10.13039/501100004837-Spanish Ministry of Science, Innovation and Universities (Grant Number: PID2020-113614RB-C21 and RTI2018-095209-B-C22)Peer ReviewedPostprint (published version

    Wavefront Longest Common Subsequence Algorithm On Multicore And Gpgpu Platform.

    Get PDF
    String comparison is a central operation in numerous applications. It has a critical task in many operations such as data mining, spelling error correction and molecular biology (Tan et al, 2007; Michailidis and Margaritis, 2000)

    Generate fuzzy string-matching to build self attention on Indonesian medical-chatbot

    Get PDF
    Chatbot is a form of interactive conversation that requires quick and precise answers. The process of identifying answers to users’ questions involves string matching and handling incorrect spelling. Therefore, a system that can independently predict and correct letters is highly necessary. The approach used to address this issue is to enhance the fuzzy string-matching method by incorporating several features for self-attention. The combination of fuzzy string-matching methods employed includes Jaro Winkler distance + Levenshtein Damerau distance and Damerau Levenshtein + Rabin Carp. The reason for using this combination is their ability not only to match strings but also to correct word typing errors. This research contributes by developing a self-attention mechanism through a modified fuzzy string-matching model with enhanced word feature structures. The goal is to utilize this self-attention mechanism in constructing the Indonesian medical bidirectional encoder representations from transformers (IM-BERT). This will serve as a foundation for additional features to provide accurate answers in the Indonesian medical question and answer system, achieving an exact match of 85.7% and an F1-score of 87.6%

    Exploiting Record Similarity for Practical Vertical Federated Learning

    Full text link
    As the privacy of machine learning has drawn increasing attention, federated learning is introduced to enable collaborative learning without revealing raw data. Notably, \textit{vertical federated learning} (VFL), where parties share the same set of samples but only hold partial features, has a wide range of real-world applications. However, existing studies in VFL rarely study the ``record linkage'' process. They either design algorithms assuming the data from different parties have been linked or use simple linkage methods like exact-linkage or top1-linkage. These approaches are unsuitable for many applications, such as the GPS location and noisy titles requiring fuzzy matching. In this paper, we design a novel similarity-based VFL framework, FedSim, which is suitable for more real-world applications and achieves higher performance on traditional VFL tasks. Moreover, we theoretically analyze the privacy risk caused by sharing similarities. Our experiments on three synthetic datasets and five real-world datasets with various similarity metrics show that FedSim consistently outperforms other state-of-the-art baselines

    Bytewise Approximate Matching: The Good, The Bad, and The Unknown

    Get PDF
    Hash functions are established and well-known in digital forensics, where they are commonly used for proving integrity and file identification (i.e., hash all files on a seized device and compare the fingerprints against a reference database). However, with respect to the latter operation, an active adversary can easily overcome this approach because traditional hashes are designed to be sensitive to altering an input; output will significantly change if a single bit is flipped. Therefore, researchers developed approximate matching, which is a rather new, less prominent area but was conceived as a more robust counterpart to traditional hashing. Since the conception of approximate matching, the community has constructed numerous algorithms, extensions, and additional applications for this technology, and are still working on novel concepts to improve the status quo. In this survey article, we conduct a high-level review of the existing literature from a non-technical perspective and summarize the existing body of knowledge in approximate matching, with special focus on bytewise algorithms. Our contribution allows researchers and practitioners to receive an overview of the state of the art of approximate matching so that they may understand the capabilities and challenges of the field. Simply, we present the terminology, use cases, classification, requirements, testing methods, algorithms, applications, and a list of primary and secondary literature
    corecore