128 research outputs found

    PASS-JOIN: A Partition-based Method for Similarity Joins

    Full text link
    As an essential operation in data cleaning, the similarity join has attracted considerable attention from the database community. In this paper, we study string similarity joins with edit-distance constraints, which find similar string pairs from two large sets of strings whose edit distance is within a given threshold. Existing algorithms are efficient either for short strings or for long strings, and there is no algorithm that can efficiently and adaptively support both short strings and long strings. To address this problem, we propose a partition-based method called Pass-Join. Pass-Join partitions a string into a set of segments and creates inverted indices for the segments. Then for each string, Pass-Join selects some of its substrings and uses the selected substrings to find candidate pairs using the inverted indices. We devise efficient techniques to select the substrings and prove that our method can minimize the number of selected substrings. We develop novel pruning techniques to efficiently verify the candidate pairs. Experimental results show that our algorithms are efficient for both short strings and long strings, and outperform state-of-the-art methods on real datasets.Comment: VLDB201

    MassJoin: A mapreduce-based method for scalable string similarity joins

    Full text link
    Abstract—String similarity join is an essential operation in data integration. The era of big data calls for scalable algorithms to support large-scale string similarity joins. In this paper, we study scalable string similarity joins using MapReduce. We propose a MapReduce-based framework, called MASSJOIN, which supports both set-based similarity functions and character-based similarity functions. We extend the existing partition-based signature scheme to support set-based similarity functions. We utilize the signatures to generate key-value pairs. To reduce the transmission cost, we merge key-value pairs to significantly reduce the number of key-value pairs, from cubic to linear com-plexity, while not sacrificing the pruning power. To improve the performance, we incorporate “light-weight ” filter units into the key-value pairs which can be utilized to prune large number of dissimilar pairs without significantly increasing the transmission cost. Experimental results on real-world datasets show that our method significantly outperformed state-of-the-art approaches. I

    Pedestrian–bus route and pickup location planning for emergency evacuation

    Get PDF
    Planning for a bus-based regional evacuation is essential for emergency preparedness, especially for hurricane or flood prone urban environments with large numbers of transit-dependent or transit-captive populations. This paper develops an optimization-based decision-support model for pedestrian–bus evacuation planning under bus fleet, pedestrian and bus routes, and network constraints. Aiming to minimize the evacuation duration time, an optimization model is proposed to determine the optimal pickup nodes for evacuees to assemble using existing pedestrian routes, and to allocate available bus fleet via bus routes and urban road network to transport the assembled evacuees between the pickup nodes and designated public shelters. The numerical examples with two scenarios based on the Sioux Falls street network from North Dakota (United States) demonstrates that this model can be used to optimize the evacuation duration time, the location of pickup nodes and bus assignment simultaneously. First published online 13 October 202

    The Protective Antibodies Induced by a Novel Epitope of Human TNF-α Could Suppress the Development of Collagen-Induced Arthritis

    Get PDF
    Tumor necrosis factor alpha (TNF-α) is a major inflammatory mediator that exhibits actions leading to tissue destruction and hampering recovery from damage. At present, two antibodies against human TNF-α (hTNF-α) are available, which are widely used for the clinic treatment of certain inflammatory diseases. This work was undertaken to identify a novel functional epitope of hTNF-α. We performed screening peptide library against anti-hTNF-α antibodies, ELISA and competitive ELISA to obtain the epitope of hTNF-α. The key residues of the epitope were identified by means of combinatorial alanine scanning and site-specific mutagenesis. The N terminus (80–91 aa) of hTNF-α proved to be a novel epitope (YG1). The two amino acids of YG1, proline and valine, were identified as the key residues, which were important for hTNF-α biological function. Furthermore, the function of the epitope was addressed on an animal model of collagen-induced arthritis (CIA). CIA could be suppressed in an animal model by prevaccination with the derivative peptides of YG1. The antibodies of YG1 could also inhibit the cytotoxicity of hTNF-α. These results demonstrate that YG1 is a novel epitope associated with the biological function of hTNF-α and the antibodies against YG1 can inhibit the development of CIA in animal model, so it would be a potential target of new therapeutic antibodies
    • …
    corecore