42 research outputs found

    TokenJoin:Efficient Filtering for Set Similarity Join with MaximumWeighted Bipartite Matching

    Get PDF
    Set similarity join is an important problem with many applications in data discovery, cleaning and integration. To increase robustness, fuzzy set similarity join calculates the similarity of two sets based on maximum weighted bipartite matching instead of set overlap. This allows pairs of elements, represented as sets or strings, to also match approximately rather than exactly, e.g., based on Jaccard similarity or edit distance. However, this significantly increases the verification cost, making even more important the need for efficient and effective filtering techniques to reduce the number of candidate pairs. The current state-of-the-art algorithm relies on similarity computations between pairs of elements to filter candidates. In this paper, we propose token-based instead of element-based filtering, showing that it is significantly more lightweight, while offering similar or even better pruning effectiveness. Moreover, we address the top-k variant of the problem, alleviating the need for a userspecified similarity threshold. We also propose early termination to reduce the cost of verification. Our experimental results on six real-world datasets show that our approach always outperforms the state of the art, being an order of magnitude faster on average.</p

    Top-k Dominant Web Services Under Multi-Criteria Matching

    No full text
    As we move from a Web of data to a Web of services, enhancing the capabilities of the current Web search engines with effective and efficient techniques for Web services retrieval and selection becomes an important issue. Traditionally, the relevance of a Web service advertisement to a service request is determined by computing an overall score that aggregates individual matching scores among the various parameters in their descriptions. Two drawbacks characterize such approaches. First, there is no single matching criterion that is optimal for determining the similarity between parameters. Instead, there are numerous approaches ranging from using Information Retrieval similarity metrics up to semantic logicbased inference rules. Second, the reduction of individual scores to an overall similarity leads to significant information loss. Since there is no consensus on how to weight these scores, existing method

    Σακχαρώδης διαβήτης: νευροπαθητικά και ισχαιμικά έλκη κάτω άκρων: παθογένεια - συμβολή στην αντιμετώπιση και εξέλιξη

    No full text
    The aim of the present study was: a) to record the clinical course and the outcome of patients with Diabetic Foot (DF) during a follow-up period of six years; b) to identify risk factors for ipsilateral re-amputation. In the first part of our study, 256 patients (171 men, 85 women; mean age 65.31±10.25 years) with diabetic foot were included. These were divided into 3 groups according to the aetiology of the lesion: Group I: 87 patients with neuropathic ulcers Group II: 34 patients with purely ischaemic ulcers and Group III: 120 patients with neuroischaemic ulcers In all 3 groups, lesion location was recorded, while lesion severity was graded according to the Meggit-Wagner system. Neuropathy was diagnosed by the Neuropathy Disability Score (NDS), and presence of painful neuropathy was assessed by means of Neuropathy Symptom Score (NSS). Moreover, patients were examined for the presence of peripheral arterial disease. The latter was diagnosed by measuring Ankle-Brachial Index with a Doppler device, as well by typical symptoms (intermittent claudication), palpation of peripheral pulses and, where possible, DSA. We also recorded past history of ulceration or amputation, sex, age, DM type, metabolic control (HbA1c), DM duration, antidiabetic treatment, smoking habits, BMI, classical micro- and macrovascular complications (nephropathy, retinopathy, coronary disease, stroke) and other risk factors, such as hypertension and dislipidaemia. We sought to identify the causal pathway of ulceration, the duration of lesions, the frequency and duration of hospitalisation, the outcome of healing or not, as well as the likelihood of subsequent amputation or vascular intervention. We also addressed the management of diabetic foot, the recurrence of lesion or appearance of new ulceration and, finally, patient mortality. The second part of the study focused on the appearance or otherwise of ipsilateral re-amputation during a 6-year follow-up among patients who had previously sustained a minor or major amputation. Included were 121 patients with prior amputation, who were divided into 2 groups: group A (95 patients without re-amputation) and group B (26 patients with re-amputation). We aimed to identify risk factors leading to ipsilateral re-amputation. Examined potential risk factors included HbA1c, smoking, nephropathy, age, sex, BMI, severity of neuropathy, severity of ischaemia, aetiology and severity of lesion, anatomic location of lesion, patient compliance, vascular intervention, and angiographic findings). ......................................................................

    Discovery and intergration of data and services in the semantic web

    No full text
    The Web constitutes a universal repository providing a huge amount of information in a variety of topics and formats. At the same time, the number of users has increased significantly, their participation has become more active, and their needs are more complex. Thus, new trends arise, emphasizing on the need for integration and collaboration. To address these new challenges, a lot of research efforts have been devoted to the transition to the Semantic Web, which will enhance the current Web with formal and explicit metadata, promising to facilitate interoperability and to increase the automation in searching, managing, and sharing information. In this direction, this thesis studies the problem of searching for relevant services and data on the Semantic Web, as well as integrating information from heterogeneous sources to meet specific needs and requirements. First, we study the problem of Web service discovery. We propose a similarity measure for comparing service descriptions, using the semantic information conveyed by the ontologies used to annotate these descriptions. We also develop techniques, drawing from concepts related to skyline queries, for ranking available services under diverse user preferences and multiple matching criteria. Then, we study the search of services and data in distributed environments, considering peer-to-peer networks where the available resources are semantically annotated. We propose an approach for efficient and progressive search of services in a structured peer-to-peer overlay network, and a method to facilitate the sharing of structured data in an ontology-enhanced peer data management system. Finally, we propose techniques to facilitate the conceptual design of Extract-Transform-Load processes, which are critical processes for reconciling information from several heterogeneous sources. These techniques also rely on the use of ontologies to identify correspondences, conflicts, and transformations between the source and target specifications
    corecore