878 research outputs found

    Ranking Bias in Deep Web Size Estimation Using Capture Recapture Method

    Get PDF
    Many deep web data sources are ranked data sources, i.e., they rank the matched documents and return at most the top k number of results even though there are more than k documents matching the query. While estimating the size of such ranked deep web data source, it is well known that there is a ranking bias—the traditional methods tend to underestimate the size when queries overflow (match more documents than the return limit). Numerous estimation methods have been proposed to overcome the ranking bias, such as by avoiding overflowing queries during the sampling process, or by adjusting the initial estimation using a fixed function. We observe that the overflow rate has a direct impact on the accuracy of the estimation. Under certain conditions, the actual size is close to the estimation obtained by unranked model multiplied by the overflow rate. Based on this result, this paper proposes a method that allows overflowing queries in the sampling process

    Web service search: who, when, what, and how

    Get PDF
    Web service search is an important problem in service oriented architecture that has attracted widespread attention from academia as well as industry. Web service searching can be performed by various stakeholders, in different situations, using different forms of queries. All those combinations result in radically different ways of implementation. Using a real world web service composition example, this paper describes when, what, and how to search web services from service assemblers’ point of view, where the semantics of web services are not explicitly described. This example outlines the approach to implement a web service broker that can recommend useful services to service assemblers

    Higher order generalization and its application in program verification

    Get PDF
    Generalization is a fundamental operation of inductive inference. While first order syntactic generalization (anti–unification) is well understood, its various extensions are often needed in applications. This paper discusses syntactic higher order generalization in a higher order language λ2 [1]. Based on the application ordering, we prove that least general generalization exists for any two terms and is unique up to renaming. An algorithm to compute the least general generalization is also presented. To illustrate its usefulness, we propose a program verification system based on higher order generalization that can reuse the proofs of similar programs

    Beyond network centrality: Individual-level behavioral traits for predicting information superspreaders in social media

    Get PDF
    Understanding the heterogeneous role of individuals in large-scale information spreading is essential to manage online behavior as well as its potential offline consequences. To this end, most existing studies from diverse research domains focus on the disproportionate role played by highly-connected “hub” individuals. However, we demonstrate here that information superspreaders in online social media are best understood and predicted by simultaneously considering two individual-level behavioral traits: influence and susceptibility. Specifically, we derive a nonlinear network-based algorithm to quantify individuals’ influence and susceptibility from multiple spreading event data. By applying the algorithm to large-scale data from Twitter and Weibo, we demonstrate that individuals’ estimated influence and susceptibility scores enable predictions of future superspreaders above and beyond network centrality, and reveal new insights on the network position of the superspreaders

    Optimal algorithms for selecting top-k combinations of attributes : theory and applications

    Get PDF
    Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.Peer reviewe
    • …
    corecore