58,476 research outputs found

    Scalable Probabilistic Similarity Ranking in Uncertain Databases (Technical Report)

    Get PDF
    This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that are assumed to be mutually-exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying a dynamic programming approach of quadratic complexity. In this paper we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach

    Integrating and Ranking Uncertain Scientific Data

    Get PDF
    Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates

    Grid service discovery with rough sets

    Get PDF
    Copyright [2008] IEEE. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Brunel University's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.The computational grid is evolving as a service-oriented computing infrastructure that facilitates resource sharing and large-scale problem solving over the Internet. Service discovery becomes an issue of vital importance in utilising grid facilities. This paper presents ROSSE, a Rough sets based search engine for grid service discovery. Building on Rough sets theory, ROSSE is novel in its capability to deal with uncertainty of properties when matching services. In this way, ROSSE can discover the services that are most relevant to a service query from a functional point of view. Since functionally matched services may have distinct non-functional properties related to Quality of Service (QoS), ROSSE introduces a QoS model to further filter matched services with their QoS values to maximise user satisfaction in service discovery. ROSSE is evaluated in terms of its accuracy and efficiency in discovery of computing services

    Vulnerability assessments of pesticide leaching to groundwater

    Get PDF
    Pesticides may have adverse environmental effects if they are transported to groundwater and surface waters. The vulnerability of water resources to contamination of pesticides must therefore be evaluated. Different stakeholders, with different objectives and requirements, are interested in such vulnerability assessments. Various assessment methods have been developed in the past. For example, the vulnerability of groundwater to pesticide leaching may be evaluated by indices and overlay-based methods, by statistical analyses of monitoring data, or by using process-based models of pesticide fate. No single tool or methodology is likely to be appropriate for all end-users and stakeholders, since their suitability depends on the available data and the specific goals of the assessment. The overall purpose of this thesis was to develop tools, based on different process-based models of pesticide leaching that may be used in groundwater vulnerability assessments. Four different tools have been developed for end-users with varying goals and interests: (i) a tool based on the attenuation factor implemented in a GIS, where vulnerability maps are generated for the islands of Hawaii (U.S.A.), (ii) a simulation tool based on the MACRO model developed to support decision-makers at local authorities to assess potential risks of leaching of pesticides to groundwater following normal usage in drinking water abstraction districts, (iii) linked models of the soil root zone and groundwater to investigate leaching of the pesticide mecoprop to shallow and deep groundwater in fractured till, and (iv) a meta-model of the pesticide fate model MACRO developed for 'worst-case' groundwater vulnerability assessments in southern Sweden. The strengths and weaknesses of the different approaches are discussed

    VAMDC as a Resource for Atomic and Molecular Data and the New Release of VALD

    Full text link
    The Virtual Atomic and Molecular Data Centre (VAMDC) (M.L. Dubernet et al. 2010, JQSRT 111, 2151) is an EU-FP7 e-infrastructure project devoted to building a common electronic infrastructure for the exchange and distribution of atomic and molecular data. It involves two dozen teams from six EU member states (Austria, France, Germany, Italy, Sweden, United Kingdom) as well as Russia, Serbia, and Venezuela. Within VAMDC scientists from many different disciplines in atomic and molecular physics collaborate with users of their data and also with scientists and engineers from the information and communication technology community. In this presentation an overview of the current status of VAMDC and its capabilities will be provided. In the second part of the presentation I will focus on one of the databases which have become part of the VAMDC platform, the Vienna Atomic Line Data Base (VALD). VALD has developed into a well-known resource of atomic data for spectroscopy particularly in astrophysics. A new release, VALD-3, will provide numerous improvements over its predecessor. This particularly relates to the data contents where new sets of atomic data for both precision spectroscopy (i.e., with data for observed energy levels) as well as opacity calculations (i.e., with data involving predicted energy levels) have been included. Data for selected diatomic molecules have been added and a new system for data distribution and data referencing provides for more convenience in using the upcoming third release of VALD.Comment: 8 pages, 1 tabl

    Ranking in Distributed Uncertain Database Environments

    Get PDF
    Distributed data processing is a major field in nowadays applications. Many applications collect and process data from distributed nodes to gain overall results. Large amount of data transfer and network delay made data processing in a centralized manner a hard operation representing an important problem. A very common way to solve this problem is ranking queries. Ranking or top-k queries concentrate only on the highest ranked tuples according to user's interest. Another issue in most nowadays applications is data uncertainty. Many techniques were introduced for modeling, managing, and processing uncertain databases. Although these techniques were efficient, they didn't deal with distributed data uncertainty. This paper deals with both data uncertainty and distribution based on ranking queries. A novel framework is proposed for ranking distributed uncertain data. The framework has a suite of novel algorithms for ranking data and monitoring updates. These algorithms help in reducing the communication rounds used and amount of data transmitted while achieving efficient and effective ranking. Experimental results show that the proposed framework has a great impact in reducing communication cost compared to other techniques.DOI:http://dx.doi.org/10.11591/ijece.v4i4.592
    corecore