58,476 research outputs found
Scalable Probabilistic Similarity Ranking in Uncertain Databases (Technical Report)
This paper introduces a scalable approach for probabilistic top-k similarity
ranking on uncertain vector data. Each uncertain object is represented by a set
of vector instances that are assumed to be mutually-exclusive. The objective is
to rank the uncertain data according to their distance to a reference object.
We propose a framework that incrementally computes for each object instance and
ranking position, the probability of the object falling at that ranking
position. The resulting rank probability distribution can serve as input for
several state-of-the-art probabilistic ranking models. Existing approaches
compute this probability distribution by applying a dynamic programming
approach of quadratic complexity. In this paper we theoretically as well as
experimentally show that our framework reduces this to a linear-time complexity
while having the same memory requirements, facilitated by incremental accessing
of the uncertain vector instances in increasing order of their distance to the
reference object. Furthermore, we show how the output of our method can be used
to apply probabilistic top-k ranking for the objects, according to different
state-of-the-art definitions. We conduct an experimental evaluation on
synthetic and real data, which demonstrates the efficiency of our approach
Integrating and Ranking Uncertain Scientific Data
Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates
Grid service discovery with rough sets
Copyright [2008] IEEE. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Brunel University's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.The computational grid is evolving as a service-oriented computing infrastructure that facilitates resource sharing and large-scale problem solving over the Internet. Service discovery becomes an issue of vital importance in utilising grid facilities. This paper presents ROSSE, a Rough sets based search engine for grid service discovery. Building on Rough sets theory, ROSSE is novel in its capability to deal with uncertainty of properties when matching services. In this way, ROSSE can discover the services that are most relevant to a service query from a functional point of view. Since functionally matched services may have distinct non-functional properties related to Quality of Service (QoS), ROSSE introduces a QoS model to further filter matched services with their QoS values to maximise user satisfaction in service discovery. ROSSE is evaluated in terms of its accuracy and efficiency in discovery of computing services
Vulnerability assessments of pesticide leaching to groundwater
Pesticides may have adverse environmental effects if they are transported to groundwater and surface waters. The vulnerability of water resources to contamination of pesticides must therefore be evaluated. Different stakeholders, with different objectives and requirements, are interested in such vulnerability assessments. Various assessment methods have been developed in the past. For example, the vulnerability of groundwater to pesticide leaching may be evaluated by indices and overlay-based methods, by statistical analyses of monitoring data, or by using process-based models of pesticide fate. No single tool or methodology is likely to be appropriate for all end-users and stakeholders, since their suitability depends on the available data and the specific goals of the assessment. The overall purpose of this thesis was to develop tools, based on different process-based models of pesticide leaching that may be used in groundwater vulnerability assessments. Four different tools have been developed for end-users with varying goals and interests: (i) a tool based on the attenuation factor implemented in a GIS, where vulnerability maps are generated for the islands of Hawaii (U.S.A.), (ii) a simulation tool based on the MACRO model developed to support decision-makers at local authorities to assess potential risks of leaching of pesticides to groundwater following normal usage in drinking water abstraction districts, (iii) linked models of the soil root zone and groundwater to investigate leaching of the pesticide mecoprop to shallow and deep groundwater in fractured till, and (iv) a meta-model of the pesticide fate model MACRO developed for 'worst-case' groundwater vulnerability assessments in southern Sweden. The strengths and weaknesses of the different approaches are discussed
VAMDC as a Resource for Atomic and Molecular Data and the New Release of VALD
The Virtual Atomic and Molecular Data Centre (VAMDC) (M.L. Dubernet et al.
2010, JQSRT 111, 2151) is an EU-FP7 e-infrastructure project devoted to
building a common electronic infrastructure for the exchange and distribution
of atomic and molecular data. It involves two dozen teams from six EU member
states (Austria, France, Germany, Italy, Sweden, United Kingdom) as well as
Russia, Serbia, and Venezuela. Within VAMDC scientists from many different
disciplines in atomic and molecular physics collaborate with users of their
data and also with scientists and engineers from the information and
communication technology community. In this presentation an overview of the
current status of VAMDC and its capabilities will be provided. In the second
part of the presentation I will focus on one of the databases which have become
part of the VAMDC platform, the Vienna Atomic Line Data Base (VALD). VALD has
developed into a well-known resource of atomic data for spectroscopy
particularly in astrophysics. A new release, VALD-3, will provide numerous
improvements over its predecessor. This particularly relates to the data
contents where new sets of atomic data for both precision spectroscopy (i.e.,
with data for observed energy levels) as well as opacity calculations (i.e.,
with data involving predicted energy levels) have been included. Data for
selected diatomic molecules have been added and a new system for data
distribution and data referencing provides for more convenience in using the
upcoming third release of VALD.Comment: 8 pages, 1 tabl
Ranking in Distributed Uncertain Database Environments
Distributed data processing is a major field in nowadays applications. Many applications collect and process data from distributed nodes to gain overall results. Large amount of data transfer and network delay made data processing in a centralized manner a hard operation representing an important problem. A very common way to solve this problem is ranking queries. Ranking or top-k queries concentrate only on the highest ranked tuples according to user's interest. Another issue in most nowadays applications is data uncertainty. Many techniques were introduced for modeling, managing, and processing uncertain databases. Although these techniques were efficient, they didn't deal with distributed data uncertainty. This paper deals with both data uncertainty and distribution based on ranking queries. A novel framework is proposed for ranking distributed uncertain data. The framework has a suite of novel algorithms for ranking data and monitoring updates. These algorithms help in reducing the communication rounds used and amount of data transmitted while achieving efficient and effective ranking. Experimental results show that the proposed framework has a great impact in reducing communication cost compared to other techniques.DOI:http://dx.doi.org/10.11591/ijece.v4i4.592
- …