1,392,440 research outputs found
Structurally Tractable Uncertain Data
Many data management applications must deal with data which is uncertain,
incomplete, or noisy. However, on existing uncertain data representations, we
cannot tractably perform the important query evaluation tasks of determining
query possibility, certainty, or probability: these problems are hard on
arbitrary uncertain input instances. We thus ask whether we could restrict the
structure of uncertain data so as to guarantee the tractability of exact query
evaluation. We present our tractability results for tree and tree-like
uncertain data, and a vision for probabilistic rule reasoning. We also study
uncertainty about order, proposing a suitable representation, and study
uncertain data conditioned by additional observations.Comment: 11 pages, 1 figure, 1 table. To appear in SIGMOD/PODS PhD Symposium
201
Range Queries on Uncertain Data
Given a set of uncertain points on the real line, each represented by
its one-dimensional probability density function, we consider the problem of
building data structures on to answer range queries of the following three
types for any query interval : (1) top- query: find the point in that
lies in with the highest probability, (2) top- query: given any integer
as part of the query, return the points in that lie in
with the highest probabilities, and (3) threshold query: given any threshold
as part of the query, return all points of that lie in with
probabilities at least . We present data structures for these range
queries with linear or nearly linear space and efficient query time.Comment: 26 pages. A preliminary version of this paper appeared in ISAAC 2014.
In this full version, we also present solutions to the most general case of
the problem (i.e., the histogram bounded case), which were left as open
problems in the preliminary versio
Integrating and Ranking Uncertain Scientific Data
Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates
Model updating using uncertain experimental modal data
The propagation of parameter uncertainty in structural dynamics has become a feasible method to determine the probabilistic description of the vibration response of industrial scale �nite element models. Though methods for uncertainty propagation have been developed extensively, the quanti�cation of parameter uncertainty has been neglected in the past. But a correct assumption for the parameter variability is essential for the estimation of the uncertain vibration response. This paper shows how to identify model parameter means and covariance matrix from uncertain experimental modal test data. The common gradient based approach from deterministic computational model updating was extended by an equation that accounts for the stochastic part. In detail an inverse approach for the identi�cation of statistical parametric properties will be presented which will be applied on a numerical model of a replica of the GARTEUR SM-AG19 benchmark structure. The uncertain eigenfrequencies and mode shapes have been determined in an extensive experimental modal test campaign where the aircraft structure was tested repeatedly while it was 130 times dis- and reassembled in between each experimental modal analysis
Supporting User-Defined Functions on Uncertain Data
Uncertain data management has become crucial in many sensing and scientific applications. As user-defined functions (UDFs) become widely used in these applications, an important task is to capture result uncertainty for queries that evaluate UDFs on uncertain data. In this work, we provide a general framework for supporting UDFs on uncertain data. Specifically, we propose a learning approach based on Gaussian processes (GPs) to compute approximate output distributions of a UDF when evaluated on uncertain input, with guaranteed error bounds. We also devise an online algorithm to compute such output distributions, which employs a suite of optimizations to improve accuracy and performance. Our evaluation using both real-world and synthetic functions shows that our proposed GP approach can outperform the state-of-the-art sampling approach with up to two orders of magnitude improvement for a variety of UDFs. 1
On Markov Chains with Uncertain Data
In this paper, a general method is described to determine uncertainty intervals for performance measures of Markov chains given an uncertainty region for the parameters of the Markov chains. We investigate the effects of uncertainties in the transition probabilities on the limiting distributions, on the state probabilities after n steps, on mean sojourn times in transient states, and on absorption probabilities for absorbing states. We show that the uncertainty effects can be calculated by solving linear programming problems in the case of interval uncertainty for the transition probabilities, and by second order cone optimization in the case of ellipsoidal uncertainty. Many examples are given, especially Markovian queueing examples, to illustrate the theory.Markov chain;Interval uncertainty;Ellipsoidal uncertainty;Linear Programming;Second Order Cone Optimization
Improvements on the k-center problem for uncertain data
In real applications, there are situations where we need to model some
problems based on uncertain data. This leads us to define an uncertain model
for some classical geometric optimization problems and propose algorithms to
solve them. In this paper, we study the -center problem, for uncertain
input. In our setting, each uncertain point is located independently from
other points in one of several possible locations in a metric space with metric , with specified probabilities
and the goal is to compute -centers that minimize the
following expected cost here
is the probability space of all realizations of given uncertain points and
In restricted assigned version of this problem, an assignment is given for any choice of centers and the
goal is to minimize In unrestricted version, the
assignment is not specified and the goal is to compute centers
and an assignment that minimize the above expected
cost.
We give several improved constant approximation factor algorithms for the
assigned versions of this problem in a Euclidean space and in a general metric
space. Our results significantly improve the results of \cite{guh} and
generalize the results of \cite{wang} to any dimension. Our approach is to
replace a certain center point for each uncertain point and study the
properties of these certain points. The proposed algorithms are efficient and
simple to implement
Belief Hierarchical Clustering
In the data mining field many clustering methods have been proposed, yet
standard versions do not take into account uncertain databases. This paper
deals with a new approach to cluster uncertain data by using a hierarchical
clustering defined within the belief function framework. The main objective of
the belief hierarchical clustering is to allow an object to belong to one or
several clusters. To each belonging, a degree of belief is associated, and
clusters are combined based on the pignistic properties. Experiments with real
uncertain data show that our proposed method can be considered as a propitious
tool
Utility-driven Data Analytics on Uncertain Data
Modern Internet of Things (IoT) applications generate massive amounts of
data, much of it in the form of objects/items of readings, events, and log
entries. Specifically, most of the objects in these IoT data contain rich
embedded information (e.g., frequency and uncertainty) and different level of
importance (e.g., unit utility of items, interestingness, cost, risk, or
weight). Many existing approaches in data mining and analytics have limitations
such as only the binary attribute is considered within a transaction, as well
as all the objects/items having equal weights or importance. To solve these
drawbacks, a novel utility-driven data analytics algorithm named HUPNU is
presented, to extract High-Utility patterns by considering both Positive and
Negative unit utilities from Uncertain data. The qualified high-utility
patterns can be effectively discovered for risk prediction, manufacturing
management, decision-making, among others. By using the developed vertical
Probability-Utility list with the Positive-and-Negative utilities structure, as
well as several effective pruning strategies. Experiments showed that the
developed HUPNU approach performed great in mining the qualified patterns
efficiently and effectively.Comment: Under review in IEEE Internet of Things Journal since 2018, 11 page
- …
