Search CORE

11 research outputs found

Integrating uncertain XML data from different sources.

Author: Eshmawi Ala A.
NC DOCKS at The University of North Carolina at Greensboro
Publication venue
Publication date: 01/01/2009
Field of study

Data Integration has become increasingly important with today's rapid growth of information available on the web and in electronic form. In the past several years, extensive work has been done to make use of the available data from different sources, particularly, in the scientific and medical fields. In our work, we are interested in integrating data from different uncertain sources in which data are stored in semistructured databases, markedly XML-based data. This interest in XML-based databases came from the flexibility it provides for storing and exchanging data. Furthermore, we are concerned with reliability of different query answers from various sources and on specifying the source where the data came from (the provenance). In essence, our work lies among three areas of interest, data integration, uncertain databases and lineage or provenance in databases. This thesis extends previous work on information integration to accommodate integration of uncertain data from multiple sources

The University of North Carolina at Greensboro

SEMISTRUCTURED PROBABILISTIC OBJECT QUERY LANGUAGE (A Query Language for Semistructured Probabilistic Data)

Author: Gutti Praveen
Publication venue: UKnowledge
Publication date: 01/01/2007
Field of study

This work presents SPOQL, a structured query language for Semistructured Probabilistic Object (SPO) model [4]. The original query language for semistructured probabilistic database management system [20], SP-Algebra [4], has limitations such as complex functional notation and unfamiliarity to application programmers. SPOQL alleviates these problems by providing a user friendly and familiar SQL-like declarative syntax for writing queries against SPDBMS. We show that parsing SPOQL queries is a more involving task than parsing SQL queries. We describe the evaluation algorithm for SPOQL queries that we have implemented

University of Kentucky

Probabilistic resource space model for managing resources in cyber-physical society

Author: Xing Yunpeng
Zhuge Hai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/02/2012
Field of study

Classification is the most basic method for organizing resources in the physical space, cyber space, socio space and mental space. To create a unified model that can effectively manage resources in different spaces is a challenge. The Resource Space Model RSM is to manage versatile resources with a multi-dimensional classification space. It supports generalization and specialization on multi-dimensional classifications. This paper introduces the basic concepts of RSM, and proposes the Probabilistic Resource Space Model, P-RSM, to deal with uncertainty in managing various resources in different spaces of the cyber-physical society. P-RSM’s normal forms, operations and integrity constraints are developed to support effective management of the resource space. Characteristics of the P-RSM are analyzed through experiments. This model also enables various services to be described, discovered and composed from multiple dimensions and abstraction levels with normal form and integrity guarantees. Some extensions and applications of the P-RSM are introduced

Crossref

Aston Publications Explorer

Databases for interval probabilities

Author: Barbara
Biazzo
Biazzo
Cano
Cavallo
de Campos
de Cooman
Dekhtyar
Dekhtyar
Dekhtyar
Dekhtyar
Dey
Givan
Goldsmith
Hung
Jaffray
Kyburg
Lakshmanan
Nierman
Ramakrishnan
Walley
Weichselberger
Zhao
Zhao
Zhao
Publication venue: 'Wiley'
Publication date: 01/01/2004
Field of study

Crossref

Decision trees for uncertain data

Author: Ho WS
Kao B
Lee SD
Tsang S
Yip KY
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/ quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete information" of a data item (taking into account the probability density function (pdf)) is utilized. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted which show that the resulting classifiers are more accurate than those using value averages. Since processing pdfs is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency. © 2006 IEEE.link_to_subscribed_fulltex

HKU Scholars Hub

A Framework for Management of Semistructured Probabilistic Data

Author: A. Dekhtyar
A. Dekhtyar
Alex Dekhtyar
D. Barbará
D. Barbará
D. Dey
D. Dey
D. Dey
D. Dey
E. Kornatzky
E. Kornatzky
E. Zimányi
F. Tian
F. Tian
J. Halpern
J. Halpern
Judy Goldsmith
L.M. de Campos
L.M. de Campos
M. Pittarelli
M. Pittarelli
R. Ng
V.S. Lakshmanan
V.S. Lakshmanan
W. Zhao
W. Zhao
Wenzhong Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Managing Uncertainty and Ontologies in Databases

Author: Hung Edward
Publication venue
Publication date: 18/04/2005
Field of study

Nowadays a vast amount of data is generated in Extensible Markup Language (XML). However, it is necessary for applications in some domains to store and manipulate uncertain information, e.g. when the sensor inputs are noisy, or we want to store data that is uncertain. Another big change we can see in applications and web data is the increasing use of ontologies to describe the semantics of data, i.e., the semantic relationships between the terms in the databases. As such information is usually absent from traditional databases, there is tremendous opportunity to ask new kinds of queries that could not be handled in the past. This provides new challenges on how to manipulate and maintain such new kinds of database systems. In this dissertation, we will see how we can (i) incorporate and manipulate uncertainty in databases, and (ii) efficiently compute aggregates and maintain views on ontology databases. First, I explain applications that require manipulating uncertain information in XML databases and maintaining web ontology databases written in Resource Description Framework (RDF). I then introduce the probabilistic semistructured PXML data model with two formal semantics. I describe a set of algebraic operations and its efficient implementation. Aggregations of PXML instances are studied with two semantics proposed: possible-worlds semantics and expectation semantics. Efficient algorithms with pruning are given and evaluated to show their feasibility. I introduce PIXML, an interval probability version of PXML, and develop a formal semantics for it. A query language and its operational semantics are given and proved to be sound and complete. Based on XML, RDF is a language used to describe web ontologies. RDQL, an RDF query language, is extended to support view definition and aggregations. Two sets of algorithms are given to maintain non-aggregate and aggregate views. Experimental results show that they are efficient compared with standard relational view maintenance algorithms

Digital Repository at the University of Maryland

Probabilistic interval XML

Author: Braz R.
Cavallo R.
Dekhtyar A.
Edward Hung
Friedman N.
Goldman S.
Hung E.
Kamberova G.
Kersting K.
Kiessling W.
Koller D.
Koller D.
Lakshmanan L. V.
Lakshmanan L. V.
Lise Getoor
Nierman A.
Poole D.
Poole D.
V. S. Subrahmanian
Zhao W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref