Search CORE

5 research outputs found

A Probabilistic Data Model and Its Semantics

Author: Zhang C
Zhang S
Publication venue
Publication date: 17/11/2003
Field of study

As database systems are increasingly being used in advanced applications, it is becoming common that data in these applications contain some elements of uncertainty. These arise from many factors, such as measurement errors and cognitive errors. As such, many researchers have focused on defining comprehensive uncertainty data models of uncertainty database systems. However, existing uncertainty data models do not adequately support some applications. Moreover, very few works address uncertainty tuple calculus. In this paper we advocate a probabilistic data model for representing uncertain information. In particular, we establish a probabilistic tuple calculus language and its semantics to meet the corresponding probabilistic relational algebra

OPUS - University of Technology Sydney

A Framework for Spatial Database Explanations

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: In the last few years, there has been a tremendous increase in the use of big data. Most of this data is hard to understand because of its size and dimensions. The importance of this problem can be emphasized by the fact that Big Data Research and Development Initiative was announced by the United States administration in 2012 to address problems faced by the government. Various states and cities in the US gather spatial data about incidents like police calls for service. When we query large amounts of data, it may lead to a lot of questions. For example, when we look at arithmetic relationships between queries in heterogeneous data, there are a lot of differences. How can we explain what factors account for these differences? If we define the observation as an arithmetic relationship between queries, this kind of problem can be solved by aggravation or intervention. Aggravation views the value of our observation for different set of tuples while intervention looks at the value of the observation after removing sets of tuples. We call the predicates which represent these tuples, explanations. Observations by themselves have limited importance. For example, if we observe a large number of taxi trips in a specific area, we might ask the question: Why are there so many trips here? Explanations attempt to answer these kinds of questions. While aggravation and intervention are designed for non spatial data, we propose a new approach for explaining spatially heterogeneous data. Our approach expands on aggravation and intervention while using spatial partitioning/clustering to improve explanations for spatial data. Our proposed approach was evaluated against a real-world taxi dataset as well as a synthetic disease outbreak datasets. The approach was found to outperform aggravation in precision and recall while outperforming intervention in precision.Dissertation/ThesisMasters Thesis Computer Science 201

ASU Digital Repository

Big Data and Causality

Author: A Bate
A Casillas
A Fujita
A Montalto
A Sharma
AKH Tung
B Widrow
BJ Ale
BJM Ale
C Bizer
C Hashimoto
C Mihăilă
C Mihăilă
C Silverstein
CC Chang
CC Yang
CM Bishop
CW Granger
D Birant
D Xu
D Zhang
EA Wan
G Sugihara
G Wu
GF Cooper
GK Gupta
GP Zhang
H Chen
H Chen
H Hassani
H Hassani
H Hassani
H Ibrahim
H Kargupta
H Yang
H Yun
J Cowie
J Han
J Han
J Li
J Li
J Li
J Ma
J Pustejovsky
J Sadek
J Vohradský
JA Suykens
JB Classen
JD Kim
JR Quinlan
JR Quinlan
JR Sato
JV Tu
JW Hunt
JW Seol
K Fundel
L Breiman
L Sanmiquel
L Talmy
L Wang
M Collins
M Hall
M Herland
M Lagazio
M Wahde
MD Richard
N Rizzolo
P Langley
PA Shoemaker
R Agrawal
R Bunescu
R Maesschalck De
R Xu
S Karimi
S Kleinberg
S Lee
S Pyysalo
S Zhang
S Zhao
SC Chen
SH Chen
ST Li
TC Fu
U Fayyad
U Hahn
U Roshan
U Soytas
V Mayer-Schonberger
WW Chow
X Zhang
Y Ji
Y Ji
Y Ji
YL Hsieh
YL Hsieh
Z Ghodsi
Z Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2017
Field of study

The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Causality analysis continues to remain one of the fundamental research questions and the ultimate objective for a tremendous amount of scientific studies. In line with the rapid progress of science and technology, the age of big data has significantly influenced the causality analysis on various disciplines especially for the last decade due to the fact that the complexity and difficulty on identifying causality among big data has dramatically increased. Data mining, the process of uncovering hidden information from big data is now an important tool for causality analysis, and has been extensively exploited by scholars around the world. The primary aim of this paper is to provide a concise review of the causality analysis in big data. To this end the paper reviews recent significant applications of data mining techniques in causality analysis covering a substantial quantity of research to date, presented in chronological order with an overview table of data mining applications in causality analysis domain as a reference directory

Crossref

De Montfort University Open Research Archive

Discovering Causality in Large Databases

Author: No C
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2003
Field of study

OPUS - University of Technology Sydney

DOI: 10.1080/08839510290030264 DISCOVERING CAUSALITY IN LARGE DATABASES

Author: Chengqi Zhang
Shichao Zhang
Publication venue
Publication date
Field of study

A causal rule between two variables, X! Y, captures the relationship that the presence of X causes the appearance of Y. Because of its usefulness (compared to association rules), techniques for mining causal rules are beginning to be developed. However, the effectiveness of existing methods (such as the LCD and CU-path algorithms)are limited to mining causal rules among simple variables, and are inadequate to discover and represent causal rules among multi-value variables. In this paper, we propose that the causality between variables X and Y be represented in the form X! Y with conditional probability matrix MY jX: We also propose a new approach to discover causality in large databases based on partitioning. The approach partitions the items into item variables by decomposing ``bad’ ’ item variables and composing ` ` not-good’ ’ item variables. In particular, we establish a method to optimize causal rules that merges the ` ` useless’ ’ information in conditional probability matrices of extracted causal rules

CiteSeerX