3,504 research outputs found
06431 Abstracts Collection -- Scalable Data Management in Evolving Networks
From 22.10.06 to 27.10.06, the Dagstuhl Seminar 06431 ``Scalable Data Management in Evolving Networks\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available
Mining Frequent Itemsets over Uncertain Databases
In recent years, due to the wide applications of uncertain data, mining
frequent itemsets over uncertain databases has attracted much attention. In
uncertain databases, the support of an itemset is a random variable instead of
a fixed occurrence counting of this itemset. Thus, unlike the corresponding
problem in deterministic databases where the frequent itemset has a unique
definition, the frequent itemset under uncertain environments has two different
definitions so far. The first definition, referred as the expected
support-based frequent itemset, employs the expectation of the support of an
itemset to measure whether this itemset is frequent. The second definition,
referred as the probabilistic frequent itemset, uses the probability of the
support of an itemset to measure its frequency. Thus, existing work on mining
frequent itemsets over uncertain databases is divided into two different groups
and no study is conducted to comprehensively compare the two different
definitions. In addition, since no uniform experimental platform exists,
current solutions for the same definition even generate inconsistent results.
In this paper, we firstly aim to clarify the relationship between the two
different definitions. Through extensive experiments, we verify that the two
definitions have a tight connection and can be unified together when the size
of data is large enough. Secondly, we provide baseline implementations of eight
existing representative algorithms and test their performances with uniform
measures fairly. Finally, according to the fair tests over many different
benchmark data sets, we clarify several existing inconsistent conclusions and
discuss some new findings.Comment: VLDB201
Capturing Data Uncertainty in High-Volume Stream Processing
We present the design and development of a data stream system that captures
data uncertainty from data collection to query processing to final result
generation. Our system focuses on data that is naturally modeled as continuous
random variables. For such data, our system employs an approach grounded in
probability and statistical theory to capture data uncertainty and integrates
this approach into high-volume stream processing. The first component of our
system captures uncertainty of raw data streams from sensing devices. Since
such raw streams can be highly noisy and may not carry sufficient information
for query processing, our system employs probabilistic models of the data
generation process and stream-speed inference to transform raw data into a
desired format with an uncertainty metric. The second component captures
uncertainty as data propagates through query operators. To efficiently quantify
result uncertainty of a query operator, we explore a variety of techniques
based on probability and statistical theory to compute the result distribution
at stream speed. We are currently working with a group of scientists to
evaluate our system using traces collected from the domains of (and eventually
in the real systems for) hazardous weather monitoring and object tracking and
monitoring.Comment: CIDR 200
Behavioural pattern identification and prediction in intelligent environments
In this paper, the application of soft computing techniques in prediction of an occupant's behaviour in an inhabited intelligent environment is addressed. In this research, daily activities of elderly people who live in their own homes suffering from dementia are studied. Occupancy sensors are used to extract the movement patterns of the occupant. The occupancy data is then converted into temporal sequences of activities which are eventually used to predict the occupant behaviour. To build the prediction model, different dynamic recurrent neural networks are investigated. Recurrent neural networks have shown a great ability in finding the temporal relationships of input patterns. The experimental results show that non-linear autoregressive network with exogenous inputs model correctly extracts the long term prediction patterns of the occupant and outperformed the Elman network. The results presented here are validated using data generated from a simulator and real environments
- …