380 research outputs found
Spatial database implementation of fuzzy region connection calculus for analysing the relationship of diseases
Analyzing huge amounts of spatial data plays an important role in many
emerging analysis and decision-making domains such as healthcare, urban
planning, agriculture and so on. For extracting meaningful knowledge from
geographical data, the relationships between spatial data objects need to be
analyzed. An important class of such relationships are topological relations
like the connectedness or overlap between regions. While real-world
geographical regions such as lakes or forests do not have exact boundaries and
are fuzzy, most of the existing analysis methods neglect this inherent feature
of topological relations. In this paper, we propose a method for handling the
topological relations in spatial databases based on fuzzy region connection
calculus (RCC). The proposed method is implemented in PostGIS spatial database
and evaluated in analyzing the relationship of diseases as an important
application domain. We also used our fuzzy RCC implementation for fuzzification
of the skyline operator in spatial databases. The results of the evaluation
show that our method provides a more realistic view of spatial relationships
and gives more flexibility to the data analyst to extract meaningful and
accurate results in comparison with the existing methods.Comment: ICEE201
Towards a semantic and statistical selection of association rules
The increasing growth of databases raises an urgent need for more accurate
methods to better understand the stored data. In this scope, association rules
were extensively used for the analysis and the comprehension of huge amounts of
data. However, the number of generated rules is too large to be efficiently
analyzed and explored in any further process. Association rules selection is a
classical topic to address this issue, yet, new innovated approaches are
required in order to provide help to decision makers. Hence, many interesting-
ness measures have been defined to statistically evaluate and filter the
association rules. However, these measures present two major problems. On the
one hand, they do not allow eliminating irrelevant rules, on the other hand,
their abun- dance leads to the heterogeneity of the evaluation results which
leads to confusion in decision making. In this paper, we propose a two-winged
approach to select statistically in- teresting and semantically incomparable
rules. Our statis- tical selection helps discovering interesting association
rules without favoring or excluding any measure. The semantic comparability
helps to decide if the considered association rules are semantically related
i.e comparable. The outcomes of our experiments on real datasets show promising
results in terms of reduction in the number of rules
The skyline operator algorithms: a review
In the present decade, there is the revival of interest to find the best way to query the database that able to provide not only a single answer, but also a set of answers that can be the most preferred by users .In many cases, the overflow of data generated from social media, and many other data stored and shared over the Internet makes the data access becoming near –infinite to the users. This has led to the need of intuitive formulation that able to provide the best choices from every unconceivable situation .Recently, the skyline computation has gained a lot of attention in the database research community for advanced queries semantic.The skyline query is introduced to be the syntax extension in SQL query to support multi-criteria data selection involving advanced queries. This paper surveys the techniques employed in initial algorithms of the skyline query processing .Some trade-offs about those different approaches are also identified throughout this study
Skyline Operators for Document Spanners
When extracting a relation of spans (intervals) from a text document, a
common practice is to filter out tuples of the relation that are deemed
dominated by others. The domination rule is defined as a partial order that
varies along different systems and tasks. For example, we may state that a
tuple is dominated by tuples which extend it by assigning additional
attributes, or assigning larger intervals. The result of filtering the relation
would then be the skyline according to this partial order. As this filtering
may remove most of the extracted tuples, we study whether we can improve the
performance of the extraction by compiling the domination rule into the
extractor.
To this aim, we introduce the skyline operator for declarative information
extraction tasks expressed as document spanners. We show that this operator can
be expressed via regular operations when the domination partial order can
itself be expressed as a regular spanner, which covers several natural
domination rules. Yet, we show that the skyline operator incurs a computational
cost (under combined complexity). First, there are cases where the operator
requires an exponential blowup on the number of states needed to represent the
spanner as a sequential variable-set automaton. Second, the evaluation may
become computationally hard. Our analysis more precisely identifies classes of
domination rules for which the combined complexity is tractable or intractable.Comment: 42 pages. Submitte
Providing Diversity in K-Nearest Neighbor Query Results
Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN)
queries return the K closest answers according to given distance metric in the
database with respect to Q. In this scenario, it is possible that a majority of
the answers may be very similar to some other, especially when the data has
clusters. For a variety of applications, such homogeneous result sets may not
add value to the user. In this paper, we consider the problem of providing
diversity in the results of KNN queries, that is, to produce the closest result
set such that each answer is sufficiently different from the rest. We first
propose a user-tunable definition of diversity, and then present an algorithm,
called MOTLEY, for producing a diverse result set as per this definition.
Through a detailed experimental evaluation on real and synthetic data, we show
that MOTLEY can produce diverse result sets by reading only a small fraction of
the tuples in the database. Further, it imposes no additional overhead on the
evaluation of traditional KNN queries, thereby providing a seamless interface
between diversity and distance.Comment: 20 pages, 11 figure
- …