380 research outputs found

    Spatial database implementation of fuzzy region connection calculus for analysing the relationship of diseases

    Full text link
    Analyzing huge amounts of spatial data plays an important role in many emerging analysis and decision-making domains such as healthcare, urban planning, agriculture and so on. For extracting meaningful knowledge from geographical data, the relationships between spatial data objects need to be analyzed. An important class of such relationships are topological relations like the connectedness or overlap between regions. While real-world geographical regions such as lakes or forests do not have exact boundaries and are fuzzy, most of the existing analysis methods neglect this inherent feature of topological relations. In this paper, we propose a method for handling the topological relations in spatial databases based on fuzzy region connection calculus (RCC). The proposed method is implemented in PostGIS spatial database and evaluated in analyzing the relationship of diseases as an important application domain. We also used our fuzzy RCC implementation for fuzzification of the skyline operator in spatial databases. The results of the evaluation show that our method provides a more realistic view of spatial relationships and gives more flexibility to the data analyst to extract meaningful and accurate results in comparison with the existing methods.Comment: ICEE201

    Towards a semantic and statistical selection of association rules

    Full text link
    The increasing growth of databases raises an urgent need for more accurate methods to better understand the stored data. In this scope, association rules were extensively used for the analysis and the comprehension of huge amounts of data. However, the number of generated rules is too large to be efficiently analyzed and explored in any further process. Association rules selection is a classical topic to address this issue, yet, new innovated approaches are required in order to provide help to decision makers. Hence, many interesting- ness measures have been defined to statistically evaluate and filter the association rules. However, these measures present two major problems. On the one hand, they do not allow eliminating irrelevant rules, on the other hand, their abun- dance leads to the heterogeneity of the evaluation results which leads to confusion in decision making. In this paper, we propose a two-winged approach to select statistically in- teresting and semantically incomparable rules. Our statis- tical selection helps discovering interesting association rules without favoring or excluding any measure. The semantic comparability helps to decide if the considered association rules are semantically related i.e comparable. The outcomes of our experiments on real datasets show promising results in terms of reduction in the number of rules

    The skyline operator algorithms: a review

    Get PDF
    In the present decade, there is the revival of interest to find the best way to query the database that able to provide not only a single answer, but also a set of answers that can be the most preferred by users .In many cases, the overflow of data generated from social media, and many other data stored and shared over the Internet makes the data access becoming near –infinite to the users. This has led to the need of intuitive formulation that able to provide the best choices from every unconceivable situation .Recently, the skyline computation has gained a lot of attention in the database research community for advanced queries semantic.The skyline query is introduced to be the syntax extension in SQL query to support multi-criteria data selection involving advanced queries. This paper surveys the techniques employed in initial algorithms of the skyline query processing .Some trade-offs about those different approaches are also identified throughout this study

    Skyline Operators for Document Spanners

    Full text link
    When extracting a relation of spans (intervals) from a text document, a common practice is to filter out tuples of the relation that are deemed dominated by others. The domination rule is defined as a partial order that varies along different systems and tasks. For example, we may state that a tuple is dominated by tuples which extend it by assigning additional attributes, or assigning larger intervals. The result of filtering the relation would then be the skyline according to this partial order. As this filtering may remove most of the extracted tuples, we study whether we can improve the performance of the extraction by compiling the domination rule into the extractor. To this aim, we introduce the skyline operator for declarative information extraction tasks expressed as document spanners. We show that this operator can be expressed via regular operations when the domination partial order can itself be expressed as a regular spanner, which covers several natural domination rules. Yet, we show that the skyline operator incurs a computational cost (under combined complexity). First, there are cases where the operator requires an exponential blowup on the number of states needed to represent the spanner as a sequential variable-set automaton. Second, the evaluation may become computationally hard. Our analysis more precisely identifies classes of domination rules for which the combined complexity is tractable or intractable.Comment: 42 pages. Submitte

    Providing Diversity in K-Nearest Neighbor Query Results

    Full text link
    Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN) queries return the K closest answers according to given distance metric in the database with respect to Q. In this scenario, it is possible that a majority of the answers may be very similar to some other, especially when the data has clusters. For a variety of applications, such homogeneous result sets may not add value to the user. In this paper, we consider the problem of providing diversity in the results of KNN queries, that is, to produce the closest result set such that each answer is sufficiently different from the rest. We first propose a user-tunable definition of diversity, and then present an algorithm, called MOTLEY, for producing a diverse result set as per this definition. Through a detailed experimental evaluation on real and synthetic data, we show that MOTLEY can produce diverse result sets by reading only a small fraction of the tuples in the database. Further, it imposes no additional overhead on the evaluation of traditional KNN queries, thereby providing a seamless interface between diversity and distance.Comment: 20 pages, 11 figure
    corecore