112,614 research outputs found
Computing fuzzy rough approximations in large scale information systems
Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory groups objects together based on the indiscernibility of their attribute values. Fuzzy rough set theory extends rough set theory to data with continuous attributes, and detects degrees of inconsistency in the data. Key to this is turning the indiscernibility relation into a gradual relation, acknowledging that objects can be similar to a certain extent. In very large datasets with millions of objects, computing the gradual indiscernibility relation (or in other words, the soft granules) is very demanding, both in terms of runtime and in terms of memory. It is however required for the computation of the lower and upper approximations of concepts in the fuzzy rough set analysis pipeline. Current non-distributed implementations in R are limited by memory capacity. For example, we found that a state of the art non-distributed implementation in R could not handle 30,000 rows and 10 attributes on a node with 62GB of memory. This is clearly insufficient to scale fuzzy rough set analysis to massive datasets. In this paper we present a parallel and distributed solution based on Message Passing Interface (MPI) to compute fuzzy rough approximations in very large information systems. Our results show that our parallel approach scales with problem size to information systems with millions of objects. To the best of our knowledge, no other parallel and distributed solutions have been proposed so far in the literature for this problem
Analysing imperfect temporal information in GIS using the Triangular Model
Rough set and fuzzy set are two frequently used approaches for modelling and reasoning about imperfect time intervals. In this paper, we focus on imperfect time intervals that can be modelled by rough sets and use an innovative graphic model [i.e. the triangular model (TM)] to represent this kind of imperfect time intervals. This work shows that TM is potentially advantageous in visualizing and querying imperfect time intervals, and its analytical power can be better exploited when it is implemented in a computer application with graphical user interfaces and interactive functions. Moreover, a probabilistic framework is proposed to handle the uncertainty issues in temporal queries. We use a case study to illustrate how the unique insights gained by TM can assist a geographical information system for exploratory spatio-temporal analysis
Optimal construction of k-nearest neighbor graphs for identifying noisy clusters
We study clustering algorithms based on neighborhood graphs on a random
sample of data points. The question we ask is how such a graph should be
constructed in order to obtain optimal clustering results. Which type of
neighborhood graph should one choose, mutual k-nearest neighbor or symmetric
k-nearest neighbor? What is the optimal parameter k? In our setting, clusters
are defined as connected components of the t-level set of the underlying
probability distribution. Clusters are said to be identified in the
neighborhood graph if connected components in the graph correspond to the true
underlying clusters. Using techniques from random geometric graph theory, we
prove bounds on the probability that clusters are identified successfully, both
in a noise-free and in a noisy setting. Those bounds lead to several
conclusions. First, k has to be chosen surprisingly high (rather of the order n
than of the order log n) to maximize the probability of cluster identification.
Secondly, the major difference between the mutual and the symmetric k-nearest
neighbor graph occurs when one attempts to detect the most significant cluster
only.Comment: 31 pages, 2 figure
Autonomous clustering using rough set theory
This paper proposes a clustering technique that minimises the need for subjective
human intervention and is based on elements of rough set theory. The proposed algorithm is
unified in its approach to clustering and makes use of both local and global data properties to
obtain clustering solutions. It handles single-type and mixed attribute data sets with ease and
results from three data sets of single and mixed attribute types are used to illustrate the
technique and establish its efficiency
Gibbs measures on Brownian currents
Motivated by applications to quantum field theory we consider Gibbs measures
for which the reference measure is Wiener measure and the interaction is given
by a double stochastic integral and a pinning external potential. In order
properly to characterize these measures through DLR equations, we are led to
lift Wiener measure and other objects to a space of configurations where the
basic observables are not only the position of the particle at all times but
also the work done by test vector fields. We prove existence and basic
properties of such Gibbs measures in the small coupling regime by means of
cluster expansion.Comment: 51 page
- …