9,017 research outputs found
Computing fuzzy rough approximations in large scale information systems
Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory groups objects together based on the indiscernibility of their attribute values. Fuzzy rough set theory extends rough set theory to data with continuous attributes, and detects degrees of inconsistency in the data. Key to this is turning the indiscernibility relation into a gradual relation, acknowledging that objects can be similar to a certain extent. In very large datasets with millions of objects, computing the gradual indiscernibility relation (or in other words, the soft granules) is very demanding, both in terms of runtime and in terms of memory. It is however required for the computation of the lower and upper approximations of concepts in the fuzzy rough set analysis pipeline. Current non-distributed implementations in R are limited by memory capacity. For example, we found that a state of the art non-distributed implementation in R could not handle 30,000 rows and 10 attributes on a node with 62GB of memory. This is clearly insufficient to scale fuzzy rough set analysis to massive datasets. In this paper we present a parallel and distributed solution based on Message Passing Interface (MPI) to compute fuzzy rough approximations in very large information systems. Our results show that our parallel approach scales with problem size to information systems with millions of objects. To the best of our knowledge, no other parallel and distributed solutions have been proposed so far in the literature for this problem
Enabling scalable stochastic gradient-based inference for Gaussian processes by employing the Unbiased LInear System SolvEr (ULISSE)
In applications of Gaussian processes where quantification of uncertainty is
of primary interest, it is necessary to accurately characterize the posterior
distribution over covariance parameters. This paper proposes an adaptation of
the Stochastic Gradient Langevin Dynamics algorithm to draw samples from the
posterior distribution over covariance parameters with negligible bias and
without the need to compute the marginal likelihood. In Gaussian process
regression, this has the enormous advantage that stochastic gradients can be
computed by solving linear systems only. A novel unbiased linear systems solver
based on parallelizable covariance matrix-vector products is developed to
accelerate the unbiased estimation of gradients. The results demonstrate the
possibility to enable scalable and exact (in a Monte Carlo sense)
quantification of uncertainty in Gaussian processes without imposing any
special structure on the covariance or reducing the number of input vectors.Comment: 10 pages - paper accepted at ICML 201
Dealing with uncertain entities in ontology alignment using rough sets
This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Ontology alignment facilitates exchange of knowledge among heterogeneous data sources. Many approaches to ontology alignment use multiple similarity measures to map entities between ontologies. However, it remains a key challenge in dealing with uncertain entities for which the employed ontology alignment measures produce conflicting results on similarity of the mapped entities. This paper presents OARS, a rough-set based approach to ontology alignment which achieves a high degree of accuracy in situations where uncertainty arises because of the conflicting results generated by different similarity measures. OARS employs a combinational approach and considers both lexical and structural similarity measures. OARS is extensively evaluated with the benchmark ontologies of the ontology alignment evaluation initiative (OAEI) 2010, and performs best in the aspect of recall in comparison with a number of alignment systems while generating a comparable performance in precision
Covering rough sets based on neighborhoods: An approach without using neighborhoods
Rough set theory, a mathematical tool to deal with inexact or uncertain
knowledge in information systems, has originally described the indiscernibility
of elements by equivalence relations. Covering rough sets are a natural
extension of classical rough sets by relaxing the partitions arising from
equivalence relations to coverings. Recently, some topological concepts such as
neighborhood have been applied to covering rough sets. In this paper, we
further investigate the covering rough sets based on neighborhoods by
approximation operations. We show that the upper approximation based on
neighborhoods can be defined equivalently without using neighborhoods. To
analyze the coverings themselves, we introduce unary and composition operations
on coverings. A notion of homomorphismis provided to relate two covering
approximation spaces. We also examine the properties of approximations
preserved by the operations and homomorphisms, respectively.Comment: 13 pages; to appear in International Journal of Approximate Reasonin
- …