4 research outputs found

    A constrained clustering approach to duplicate detection among relational data

    Full text link
    This paper proposes an approach to detect duplicates among relational data. Traditional methods for record linkage or duplicate detection work on a set of records which have no explicit relations with each other. These records can be formatted into a single database table for processing. However, there are situations that records from different sources can not be flattened into one table and records within one source have certain (semantic) relations between them. The duplicate detection issue of these relational data records/instances can be dealt with by formatting them into several tables and applying traditional methods to each table. However, as the relations among the original data records are ignored, this approach generates poor or inconsistent results. This paper analyzes the characteristics of relational data and proposes a particular clustering approach to perform duplicate detection. This approach incorporates constraint rules derived from the characteristics of relational data and therefore yields better and more consistent results, which are revealed by our experiments. © Springer-Verlag Berlin Heidelberg 2007

    A cooperative active perception approach for swarm robotics

    Get PDF
    More than half a century after modern robotics first emerged, we still face a landscape in which most of the work done by robots is predetermined, rather than autonomous. A strong understanding of the environment is one of the key factors for autonomy, enabling the robots to make correct decisions based on the environment surrounding them. Classic methods for obtaining robotic controllers are based on manual specification, but become less trivial as the complexity scales. Artificial intelligence methods like evolutionary algorithms were introduced to synthesize robotic controllers by optimizing an artificial neural network to a given fitness function that measures the robots’ performance to solve a predetermined task. In this work, a novel approach to swarm robotics environment perception is studied, with a behavior model based on the cooperative identification of objects that fly around an environment, followed by an action based on the result of the identification process. Controllers are obtained via evolutionary methods. Results show a controller with a high identification and correct decision rates. The work is followed by a study on scaling up that approach to multiple environments. Experiments are done on terrain, marine and aerial environments, as well as on ideal, noisy and hybrid scenarios. In the hybrid scenario, different evolution samples are done in different environments. Results show the way these controllers are able to adapt to each scenario and conclude a hybrid evolution is the best fit to generate a more robust and environment independent controller to solve our task.Mais de um século após a robótica moderna ter surgido, ainda nos deparamos com um cenário onde a maioria do trabalho executado por robôs é pré-determinado, ao invés de autónomo. Uma forte compreensão do ambiente é um dos pontos chave para a autonomia, permitindo aos robôs tomarem decisões corretas baseadas no ambiente que os rodeia. Abordagens mais clássicas para obter controladores de robótica são baseadas na especificação manual, mas tornam-se menos apropriadas à medida que a complexidade aumenta. Métodos de inteligência artificial como algoritmos evolucionários foram introduzidos para obter controladores de robótica através da otimização de uma rede neuronal artificial para uma função de fitness que mede a aptidão dos robôs para resolver uma determinada tarefa. Neste trabalho, é apresentada uma nova abordagem para perceção do ambiente por um enxame de robôs, com um modelo de comportamento baseado na identificação cooperativa de objetos que circulam no ambiente, seguida de uma atuação baseada no resultado da identificação. Os controladores são obtidos através de métodos evolucionários. Os resultados apesentam um controlador com uma alta taxa de identificação e de decisão. Segue-se um estudo sobre o escalonamento da abordagem a múltiplos ambientes. São feitas experiencias num ambiente terrestre, marinho e aéreo, bem como num contexto ideal, ruidoso e híbrido. No contexto híbrido, diferentes samples da evolução ocorrem em diferentes ambientes. Os resultados demonstram a forma como cada controlador se adapta aos restantes ambientes e concluem que a evolução híbrida foi a mais capaz de gerar um controlador robusto e transversal aos diferentes ambientes. Palavras-chave: Robótica evolucionária, Sistemas multi-robô, Cooperação, Perceção, Identificação de objetos, Inteligência artificial, Aprendizagem automática, Redes neuronais, Múltiplos ambientes

    Statistical relational learning with nonparametric Bayesian models

    Get PDF
    Statistical relational learning analyzes the probabilistic constraints between the entities, their attributes and relationships. It represents an area of growing interest in modern data mining. Many leading researches are proposed with promising results. However, there is no easily applicable recipe of how to turn a relational domain (e.g. a database) into a probabilistic model. There are mainly two reasons. First, structural learning in relational models is even more complex than structural learning in (non-relational) Bayesian networks due to the exponentially many attributes an attribute might depend on. Second, it might be difficult and expensive to obtain reliable prior knowledge for the domains of interest. To remove these constraints, this thesis applies nonparametric Bayesian analysis to relational learning and proposes two compelling models: Dirichlet enhanced relational learning and infinite hidden relational learning. Dirichlet enhanced relational learning (DERL) extends nonparametric hierarchical Bayesian modeling to relational data. In existing relational models, the model parameters are global, which means the conditional probability distributions are the same for each entity and the relationships are independent of each other. To solve the limitations, we introduce hierarchical Bayesian (HB) framework to relational learning, such that model parameters can be personalized, i.e. owned by entities or relationships, and are coupled via common prior distributions. Additional flexibility is introduced in a nonparametric HB modeling, such that the learned knowledge can be truthfully represented. For inference, we develop an efficient variational method, which is motivated by the Polya urn representation of DP. DERL is demonstrated in a medical domain where we form a nonparametric HB model for entities involving hospitals, patients, procedures and diagnoses. The experiments show that the additional flexibility introduced by the nonparametric HB modeling results in a more accurate model to represent the dependencies between different types of relationships and gives significantly improved prediction performance about unknown relationships. In infinite hidden relational model (IHRM), we apply nonparametric mixture modeling to relational data, which extends the expressiveness of a relational model by introducing for each entity an infinite-dimensional hidden variable as part of a Dirichlet process (DP) mixture model. There are mainly three advantages. First, this reduces the extensive structural learning, which is particularly difficult in relational models due to the huge number of potential probabilistic parents. Second, the information can globally propagate in the ground network defined by the relational structure. Third, the number of mixture components for each entity class can be optimized by the model itself based on the data. IHRM can be applied for entity clustering and relationship/attribute prediction, which are two important tasks in relational data mining. For inference of IHRM, we develop four algorithms: collapsed Gibbs sampling with the Chinese restaurant process, blocked Gibbs sampling with the truncated stick breaking construction (SBC), and mean-field inference with truncated SBC, as well as an empirical approximation. IHRM is evaluated in three different domains: a recommendation system based on the MovieLens data set, prediction of the functions of yeast genes/proteins on the data set of KDD Cup 2001, and the medical data analysis. The experimental results show that IHRM gives significantly improved estimates of attributes/relationships and highly interpretable entity clusters in complex relational data
    corecore