148 research outputs found

    BClean: A Bayesian Data Cleaning System

    Full text link
    There is a considerable body of work on data cleaning which employs various principles to rectify erroneous data and transform a dirty dataset into a cleaner one. One of prevalent approaches is probabilistic methods, including Bayesian methods. However, existing probabilistic methods often assume a simplistic distribution (e.g., Gaussian distribution), which is frequently underfitted in practice, or they necessitate experts to provide a complex prior distribution (e.g., via a programming language). This requirement is both labor-intensive and costly, rendering these methods less suitable for real-world applications. In this paper, we propose BClean, a Bayesian Cleaning system that features automatic Bayesian network construction and user interaction. We recast the data cleaning problem as a Bayesian inference that fully exploits the relationships between attributes in the observed dataset and any prior information provided by users. To this end, we present an automatic Bayesian network construction method that extends a structure learning-based functional dependency discovery method with similarity functions to capture the relationships between attributes. Furthermore, our system allows users to modify the generated Bayesian network in order to specify prior information or correct inaccuracies identified by the automatic generation process. We also design an effective scoring model (called the compensative scoring model) necessary for the Bayesian inference. To enhance the efficiency of data cleaning, we propose several approximation strategies for the Bayesian inference, including graph partitioning, domain pruning, and pre-detection. By evaluating on both real-world and synthetic datasets, we demonstrate that BClean is capable of achieving an F-measure of up to 0.9 in data cleaning, outperforming existing Bayesian methods by 2% and other data cleaning methods by 15%.Comment: Our source code is available at https://github.com/yyssl88/BClea

    Numerical methods and accurate computations with structured matrices

    Get PDF
    Esta tesis doctoral es un compendio de 11 artículos científicos. El tema principal de la tesis es el Álgebra Lineal Numérica, con énfasis en dos clases de matrices estructuradas: las matrices totalmente positivas y las M-matrices. Para algunas subclases de estas matrices, es posible desarrollar algoritmos para resolver numéricamente varios de los problemas más comunes en álgebra lineal con alta precisión relativa independientemente del número de condición de la matriz. La clave para lograr cálculos precisos está en el uso de una parametrización diferente que represente la estructura especial de la matriz y en el desarrollo de algoritmos adaptados que trabajen con dicha parametrización.Las matrices totalmente positivas no singulares admiten una factorización única como producto de matrices bidiagonales no negativas llamada factorización bidiagonal. Si conocemos esta representación con alta precisión relativa, se puede utilizar para resolver ciertos sistemas de ecuaciones y para calcular la inversa, los valores propios y los valores singulares con alta precisión relativa. Nuestra contribución en este campo ha sido la obtención de la factorización bidiagonal con alta precisión relativa de matrices de colocación de polinomios de Laguerre generalizados, de matrices de colocación de polinomios de Bessel, de clases de matrices que generalizan la matriz de Pascal y de matrices de q-enteros. También hemos estudiado la extensión de varias propiedades óptimas de las matrices de colocación de B-bases normalizadas (que en particular son matrices totalmente positivas). En particular, hemos demostrado propiedades de optimalidad de las matrices de colocación del producto tensorial de B-bases normalizadas.Si conocemos las sumas de filas y las entradas extradiagonales de una M-matriz no singular diagonal dominante con alta precisión relativa, entonces podemos calcular su inversa, determinante y valores singulares también con alta precisión relativa. Hemos buscado nuevos métodos para lograr cálculos precisos con nuevas clases de M-matrices o matrices relacionadas. Hemos propuesto una parametrización para las Z-matrices de Nekrasov con entradas diagonales positivas que puede utilizarse para calcular su inversa y determinante con alta precisión relativa. También hemos estudiado la clase denominada B-matrices, que está muy relacionada con las M-matrices. Hemos obtenido un método para calcular los determinantes de esta clase con alta precisión relativa y otro para calcular los determinantes de las matrices de B-Nekrasov también con alta precisión relativa. Basándonos en la utilización de dos matrices de escalado que hemos introducido, hemos desarrollado nuevas cotas para la norma infinito de la inversa de una matriz de Nekrasov y para el error del problema de complementariedad lineal cuando su matriz asociada es de Nekrasov. También hemos obtenido nuevas cotas para la norma infinito de las inversas de Bpi-matrices, una clase que extiende a las B-matrices, y las hemos utilizado para obtener nuevas cotas del error para el problema de complementariedad lineal cuya matriz asociada es una Bpi-matriz. Algunas clases de matrices han sido generalizadas al caso de mayor dimensión para desarrollar una teoría para tensores extendiendo la conocida para el caso matricial. Por ejemplo, la definición de la clase de las B-matrices ha sido extendida a la clase de B-tensores, dando lugar a un criterio sencillo para identificar una nueva clase de tensores definidos positivos. Hemos propuesto una extensión de la clase de las Bpi-matrices a Bpi-tensores, definiendo así una nueva clase de tensores definidos positivos que puede ser identificada en base a un criterio sencillo basado solo en cálculos que involucran a las entradas del tensor. Finalmente, hemos caracterizado los casos en los que las matrices de Toeplitz tridiagonales son P-matrices y hemos estudiado cuándo pueden ser representadas en términos de una factorización bidiagonal que sirve como parametrización para lograr cálculos con alta precisión relativa.<br /

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF

    Deep learning for internet of underwater things and ocean data analytics

    Get PDF
    The Internet of Underwater Things (IoUT) is an emerging technological ecosystem developed for connecting objects in maritime and underwater environments. IoUT technologies are empowered by an extreme number of deployed sensors and actuators. In this thesis, multiple IoUT sensory data are augmented with machine intelligence for forecasting purposes

    PERICLES Deliverable 4.3:Content Semantics and Use Context Analysis Techniques

    Get PDF
    The current deliverable summarises the work conducted within task T4.3 of WP4, focusing on the extraction and the subsequent analysis of semantic information from digital content, which is imperative for its preservability. More specifically, the deliverable defines content semantic information from a visual and textual perspective, explains how this information can be exploited in long-term digital preservation and proposes novel approaches for extracting this information in a scalable manner. Additionally, the deliverable discusses novel techniques for retrieving and analysing the context of use of digital objects. Although this topic has not been extensively studied by existing literature, we believe use context is vital in augmenting the semantic information and maintaining the usability and preservability of the digital objects, as well as their ability to be accurately interpreted as initially intended.PERICLE

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access book constitutes the proceedings of the 28th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2022, which was held during April 2-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 46 full papers and 4 short papers presented in this volume were carefully reviewed and selected from 159 submissions. The proceedings also contain 16 tool papers of the affiliated competition SV-Comp and 1 paper consisting of the competition report. TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, exibility, and efficiency of tools and algorithms for building computer-controlled systems

    Recognizing complex faces and gaits via novel probabilistic models

    Get PDF
    In the field of computer vision, developing automated systems to recognize people under unconstrained scenarios is a partially solved problem. In unconstrained sce- narios a number of common variations and complexities such as occlusion, illumi- nation, cluttered background and so on impose vast uncertainty to the recognition process. Among the various biometrics that have been emerging recently, this dissertation focus on two of them namely face and gait recognition. Firstly we address the problem of recognizing faces with major occlusions amidst other variations such as pose, scale, expression and illumination using a novel PRObabilistic Component based Interpretation Model (PROCIM) inspired by key psychophysical principles that are closely related to reasoning under uncertainty. The model basically employs Bayesian Networks to establish, learn, interpret and exploit intrinsic similarity mappings from the face domain. Then, by incorporating e cient inference strategies, robust decisions are made for successfully recognizing faces under uncertainty. PROCIM reports improved recognition rates over recent approaches. Secondly we address the newly upcoming gait recognition problem and show that PROCIM can be easily adapted to the gait domain as well. We scienti cally de ne and formulate sub-gaits and propose a novel modular training scheme to e ciently learn subtle sub-gait characteristics from the gait domain. Our results show that the proposed model is robust to several uncertainties and yields sig- ni cant recognition performance. Apart from PROCIM, nally we show how a simple component based gait reasoning can be coherently modeled using the re- cently prominent Markov Logic Networks (MLNs) by intuitively fusing imaging, logic and graphs. We have discovered that face and gait domains exhibit interesting similarity map- pings between object entities and their components. We have proposed intuitive probabilistic methods to model these mappings to perform recognition under vari- ous uncertainty elements. Extensive experimental validations justi es the robust- ness of the proposed methods over the state-of-the-art techniques.
    corecore