4 research outputs found

    Integrating uncertain XML data from different sources.

    Get PDF
    Data Integration has become increasingly important with today's rapid growth of information available on the web and in electronic form. In the past several years, extensive work has been done to make use of the available data from different sources, particularly, in the scientific and medical fields. In our work, we are interested in integrating data from different uncertain sources in which data are stored in semistructured databases, markedly XML-based data. This interest in XML-based databases came from the flexibility it provides for storing and exchanging data. Furthermore, we are concerned with reliability of different query answers from various sources and on specifying the source where the data came from (the provenance). In essence, our work lies among three areas of interest, data integration, uncertain databases and lineage or provenance in databases. This thesis extends previous work on information integration to accommodate integration of uncertain data from multiple sources

    Inconsistency and Incompleteness in Relational Databases and Logic Programs

    Get PDF
    The aim of this thesis is to study the role played by negation in databases and to develop data models that can handle inconsistent and incomplete information. We develop models that also allow incompleteness through disjunctive information under both the CWA and the OWA in relational databases. In the area of logic programming, extended logic programs allow explicit representation of negative information. As a result, a number of extended logic programs have an inconsistent semantics. We present a translation of extended logic programs to normal logic programs that is more tolerant to inconsistencies. Extended logic programs have also been used widely in order to compute the repairs of an inconsistent database. We present some preliminary ideas on how source information can be incorporated into the repair program in order to produce a subset of the set of all repairs based on a preference for certain sources over others

    Supporting Uncertainty in Standard Database Management Systems

    Get PDF
    Management of uncertain data in numerous real life applications has attracted the attention of database and artificial intelligent research communities. This has resulted in development of new database management systems (DBMS) in which uncertainty is treated as first class citizens. We follow a different approach in this thesis and develop a system (to which we refer as DBMS with Uncertainty, or UDBMS) which is capable of representing and manipulating uncertain data at the application level on top of a standard relational DBMS. Compared to the first approach which treats uncertainty as its first class citizens, the proposed approach may be considered as “light weight” because it is built upon existing database technologies. As the underlying uncertainty formalism, we consider the Information Source Tracking (IST) method, which is essentially probabilistic. We extend the standard SQL language with uncertainty (to which we refer as USQL), to express queries and transactions in our context. The query processing and optimization techniques are extended accordingly to take into account the presence of uncertainty. To evaluate the performance of UDBMS, we conducted extensive experiments using USQL queries and IST relations obtained by extending the standard TPC-H benchmark queries and generated data. We compare and discuss the two approaches mentioned for uncertainty management. Our results indicate that the performance of the proposed UDBMS is reasonably good when the relations involved can be loaded completely into the main memory

    Data quality management and data cleaning

    Get PDF
    Today´s enterprises are often challenged by managing a large amount of data used in their business operation. Assurance and maintenance of adequate data quality level are important aspects of data quality management due to many reasons. On the one hand, the adequate data quality level represents a competitive advantage, and on the other hand, low data quality level leads to many unpleasant consequences. In the past, frameworks, methodologies, and tools to help ensuring adequate level of data quality were formed. Besides, the question of data quality is discussed in legislation and various standards. Despite that fact, some researches show poor state of data quality in enterprises. A purpose of the thesis is to research and present the area of data quality, and to show subsequent issues of low data quality. The thesis presents consequences as well as reasons of low data quality. It also shows reasons of data quality importance. In addition, it presents standards, legislation, and best practices that deal with the field of data quality. Data quality issues also arise in the field of the Internet of Things, which is an object of many researches lately, therefore, the thesis also presents main issues from that point of view. The main emphasis of the thesis is on the part of the field dealing with data quality and data cleaning. The thesis presents error types, various data cleaning frameworks, and combines their main activities in a consolidated view. Furthermore, the thesis presents an overview of the existing software solutions available on the market to support data cleaning tasks. The aforementioned is introduced in the theoretical part of the thesis. The second part of the thesis represents a practical part, where a proposal for data quality improvement is given using a prototype of a software solution to address a specific part of data quality management, which deals with data accuracy maintenance by sensing errors in data, and the possibility of error elimination (data cleaning). In addition, the thesis proposes installation of the solution in a concrete organisation´s information system by considering principles and rules the literature suggests. In the conclusion, there are essential approaches given to aid the improvement of data quality field in enterprises