28,125 research outputs found
Incremental Entity Resolution from Linked Documents
In many government applications we often find that information about
entities, such as persons, are available in disparate data sources such as
passports, driving licences, bank accounts, and income tax records. Similar
scenarios are commonplace in large enterprises having multiple customer,
supplier, or partner databases. Each data source maintains different aspects of
an entity, and resolving entities based on these attributes is a well-studied
problem. However, in many cases documents in one source reference those in
others; e.g., a person may provide his driving-licence number while applying
for a passport, or vice-versa. These links define relationships between
documents of the same entity (as opposed to inter-entity relationships, which
are also often used for resolution). In this paper we describe an algorithm to
cluster documents that are highly likely to belong to the same entity by
exploiting inter-document references in addition to attribute similarity. Our
technique uses a combination of iterative graph-traversal, locality-sensitive
hashing, iterative match-merge, and graph-clustering to discover unique
entities based on a document corpus. A unique feature of our technique is that
new sets of documents can be added incrementally while having to re-resolve
only a small subset of a previously resolved entity-document collection. We
present performance and quality results on two data-sets: a real-world database
of companies and a large synthetically generated `population' database. We also
demonstrate benefit of using inter-document references for clustering in the
form of enhanced recall of documents for resolution.Comment: 15 pages, 8 figures, patented wor
Matching Dependencies with Arbitrary Attribute Values: Semantics, Query Answering and Integrity Constraints
Matching dependencies (MDs) were introduced to specify the identification or
matching of certain attribute values in pairs of database tuples when some
similarity conditions are satisfied. Their enforcement can be seen as a natural
generalization of entity resolution. In what we call the "pure case" of MDs,
any value from the underlying data domain can be used for the value in common
that does the matching. We investigate the semantics and properties of data
cleaning through the enforcement of matching dependencies for the pure case. We
characterize the intended clean instances and also the "clean answers" to
queries as those that are invariant under the cleaning process. The complexity
of computing clean instances and clean answers to queries is investigated.
Tractable and intractable cases depending on the MDs and queries are
identified. Finally, we establish connections with database "repairs" under
integrity constraints.Comment: 13 pages, double column, 2 figure
Image database system for glaucoma diagnosis support
Tato práce popisuje přehled standardních a pokročilých metod používaných k diagnose glaukomu v ranném stádiu. Na základě teoretických poznatků je implementován internetově orientovaný informační systém pro oční lékaře, který má tři hlavní cíle. Prvním cílem je možnost sdílení osobních dat konkrétního pacienta bez nutnosti posílat tato data internetem. Druhým cílem je vytvořit účet pacienta založený na kompletním očním vyšetření. Posledním cílem je aplikovat algoritmus pro registraci intenzitního a barevného fundus obrazu a na jeho základě vytvořit internetově orientovanou tři-dimenzionální vizualizaci optického disku. Tato práce je součásti DAAD spolupráce mezi Ústavem Biomedicínského Inženýrství, Vysokého Učení Technického v Brně, Oční klinikou v Erlangenu a Ústavem Informačních Technologií, Friedrich-Alexander University, Erlangen-Nurnberg.This master thesis describes a conception of standard and advanced eye examination methods used for glaucoma diagnosis in its early stage. According to the theoretical knowledge, a web based information system for ophthalmologists with three main aims is implemented. The first aim is the possibility to share medical data of a concrete patient without sending his personal data through the Internet. The second aim is to create a patient account based on a complete eye examination procedure. The last aim is to improve the HRT diagnostic method with an image registration algorithm for the fundus and intensity images and create an optic nerve head web based 3D visualization. This master thesis is a part of project based on DAAD co-operation between Department of Biomedical Engineering, Brno University of Technology, Eye Clinic in Erlangen and Department of Computer Science, Friedrich-Alexander University, Erlangen-Nurnberg.
Recommended from our members
Learning from AI : new trends in database technology
Recently some researchers in the areas of database data modelling and knowledge representations in artificial intelligence have recognized that they share many common goals. In this survey paper we show the relationship between database and artificial intelligence research. We show that there has been a tendency for data models to incorporate more modelling techniques developed for knowledge representations in artificial intelligence as the desire to incorporate more application oriented semantics, user friendliness, and flexibility has increased. Increasing the semantics of the representation is the key to capturing the "reality" of the database environment, increasing user friendliness, and facilitating the support of multiple, possibly conflicting, user views of the information contained in a database
- …