9 research outputs found

    Wrapping of Web Sources with restricted Query Interfaces by Query Tunneling

    Get PDF
    AbstractInformation sources in the World Wide Web usually offer two different schemes to their users, an Interface Schema which the user can query and a Result Schema which the user can browse. Often the Interface Schema is more restricted than the Result Schema, moreover many sources offer keyword-search interfaces only. Thus query capabilities of such sources are very small and a useful integration into a mediator-based information system using query capabilities is almost impossible. We propose the Query Tunnelling architecture for the wrapping of these restricted web sources. Wrapping of sources by Query Tunneling hides restrictive query interfaces and makes such sources fully queryable based on their result schema. The process of Query Tunneling is divided into two main steps, Query Relaxation to make a higher order query suitable to a restricted interface and Result Restriction in order to filter the results using the original query

    Identifizierung von Realwelt-Objekten in multiplen Datenbanken

    Get PDF
    Die Daten von Realwelt-Objekten können in mehreren Datenbanken enthalten sein, ohne daß ein globaler und konsistenter Identifizierer existiert. Wie läßt sich herausfinden, welche der Daten sich auf dieselben Realwelt-Objekte beziehen? Das hier dargestellte allgemeine Modell für die Objektidentifizierung besteht aus den Schritten Konversion, Vergleich und Klassifikation. Es umfaßt zudem: (1) Identifizierungskonzepte, (2) die Softwarearchitektur, (3) Charakteristika der Datenqualität, (4) eine Vorauswahlmethode, die die Effizienz für große Datenbanken sicherstellt (unter Verwendung von Indexstrukturen) und (5) eine Spezifikation für die Evaluation von Verfahren, einschließlich Stichprobenziehung und Qualitätskriterien. Wir bewerteten verschiedene Verfahren mit Wohnungs-, Adreß- und Bibliotheksdaten. Wesentliche Ergebnisse sind, daß die Skalierbarkeit ausschließlich durch die verwandte Vorauswahlmethode und deren Umsetzung bestimmt ist sowie daß das Entscheidungsbaumverfahren eine höhere Korrektheit erreichte und robuster war als Record Linkage.Object Identification is essential where real-world objects data are distributed over multiple databases without any global and consistent identifier. We present a generic object identification framework, consisting of three successive steps: Conversion, Comparison, and Classification. In addition, the framework covers: (1) concepts for identification, (2) its software architecture, (3) data quality characteristics, (4) a preselection technique that ensures efficiency for large databases (incorporating suitable index structures), and (5) a prescription for evaluation, including sampling and quality criteria. Based on the framework, methods can be specified, implemented and evaluated w.r.t. to the requirements of an application. We evaluated several methods on real data. One main result is that scalability is determined by the applied preselection technique and its implementation. As another result we can state that Decision Tree Induction achieves better correctness and is more robust than Record Linkage

    WrapIt: Automated Integration of Web Databases with Extensional Overlaps

    No full text
    The world wide web does not longer consist of static web pages. Instead, more and more web pages are created dynamically from user request and database content. Conventional search engines do not consider these dynamic pages, as user input cannot be simulated, thus providing often insu#cient results

    Improving the Quality of Association Rule Mining by Means of Rough Sets

    No full text
    Summary. We evaluate the rough set and the association rule method with respect to their performance and the quality of the produced rules. It is shown that despite their different approaches, both methods are based on the same principle and, consequently, must generate identical rules. However, they differ strongly with respect to performance. Subsequently an optimized association rule procedure is presented which unifies the advantages of both methods.

    Object Identification Quality

    Get PDF
    Research and industry has tackled the object identification problem of data integration in many different ways. This paper presents a framework, that allows the evaluation of competing approaches. To this end, complexity measures and data characteristics are introduced, which reflect the hardness of a given object identification problem. All characteristics can be estimated by use of simple SQL queries and simple calculations. Following the principle of benchmark definitions we specify a test framework. It consists of a test database and its characteristics, quality criteria, and a test specification. Adequate measures needed for the correctness criterion of the benchmark are given. A running example of the Berlin Online Apartment-Advertisements database (BOA) illustrates the approach. The BOA-database is freely available at www.wiwiss.fu-berlin.de/lenz/boa/. I. MOTIVATION Even though quality cannot be def ined, you know what it is
    corecore