1 research outputs found

    Der HyperView-Ansatz zur Integration semistrukturierter Daten

    No full text
    Title page, contents 1 Introduction 1.1 Integration of semistructured information sources Virtual Web Sites 1.2 The HyperView approach 1.2.1 Data Model and View Mechanism 1.2.2 Architecture 1.2.3 Application of the HyperView Technology 1.3 Related Work (Overview) 1.4 Overview 2 HyperView by Example: Wrapping Publisher Web Sites 2.1 Digital Libraries of Electronic Journals 2.1.1 The DARWIN project 2.1.2 Use cases 2.2 Modeling publisher Web Sites 2.2.1 Generic approach 2.2.2 Graph Schemata 2.2.3 The HyperView Database Schema 2.2.4 ACR Schemata of Example Web Sources 2.2.5 Representing HTML Pages as HTML graphs 2.3 Building Views on publisher Web Sites 2.3.1 Queries and Rules 2.3.2 Defining a View over the HTML Graphs 2.3.3 Defining a View over the ACR Graphs 2.3.4 Querying the HyperView system 2.4 The Architecture of DARWIN 2.5 Summary 3 Formal Framework 3.1 Clustered Graph Data Model (CGDM) 3.1.1 Motivation 3.1.2 Basic definitions 3.1.3 Schemata and instances 3.2 Rules 3.2.1 Rule application 3.3 Queries and Oracles 3.3.1 Applying a rule to a virtual data graph 3.3.2 Hyperviews 3.3.3 Using a rule to answer a subquery 3.3.4 Chaining rules to answer a query 3.4 Reuse of existing subgraphs 3.5 Bibliography on Graph-Transformation 3.6 Summary 4 The HyperView System 4.1 Encoding of Graphs 4.1.1 Plain Graphs 4.1.2 Clustered Graphs 4.1.3 Type checking 4.2 Encoding of Queries 4.3 Encoding of Rules 4.4 Rule Activation 4.5 Query execution 4.6 Complexity and Performance 4.7 Metadata management 4.7.1 Schema clusters 4.7.2 The `meta` cluster 4.7.3 WWW meta data 4.8 The HyperView System prototype 4.9 Summary 5 The HVQL Query Language 5.1 Introduction 5.2 Basic Notations 5.3 Graph Patterns 5.4 Graph Literals 5.5 Queries 5.5.1 Syntax 5.5.2 Semantics 5.5.3 Implementation 5.6 Rules 5.6.1 Syntax 5.6.2 Semantics 5.6.3 Implementation 5.6.4 Example 5.7 Meta Edges 5.8 HTML Edges 5.9 Embedding of HVQL in the HyperView System 5.10 Summary 6 Support for Web Interfaces 6.1 Introduction 6.2 Architecture of the HyperView Web server 6.3 Conceptual model of the virtual HyperView Web site 6.4 HTML Code Generation 6.4.1 Phase 1: Preparation 6.4.2 Phase 2: Generation of a HTML skeleton 6.4.3 Phase 3: HTML dump and generation of variable HTML code 6.4.4 HVQL notation for HTML rules 6.5 The HyperView Browser 6.5.1 Customization 6.6 Summary 7 Case Study: Town Information 7.1 Introduction 7.2 Scenario 7.2.1 Use Case 7.3 Developing a cultural event calendar 7.3.1 Conceptual schema 7.3.2 Wrapping town information sites 7.4 The cultural calendar Web site 7.5 Summary 8 The HyperView Methodology 8.1 User roles 8.2 Content Specification 8.3 The Design Space of HyperView 8.4 Schema development 8.4.1 HTML layer 8.4.2 ACR layer 8.4.3 Database layer 8.4.4 UI layer 8.5 View development 8.5.1 Implementing HTML views 8.5.2 ACR Views 8.5.3 DB Views 8.6 Maintenance 8.6.1 Robustness 8.6.2 Error detection 8.6.3 Adaption 8.7 Summary 9 Discussion and Outlook 9.1 Related Work 9.1.1 Data models and schemata for semistructured data 9.1.2 Data Extraction from Semistructured Documents 9.1.3 Querying the Web 9.1.4 Integration of Heterogeneous Data Sources 9.1.5 Related applications of Graph- Transformation techniques 9.1.6 Comparison with HyperView 9.2 Future Applications: XML & RDF 9.2.1 XML 9.2.2 XML Parsing 9.2.3 XML DTD s and schemata 9.2.4 XPointer and XQL 9.2.5 Extensible Stylesheet Language 9.2.6 Channel Definition Format 9.2.7 Resource Description Framework (RDF) 9.2.8 RDF Schemata 9.2.9 Summary 9.3 Open Issues 9.3.1 Theoretical Issues 9.3.2 Integration Issues 9.3.3 Implementation and Performance Issues 9.3.4 Interface Issues 9.4 Contributions and Outlook 9.5 Acknowledgments Bibliography Table of Mathematical Symbols Zusammenfassung der Ergebnisse Lebenslauf Verwendete HilfsmittelUsing the World Wide Web to answer a specific question often requires information to be collected from multiple heterogeneous Web sites. Virtual Web sites are a promising approach to automate this task for particular, focused application domains. A virtual Web site serves pages containing concentrated information that has been extracted, homogenized, and combined from several underlying Web sites. The HyperView approach to the integration of semistructured data presented in this thesis provides a methodology, a formal framework, and a software environment for building such virtual Web sites. The HyperView approach treats the three steps of data extraction, integration, and presentation uniformly as consecutive views that map between different levels of abstraction. These levels are reflected by the architectural layers of the system. The contents of Web sites as well as the consecutive views are represented as graphs. Views are defined by sets of graph transformation rules. A demand-driven rule activation mechanism has been formally described and implemented. This mechanism incrementally materializes views in response to queries issued against them. The HyperView System has been implemented in Prolog. Graph transformation rules are compiled into efficient Prolog predicates. Java servlets are used to support virtual Web sites. The main contributions of this thesis are: 1\. the key idea of applying the same view mechanism uniformly to the problems of extraction, integration, and presentation, 2\. the HyperView methodology for modeling and integrating Web sites, 3\. the formal framework defining the data model, rule concept, and the demand-driven view materialization mechanism of HyperView, 4\. the HyperView System prototype providing a platform for building virtual integrated Web sites 5\. the validation of the HyperView methodology and system in case studies on Digital Libraries and Town Information.Die Beantwortung konkreter Fragen per World Wide Web erfordert häufig das Zusammentragen und Kombinieren von Informationen aus mehreren Web-Sites. Virtuelle Web Sites versprechen, diese Aufgabe zumindest für begrenzte Anwendungsbereiche zu automatisieren. Ein virtueller Web Site bietet Informationen, die aus zugrundeliegenden Web Sites extrahiert, vereinheitlicht, und integriert wurden. Der HyperView-Ansatz zur Integration von semistrukturierten Daten besteht aus einer Methodik, einem mathematischen Formalismus und einer Software-Umgebung für die Realisierung virtueller Web Sites. Im HyperView-Ansatz werden die drei Schritte der Extrahierung, Integration und Präsentation der Daten als aufeinanderfolgende Sichten (Views) aufgefaßt, welche die Abstraktionsebenen der HyperView-Architektur aufeinander abbilden. Der Inhalt jeder Schicht wird durch Graphen repräsentiert. Sichten werden durch Mengen von Graphtransformationsregeln definiert. Ein bedarfsgesteuerter Mechanismus zur Aktivierung dieser Regeln wurde formal beschrieben und implementiert. Dieser Mechanismus materialisiert Sichten inkrementell, in Reaktion auf Anfragen. Das HyperView System ist in Prolog implementiert. Graphtransformationsregeln werden in effiziente Prolog-Prädikate kompiliert. Java Servlets werden für die Generierung von HTML-Seiten genutzt. Die Hauptergebnisse dieser Arbeit sind: 1\. der Nachweis, daß die Probleme der Daten-Extraktion, -Integration, und -Präsentation mit einem einheitlichen Abbildungs-Mechanismus gelöst werden können, 2\. die HyperView-Methodik für die Modellierung und Integration von Web-Sites, 3\. die formale Definition des Datenmodells, des Regelkonzepts und des bedarfsgesteuerten Mechanismus für die Materialisierung von Sichten, 4\. die Implementierung des HyperView System s als einer Plattform für die Errichtung virtueller Web-Sites, und 5\. die Validierung der HyperView-Methodik und des HyperView System s in Fallstudien zu Digitalen Bibliotheken und Stadtinformationen
    corecore