28 research outputs found

    Querying web metadata: Native score management and text support in databases

    Get PDF
    In this article, we discuss the issues involved in adding a native score management system to object-relational databases, to be used in querying Web metadata (that describes the semantic content of Web resources). The Web metadata model is based on topics (representing entities), relationships among topics (called metalinks), and importance scores (sideway values) of topics and metalinks. We extend database relations with scoring functions and importance scores. We add to SQL score-management clauses with well-defined semantics, and propose the sidewayvalue algebra (SVA), to evaluate the extended SQL queries. SQL extensions and the SVA algebra are illustrated through two Web resources, namely, the DBLP Bibliography and the SIGMOD Anthology. SQL extensions include clauses for propagating input tuple importance scores to output tuples during query processing, clauses that specify query stopping conditions, threshold predicates (a type of approximate similarity predicates for text comparisons), and user-defined-function-based predicates. The propagated importance scores are then used to rank and return a small number of output tuples. The query stopping conditions are propagated to SVA operators during query processing. We show that our SQL extensions are well-defined, meaning that, given a database and a query Q, under any query processing scheme, the output tuples of Q and their importance scores stay the same. To process the SQL extensions, we discuss two sideway value algebra operators, namely, sideway value algebra join and topic closure, give their implementation algorithms, and report their experimental evaluations

    Metadata-based and personalized web querying

    Get PDF
    Cataloged from PDF version of article.The advent of the Web has raised new searching and querying problems. Keyword matching based querying techniques that have been widely used by search engines, return thousands of Web documents for a single query, and most of these documents are generally unrelated to the users’ information needs. Towards the goal of improving the information search needs of Web users, a recent promising approach is to index the Web by using metadata and annotations. In this thesis, we model and query Web-based information resources using metadata for improved Web searching capabilities. Employing metadata for querying the Web increases the precision of the query outputs by returning semantically more meaningful results. Our Web data model, named “Web information space model”, consists of Web-based information resources (HTML/XML documents on the Web), expert advice repositories (domain-expert-specified metadata for information resources), and personalized information about users (captured as user profiles that indicate users’ preferences about experts as well as users’ knowledge about topics). Expert advice is specified using topics and relationships among topics (i.e., metalinks), along the lines of recently proposed topic maps standard. Topics and metalinks constitute metadata that describe the contents of the underlying Web information resources. Experts assign scores to topics, metalinks, and information resources to represent the “importance” of them. User profiles store users’ preferences and navigational history information about the information resources that the user visits. User preferences, knowledge level on topics, and history information are used for personalizing the Web search, and improving the precision of the results returned to the user. We store expert advices and user profiles in an object relational database iv v management system, and extend the SQL for efficient querying of Web-based information resources through the Web information space model. SQL extensions include the clauses for propagating input importance scores to output tuples, the clause that specifies query stopping condition, and new operators (i.e., text similarity based selection, text similarity based join, and topic closure). Importance score propagation and query stopping condition allow ranking of query outputs, and limiting the output size. Text similarity based operators and topic closure operator support sophisticated querying facilities. We develop a new algebra called Sideway Value generating Algebra (SVA) to process these SQL extensions. We also propose evaluation algorithms for the text similarity based SVA directional join operator, and report experimental results on the performance of the operator. We demonstrate experimentally the effectiveness of metadata-based personalized Web search through SQL extensions over the Web information space model against keyword matching based Web search techniques.Özel, Selma AyşePh.D

    Protein Structure Data Management System

    Get PDF
    With advancement in the development of the new laboratory instruments and experimental techniques, the protein data has an explosive increasing rate. Therefore how to efficiently store, retrieve and modify protein data is becoming a challenging issue that most biological scientists have to face and solve. Traditional data models such as relational database lack of support for complex data types, which is a big issue for protein data application. Hence many scientists switch to the object-oriented databases since object-oriented nature of life science data perfectly matches the architecture of object-oriented databases, but there are still a lot of problems that need to be solved in order to apply OODB methodologies to manage protein data. One major problem is that the general-purpose OODBs do not have any built-in data types for biological research and built-in biological domain-specific functional operations. In this dissertation, we present an application system with built-in data types and built-in biological domain-specific functional operations that extends the Object-Oriented Database (OODB) system by adding domain-specific additional layers Protein-QL, Protein Algebra Architecture and Protein-OODB above OODB to manage protein structure data. This system is composed of three parts: 1) Client API to provide easy usage for different users. 2) Middleware including Protein-QL, Protein Algebra Architecture and Protein-OODB is designed to implement protein domain specific query language and optimize the complex queries, also it capsulates the details of the implementation such that users can easily understand and master Protein-QL. 3) Data Storage is used to store our protein data. This system is for protein domain, but it can be easily extended into other biological domains to build a bio-OODBMS. In this system, protein, primary, secondary, and tertiary structures are defined as internal data types to simplify the queries in Protein-QL such that the domain scientists can easily master the query language and formulate data requests, and EyeDB is used as the underlying OODB to communicate with Protein-OODB. In addition, protein data is usually stored as PDB format and PDB format is old, ambiguous, and inadequate, therefore, PDB data curation will be discussed in detail in the dissertation

    Knowledge base systems : a formal model

    Get PDF

    COLAB : a hybrid knowledge representation and compilation laboratory

    Get PDF
    Knowledge bases for real-world domains such as mechanical engineering require expressive and efficient representation and processing tools. We pursue a declarative-compilative approach to knowledge engineering. While Horn logic (as implemented in PROLOG) is well-suited for representing relational clauses, other kinds of declarative knowledge call for hybrid extensions: functional dependencies and higher-order knowledge should be modeled directly. Forward (bottom-up) reasoning should be integrated with backward (top-down) reasoning. Constraint propagation should be used wherever possible instead of search-intensive resolution. Taxonomic knowledge should be classified into an intuitive subsumption hierarchy. Our LISP-based tools provide direct translators of these declarative representations into abstract machines such as an extended Warren Abstract Machine (WAM) and specialized inference engines that are interfaced to each other. More importantly, we provide source-to-source transformers between various knowledge types, both for user convenience and machine efficiency. These formalisms with their translators and transformers have been developed as part of COLAB, a compilation laboratory for studying what we call, respectively, "vertical\u27; and "horizontal\u27; compilation of knowledge, as well as for exploring the synergetic collaboration of the knowledge representation formalisms. A case study in the realm of mechanical engineering has been an important driving force behind the development of COLAB. It will be used as the source of examples throughout the paper when discussing the enhanced formalisms, the hybrid representation architecture, and the compilers

    Metadata-based modeling of information resources on the web

    Get PDF
    This paper deals with the problem of modeling Web information resources using expert knowledge and personalized user information for improved Web searching capabilities. We propose a "Web information space" model, which is composed of Web-based information resources (HTML/XML [Hypertext Markup Language/Extensible Markup Language] documents on the Web), expert advice repositories (domain-expert-specified meta-data for information resources), and personalized information about users (captured as user profiles that indicates users' preferences about experts as well as users' knowledge about topics). Expert advice, the heart of the Web information space model, is specified using topics and relationships among topics (called metalinks), along the lines of the recently proposed topic maps. Topics and metalinks constitute metadata that describe the contents of the underlying HTML/XML Web resources. The metadata specification process is semiautomated, and it exploits XML DTDs (Document Type Definition) to allow domain-expert guided mapping of DTD elements to topics and metalinks. The expert advice is stored in an object-relational database management systems (DBMS). To demonstrate the practicality and usability of the proposed Web information space model, we created a prototype expert advice repository of more than one million topics/metalinks for DBLP (Database and Logic Programming) Bibliography data set. We also present a query interface that provides sophisticated querying facilities for DBLP Bibliography resources using the expert advice repository

    Efficient Range and Join Query Processing in Massively Distributed Peer-to-Peer Networks

    Get PDF
    Peer-to-peer (P2P) has become a modern distributed computing architecture that supports massively large-scale data management and query processing. Complex query operators such as range operator and join operator are needed by various distributed applications, including content distribution, locality-aware services, computing resource sharing, and many others. This dissertation tackles a number of problems related to range and join query processing in P2P systems: fault-tolerant range query processing under structured P2P architecture, distributed range caching under unstructured P2P architecture, and integration of heterogeneous data under unstructured P2P architecture. To support fault-tolerant range query processing so as to provide strong performance guarantees in the presence of network churn, effective replication schemes are developed at either the overlay network level or the query processing level. To facilitate range query processing, a prefetch-based caching approach is proposed to eliminate the performance bottlenecks incurred by those data items that are not well cached in the network. Finally, a purely decentralized partition-based join query operator is devised to realize bandwidth-efficient join query processing under unstructured P2P architecture. Theoretical analysis and experimental simulations demonstrate the effectiveness of the proposed approaches

    Neuere Entwicklungen der deklarativen KI-Programmierung : proceedings

    Get PDF
    The field of declarative AI programming is briefly characterized. Its recent developments in Germany are reflected by a workshop as part of the scientific congress KI-93 at the Berlin Humboldt University. Three tutorials introduce to the state of the art in deductive databases, the programming language Gödel, and the evolution of knowledge bases. Eleven contributed papers treat knowledge revision/program transformation, types, constraints, and type-constraint combinations

    Intensional Query Processing in Deductive Database Systems.

    Get PDF
    This dissertation addresses the problem of deriving a set of non-ground first-order logic formulas (intensional answers), as an answer set to a given query, rather than a set of facts (extensional answers), in deductive database (DDB) systems based on non-recursive Horn clauses. A strategy in previous work in this area is to use resolution to derive intensional answers. It leaves however, several important problems. Some of them are: no specific resolution strategy is given; no specific methodologies to formalize the meaningful intensional answers are given; no solution is given to handle large facts in extensional databases (EDB); and no strategy is given to avoid deriving meaningless intensional answers. As a solution, a three-stage formalization process (pre-resolution, resolution, and post-resolution) for the derivation of meaningful intensional answers is proposed which can solve all of the problems mentioned above. A specific resolution strategy called SLD-RC resolution is proposed, which can derive a set of meaningful intensional answers. The notions of relevant literals and relevant clauses are introduced to avoid deriving meaningless intensional answers. The soundness and the completeness of SLD-RC resolution for intensional query processing are proved. An algorithm for the three-stage formalization process is presented and the correctness of the algorithm is proved. Furthermore, it is shown that there are two relationships between intensional answers and extensional answers. In a syntactic relationship, intensional answers are sufficient conditions to derive extensional answers. In a semantic relationship, intensional answers are sufficient and necessary conditions to derive extensional answers. Based on these relationships, the notions of the global and local completeness of an intensional database (IDB) are defined. It is proved that all incomplete IDBs can be transformed into globally complete IDBs, in which all extensional answers can be generated by evaluating intensional answers against an EDB. We claim that the intensional query processing provide a new methodology for query processing in DDBs and thus, extending the categories of queries, will greatly increase our insight into the nature of DDBs
    corecore