28 research outputs found
Querying web metadata: Native score management and text support in databases
In this article, we discuss the issues involved in adding a native score management system to object-relational databases, to be used in querying Web metadata (that describes the semantic content of Web resources). The Web metadata model is based on topics (representing entities), relationships among topics (called metalinks), and importance scores (sideway values) of topics and metalinks. We extend database relations with scoring functions and importance scores. We add to SQL score-management clauses with well-defined semantics, and propose the sidewayvalue algebra (SVA), to evaluate the extended SQL queries. SQL extensions and the SVA algebra are illustrated through two Web resources, namely, the DBLP Bibliography and the SIGMOD Anthology. SQL extensions include clauses for propagating input tuple importance scores to output tuples during query processing, clauses that specify query stopping conditions, threshold predicates (a type of approximate similarity predicates for text comparisons), and user-defined-function-based predicates. The propagated importance scores are then used to rank and return a small number of output tuples. The query stopping conditions are propagated to SVA operators during query processing. We show that our SQL extensions are well-defined, meaning that, given a database and a query Q, under any query processing scheme, the output tuples of Q and their importance scores stay the same. To process the SQL extensions, we discuss two sideway value algebra operators, namely, sideway value algebra join and topic closure, give their implementation algorithms, and report their experimental evaluations
Metadata-based and personalized web querying
Cataloged from PDF version of article.The advent of the Web has raised new searching and querying problems. Keyword
matching based querying techniques that have been widely used by search
engines, return thousands of Web documents for a single query, and most of these
documents are generally unrelated to the users’ information needs. Towards the
goal of improving the information search needs of Web users, a recent promising
approach is to index the Web by using metadata and annotations.
In this thesis, we model and query Web-based information resources using
metadata for improved Web searching capabilities. Employing metadata for
querying the Web increases the precision of the query outputs by returning semantically
more meaningful results. Our Web data model, named “Web information
space model”, consists of Web-based information resources (HTML/XML documents
on the Web), expert advice repositories (domain-expert-specified metadata
for information resources), and personalized information about users (captured
as user profiles that indicate users’ preferences about experts as well as users’
knowledge about topics). Expert advice is specified using topics and relationships
among topics (i.e., metalinks), along the lines of recently proposed topic maps
standard. Topics and metalinks constitute metadata that describe the contents of
the underlying Web information resources. Experts assign scores to topics, metalinks,
and information resources to represent the “importance” of them. User
profiles store users’ preferences and navigational history information about the
information resources that the user visits. User preferences, knowledge level on
topics, and history information are used for personalizing the Web search, and
improving the precision of the results returned to the user.
We store expert advices and user profiles in an object relational database
iv
v
management system, and extend the SQL for efficient querying of Web-based information
resources through the Web information space model. SQL extensions
include the clauses for propagating input importance scores to output tuples, the
clause that specifies query stopping condition, and new operators (i.e., text similarity
based selection, text similarity based join, and topic closure). Importance
score propagation and query stopping condition allow ranking of query outputs,
and limiting the output size. Text similarity based operators and topic closure
operator support sophisticated querying facilities. We develop a new algebra
called Sideway Value generating Algebra (SVA) to process these SQL extensions.
We also propose evaluation algorithms for the text similarity based SVA directional
join operator, and report experimental results on the performance of the
operator. We demonstrate experimentally the effectiveness of metadata-based
personalized Web search through SQL extensions over the Web information space
model against keyword matching based Web search techniques.Ă–zel, Selma AyĹźePh.D
Protein Structure Data Management System
With advancement in the development of the new laboratory instruments and experimental techniques, the protein data has an explosive increasing rate. Therefore how to efficiently store, retrieve and modify protein data is becoming a challenging issue that most biological scientists have to face and solve. Traditional data models such as relational database lack of support for complex data types, which is a big issue for protein data application. Hence many scientists switch to the object-oriented databases since object-oriented nature of life science data perfectly matches the architecture of object-oriented databases, but there are still a lot of problems that need to be solved in order to apply OODB methodologies to manage protein data. One major problem is that the general-purpose OODBs do not have any built-in data types for biological research and built-in biological domain-specific functional operations. In this dissertation, we present an application system with built-in data types and built-in biological domain-specific functional operations that extends the Object-Oriented Database (OODB) system by adding domain-specific additional layers Protein-QL, Protein Algebra Architecture and Protein-OODB above OODB to manage protein structure data. This system is composed of three parts: 1) Client API to provide easy usage for different users. 2) Middleware including Protein-QL, Protein Algebra Architecture and Protein-OODB is designed to implement protein domain specific query language and optimize the complex queries, also it capsulates the details of the implementation such that users can easily understand and master Protein-QL. 3) Data Storage is used to store our protein data. This system is for protein domain, but it can be easily extended into other biological domains to build a bio-OODBMS. In this system, protein, primary, secondary, and tertiary structures are defined as internal data types to simplify the queries in Protein-QL such that the domain scientists can easily master the query language and formulate data requests, and EyeDB is used as the underlying OODB to communicate with Protein-OODB. In addition, protein data is usually stored as PDB format and PDB format is old, ambiguous, and inadequate, therefore, PDB data curation will be discussed in detail in the dissertation
COLAB : a hybrid knowledge representation and compilation laboratory
Knowledge bases for real-world domains such as mechanical engineering require expressive and efficient representation and processing tools. We pursue a declarative-compilative approach to knowledge engineering. While Horn logic (as implemented in PROLOG) is well-suited for representing relational clauses, other kinds of declarative knowledge call for hybrid extensions: functional dependencies and higher-order knowledge should be modeled directly. Forward (bottom-up) reasoning should be integrated with backward (top-down) reasoning. Constraint propagation should be used wherever possible instead of search-intensive resolution. Taxonomic knowledge should be classified into an intuitive subsumption hierarchy. Our LISP-based tools provide direct translators of these declarative representations into abstract machines such as an extended Warren Abstract Machine (WAM) and specialized inference engines that are interfaced to each other. More importantly, we provide source-to-source transformers between various knowledge types, both for user convenience and machine efficiency. These formalisms with their translators and transformers have been developed as part of COLAB, a compilation laboratory for studying what we call, respectively, "vertical\u27; and "horizontal\u27; compilation of knowledge, as well as for exploring the synergetic collaboration of the knowledge representation formalisms. A case study in the realm of mechanical engineering has been an important driving force behind the development of COLAB. It will be used as the source of examples throughout the paper when discussing the enhanced formalisms, the hybrid representation architecture, and the compilers
Metadata-based modeling of information resources on the web
This paper deals with the problem of modeling Web information resources using expert knowledge and personalized user information for improved Web searching capabilities. We propose a "Web information space" model, which is composed of Web-based information resources (HTML/XML [Hypertext Markup Language/Extensible Markup Language] documents on the Web), expert advice repositories (domain-expert-specified meta-data for information resources), and personalized information about users (captured as user profiles that indicates users' preferences about experts as well as users' knowledge about topics). Expert advice, the heart of the Web information space model, is specified using topics and relationships among topics (called metalinks), along the lines of the recently proposed topic maps. Topics and metalinks constitute metadata that describe the contents of the underlying HTML/XML Web resources. The metadata specification process is semiautomated, and it exploits XML DTDs (Document Type Definition) to allow domain-expert guided mapping of DTD elements to topics and metalinks. The expert advice is stored in an object-relational database management systems (DBMS). To demonstrate the practicality and usability of the proposed Web information space model, we created a prototype expert advice repository of more than one million topics/metalinks for DBLP (Database and Logic Programming) Bibliography data set. We also present a query interface that provides sophisticated querying facilities for DBLP Bibliography resources using the expert advice repository
Efficient Range and Join Query Processing in Massively Distributed Peer-to-Peer Networks
Peer-to-peer (P2P) has become a modern distributed computing architecture that supports massively large-scale data management and query processing. Complex query operators such as range operator and
join operator are needed by various distributed applications, including content distribution, locality-aware services, computing resource sharing, and many others.
This dissertation tackles a number of problems related to range and join query processing in P2P systems: fault-tolerant range query processing under structured P2P architecture, distributed range caching under unstructured P2P architecture, and integration of heterogeneous data under unstructured P2P architecture. To support
fault-tolerant range query processing so as to provide strong performance guarantees in the presence of network churn, effective
replication schemes are developed at either the overlay network level or the query processing level. To facilitate range query
processing, a prefetch-based caching approach is proposed to eliminate the performance bottlenecks incurred by those data items
that are not well cached in the network. Finally, a purely decentralized partition-based join query operator is devised to realize bandwidth-efficient join query processing under unstructured P2P architecture.
Theoretical analysis and experimental simulations demonstrate the effectiveness of the proposed approaches
Recommended from our members
Ageneric predictive information system for resource planning and optimisation
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel UniversityThe purpose of this research work is to demonstrate the feasibility of creating a quick response decision platform for middle management in industry. It utilises the strengths of current, but more importantly creates a leap forward in the theory and practice of Supervisory and Data Acquisition (SCADA) systems and Discrete Event Simulation and Modelling (DESM). The proposed research platform uses real-time data and creates an automatic platform for real-time and predictive system analysis, giving current and ahead of time information on the performance of the system in an efficient manner. Data acquisition as the backend connection of data integration system to the shop floor faces both hardware and software challenges for coping with large scale real-time data collection. Limited scope of SCADA systems does not make them suitable candidates for this. Cost effectiveness, complexity, and efficiency-orientation of proprietary solutions leave space for more challenge. A Flexible Data Input Layer Architecture (FDILA) is proposed to address generic data integration platform so a multitude of data sources can be connected to the data processing unit. The efficiency of the proposed integration architecture lies in decentralising and distributing services between different layers. A novel Sensitivity Analysis (SA) method called EvenTracker is proposed as an effective tool to measure the importance and priority of inputs to the system. The EvenTracker method is introduced to deal with the complexity systems in real-time. The approach takes advantage of event-based definition of data involved in process flow. The underpinning logic behind EvenTracker SA method is capturing the cause-effect relationships between triggers (input variables) and events (output variables) at a specified period of time determined by an expert. The approach does not require estimating data distribution of any kind. Neither the performance model requires execution beyond the real-time. The proposed EvenTracker sensitivity analysis method has the lowest computational complexity compared with other popular sensitivity analysis methods. For proof of concept, a three tier data integration system was designed and developed by using National Instruments’ LabVIEW programming language, Rockwell Automation’s Arena simulation and modelling software, and OPC data communication software. A laboratory-based conveyor system with 29 sensors was installed to simulate a typical shop floor production line. In addition, EvenTracker SA method has been implemented on the data extracted from 28 sensors of one manufacturing line in a real factory. The experiment has resulted 14% of the input variables to be unimportant for evaluation of model outputs. The method proved a time efficiency gain of 52% on the analysis of filtered system when unimportant input variables were not sampled anymore. The EvenTracker SA method compared to Entropy-based SA technique, as the only other method that can be used for real-time purposes, is quicker, more accurate and less computationally burdensome. Additionally, theoretic estimation of computational complexity of SA methods based on both structural complexity and energy-time analysis resulted in favour of the efficiency of the proposed EvenTracker SA method. Both laboratory and factory-based experiments demonstrated flexibility and efficiency of the proposed solution.The Engineering and Physical Sciences Research Council
Neuere Entwicklungen der deklarativen KI-Programmierung : proceedings
The field of declarative AI programming is briefly characterized. Its recent developments in Germany are reflected by a workshop as part of the scientific congress KI-93 at the Berlin Humboldt University. Three tutorials introduce to the state of the art in deductive databases, the programming language Gödel, and the evolution of knowledge bases. Eleven contributed papers treat knowledge revision/program transformation, types, constraints, and type-constraint combinations
Intensional Query Processing in Deductive Database Systems.
This dissertation addresses the problem of deriving a set of non-ground first-order logic formulas (intensional answers), as an answer set to a given query, rather than a set of facts (extensional answers), in deductive database (DDB) systems based on non-recursive Horn clauses. A strategy in previous work in this area is to use resolution to derive intensional answers. It leaves however, several important problems. Some of them are: no specific resolution strategy is given; no specific methodologies to formalize the meaningful intensional answers are given; no solution is given to handle large facts in extensional databases (EDB); and no strategy is given to avoid deriving meaningless intensional answers. As a solution, a three-stage formalization process (pre-resolution, resolution, and post-resolution) for the derivation of meaningful intensional answers is proposed which can solve all of the problems mentioned above. A specific resolution strategy called SLD-RC resolution is proposed, which can derive a set of meaningful intensional answers. The notions of relevant literals and relevant clauses are introduced to avoid deriving meaningless intensional answers. The soundness and the completeness of SLD-RC resolution for intensional query processing are proved. An algorithm for the three-stage formalization process is presented and the correctness of the algorithm is proved. Furthermore, it is shown that there are two relationships between intensional answers and extensional answers. In a syntactic relationship, intensional answers are sufficient conditions to derive extensional answers. In a semantic relationship, intensional answers are sufficient and necessary conditions to derive extensional answers. Based on these relationships, the notions of the global and local completeness of an intensional database (IDB) are defined. It is proved that all incomplete IDBs can be transformed into globally complete IDBs, in which all extensional answers can be generated by evaluating intensional answers against an EDB. We claim that the intensional query processing provide a new methodology for query processing in DDBs and thus, extending the categories of queries, will greatly increase our insight into the nature of DDBs