11,768 research outputs found

    KARL: A Knowledge-Assisted Retrieval Language

    Get PDF
    Data classification and storage are tasks typically performed by application specialists. In contrast, information users are primarily non-computer specialists who use information in their decision-making and other activities. Interaction efficiency between such users and the computer is often reduced by machine requirements and resulting user reluctance to use the system. This thesis examines the problems associated with information retrieval for non-computer specialist users, and proposes a method for communicating in restricted English that uses knowledge of the entities involved, relationships between entities, and basic English language syntax and semantics to translate the user requests into formal queries. The proposed method includes an intelligent dictionary, syntax and semantic verifiers, and a formal query generator. In addition, the proposed system has a learning capability that can improve portability and performance. With the increasing demand for efficient human-machine communication, the significance of this thesis becomes apparent. As human resources become more valuable, software systems that will assist in improving the human-machine interface will be needed and research addressing new solutions will be of utmost importance. This thesis presents an initial design and implementation as a foundation for further research and development into the emerging field of natural language database query systems

    Techniques for improving efficiency and scalability for the integration of information retrieval and databases

    Get PDF
    PhDThis thesis is on the topic of integration of Information Retrieval (IR) and Databases (DB), with particular focuses on improving efficiency and scalability of integrated IR and DB technology (IR+DB). The main purpose of this study is to develop efficient and scalable techniques for supporting integrated IR and DB technology, which is a popular approach today for handling complex queries over text and structured data. Our specific interest in this thesis is how to efficiently handle queries over large-scale text and structured data. The work is based on a technology that integrates probability theory and relational algebra, where retrievals for text and data are to be expressed in probabilistic logical programs such as probabilistic relational algebra or probabilistic Datalog. To support efficient processing of probabilistic logical programs, we proposed three optimization techniques that focus on aspects covered logical and physical layers, which include: scoring-driven query optimization using scoring expression, query processing with top-k incorporated pipeline, and indexing with relational inverted index. Specifically, scoring expressions are proposed for expressing the scoring or probabilistic semantics of implied scoring functions of PRA expressions, so that efficient query execution plan can be generated by rule-based scoring-driven optimizer. Secondly, to balance efficiency and effectiveness so that to improve query response time, we studied methods for incorporating topk algorithms into pipelined query execution engine for IR+DB systems. Thirdly, the proposed relational inverted index integrates IR-style inverted index and DB-style tuple-based index, which can be used to support efficient probability estimation and aggregation as well as conventional relational operations. Experiments were carried out to investigate the performances of proposed techniques. Experimental results showed that the efficiency and scalability of an IR+DB prototype have been improved, while the system can handle queries efficiently on considerable large data sets for a number of IR tasks

    Compressed materialised views of semi-structured data

    Get PDF
    Query performance issues over semi-structured data have led to the emergence of materialised XML views as a means of restricting the data structure processed by a query. However preserving the conventional representation of such views remains a significant limiting factor especially in the context of mobile devices where processing power, memory usage and bandwidth are significant factors. To explore the concept of a compressed materialised view, we extend our earlier work on structural XML compression to produce a combination of structural summarisation and data compression techniques. These techniques provide a basis for efficiently dealing with both structural queries and valuebased predicates. We evaluate the effectiveness of such a scheme, presenting results and performance measures that show advantages of using such structures

    The caCORE Software Development Kit: Streamlining construction of interoperable biomedical information services

    Get PDF
    BACKGROUND: Robust, programmatically accessible biomedical information services that syntactically and semantically interoperate with other resources are challenging to construct. Such systems require the adoption of common information models, data representations and terminology standards as well as documented application programming interfaces (APIs). The National Cancer Institute (NCI) developed the cancer common ontologic representation environment (caCORE) to provide the infrastructure necessary to achieve interoperability across the systems it develops or sponsors. The caCORE Software Development Kit (SDK) was designed to provide developers both within and outside the NCI with the tools needed to construct such interoperable software systems. RESULTS: The caCORE SDK requires a Unified Modeling Language (UML) tool to begin the development workflow with the construction of a domain information model in the form of a UML Class Diagram. Models are annotated with concepts and definitions from a description logic terminology source using the Semantic Connector component. The annotated model is registered in the Cancer Data Standards Repository (caDSR) using the UML Loader component. System software is automatically generated using the Codegen component, which produces middleware that runs on an application server. The caCORE SDK was initially tested and validated using a seven-class UML model, and has been used to generate the caCORE production system, which includes models with dozens of classes. The deployed system supports access through object-oriented APIs with consistent syntax for retrieval of any type of data object across all classes in the original UML model. The caCORE SDK is currently being used by several development teams, including by participants in the cancer biomedical informatics grid (caBIG) program, to create compatible data services. caBIG compatibility standards are based upon caCORE resources, and thus the caCORE SDK has emerged as a key enabling technology for caBIG. CONCLUSION: The caCORE SDK substantially lowers the barrier to implementing systems that are syntactically and semantically interoperable by providing workflow and automation tools that standardize and expedite modeling, development, and deployment. It has gained acceptance among developers in the caBIG program, and is expected to provide a common mechanism for creating data service nodes on the data grid that is under development

    Extending functional databases for use in text-intensive applications

    Get PDF
    This thesis continues research exploring the benefits of using functional databases based around the functional data model for advanced database applications-particularly those supporting investigative systems. This is a growing generic application domain covering areas such as criminal and military intelligence, which are characterised by significant data complexity, large data sets and the need for high performance, interactive use. An experimental functional database language was developed to provide the requisite semantic richness. However, heavy use in a practical context has shown that language extensions and implementation improvements are required-especially in the crucial areas of string matching and graph traversal. In addition, an implementation on multiprocessor, parallel architectures is essential to meet the performance needs arising from existing and projected database sizes in the chosen application area. [Continues.

    Design and Performance analysis of a relational replicated database systems

    Get PDF
    The hardware organization and software structure of a new database system are presented. This system, the relational replicated database system (RRDS), is based on a set of replicated processors operating on a partitioned database. Performance improvements and capacity growth can be obtained by adding more processors to the configuration. Based on designing goals a set of hardware and software design questions were developed. The system then evolved according to a five-phase process, based on simulation and analysis, which addressed and resolved the design questions. Strategies and algorithms were developed for data access, data placement, and directory management for the hardware organization. A predictive performance analysis was conducted to determine the extent to which original design goals were satisfied. The predictive performance results, along with an analytical comparison with three other relational multi-backend systems, provided information about the strengths and weaknesses of our design as well as a basis for future research

    Towards designing a knowledge-based tutoring system : SQL-tutor as an example

    Get PDF
    A Knowledge-Based Tutoring System, also sometimes called an Intelligent Tutoring System, is a computer based instructional system that uses artificial intelligence techniques to help people learn some subject. The goal of the system is to provide private tutoring to its students based on their different backgrounds, requests, and interests. The system knows what subject materials it should teach, when and how to teach them, and can diagnose the mistakes made by the students and help them correct the mistakes. The major objective of this dissertation is to investigate and develop a generic framework upon which we can build a Knowledge-Based Tutoring System effectively. As an example, we have focused on developing SQL-TUTOR, a tutoring system for teaching SQL concepts and programming skills. The generic architecture of the system is rooted at the popular view that a tutoring process between a tutor (either a human being or a machine) and a student is a knowledge communication process. This process can be divided into a series of communication cycles and each communication cycle consists of four phases, namely, planning, discussing, evaluating, and remedying phases. One major feature of the architecture proposed by us in this dissertation is its curriculum knowledge base which contains the knowledge about the course curriculum, We have developed a representation schema for describing the goal structure of the course, the prerequisite relationships among the course materials, and the multiple views to organize these materials. The inclusion of the curriculum knowledge in a KBTS allows the system to create different curricula for each individual student and to diagnose the student\u27s errors more effectively. The system also provides a group of operators for the student to hand-tailor his/her curricula when he/she starts learning the course. The student can use these operators to select a specific path to go through the course materials, to pick a specific topic from the curricula to study, or to remove a particular topic from the curricula. Since the student can construct his/her own learning plans by these operators, he/she is relatively free to determine how to study the course materials and, as a result, he/she can become more active in the tutoring process. The knowledge about a subject domain is stored in a set of topics and a sample database. The content of a topic consists of a set of related domain concepts. Each concept is described by both natural and formal forms. The relationships among the concepts are modeled a type of semantic network called the context network. The sample database contains a set of sample tables and an enhanced system catalog which contains the knowledge about the name, semantic meanings of the database objects. The built-in Problem Solver of the system allows the system to reason over the networks and the sample database and answer various kinds of questions raised by the student about the domain concepts and their relationships. The knowledge of writing SQL queries is embodied in a set of examples attached to the topics. Each of such an example is carefully designed for one category of SQL query problems. An example in SQL-TUTOR is a packed knowledge chunk which can serve several important teaching purposes, including generating problem descriptions with different levels of details, formulating various SQL solutions for the given problem, explaining these solutions to the student, and evaluating SQL queries written by the student
    • …
    corecore