2,991 research outputs found

    A database management capability for Ada

    Get PDF
    The data requirements of mission critical defense systems have been increasing dramatically. Command and control, intelligence, logistics, and even weapons systems are being required to integrate, process, and share ever increasing volumes of information. To meet this need, systems are now being specified that incorporate data base management subsystems for handling storage and retrieval of information. It is expected that a large number of the next generation of mission critical systems will contain embedded data base management systems. Since the use of Ada has been mandated for most of these systems, it is important to address the issues of providing data base management capabilities that can be closely coupled with Ada. A comprehensive distributed data base management project has been investigated. The key deliverables of this project are three closely related prototype systems implemented in Ada. These three systems are discussed

    File Allocation and Join Site Selection Problem in Distributed Database Systems.

    Get PDF
    There are two important problems associated with the design of distributed database systems. One is the file allocation problem, and the other is the query optimization problem. In this research a methodology that considers both these aspects is developed that determines the optimal location of files and join sites for given queries simultaneously. Using this methodology, three different mixed integer programming models that describe three cases of the file allocation and join site selection problem are developed. Dual-based procedures are developed for each of the three mixed integer programming models. Extensive computational testing is performed which shows that the dual-based algorithms developed are able to generate solutions which are very close to the optimal. Also, these near optimal solutions are found very quickly, even for large scale problems

    Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System

    Get PDF
    Linear algebra operations appear in nearly every application in advanced analytics, machine learning, and of various science domains. Until today, many data analysts and scientists tend to use statistics software packages or hand-crafted solutions for their analysis. In the era of data deluge, however, the external statistics packages and custom analysis programs that often run on single-workstations are incapable to keep up with the vast increase in data volume and size. In particular, there is an increasing demand of scientists for large scale data manipulation, orchestration, and advanced data management capabilities. These are among the key features of a mature relational database management system (DBMS). With the rise of main memory database systems, it now has become feasible to also consider applications that built up on linear algebra. This thesis presents a deep integration of linear algebra functionality into an in-memory column-oriented database system. In particular, this work shows that it has become feasible to execute linear algebra queries on large data sets directly in a DBMS-integrated engine (LAPEG), without the need of transferring data and being restricted by hard disc latencies. From various application examples that are cited in this work, we deduce a number of requirements that are relevant for a database system that includes linear algebra functionality. Beside the deep integration of matrices and numerical algorithms, these include optimization of expressions, transparent matrix handling, scalability and data-parallelism, and data manipulation capabilities. These requirements are addressed by our linear algebra engine. In particular, the core contributions of this thesis are: firstly, we show that the columnar storage layer of an in-memory DBMS yields an easy adoption of efficient sparse matrix data types and algorithms. Furthermore, we show that the execution of linear algebra expressions significantly benefits from different techniques that are inspired from database technology. In a novel way, we implemented several of these optimization strategies in LAPEG’s optimizer (SpMachO), which uses an advanced density estimation method (SpProdest) to predict the matrix density of intermediate results. Moreover, we present an adaptive matrix data type AT Matrix to obviate the need of scientists for selecting appropriate matrix representations. The tiled substructure of AT Matrix is exploited by our matrix multiplication to saturate the different sockets of a multicore main-memory platform, reaching up to a speed-up of 6x compared to alternative approaches. Finally, a major part of this thesis is devoted to the topic of data manipulation; where we propose a matrix manipulation API and present different mutable matrix types to enable fast insertions and deletes. We finally conclude that our linear algebra engine is well-suited to process dynamic, large matrix workloads in an optimized way. In particular, the DBMS-integrated LAPEG is filling the linear algebra gap, and makes columnar in-memory DBMS attractive as efficient, scalable ad-hoc analysis platform for scientists

    An exploration of graph algorithms and graph databases

    Get PDF
    With data becoming larger in quantity, the need for complex, efficient algorithms to solve computationally complex problems has become greater. In this thesis we evaluate a selection of graph algorithms; we provide a novel algorithm for solving and approximating the Longest Simple Cycle problem, as well as providing novel implementations of other graph algorithms in graph database systems.The first area of exploration is finding the Longest Simple Cycle in a graph problem. We propose two methods of finding the longest simple cycle. The first method is an exact approach based on a flow-based Integer Linear Program. The second is a multi-start local search heuristic which uses a simple depth-first search as a basis for a cycle, and improves this with four perturbation operators.Secondly, we focus on implementing the Minimum Dominating Set problem into graph database systems. An unoptimised greedy heuristic solution to the Minimum Dominating Set problem is implemented into a client-server system using a declarative query language, an embedded database system using an imperative query language and a high level language as a direct comparison. The performance of the graph back-end on the database systems is evaluated. The language expressiveness of the query languages is also explored. We identify limitations of the query methods of the database system, and propose a function that increases the functionality of the queries

    An Integrated Engineering-Computation Framework for Collaborative Engineering: An Application in Project Management

    Get PDF
    Today\u27s engineering applications suffer from a severe integration problem. Engineering, the entire process, consists of a myriad of individual, often complex, tasks. Most computer tools support particular tasks in engineering, but the output of one tool is different from the others\u27. Thus, the users must re-enter the relevant information in the format required by another tool. Moreover, usually in the development process of a new product/process, several teams of engineers with different backgrounds/responsibilities are involved, for example mechanical engineers, cost estimators, manufacturing engineers, quality engineers, and project manager. Engineers need a tool(s) to share technical and managerial information and to be able to instantly access the latest changes made by one member, or more, in the teams to determine right away the impacts of these changes in all disciplines (cost, time, resources, etc.). In other words, engineers need to participate in a truly collaborative environment for the achievement of a common objective, which is the completion of the product/process design project in a timely, cost effective, and optimal manner. In this thesis, a new framework that integrates the capabilities of four commercial software, Microsoft Excelâ„¢ (spreadsheet), Microsoft Projectâ„¢ (project management), What\u27s Best! (an optimization add-in), and Visual Basicâ„¢ (programming language), with a state-of-the-art object-oriented database (knowledge medium), InnerCircle2000â„¢ is being presented and applied to handle the Cost-Time Trade-Off problem in project networks. The result was a vastly superior solution over the conventional solution from the viewpoint of data handling, completeness of solution space, and in the context of a collaborative engineering-computation environment

    Static and dynamic semantics of NoSQL languages

    Get PDF
    We present a calculus for processing semistructured data that spans differences of application area among several novel query languages, broadly categorized as "NoSQL". This calculus lets users define their own operators, capturing a wider range of data processing capabilities, whilst providing a typing precision so far typical only of primitive hard-coded operators. The type inference algorithm is based on semantic type checking, resulting in type information that is both precise, and flexible enough to handle structured and semistructured data. We illustrate the use of this calculus by encoding a large fragment of Jaql, including operations and iterators over JSON, embedded SQL expressions, and co-grouping, and show how the encoding directly yields a typing discipline for Jaql as it is, namely without the addition of any type definition or type annotation in the code

    Database system architecture supporting coexisting query languages and data models

    Get PDF
    SIGLELD:D48239/84 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    SciQL, A query language for science applications

    Get PDF
    Scientific applications are still poorly served by contemporary relational database systems. At best, the system provides a bridge towards an external library using user-defined functions, explicit import/export facilities or linked-in Java/C# interpreters. Time has come to rectify this with SciQL, a SQL-query language for science applications with arrays as first class citizens. It provides a seamless symbiosis of array-, set-, and sequence- interpretation using a clear separation of the mathematical object from its underlying storage representation. The language extends value-based grouping in SQL with structural grouping, i.e., fixed-sized and unbounded groups based on explicit relationships between its index attributes. It leads to a generalization of window-based query processing. The SciQL architecture benefits from a column store system with an adaptive storage scheme, including keeping multiple representations around for reduced impedance mismatch. This paper is focused on the language features, its architectural consequences and extensive examples of its intended use

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
    • …
    corecore