10 research outputs found

    UNIX as a basis for CAD software : (preprint)

    Get PDF

    A Nine Month Progress Report on an Investigation into Mechanisms for Improving Triple Store Performance

    No full text
    This report considers the requirement for fast, efficient, and scalable triple stores as part of the effort to produce the Semantic Web. It summarises relevant information in the major background field of Database Management Systems (DBMS), and provides an overview of the techniques currently in use amongst the triple store community. The report concludes that for individuals and organisations to be willing to provide large amounts of information as openly-accessible nodes on the Semantic Web, storage and querying of the data must be cheaper and faster than it is currently. Experiences from the DBMS field can be used to maximise triple store performance, and suggestions are provided for lines of investigation in areas of storage, indexing, and query optimisation. Finally, work packages are provided describing expected timetables for further study of these topics

    High Level Efficiency in Database Languages

    Get PDF
    The subject of this Ph.D. thesis is the design and implementation of database languages. The thesis consists of five articles:  [1] Joan F. Boyar and Kim S. Larsen. Efficient Rebalancing of Chromatic Search Trees. In O. Nurmi and E. Ukkonen, eds., LNCS 621: Algorithm Theory -- SWAT'92 , pp. 151-164. Springer-Verlag, 1992. [2] Kim S. Larsen. On Aggregation and Computation on Domain Values. PB-414, Computer Science Department, Aarhus University, 1992. [3] Kim S. Larsen. Strategies for Expression Evaluation Using Sort-Merge Algorithms. PB-415, Computer Science Department, Aarhus University, 1992. [4] Kim S. Larsen and Michael I. Schwartzbach. Injectivity of Unary Queries With Computation on Domain Values. Computer Science Department, Aarhus University, 1992. Revised version of PB-311. [5] Kim S. Larsen, Michael I. Schwartzbach and Erik M. Schmidt. A New Formalism for Relational Algebra. IPL , 41(3):163-168, 1992. and this survey paper. In [5], a new query language design is proposed. The expressive power of the language is determined in [2] and all reasonable extensions are considered. In [3, 4], we focus on the optimization issue of avoiding unnecessary sorting of relations. The results in these papers are directly applicable to any algebra-based query language. In addition to the query language part, a database system also has to offer update facilities. The theory of standard tuple based updates is quite well developed in the sequential case. In [1], we discuss a new concurrent implementation of balanced search trees for that purpose.This survey paper describes the results of the papers which form the thesis, and relates these results to each other and to the area in a broader sense than is customary in the introductions of individual papers. The paper is intended to be read in combination with the papers on which it is based

    A comparative study of the performance of concurrency control algorithms in a centralised database

    Get PDF
    Abstract unavailable. Please refer to PDF

    Database system architecture supporting coexisting query languages and data models

    Get PDF
    SIGLELD:D48239/84 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Parallel database operations in heterogeneous environments

    Get PDF
    Im Gegensatz zu dem traditionellen Begriff eines Supercomputers, der aus vielen mittels superschneller, lokaler Netzwerkverbindungen miteinander verbundenen Superrechnern besteht, basieren heterogene Computerumgebungen auf "kompletten" Computersystemen, die mit Hilfe eines herkömmlichen Netzwerkanschlusses an private oder öffentliche Netzwerke angeschlossen sind. Der Bereich des Computernetzwerkens hat sich über die letzten drei Jahrzehnte entwickelt und ist, wie viele andere Technologien, in bezug auf Performance, Funktionalität und Verlässlichkeit extrem gewachsen. Zu Beginn des 21.Jahrhunderts zählt das betriebssichere Hochgeschwindigkeitsnetz genauso zur Alltäglichkeit wie Elektrizität, und auch Rechnerressourcen sind, was Verfügbarkeit und universellen Gebrauch anbelangt, ebenso Standard wie elektrischer Strom. Wissenschafter haben für die Verwendung von heterogenen Grids bei verschiedenen rechenintensiven Applikationen eine Architektur von computational Grids konzipiert und darin Modelle aufgesetzt, die zum einen Rechenleistungen defnieren und zum anderen die komplexen Eigenschaften der Grid-Organisation vor den Benutzern verborgen halten. Somit wird die Verwendung für den Benutzer genauso einfach wie es möglich ist elektrischen Strom zu beziehen. Grundsätzlich existiert keine generell akzeptierte Definition für Grids. Einige Wissenschafter bezeichnen sie als hochleistungsfähige verteilte Umgebung. Manche berücksichtigen bei der Definierung auch die geographische Verteilung und ihre Multi-Domain-Eigenschaft. Andere Wissenschafter wiederum definieren Grids über die Anzahl der Ressourcen, die sie verbinden. Parallele Datenbanksysteme haben in den letzten zwei Jahrzehnten große Bedeutung erlangt, da das rechenintensive wissenschaftliche Arbeiten, wie z.B. auf dem Gebiet der Bioinformatik, Strömungslehre und Hochenergie physik die Verarbeitung riesiger verteilter Datensätze erfordert. Diese Tendenz resultierte daraus, dass man von der fehlgeschlagenen Entwicklung hochspezialisierter Datenbankmaschinen zur Verwendung herkömmlicher paralleler Hardware-Architekturen übergegangen ist. Grundsätzlich wird die gleichzeitige Abarbeitung entweder durch verteilte Datenbankoperationen oder durch Datenparallelität gelöst. Im ersten Fall wird ein unterteilter Abfragenabarbeitungsplan durch verschiedene Datenbankoperatoren parallel durchgeführt. Im Fall der Datenparallelität erfolgt eine Unterteilung der Daten, wobei mehrere Prozessoren die gleichen Operationen parallel an Teilen der Daten durchführen. Es liegen genaue Analysen von parallelen Datenbank-Arbeitsvorgängen für sequenzielle Prozessoren vor. Eine Reihe von Publikationen haben dieses Thema abgehandelt und dabei Vorschläge und Analysen für parallele Datenbankmaschinen erstellt. Bis dato existiert allerdings noch keine spezifische Analyse paralleler Algorithmen mit dem Fokus der speziellen Eigenschaften einer "Grid"-Infrastruktur. Der spezifische Unterschied liegt in der Heterogenität von Grid-Ressourcen. In "shared nothing"-Architekturen, wie man sie bei klassischen Supercomputern und Cluster- Systemen vorfindet, sind alle Ressourcen wie z.B. Verarbeitungsknoten, Festplatten und Netzwerkverbindungen angesichts ihrer Leistung, Zugriffszeit und Bandbreite üblicherweise gleich (homogen). Im Gegensatz dazu zeigen Grid-Architekturen heterogene Ressourcen mit verschiedenen Leistungseigenschaften. Der herausfordernde Aspekt dieser Arbeit bestand darin aufzuzeigen, wie man das Problem heterogener Ressourcen löst, d.h. diese Ressourcen einerseits zur Leistungsmaximierung und andererseits zur Definition von Algorithmen einsetzt, um die Arbeitsablauf-Orchestrierung von Datenbankprozessoren zu optimieren. Um dieser Herausforderung gerecht werden zu können, wurde ein mathematisches Modell zur Untersuchung des Leistungsverhaltens paralleler Datenbankoperationen in heterogenen Umgebungen, wie z.B. in Grids, basierend auf generalisierten Multiprozessor- Architekturen entwickelt. Es wurden dabei sowohl die Parameter und deren Einfluss auf die Leistung als auch das Verhalten der Algorithmen in heterogenen Umgebungen beobachtet. Dabei konnte man feststellen, dass kleine Anpassungen an den Algorithmen zur signifikanten Leistungsverbesserung heterogener Umgebungen führen. Weiters wurde eine graphische Darstellung der Knotenkonfiguration entwickelt und ein optimierter Algorithmus, mit dem ein optimaler Knoten zur Ausführung von Datenbankoperationen gefunden werden kann. Diese Ergebnisse zum neuen Algorithmus wurden durch die Implementierung in einer serviceorientierten Architektur (SODA) bestätigt. Durch diese Implementierung konnte die Gültigkeit des Modells und des neu entwickelten optimierten Algorithmus nachgewiesen werden. In dieser Arbeit werden auch die Möglichkeiten für eine brauchbare Erweiterung des vorgestellten Modells gezeigt, wie z.B. für den Einsatz von Leistungskennziffern für Algorithmen zur Findung optimaler Knoten, die Verlässlichkeit der Knoten oder Vorgehensweisen/Lösungsaufgaben zur dynamischen Optimierung von Arbeitsabläufen.In contrast to the traditional notion of a supercomputer, which has many processors connected by a local high-speed computer bus, heterogeneous computing environments rely on "complete" computer nodes (CPU, storage, network interface, etc.) connected to a private or public network by a conventional network interface. Computer networking has evolved over the past three decades, and, like many technologies, has grown exponentially in terms of performance, functionality and reliability. At the beginning of the twenty-first century, high-speed, highly reliable Internet connectivity has become as commonplace as electricity, and computing resources have become as standard in terms of availability and universal use as electrical power. To use heterogeneous Grids for various applications requiring high-processing power, researchers propose the notion of computational Grids where rules are defined relating to both services and hiding the complexity of the Grid organization from the users. Thus, users would find it as easy to use as electrical power. Generally, there is no widely accepted definition of Grids. Some researchers define it as a high-performance distributed environment. Some take into consideration its geographically distributed, multi-domain feature. Others define Grids based on the number of resources they unify. Parallel database systems gained an important role in database research over the past two decades due to the necessity of handling large distributed datasets for scientific computing such as bioinformatics, fluid dynamics and high energy physics (HEP). This was connected with the shift from the (actually failed) development of highly specialized database machines to the usage of conventional parallel hardware architectures. Generally, concurrent execution is employed either by database operator or data parallelism. The first is achieved through parallel execution of a partitioned query execution plan by different operators, while the latter is achieved through parallel execution of the same operation on the partitioned data among multiple processors. Parallel database operation algorithms have been well analyzed for sequential processors. A number of publications have covered this topic which proposed and analyzed these algorithms for parallel database machines. Until now, to the best knowledge of the author, no specific analysis has been done so far on parallel algorithms with a focus on the specific characteristics of a Grid infrastructure. The specific difference lies in the heterogeneous nature of Grid resources. In a "shared nothing architecture", which can be found in classical supercomputers and cluster systems, all resources such as processing nodes, disks and network interconnection have typically homogeneous characteristics as regards to performance, access time and bandwidth. In contrast, in a Grid architecture heterogeneous resources are found that show different performance characteristics. The challenge of this research is to discover the way how to cope with or to exploit this situation to maximize performance and to define algorithms that lead to a solution for an optimized workflow orchestration. To address this challenge, we developed a mathematical model to investigate the performance behavior of parallel database operations in heterogeneous environments, such as a Grid, based on generalized multiprocessor architecture. We also studied the parameters and their influence on the performance as well as the behavior of the algorithms in heterogeneous environments. We discovered that only a small adjustment on the algorithm is necessary to significantly improve the performance for heterogeneous environments. A graphical representation of the node configuration and an optimized algorithm for finding the optimal node configuration for the execution of the parallel binary merge sort have been developed. Finally, we have proved our findings of the new algorithm by implementing it on a service-orientated infrastructure (SODA). The model and our new developed modified algorithms have been verified with the implementation. We also give an outlook of useful extensions to our model e.g. using performance indices, reliability of the nodes and approaches for dynamic optimization of workflow

    Architecture of a Database System

    Get PDF
    数据库管理系统(DBMS)广泛存在于现代计算机系统中,并且是其重要的组成部分。它是学术界以及工业界数十年研究和发展的成果。在计算机发展史上,数据库属于最早开发的多用户服务系统之一,因此,它的研究也催生了许多为保证系统可拓展性以及稳定性的系统开发技术,这些技术如今被应用于许多其他的领域。虽然许多数据库的相关算法和概念广泛见于教科书中,但关于如何让一个数据库工作的系统设计问题却鲜有资料介绍。本文从体系架构角度探讨数据库设计的一些准则,包括处理模型、并行架构、存储系统设计、事务处理系统、查询处理及优化结构以及具有代表性的共享组件和应用。当业界有多种设计方式可供选择时,我们以当前成功的商业开源软件作为参考标准

    P-Pascal : a data-oriented persistent programming language

    Get PDF
    Bibliography: pages 187-199.Persistence is measured by the length of time an object is retained and is usable in a system. Persistent languages extend general purpose languages by providing the full range of persistence for data of any type. Moreover, data which remains on disk after program termination, is manipulated in the same way as transient data. As these languages are based on general purpose programming languages, they tend to be program-centred rather than data-centred. This thesis investigates the inclusion of data-oriented features in a persistent programming language. P-Pascal, a Persistent Pascal, has been designed and implemented to develop techniques for data clustering, metadata maintenance, security enforcement and bulk data management. It introduces type completeness to Pascal and in particular shows how a type-complete set constructor can be provided. This type is shown to be a practical and versatile mechanism for handling bulk data collections in a persistent environment. Relational algebra operators are provided and the automatic optimisation of set expressions is performed by the compiler and the runtime system. The P-Pascal Abstract Machine incorporates two complementary data placement strategies, automatic updating of type information, and metadata query facilities. The protection of data types, primary (named) objects and their individual components is supported. The challenges and opportunities presented by the persistent store organisation are discussed, and techniques for efficiently exploiting these properties are proposed. We also describe the effects on a data-oriented system of treating persistent and transient data alike, so that they cannot be distinguished statically. We conclude that object clustering, metadata maintenance and security enforcement can and should be incorporated in persistent programming languages. The provision of a built-in, type-complete bulk data constructor and its non-procedural operators is demonstrated. We argue that this approach is preferable to engineering such objects on top of a language, because of greater ease of use and considerable opportunity for automatic optimisation. The existence of such a type does not preclude programmers from constructing their own bulk objects using other types - this is but one advantage of a persistent language over a database system

    Retrospection on a database system

    No full text
    corecore