18 research outputs found

    Evolving database systems : a persistent view

    Get PDF
    Submitted to POS7 This work was supported in St Andrews by EPSRC Grant GR/J67611 "Delivering the Benefits of Persistence"Orthogonal persistence ensures that information will exist for as long as it is useful, for which it must have the ability to evolve with the growing needs of the application systems that use it. This may involve evolution of the data, meta-data, programs and applications, as well as the users' perception of what the information models. The need for evolution has been well recognised in the traditional (data processing) database community and the cost of failing to evolve can be gauged by the resources being invested in interfacing with legacy systems. Zdonik has identified new classes of application, such as scientific, financial and hypermedia, that require new approaches to evolution. These applications are characterised by their need to store large amounts of data whose structure must evolve as it is discovered by the applications that use it. This requires that the data be mapped dynamically to an evolving schema. Here, we discuss the problems of evolution in these new classes of application within an orthogonally persistent environment and outline some approaches to these problems.Postprin

    XML Standardized W3C Tool for effective communication between B2B or B2C Applications

    Get PDF
    XML is becoming a standard for communicating the information or data over the internet. The shift from SGML to XML has created new demands for managing the structured documents. Many XML documents will be transient representations for the purpose of data exchange between different types of applications, but there will also be a need for effective means to manage persistent XML data as a database. In this paper we explore the different types of XML documents and XML databases. The purpose of the paper is not to suggest this technology is covering all the features necessary for effective communication of data between applications. Instead the purpose is to initiate discussion of the requirements arising from document collections, to offer a context in which to evaluate current and future solutions, and to encourage the develop ment of proper models and systems for XML database management. Our discussion addresses issues arising from data modeling, data definition, and data manipulation. In future, XML will become the standard for communicating data between business - to - business applications, inventory database access and sharing, integration of commercial transactions, and workflo

    The Evaluation of Content-Based Web Queries

    Get PDF
    We introduce the notions of syntactically and semantically structured data to refine the notion of semi-structured data. As we will see, most data found on the Web is syntactically structured. In order to evaluate content-based Web queries, semantically structured data is needed. The problem occurs to transform syntactically structured data into semantically structured data. Syntactically and semantically structured data can be represented by trees. Our main contribution is a powerful restructuring mechanism that allows to express the transformation of trees representing syntactically structured data to trees that represent semantically structured data. We embed our restructuring mechanism into RAW (Relational Algebra for the Web) and demonstrate its expressiveness by several example queries

    Efficient storage of XML data

    Full text link
    We introduce NATIX, an efficient, native repository for storing, retrieving and managing tree-structured large objects, preferably XML documents. In contrast to traditionallarge object (LOB) managers, we do not split at arbitrary byte positions but take the semantics of the underlying tree structure of XML documents into account. Our parameterizable split algorithm dynamically maintains physical records of size smaller than a page which contain sets of connected tree nodes. This not only improves efficiency by clustering subtrees but also facilitates their compact representation. Existing approaches to store XML documents either use flat files or map every single tree node onto a separate physical record. The increased flexibility of our approach results in higher efficiency. Performance measurements validate this claim

    Automatic Migration of Files into Relational Databases

    Get PDF
    ABSTRACT In order to provide database-like features for files, particularly for searching in Web data, one solution is to migrate file data into a relational database. Having stored the data, the capabilities of SQL can be used for querying, provided, the data has been given some structure. To this end, an adapter must be implemented that converts data from files into the database. This paper proposes a specification-based automation for this procedure: Given some descriptive specification of file contents, those file adapters are generated. An adequate specification language provides powerful concepts to describe the contents of files. In contrast to similar work, directory structures are taken into account because they often contain useful semantics

    Selectively Storing XML Data in Relations

    Get PDF

    An efficient and scalable algorithm for clustering XML documents by structure

    Full text link

    Anatomy of a Native XML Base Management System

    Full text link
    Several alternatives to manage large XML document collections exist, ranging from file systems over relational or other database systems to specifically tailored XML repositories. In this paper we give a tour of Natix, a database management system designed from scratch for storing and processing XML data. Contrary to the common belief that management of XML data is just another application for traditional databases like relational systems, we illustrate how almost every component in a database system is affected in terms of adequacy and performance. We show how to design and optimize areas such as storage, transaction management comprising recovery and multi-user synchronisation as well as query processing for XML

    Математичне та програмне забезпечення обробки надвеликих масивів даних у форматі XML

    Get PDF
    Актуальність теми: XML – це популярний формат для передачі і зберігання даних. На цьом форматі побудовані стандарти для обміну даними у багатьох галузях діяльності. Тема роботи є актуальною, оскільки на сьогодні існуючі засоби роботи з XML не дають можливості аналітичної обробки надвеликих масивів даних. Мета дослідження: розробка засобів аналітичної обробки надвеликих масивів XML-документів, що добре інтегруються з існуючими системами зберігання даних. Для реалізації поставленої мети були сформульовані наступні завдання: - розробити модель обробки XML-даних та надвеликих масивів таких даних; - розробити метод перетворення даних з умовою масового паралельного виконання; - розробити архітектуру програмного забезпечення, що реалізовує такий метод; - реалізувати програмне забезпечення для обробки надвеликих масивів XML-даних; - дослідити ефективність розробленого методу. Об’єкт дослідження: надвеликі масиви даних у форматі XML Предмет дослідження:.методи обробки надвеликих масивів даних у форматі XML. Методи дослідження: при проведенні досліджень у дисертаційній роботі використовувались методи обробки надвеликих масивів даних на основі масових паралельних обчислень. Наукова новизна: Найбільш суттєвими науковими результатами магістерської дисертації є: – вперше створено метод обробки надвеликих масивів даних у форматі XML, що дозволяє виконання аналітичних запитів; – розроблено програмне забезпечення, що використовує створений метод. Практичне значення отриманих результатів визначається тим, що запропонований алгоритм багаторазово прискорює процес аналізу надвеликих масивів даних у форматі XML. Зв’язок роботи з науковими програмами, планами, темами: дисертаційна робота виконувалась на кафедрі автоматизованих систем обробки інформації та управління Національного технічного університету України «Київський політехнічний інститут ім. Ігоря Сікорського» в рамках теми «Методи та технології високопродуктивних обчислень та обробки надвеликих масивів даних». Державний реєстраційний номер 0117U000924. Апробація: Основні положення роботи доповідались і обговорювались на III Всеукраїнській науково-практична конференція молодих вчених та студентів «Інформаційні системи та технології управління» (ІСТУ-2019) у рамках доповіді на тему «Метод обробки надвеликих масивів XML даних».Relevance of the topic: XML is a popular format for transmitting and storing data. This format builds standards for data exchange in many industries. The topic of work is relevant, because today the existing XML tools do not allow analytical processing of large data sets. Research goal: development of analytical tools for processing large arrays of XML documents that integrate well with existing storage systems. The following tasks have been set to reach the goal: - develop a model for processing XML data and large arrays of such data; - to develop a method of data conversion with the condition of mass parallel execution; - develop a software architecture that implements this method; - implement software for processing oversized XML data sets; - to investigate the effectiveness of the developed method. Research object: extra-large XML data sets Research subject: methods of processing large data sets in XML format Research methods: the methods of processing of large data sets based on mass parallel calculations were used in the dissertation research. Scientific novelty: The most significant scientific results of the master's thesis are: – for the first time the method of processing of large data sets in XML format was created, which allows to perform analytical queries; – software developed using the created method. Practical value of the results: The practical value of the obtained results is determined by the fact that the proposed algorithm significantly improves the process of analysis of large data sets in XML format. Link to scientific programs: This dissertation was performet at the Department of Automated Information Processing and Management Systems of the National Technical University of Ukraine "Kyiv Polytechnic Institute named after Igor Sikorsky” within the topic “Methods and technologies of high-performance computing and processing of large data sets”. State Registration Number 0117U000924. Publications: The main provisions of the work were reported and discussed at the III All-Ukrainian Scientific and Practical Conference of Young Scientists and Students "Information Systems and Technologies of Management" (ISTU-2019) in the framework of the report on the method of processing large data sets of XML data
    corecore