35 research outputs found

    Regular Rooted Graph Grammars

    Get PDF
    In dieser Arbeit wir ein pragmatischer Ansatz zur Typisierung, statischen Analyse und Optimierung von Web-Anfragespachen, speziell Xcerpt, untersucht. Pragmatisch ist der Ansatz in dem Sinne, dass dem Benutzer keinerlei EinschrƤnkungen aus Entscheidbarkeits- oder EffizienzgrĆ¼nden auf modellierbare Typen gestellt werden. Effizienz und Entscheidbarkeit werden stattdessen, falls nƶtig, durch Vergrƶberungen bei der TypprĆ¼fung erkauft. Eine Typsprache zur Typisierung von Graph-strukturierten Daten im Web wird eingefĆ¼hrt. Modellierbare Graphen sind so genannte gewurzelte Graphen, welche aus einem Spannbaum und Querreferenzen aufgebaut sind. Die Typsprache basiert auf regulƤre Baum Grammatiken, welche um typisierte Referenzen erweitert wurde. Neben wie im Web mit XML Ć¼blichen geordneten strukturierten Daten, sind auch ungeordnete Daten, wie etwa in Xcerpt oder RDF Ć¼blich, modellierbar. Der dazu verwendete Ansatz---ungeordnete Interpretation RegulƤrer AusdrĆ¼cke---ist neu. Eine operationale Semantik fĆ¼r geordnete wie ungeordnete Typen wird auf Basis spezialisierter Baumautomaten und sog. Counting Constraints (welche wiederum auf presburgerarithmetische AusdrĆ¼cke) basieren. Es wird ferner statische Typ-PrĆ¼fung und -Inferenz von Xcerpt Anfrage- und Konstrukttermen, wie auch Optimierung von Xcerpt Anfragen auf Basis von Typinformation eingefĆ¼hrt.This thesis investigates a pragmatic approach to typing, static analysis and static optimization of Web query languages, in special the Web query language Xcerpt. The approach is pragmatic in the sense, that no restriction on the types are made for decidability or efficiency reasons, instead precision is given up if necessary. Pragmatics on the dynamic side means to use types not only to ensure validity of objects operating on, but also influencing query selection based on types. A typing language for typing of graph structured data on the Web is introduced. The Graphs in mind are based on spanning trees with references, the typing languages is based on regular tree grammars with typed reference extensions. Beside ordered data in the spirit of XML, unordered data (i.e. in the spirit of the Xcerpt data model or RDF) can be modelled using regular expressions under unordered interpretation ā€“ this approach is new. An operational semantics for ordered and unordered types is given based on specialized regular tree automata and counting constraints (them again based on Presburger arithmetic formulae). Static type checking of Xcerpt query and construct terms is introduced, as well as optimization of Xcerpt query terms based on schema information

    An Integrated Environment For Automated Benchmarking And Validation Of XML-Based Applications

    Get PDF
    Testing is the dominant software verification technique used in industry; it is a critical and most expensive process during software development. Along with the increase in software complexity, the costs of testing are increasing rapidly. Faced with this problem, many researchers are working on automated testing, attempting to find methods that execute the processes of testing automatically and cut down the cost of testing. Today, software systems are becoming complicated. Some of them are composed of several different components. Some projects even required different systems to work together and support each other. The XML have been developed to facilitate data exchange and enhance interoperability among software systems. Along with the development of XML technologies, XML-based systems are used widely in many domains. In this thesis we will present a methodology for testing XML-based applications automatically. In this thesis we present a methodology called XPT (XML-based Partition Testing) which is defined as deriving XML Instances from XML Schema automatically and systematically. XPT methodology is inspired from the Category-partition method, which is a well-known approach to Black-box Test generation. We follow a similar idea of applying partitioning to an XML Schema in order to generate a suite of conforming instances; in addition, since the number of generated instances soon becomes unmanageable, we also introduce a set of heuristics for reducing the suite; while optimizing the XML Schema coverage. The aim of our research is not only to invent a technical method, but also to attempt to apply XPT methodology in real applications. We have created a proof-of-concept tool, TAXI, which is the implementation of XPT. This tool has a graphic user interface that can guide and help testers to use it easily. TAXI can also be customized for specific applications to build the test environment and automate the whole processes of testing. The details of TAXI design and the case studies using TAXI in different domains are presented in this thesis. The case studies cover three test purposes. The first one is for functional correctness, specifically we apply the methodology to do the XSLT Testing, which uses TAXI to build an automatic environment for testing the XSLT transformation; the second is for robustness testing, we did the XML database mapping test which tests the data transformation tool for mapping and populate the data from XML Document to XML database; and the third one is for the performance testing, we show XML benchmark that uses TAXI to do the benchmarking of the XML-based applications

    Distributed XML Query Processing

    Get PDF
    While centralized query processing over collections of XML data stored at a single site is a well understood problem, centralized query evaluation techniques are inherently limited in their scalability when presented with large collections (or a single, large document) and heavy query workloads. In the context of relational query processing, similar scalability challenges have been overcome by partitioning data collections, distributing them across the sites of a distributed system, and then evaluating queries in a distributed fashion, usually in a way that ensures locality between (sub-)queries and their relevant data. This thesis presents a suite of query evaluation techniques for XML data that follow a similar approach to address the scalability problems encountered by XML query evaluation. Due to the significant differences in data and query models between relational and XML query processing, it is not possible to directly apply distributed query evaluation techniques designed for relational data to the XML scenario. Instead, new distributed query evaluation techniques need to be developed. Thus, in this thesis, an end-to-end solution to the scalability problems encountered by XML query processing is proposed. Based on a data partitioning model that supports both horizontal and vertical fragmentation steps (or any combination of the two), XML collections are fragmented and distributed across the sites of a distributed system. Then, a suite of distributed query evaluation strategies is proposed. These query evaluation techniques ensure locality between each fragment of the collection and the parts of the query corresponding to the data in this fragment. Special attention is paid to scalability and query performance, which is achieved by ensuring a high degree of parallelism during distributed query evaluation and by avoiding access to irrelevant portions of the data. For maximum flexibility, the suite of distributed query evaluation techniques proposed in this thesis provides several alternative approaches for evaluating a given query over a given distributed collection. Thus, to achieve the best performance, it is necessary to predict and compare the expected performance of each of these alternatives. In this work, this is accomplished through a query optimization technique based on a distribution-aware cost model. The same cost model is also used to fine-tune the way a collection is fragmented to the demands of the query workload evaluated over this collection. To evaluate the performance impact of the distributed query evaluation techniques proposed in this thesis, the techniques were implemented within a production-quality XML database system. Based on this implementation, a thorough experimental evaluation was performed. The results of this evaluation confirm that the distributed query evaluation techniques introduced here lead to significant improvements in query performance and scalability both when compared to centralized techniques and when compared to existing distributed query evaluation techniques

    Structural patterns for document engineering: from an empirical bottom-up analysis to an ontological theory

    Get PDF
    This thesis aims at investigating a new approach to document analysis based on the idea of structural patterns in XML vocabularies. My work is founded on the belief that authors do naturally converge to a reasonable use of markup languages and that extreme, yet valid instances are rare and limited. Actual documents, therefore, may be used to derive classes of elements (patterns) persisting across documents and distilling the conceptualization of the documents and their components, and may give ground for automatic tools and services that rely on no background information (such as schemas) at all. The central part of my work consists in introducing from the ground up a formal theory of eight structural patterns (with three sub-patterns) that are able to express the logical organization of any XML document, and verifying their identifiability in a number of different vocabularies. This model is characterized by and validated against three main dimensions: terseness (i.e. the ability to represent the structure of a document with a small number of objects and composition rules), coverage (i.e. the ability to capture any possible situation in any document) and expressiveness (i.e. the ability to make explicit the semantics of structures, relations and dependencies). An algorithm for the automatic recognition of structural patterns is then presented, together with an evaluation of the results of a test performed on a set of more than 1100 documents from eight very different vocabularies. This language-independent analysis confirms the ability of patterns to capture and summarize the guidelines used by the authors in their everyday practice. Finally, I present some systems that work directly on the pattern-based representation of documents. The ability of these tools to cover very different situations and contexts confirms the effectiveness of the model

    A Pattern-based Foundation for Language-Driven Software Engineering

    Get PDF
    This work brings together two fundamental ideas for modelling, programming and analysing software systems. The first idea is of a methodological nature: engineering software by systematically creating and relating languages. The second idea is of a technical nature: using patterns as a practical foundation for computing. The goal is to show that the systematic creation and layering of languages can be reduced to the elementary operations of pattern matching and instantiation and that this pattern-based approach provides a formal and practical foundation for language-driven modelling, programming and analysis. The underpinning of the work is a novel formalism for recognising, deconstructing, creating, searching, transforming and generally manipulating data structures. The formalism is based on typed sequences, a generic structure for representing trees. It defines basic pattern expressions for matching and instantiating atomic values and variables. Horizontal, vertical, diagonal and hierarchical operators are different ways of combining patterns. Transformations combine matching and instantiating patterns and they are patterns themselves. A quasiquotation mechanism allows arbitrary levels of meta-pattern functionality and forms the basis of pattern abstraction. Path polymorphic operators are used to specify fine-grained search of structures. A range of core concepts such as layering, parsing and pattern-based computing can naturally be defined through pattern expressions. Three language-driven tools that utilise the pattern formalism showcase the applicability of the pattern-approach. Concat is a self-sustaining (meta-)programming system in which all computations are expressed by matching and instantiation. This includes parsing, executing and optimising programs. By applying its language engineering tools to its own meta-language, Concat can extend itself from within. XMF (XML Modeling Framework) is a browser-based modelling- and meta-modelling framework that provides flexible means to create and relate modelling languages and to query and validate models. The pattern functionality that makes this possible is partly exposed as a schema language and partly as a JavaScript library. CFR (Channel Filter Rule Language) implements a language-driven approach for layered analysis of communication in complex networked systems. The communication on each layer is visible in the language of an ā€œabstract protocolā€ that is defined by communication patterns

    Domain specific modeling and analysis

    Get PDF
    It is desirable to model software systems in such a way that analysis of the systems, and tool development for such analysis, is readily possible and feasible in the context of large scientific research projects. This thesis emphasizes the methodology that serves as a basis for such developments. I focus on methods for the design of data-languages and their corresponding tools.UBL - phd migration 201

    A pattern-based foundation for language-driven software engineering

    Get PDF
    This work brings together two fundamental ideas for modelling, programming and analysing software systems. The first idea is of a methodological nature: engineering software by systematically creating and relating languages. The second idea is of a technical nature: using patterns as a practical foundation for computing. The goal is to show that the systematic creation and layering of languages can be reduced to the elementary operations of pattern matching and instantiation and that this pattern-based approach provides a formal and practical foundation for language-driven modelling, programming and analysis. The underpinning of the work is a novel formalism for recognising, deconstructing, creating, searching, transforming and generally manipulating data structures. The formalism is based on typed sequences, a generic structure for representing trees. It defines basic pattern expressions for matching and instantiating atomic values and variables. Horizontal, vertical, diagonal and hierarchical operators are different ways of combining patterns. Transformations combine matching and instantiating patterns and they are patterns themselves. A quasiquotation mechanism allows arbitrary levels of meta-pattern functionality and forms the basis of pattern abstraction. Path polymorphic operators are used to specify fine-grained search of structures. A range of core concepts such as layering, parsing and pattern-based computing can naturally be defined through pattern expressions. Three language-driven tools that utilise the pattern formalism showcase the applicability of the pattern-approach. Concat is a self-sustaining (meta-)programming system in which all computations are expressed by matching and instantiation. This includes parsing, executing and optimising programs. By applying its language engineering tools to its own meta-language, Concat can extend itself from within. XMF (XML Modeling Framework) is a browser-based modelling- and meta-modelling framework that provides flexible means to create and relate modelling languages and to query and validate models. The pattern functionality that makes this possible is partly exposed as a schema language and partly as a JavaScript library. CFR (Channel Filter Rule Language) implements a language-driven approach for layered analysis of communication in complex networked systems. The communication on each layer is visible in the language of an ā€œabstract protocolā€ that is defined by communication patterns.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Organising knowledge in the age of the semantic web: a study of the commensurability of ontologies

    Get PDF
    Ā This study is directed towards the problem of conceptual translation across different data management systems and formats, with a particular focus on those used in the emerging world of the Semantic Web. Increasingly, organisations have sought to connect information sources and services within and beyond their enterprise boundaries, building upon existing Internet facilities to offer improved research, planning, reporting and management capabilities. The Semantic Web is an ambitious response to this growing demand, offering a standards-based platform for sharing, linking and reasoning with information. The imagined result, a globalised knowledge network formed out of mutually referring data structures termed "ontologies", would make possible new kinds of queries, inferences and amalgamations of information. Such a network, though, is premised upon large numbers of manually drawn links between these ontologies. In practice, establishing these links is a complex translation task requiring considerable time and expertise; invariably, as ontologies and other structured information sources are published, many useful connections are neglected. To combat this, in recent years substantial research has been invested into "ontology matching" - the exploration of algorithmic approaches for automatically translating or aligning ontologies. These approaches, which exploit the explicit semantic properties of individual concepts, have registered impressive precision and recall results against humanly-engineered translations. However they are unable to make use of background cultural information about the overall systems in which those concepts are housed - how those systems are used, for what purpose they were designed, what methodological or theoretical principles underlined their construction, and so on. The present study investigates whether paying attention to these sociological dimensions of electronic knowledge systems could supplement algorithmic approaches in some circumstances. Specifically, it asks whether a holistic notion of commensurability can be useful when aligning or translating between such systems. Ā  Ā  Ā The first half of the study introduces the problem, surveys the literature, and outlines the general approach. It then proposes both a theoretical foundation and a practical framework for assessing commensurability of ontologies and other knowledge systems. Chapter 1 outlines the Semantic Web, ontologies and the problem of conceptual translation, and poses the key research questions. Conceptual translation can be treated as, by turns, a social, philosophical, linguistic or technological problem; Chapter 2 surveys a correspondingly wide range of literature and approaches. Ā  Ā  Ā The methods employed by the study are described in Chapter 3. Chapter 4 critically examines theories of conceptual schemes and commensurability, while Chapter 5 describes the framework itself, comprising a series of specific dimensions, a broad methodological approach, and a means for generating both qualitative and quantitative assessments. The second half of the study then explores the notion of commensurability through several empirical frames. Chapters 6 to 8 applies the framework to a series of case studies. Chapter 6 presents a brief history of knowledge systems, and compares two of these systems - relational databases and Semantic Web ontologies. Chapter 7, in turn, compares several "upper-level" ontologies - reusable schematisations of abstract concepts like Time and Space . Chapter 8 reviews a recent, widely publicised controversy over the standardisation of document formats. This analysis in particular shows how the opaque dry world of technical specifications can reveal the complex network of social dynamics, interests and beliefs which coordinate and motivate them. Collectively, these studies demonstrate the framework is useful in making evident assumptions which motivate the design of different knowledge systems, and further, in assessing the commensurability of those systems. Chapter 9 then presents a further empirical study; here, the framework is implemented as a software system, and pilot tested among a small cohort of researchers. Finally, Chapter 10 summarises the argumentative trajectory of the study as a whole - that, broadly, an elaborated notion of commensurability can tease out important and salient features of translation inscrutable to purely algorithmic methods - and suggests some possibilities for further work
    corecore