78 research outputs found

    Four Lessons in Versatility or How Query Languages Adapt to the Web

    Get PDF
    Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”

    Data Integration on the (Semantic) Web with Rules and Rich Unification

    Get PDF
    For the last decade a multitude of new data formats for the World Wide Web have been developed, and a huge amount of heterogeneous semi-structured data is flourishing online. With the ever increasing number of documents on the Web, rules have been identified as the means of choice for reasoning about this data, transforming and integrating it. Query languages such as SPARQL and rule languages such as Xcerpt use compound queries that are matched or unified with semi-structured data. This notion of unification is different from the one that is known from logic programming engines in that it (i) provides constructs that allow queries to be incomplete in several ways (ii) in that variables may have different types, (iii) in that it results in sets of substitutions for the variables in the query instead of a single substitution and (iv) in that subsumption between queries is much harder to decide than in logic programming. This thesis abstracts from Xcerpt query term simulation, SPARQL graph pattern matching and XPath XML document matching, and shows that all of them can be considered as a form of rich unification. Given a set of mappings between substitution sets of different languages, this abstraction opens up the possibility for format-versatile querying, i.e. combination of queries in different formats, or transformation of one format into another format within a single rule. To show the superiority of this approach, this thesis introduces an extension of Xcerpt called Xcrdf, and describes use-cases for the combined querying and integration of RDF and XML data. With XML being the predominant Web format, and RDF the predominant Semantic Web format, Xcrdf extends Xcerpt by a set of RDF query terms and construct terms, including query primitives for RDF containers collections and reifications. Moreover, Xcrdf includes an RDF path query language called RPL that is more expressive than previously proposed polynomial-time RDF path query languages, but can still be evaluated in polynomial time combined complexity. Besides the introduction of this framework for data integration based on rich unification, this thesis extends the theoretical knowledge about Xcerpt in several ways: We show that Xcerpt simulation unification is decidable, and give complexity bounds for subsumption in several fragments of Xcerpt query terms. The proof is based on a set of subsumption monotone query term transformations, and is only feasible because of the injectivity requirement on subterms of Xcerpt queries. The proof gives rise to an algorithm for deciding Xcerpt query term simulation. Moreover, we give a semantics to locally and weakly stratified Xcerpt programs, but this semantics is applicable not only to Xcerpt, but to any rule language with rich unification, including multi-rule SPARQL programs. Finally, we show how Xcerpt grouping stratification can be reduced to Xcerpt negation stratification, thereby also introducing the notion of local grouping stratification and weak grouping stratification

    Acta Cybernetica : Volume 23. Number 2.

    Get PDF

    Distributed XML Query Processing

    Get PDF
    While centralized query processing over collections of XML data stored at a single site is a well understood problem, centralized query evaluation techniques are inherently limited in their scalability when presented with large collections (or a single, large document) and heavy query workloads. In the context of relational query processing, similar scalability challenges have been overcome by partitioning data collections, distributing them across the sites of a distributed system, and then evaluating queries in a distributed fashion, usually in a way that ensures locality between (sub-)queries and their relevant data. This thesis presents a suite of query evaluation techniques for XML data that follow a similar approach to address the scalability problems encountered by XML query evaluation. Due to the significant differences in data and query models between relational and XML query processing, it is not possible to directly apply distributed query evaluation techniques designed for relational data to the XML scenario. Instead, new distributed query evaluation techniques need to be developed. Thus, in this thesis, an end-to-end solution to the scalability problems encountered by XML query processing is proposed. Based on a data partitioning model that supports both horizontal and vertical fragmentation steps (or any combination of the two), XML collections are fragmented and distributed across the sites of a distributed system. Then, a suite of distributed query evaluation strategies is proposed. These query evaluation techniques ensure locality between each fragment of the collection and the parts of the query corresponding to the data in this fragment. Special attention is paid to scalability and query performance, which is achieved by ensuring a high degree of parallelism during distributed query evaluation and by avoiding access to irrelevant portions of the data. For maximum flexibility, the suite of distributed query evaluation techniques proposed in this thesis provides several alternative approaches for evaluating a given query over a given distributed collection. Thus, to achieve the best performance, it is necessary to predict and compare the expected performance of each of these alternatives. In this work, this is accomplished through a query optimization technique based on a distribution-aware cost model. The same cost model is also used to fine-tune the way a collection is fragmented to the demands of the query workload evaluated over this collection. To evaluate the performance impact of the distributed query evaluation techniques proposed in this thesis, the techniques were implemented within a production-quality XML database system. Based on this implementation, a thorough experimental evaluation was performed. The results of this evaluation confirm that the distributed query evaluation techniques introduced here lead to significant improvements in query performance and scalability both when compared to centralized techniques and when compared to existing distributed query evaluation techniques

    Child Prime Label Approaches to Evaluate XML Structured Queries

    Get PDF
    The adoption of the eXtensible Markup Language (XML) as the standard format to store and exchange semi-structure data has been gaining momentum. The growing number of XML documents leads to the need for appropriate XML querying algorithms which are able to retrieve XML data efficiently. Due to the importance of twig pattern matching in XML retrieval systems, finding all matching occurrences of a tree pattern query in an XML document is often considered as a specific task for XML databases as well as a core operation in XML query processing. This thesis presents a design and implementation of a new indexing technique, called the Child Prime Label (CPL) which exploits the property of prime numbers to identify Parent-Child (P-C) edges in twig pattern queries (TPQs) during query evaluation. The CPL approach can be incorporated efficiently within the existing labelling schemes. The major contributions of this thesis can be seen as a set of novel twig matching algorithms which apply the CPL approach and focus on reducing the overhead of storing useless elements and performing unnecessary computations during the output enumeration. The research presented here is the first to provide an efficient and general solution for TPQs containing ordering constraints and positional predicates specified by the XML query languages. To evaluate the CPL approaches, the holistic model was implemented as an experimental prototype in which the approaches proposed are compared against state-of-the-art holistic twig algorithms. Extensive performance studies on various real-world and artificial datasets were conducted to demonstrate the significant improvement of the CPL approaches over the previous indexing and querying methods. The experimental results demonstrate the validity and improvements of the new algorithms over other related methods on common various subclasses of TPQs. Moreover, the scalability tests reveal that the new algorithms are more suitable for processing large XML datasets

    Refined electrophysiological recording and processing of neural signals from the retina and ascending visual pathways

    Get PDF
    The purpose of this thesis was the development of refined methods for recording and processing of neural signals of the retina and ascending visual pathways. The first chapter describes briefly the fundamentals of the human visual system and the basics of the functional testing of the retina and the visual pathways. The second and third chapters are dedicated to the processing of visual electrophysiological data using the newly developed software ERG Explorer, and present a proposal for an open and standardized data format, ElVisML, for future proof storage of visual electrophysiological data. The fourth chapter describes the development and application of two novel electrodes: First a contact lens electrode for the recording of electrical potentials of the ciliary muscle during accommodation, and second, the marble electrode, which is made of a super-absorbant polymer and allows for a preparation-free recording of visual evoked potentials. Results obtained in studies using the both electrodes are presented. The fifths and last chapter of the thesis presents the results from four studies within the field of visual electrophysiology. The first study examines the ophthalmological assessment of cannabis-induced perception disorder using electrophysiological methods. The second study presents a refined method for the objective assessment of the visual acuity using visual evoked potentials and introduces therefore, a refined stimulus paradigm and a novel method for the analysis of the sweep VEP. The third study presents the results of a newly developed stimulus design for full-field electrophysiology, which allows to assess previously non-recordable electroretinograms. The last study describes a relation of the spatial frequency of a visual stimulus to the amplitudes of visual evoked potentials in comparison to the BOLD response obtained using functional near-infrared spectroscopy and functional magnetic resonance imaging

    pmSys: Implementation of a digital Player Monitoring System

    Get PDF
    A football match can be determined by the smallest factors such as mood, however, but other factors as injuries can determine whether you place first or second. The teams with the least injuried players would have a better edge in reaching the top each season. Since the beginning of monitoring in football it has all been registered by hand using paper and pen. During the 21th century technology has been one of the best and most accurate helping hand any area within monitoring can get. Being able to process large amounts of data in split seconds has proven to be worth the investment in going digital when it comes to monitoring. On the basis of this, pmSys was created to enhance the power of processing personal data in real time. In this master thesis we wanted to develop a system for both the football players and trainers to be able to register and follow up the submitted data in real time. By giving a team these tools we wish to constitute the small factor that can push any football team to the limits without going over the edge and into an injury nightmare

    Toma de decisiones inteligente a partir de registros médicos almacenados en CDA-HL7

    Get PDF
    Debido al incremento exponencial de la información almacenada en las organizaciones, la Sociedad de la Información está siendo superada por la necesidad de nuevos métodos capaces de procesar la información y asegurar su uso productivo. Esto se hace lógicamente extensible a los centros hospitalarios, a partir del uso extendido de las Historias Clínicas en formato electrónico. Disponer de información sistematizada, gestionarla de forma eficiente y segura es esencial para garantizar mejores prácticas en salud. A esto se le añade la necesidad de soportar estándares que permitan el intercambio entre las instituciones de salud; específicamente HL7 se ha convertido en uno de los más utilizados debido a que proporciona el intercambio a partir del metalenguaje XML. En este trabajo se propone una metodología para el descubrimiento de conocimiento implícito en Historias Clínicas en formato semi-estructurado utilizando el contenido y la estructura de los mismos. Los principales resultados son: (1) La metodología para el agrupamiento de Historias Clínicas; (2) La interpretación de los resultados del agrupamiento para asistir la toma de decisiones diagnósticas; (3) La implementación del estándar HL7, para la manipulación de documentos médicos a partir de CDA.Palabras Clave: agrupamiento, descubrimiento de Conocimiento, HCE, XML, CDA

    A Functional, Comprehensive and Extensible Multi-Platform Querying and Transformation Approach

    Get PDF
    This thesis is about a new model querying and transformation approach called FunnyQT which is realized as a set of APIs and embedded domain-specific languages (DSLs) in the JVM-based functional Lisp-dialect Clojure. Founded on a powerful model management API, FunnyQT provides querying services such as comprehensions, quantified expressions, regular path expressions, logic-based, relational model querying, and pattern matching. On the transformation side, it supports the definition of unidirectional model-to-model transformations, of in-place transformations, it supports defining bidirectional transformations, and it supports a new kind of co-evolution transformations that allow for evolving a model together with its metamodel simultaneously. Several properties make FunnyQT unique. Foremost, it is just a Clojure library, thus, FunnyQT queries and transformations are Clojure programs. However, most higher-level services are provided as task-oriented embedded DSLs which use Clojure's powerful macro-system to support the user with tailor-made language constructs important for the task at hand. Since queries and transformations are just Clojure programs, they may use any Clojure or Java library for their own purpose, e.g., they may use some templating library for defining model-to-text transformations. Conversely, like every Clojure program, FunnyQT queries and transformations compile to normal JVM byte-code and can easily be called from other JVM languages. Furthermore, FunnyQT is platform-independent and designed with extensibility in mind. By default, it supports the Eclipse Modeling Framework and JGraLab, and support for other modeling frameworks can be added with minimal effort and without having to modify the respective framework's classes or FunnyQT itself. Lastly, because FunnyQT is embedded in a functional language, it has a functional emphasis itself. Every query and every transformation compiles to a function which can be passed around, given to higher-order functions, or be parametrized with other functions

    Automated and Effective Security Testing for XML-based Vulnerabilities

    Get PDF
    Nowadays, the External Markup Language (XML) is the most commonly used technology in web services for enabling service providers and consumers to exchange data. XML is also widely used to store data and configuration files that control the operation of software systems. Nevertheless, XML suffers from several well-known vulnerabilities such as XML Injections (XMLi). Any exploitation of these vulnerabilities might cause serious and undesirable consequences, e.g., denial of service and accessing or modifying highly-confidential data. Fuzz testing techniques have been investigated in the literature to detect XMLi vulnerabilities. However, their success rate tends to be very low since they cannot generate complex test inputs required for the detection of these vulnerabilities. Furthermore, these approaches are not effective for real-world complex XML-based enterprise systems, which are composed of several components including front-end web applications, XML gateway/firewall, and back-end web services. In this dissertation, we propose several automated security testing strategies for detecting XML-based vulnerabilities. In particular, we tackle the challenges of security testing in an industrial context. Our proposed strategies, target various and complementary aspects of security testing for XML-based systems, e.g., test case generation for XML gateway/firewall. The development and evaluation of these strategies have been done in close collaboration with a leading financial service provider in Luxembourg/Switzerland, namely SIX Payment Services (formerly known as CETREL S.A.). SIX Payment Services processes several thousand financial transactions daily, providing a range of financial services, e.g., online payments, issuing of credit and debit cards. The main research contributions of this dissertation are: -A large-scale and systematic experimental assessment for detecting vulnerabilities in numerous widely-used XML parsers and the underlying systems using them. In particular, we targeted two common XML parser’s vulnerabilities: (i) XML Billion Laughs (BIL), and (ii) XML External Entities (XXE). - A novel automated testing approach, that is based on constraint-solving and input mutation techniques, to detect XMLi vulnerabilities in XML gateway/firewall and back-end web services. - A black-box search-based testing approach to detect XMLi vulnerabilities in front-end web applications. Genetic algorithms are used to search for inputs that can manipulate the application to generate malicious XML messages. - An in-depth analysis of various search algorithms and fitness functions, to improve the search-based testing approach for front-end web applications. - Extensive evaluations of our proposed testing strategies on numerous real-world industrial web services, XML gateway/firewall, and web applications as well as several open-source systems
    corecore