24 research outputs found

    Web and Semantic Web Query Languages

    Get PDF
    A number of techniques have been developed to facilitate powerful data retrieval on the Web and Semantic Web. Three categories of Web query languages can be distinguished, according to the format of the data they can retrieve: XML, RDF and Topic Maps. This article introduces the spectrum of languages falling into these categories and summarises their salient aspects. The languages are introduced using common sample data and query types. Key aspects of the query languages considered are stressed in a conclusion

    Survey over Existing Query and Transformation Languages

    Get PDF
    A widely acknowledged obstacle for realizing the vision of the Semantic Web is the inability of many current Semantic Web approaches to cope with data available in such diverging representation formalisms as XML, RDF, or Topic Maps. A common query language is the first step to allow transparent access to data in any of these formats. To further the understanding of the requirements and approaches proposed for query languages in the conventional as well as the Semantic Web, this report surveys a large number of query languages for accessing XML, RDF, or Topic Maps. This is the first systematic survey to consider query languages from all these areas. From the detailed survey of these query languages, a common classification scheme is derived that is useful for understanding and differentiating languages within and among all three areas

    Implementation of Web Query Languages Reconsidered

    Get PDF
    Visions of the next generation Web such as the "Semantic Web" or the "Web 2.0" have triggered the emergence of a multitude of data formats. These formats have different characteristics as far as the shape of data is concerned (for example tree- vs. graph-shaped). They are accompanied by a puzzlingly large number of query languages each limited to one data format. Thus, a key feature of the Web, namely to make it possible to access anything published by anyone, is compromised. This thesis is devoted to versatile query languages capable of accessing data in a variety of Web formats. The issue is addressed from three angles: language design, common, yet uniform semantics, and common, yet uniform evaluation. % Thus it is divided in three parts: First, we consider the query language Xcerpt as an example of the advocated class of versatile Web query languages. Using this concrete exemplar allows us to clarify and discuss the vision of versatility in detail. Second, a number of query languages, XPath, XQuery, SPARQL, and Xcerpt, are translated into a common intermediary language, CIQLog. This language has a purely logical semantics, which makes it easily amenable to optimizations. As a side effect, this provides the, to the best of our knowledge, first logical semantics for XQuery and SPARQL. It is a very useful tool for understanding the commonalities and differences of the considered languages. Third, the intermediate logical language is translated into a query algebra, CIQCAG. The core feature of CIQCAG is that it scales from tree- to graph-shaped data and queries without efficiency losses when tree-data and -queries are considered: it is shown that, in these cases, optimal complexities are achieved. CIQCAG is also shown to evaluate each of the aforementioned query languages with a complexity at least as good as the best known evaluation methods so far. For example, navigational XPath is evaluated with space complexity O(q d) and time complexity O(q n) where q is the query size, n the data size, and d the depth of the (tree-shaped) data. CIQCAG is further shown to provide linear time and space evaluation of tree-shaped queries for a larger class of graph-shaped data than any method previously proposed. This larger class of graph-shaped data, called continuous-image graphs, short CIGs, is introduced for the first time in this thesis. A (directed) graph is a CIG if its nodes can be totally ordered in such a manner that, for this order, the children of any node form a continuous interval. CIQCAG achieves these properties by employing a novel data structure, called sequence map, that allows an efficient evaluation of tree-shaped queries, or of tree-shaped cores of graph-shaped queries on any graph-shaped data. While being ideally suited to trees and CIGs, the data structure gracefully degrades to unrestricted graphs. It yields a remarkably efficient evaluation on graph-shaped data that only a few edges prevent from being trees or CIGs

    Rule-based Methodologies for the Specification and Analysis of Complex Computing Systems

    Full text link
    Desde los orígenes del hardware y el software hasta la época actual, la complejidad de los sistemas de cálculo ha supuesto un problema al cual informáticos, ingenieros y programadores han tenido que enfrentarse. Como resultado de este esfuerzo han surgido y madurado importantes áreas de investigación. En esta disertación abordamos algunas de las líneas de investigación actuales relacionada con el análisis y la verificación de sistemas de computación complejos utilizando métodos formales y lenguajes de dominio específico. En esta tesis nos centramos en los sistemas distribuidos, con un especial interés por los sistemas Web y los sistemas biológicos. La primera parte de la tesis está dedicada a aspectos de seguridad y técnicas relacionadas, concretamente la certificación del software. En primer lugar estudiamos sistemas de control de acceso a recursos y proponemos un lenguaje para especificar políticas de control de acceso que están fuertemente asociadas a bases de conocimiento y que proporcionan una descripción sensible a la semántica de los recursos o elementos a los que se accede. También hemos desarrollado un marco novedoso de trabajo para la Code-Carrying Theory, una metodología para la certificación del software cuyo objetivo es asegurar el envío seguro de código en un entorno distribuido. Nuestro marco de trabajo está basado en un sistema de transformación de teorías de reescritura mediante operaciones de plegado/desplegado. La segunda parte de esta tesis se concentra en el análisis y la verificación de sistemas Web y sistemas biológicos. Proponemos un lenguaje para el filtrado de información que permite la recuperación de informaciones en grandes almacenes de datos. Dicho lenguaje utiliza información semántica obtenida a partir de ontologías remotas para re nar el proceso de filtrado. También estudiamos métodos de validación para comprobar la consistencia de contenidos web con respecto a propiedades sintácticas y semánticas. Otra de nuestras contribuciones es la propuesta de un lenguaje que permite definir y comprobar automáticamente restricciones semánticas y sintácticas en el contenido estático de un sistema Web. Finalmente, también consideramos los sistemas biológicos y nos centramos en un formalismo basado en lógica de reescritura para el modelado y el análisis de aspectos cuantitativos de los procesos biológicos. Para evaluar la efectividad de todas las metodologías propuestas, hemos prestado especial atención al desarrollo de prototipos que se han implementado utilizando lenguajes basados en reglas.Baggi ., M. (2010). Rule-based Methodologies for the Specification and Analysis of Complex Computing Systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8964Palanci

    Querying heterogeneous data in an in-situ unified agile system

    Get PDF
    Data integration provides a unified view of data by combining different data sources. In today’s multi-disciplinary and collaborative research environments, data is often produced and consumed by various means, multiple researchers operate on the data in different divisions to satisfy various research requirements, and using different query processors and analysis tools. This makes data integration a crucial component of any successful data intensive research activity. The fundamental difficulty is that data is heterogeneous not only in syntax, structure, and semantics, but also in the way it is accessed and queried. We introduce QUIS (QUery In-Situ), an agile query system equipped with a unified query language and a federated execution engine. It is capable of running queries on heterogeneous data sources in an in-situ manner. Its language provides advanced features such as virtual schemas, heterogeneous joins, and polymorphic result set representation. QUIS utilizes a federation of agents to transform a given input query written in its language to a (set of) computation models that are executable on the designated data sources. Federative query virtualization has the disadvantage that some aspects of a query may not be supported by the designated data sources. QUIS ensures that input queries are always fully satisfied. Therefore, if the target data sources do not fulfill all of the query requirements, QUIS detects the features that are lacking and complements them in a transparent manner. QUIS provides union and join capabilities over an unbound list of heterogeneous data sources; in addition, it offers solutions for heterogeneous query planning and optimization. In brief, QUIS is intended to mitigate data access heterogeneity through query virtualization, on-the-fly transformation, and federated execution. It offers in-Situ querying, agile querying, heterogeneous data source querying, unifeied execution, late-bound virtual schemas, and Remote execution

    Just-in-time Analytics Over Heterogeneous Data and Hardware

    Get PDF
    Industry and academia are continuously becoming more data-driven and data-intensive, relying on the analysis of a wide variety of datasets to gain insights. At the same time, data variety increases continuously across multiple axes. First, data comes in multiple formats, such as the binary tabular data of a DBMS, raw textual files, and domain-specific formats. Second, different datasets follow different data models, such as the relational and the hierarchical one. Data location also varies: Some datasets reside in a central "data lake", whereas others lie in remote data sources. In addition, users execute widely different analysis tasks over all these data types. Finally, the process of gathering and integrating diverse datasets introduces several inconsistencies and redundancies in the data, such as duplicate entries for the same real-world concept. In summary, heterogeneity significantly affects the way data analysis is performed. In this thesis, we aim for data virtualization: Abstracting data out of its original form and manipulating it regardless of the way it is stored or structured, without a performance penalty. To achieve data virtualization, we design and implement systems that i) mask heterogeneity through the use of heterogeneity-aware, high-level building blocks and ii) offer fast responses through on-demand adaptation techniques. Regarding the high-level building blocks, we use a query language and algebra to handle multiple collection types, such as relations and hierarchies, express transformations between these collection types, as well as express complex data cleaning tasks over them. In addition, we design a location-aware compiler and optimizer that masks away the complexity of accessing multiple remote data sources. Regarding on-demand adaptation, we present a design to produce a new system per query. The design uses customization mechanisms that trigger runtime code generation to mimic the system most appropriate to answer a query fast: Query operators are thus created based on the query workload and the underlying data models; the data access layer is created based on the underlying data formats. In addition, we exploit emerging hardware by customizing the system implementation based on the available heterogeneous processors â CPUs and GPGPUs. We thus pair each workload with its ideal processor type. The end result is a just-in-time database system that is specific to the query, data, workload, and hardware instance. This thesis redesigns the data management stack to natively cater for data heterogeneity and exploit hardware heterogeneity. Instead of centralizing all relevant datasets, converting them to a single representation, and loading them in a monolithic, static, suboptimal system, our design embraces heterogeneity. Overall, our design decouples the type of performed analysis from the original data layout; users can perform their analysis across data stores, data models, and data formats, but at the same time experience the performance offered by a custom system that has been built on demand to serve their specific use case

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Personalized City Tours - An Extension of the OGC OpenLocation Specification

    Get PDF
    A business trip to London last month , a day visit in Cologne next saturday and romantic weekend in Paris in autumn – this example exhibits one of the central characteristics of today’s tourism. People in the western hemisphere take much pleasure in frequent and repeated short term visits of cities. Every city visitor faces the general problems of where to go and what to see in the diverse microcosm of a metropolis. This thesis presents a framework for the generation of personalized city tours - as extension of the Open Location Specification of the Open Geospatial Consortium. It is founded on context-awareness and personalization while at the same time proposing a combined approach to allow for adaption to the user. This framework considers TimeGeography and its algorithmic implementations to be able to cope with spatio-temporal constraints of a city tour. Traveling salesmen problems - for which a heuristic approache is proposed – are subjacent to the tour generation. To meet the requirements of today’s distributed and heterogeneous computing environments, the tour framework comprises individual services that expose standard-compliant interfaces and allow for integration in service oriented architectures

    A Core Calculus for XQuery 3.0: Combining Navigational and Pattern Matching Approaches

    Get PDF
    International audienceXML processing languages can be classified according to whether they extract XML data by paths or patterns. The strengths of one category correspond to the weaknesses of the other. In this work, we propose to bridge the gap between these two classes by considering two languages, one in each class: XQuery (for path-based extraction) and CDuce (for pattern-based extraction). To this end, we extend CDuce so as it can be seen as a succinct core λ-calculus that captures XQuery 3.0. The extensions we consider essentially allow CDuce to implement XPath-like navigational expressions by pattern matching and precisely type them. The elaboration of XQuery 3.0 into the extended CDuce provides a for-mal semantics and a sound static type system for XQuery 3.0 programs
    corecore