887 research outputs found

    SWI-Prolog and the Web

    Get PDF
    Where Prolog is commonly seen as a component in a Web application that is either embedded or communicates using a proprietary protocol, we propose an architecture where Prolog communicates to other components in a Web application using the standard HTTP protocol. By avoiding embedding in external Web servers development and deployment become much easier. To support this architecture, in addition to the transfer protocol, we must also support parsing, representing and generating the key Web document types such as HTML, XML and RDF. This paper motivates the design decisions in the libraries and extensions to Prolog for handling Web documents and protocols. The design has been guided by the requirement to handle large documents efficiently. The described libraries support a wide range of Web applications ranging from HTML and XML documents to Semantic Web RDF processing. To appear in Theory and Practice of Logic Programming (TPLP)Comment: 31 pages, 24 figures and 2 tables. To appear in Theory and Practice of Logic Programming (TPLP

    When Things Matter: A Data-Centric View of the Internet of Things

    Full text link
    With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed

    A scalable analysis framework for large-scale RDF data

    Get PDF
    With the growth of the Semantic Web, the availability of RDF datasets from multiple domains as Linked Data has taken the corpora of this web to a terabyte-scale, and challenges modern knowledge storage and discovery techniques. Research and engineering on RDF data management systems is a very active area with many standalone systems being introduced. However, as the size of RDF data increases, such single-machine approaches meet performance bottlenecks, in terms of both data loading and querying, due to the limited parallelism inherent to symmetric multi-threaded systems and the limited available system I/O and system memory. Although several approaches for distributed RDF data processing have been proposed, along with clustered versions of more traditional approaches, their techniques are limited by the trade-off they exploit between loading complexity and query efficiency in the presence of big RDF data. This thesis then, introduces a scalable analysis framework for processing large-scale RDF data, which focuses on various techniques to reduce inter-machine communication, computation and load-imbalancing so as to achieve fast data loading and querying on distributed infrastructures. The first part of this thesis focuses on the study of RDF store implementation and parallel hashing on big data processing. (1) A system-level investigation of RDF store implementation has been conducted on the basis of a comparative analysis of runtime characteristics of a representative set of RDF stores. The detailed time cost and system consumption is measured for data loading and querying so as to provide insight into different triple store implementation as well as an understanding of performance differences between different platforms. (2) A high-level structured parallel hashing approach over distributed memory is proposed and theoretically analyzed. The detailed performance of hashing implementations using different lock-free strategies has been characterized through extensive experiments, thereby allowing system developers to make a more informed choice for the implementation of their high-performance analytical data processing systems. The second part of this thesis proposes three main techniques for fast processing of large RDF data within the proposed framework. (1) A very efficient parallel dictionary encoding algorithm, to avoid unnecessary disk-space consumption and reduce computational complexity of query execution. The presented implementation has achieved notable speedups compared to the state-of-art method and also has achieved excellent scalability. (2) Several novel parallel join algorithms, to efficiently handle skew over large data during query processing. The approaches have achieved good load balancing and have been demonstrated to be faster than the state-of-art techniques in both theoretical and experimental comparisons. (3) A two-tier dynamic indexing approach for processing SPARQL queries has been devised which keeps loading times low and decreases or in some instances removes intermachine data movement for subsequent queries that contain the same graph patterns. The results demonstrate that this design can load data at least an order of magnitude faster than a clustered store operating in RAM while remaining within an interactive range for query processing and even outperforms current systems for various queries

    Fast Iterative Graph Computation: A Path Centric Approach

    Full text link
    Abstract—Large scale graph processing represents an inter-esting challenge due to the lack of locality. This paper presents PathGraph for improving iterative graph computation on graphs with billions of edges. Our system design has three unique features: First, we model a large graph using a collection of tree-based partitions and use an path-centric computation rather than vertex-centric or edge-centric computation. Our parallel computation model significantly improves the memory and disk locality for performing iterative computation algorithms. Second, we design a compact storage that further maximize sequential access and minimize random access on storage media. Third, we implement the path-centric computation model by using a scatter/gather programming model, which parallels the iterative computation at partition tree level and performs sequential updates for vertices in each partition tree. The experimental results show that the path-centric approach outperforms vertex-centric and edge-centric systems on a number of graph algorithms for both in-memory and out-of-core graphs

    High throughput indexing for large-scale semantic web data

    Get PDF
    Distributed RDF data management systems become increasingly important with the growth of the Semantic Web. Currently, several such systems have been proposed, however, their indexing methods meet performance bottlenecks either on data loading or querying when processing large amounts of data. In this work, we propose a high throughout index to enable rapid analysis of large datasets. We adopt a hybrid structure to combine the loading speed of similar-size based methods with the execution speed of graph-based approaches, using dynamic data repartitioning over query workloads. We introduce the design and detailed implementation of our method. Experimental results show that the proposed index can indeed vastly improve loading speeds while remaining competitive in terms of performance. Therefore, the method could be considered as a good choice for RDF analysis in large-scale distributed scenarios

    ClioPatria: A SWI-Prolog Infrastructure for the Semantic Web

    Get PDF
    ClioPatria is a comprehensive semantic web development framework based on SWI-Prolog. SWI-Prolog provides an efficient C-based main-memory RDF store that is designed to cooperate naturally and efficiently with Prolog, realizing a flexible RDF-based environment for rule based programming. ClioPatria extends this core with a SPARQL and LOD server, an extensible web frontend to manage the server, browse the data, query the data using SPARQL and Prolog and a Git-based plugin manager. The ability to query RDF using Prolog provides query composition and smooth integration with application logic. ClioPatria is primarily positioned as a prototyping platform for exploring novel ways of reasoning with RDF data. It has been used in several research projects in order to perform tasks such as data integration and enrichment and semantic search

    Database architecture evolution: Mammals flourished long before dinosaurs became extinct

    Get PDF
    The holy grail for database architecture research is to find a solution that is Scalable & Speedy, to run on anything from small ARM processors up to globally distributed compute clusters, Stable & Secure, to service a broad user community, Small & Simple, to be comprehensible to a small team of programmers, Self-managing, to let it run out-of-the-box without hassle. In this paper, we provide a trip report on this quest, covering both past experiences, ongoing research on hardware-conscious algorithms, and novel ways towards self-management specifically focused on column store solutions

    When things matter: A survey on data-centric Internet of Things

    Get PDF
    With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, but several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy and continuous. This paper reviews the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed

    MonetDB: Two Decades of Research in Column-oriented Database Architectures

    Get PDF
    MonetDB is a state-of-the-art open-source column-store database management system targeting applications in need for analytics over large collections of data. MonetDB is actively used nowadays in health care, in telecommunications as well as in scientific databases and in data management research, accumulating on average more than 10,000 downloads on a monthly basis. This paper gives a brief overview of the MonetDB technology as it developed over the past two decades and the main research highlights which drive the current MonetDB design and form the basis for its future evolution
    corecore