1,226 research outputs found

    Scalable RDF Data Compression using X10

    Get PDF
    The Semantic Web comprises enormous volumes of semi-structured data elements. For interoperability, these elements are represented by long strings. Such representations are not efficient for the purposes of Semantic Web applications that perform computations over large volumes of information. A typical method for alleviating the impact of this problem is through the use of compression methods that produce more compact representations of the data. The use of dictionary encoding for this purpose is particularly prevalent in Semantic Web database systems. However, centralized implementations present performance bottlenecks, giving rise to the need for scalable, efficient distributed encoding schemes. In this paper, we describe an encoding implementation based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate performance on a cluster of up to 384 cores and datasets of up to 11 billion triples (1.9 TB). Compared to the state-of-art MapReduce algorithm, we demonstrate a speedup of 2.6-7.4x and excellent scalability. These results illustrate the strong potential of the APGAS model for efficient implementation of dictionary encoding and contributes to the engineering of larger scale Semantic Web applications

    Towards Exascale Scientific Metadata Management

    Full text link
    Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

    Efficient Parallel Dictionary Encoding for RDF Data.

    Get PDF
    The SemanticWeb comprises enormous volumes of semi-structured data elements. For interoperability, these elements are represented by long strings. Such representations are not efficient for the purposes of SemanticWeb applications that perform computations over large volumes of information. A typical method for alleviating the impact of this problem is through the use of compression methods that produce more compact representations of the data. The use of dictionary encoding for this purpose is particularly prevalent in Semantic Web database systems. However, centralized implementations present performance bottlenecks, giving rise to the need for scalable, efficient distributed encoding schemes. In this paper, we describe a straightforward but very efficient encoding algorithm and evaluate its performance on a cluster of up to 384 cores and datasets of up to 11 billion triples (1.9 TB). Compared to the state-of-art MapReduce algorithm, we demonstrate a speedup of 2:6 - 7:4x and excellent scalability

    Efficient Parallel Dictionary Encoding for RDF Data.

    Get PDF
    The SemanticWeb comprises enormous volumes of semi-structured data elements. For interoperability, these elements are represented by long strings. Such representations are not efficient for the purposes of SemanticWeb applications that perform computations over large volumes of information. A typical method for alleviating the impact of this problem is through the use of compression methods that produce more compact representations of the data. The use of dictionary encoding for this purpose is particularly prevalent in Semantic Web database systems. However, centralized implementations present performance bottlenecks, giving rise to the need for scalable, efficient distributed encoding schemes. In this paper, we describe a straightforward but very efficient encoding algorithm and evaluate its performance on a cluster of up to 384 cores and datasets of up to 11 billion triples (1.9 TB). Compared to the state-of-art MapReduce algorithm, we demonstrate a speedup of 2:6 - 7:4x and excellent scalability

    Productive Development of Scalable Network Functions with NFork

    Full text link
    Despite decades of research, developing correct and scalable concurrent programs is still challenging. Network functions (NFs) are not an exception. This paper presents NFork, a system that helps NF domain experts to productively develop concurrent NFs by abstracting away concurrency from developers. The key scheme behind NFork's design is to exploit NF characteristics to overcome the limitations of prior work on concurrency programming. Developers write NFs as sequential programs, and during runtime, NFork performs transparent parallelization by processing packets in different cores. Exploiting NF characteristics, NFork leverages transactional memory and develops efficient concurrent data structures to achieve scalability and guarantee the absence of concurrency bugs. Since NFork manages concurrency, it further provides (i) a profiler that reveals the root causes of scalability bottlenecks inherent to the NF's semantics and (ii) actionable recipes for developers to mitigate these root causes by relaxing the NF's semantics. We show that NFs developed with NFork achieve competitive scalability with those in Cisco VPP [16], and NFork's profiler and recipes can effectively aid developers in optimizing NF scalability.Comment: 16 pages, 8 figure

    A new approach to reversible computing with applications to speculative parallel simulation

    Get PDF
    In this thesis, we propose an innovative approach to reversible computing that shifts the focus from the operations to the memory outcome of a generic program. This choice allows us to overcome some typical challenges of "plain" reversible computing. Our methodology is to instrument a generic application with the help of an instrumentation tool, namely Hijacker, which we have redesigned and developed for the purpose. Through compile-time instrumentation, we enhance the program's code to keep track of the memory trace it produces until the end. Regardless of the complexity behind the generation of each computational step of the program, we can build inverse machine instructions just by inspecting the instruction that is attempting to write some value to memory. Therefore from this information, we craft an ad-hoc instruction that conveys this old value and the knowledge of where to replace it. This instruction will become part of a more comprehensive structure, namely the reverse window. Through this structure, we have sufficient information to cancel all the updates done by the generic program during its execution. In this writing, we will discuss the structure of the reverse window, as the building block for the whole reversing framework we designed and finally realized. Albeit we settle our solution in the specific context of the parallel discrete event simulation (PDES) adopting the Time Warp synchronization protocol, this framework paves the way for further general-purpose development and employment. We also present two additional innovative contributions coming from our innovative reversibility approach, both of them still embrace traditional state saving-based rollback strategy. The first contribution aims to harness the advantages of both the possible approaches. We implement the rollback operation combining state saving together with our reversible support through a mathematical model. This model enables the system to choose in autonomicity the best rollback strategy, by the mutable runtime dynamics of programs. The second contribution explores an orthogonal direction, still related to reversible computing aspects. In particular, we will address the problem of reversing shared libraries. Indeed, leading from their nature, shared objects are visible to the whole system and so does every possible external modification of their code. As a consequence, it is not possible to instrument them without affecting other unaware applications. We propose a different method to deal with the instrumentation of shared objects. All our innovative proposals have been assessed using the last generation of the open source ROOT-Sim PDES platform, where we integrated our solutions. ROOT-Sim is a C-based package implementing a general purpose simulation environment based on the Time Warp synchronization protocol

    Improving Key-Value Database Scalability with Lazy State Determination

    Get PDF
    Applications keep demanding higher and higher throughput and lower response times from Database systems. Databases leverage concurrency, by using both multiple computer systems (nodes) and the multiple cores available in each node, to execute multiple requests (transactions) concurrently. Executing multiple transactions concurrently requires coordination, which is ensured by the database concurrency control (CC) module. However, excessive control/limitation of concurrency by the CC module negatively impacts the overall performance (latency and throughput) of the database system. The performance limitations imposed by the database CC module can be addressed by exploring new hardware, or by leveraging software-based techniques such as futures and lazy evaluation of transactions. This is where Lazy State Determination (LSD) shines [43, 42]. LSD proposes a new transactional API that decreases the conflicts between concurrent transactions by enabling the use of futures in both SQL and Key-Value database systems. The use of futures allows LSD to better capture the application semantics and to make more informed decisions on what really constitutes a conflict. These two key insights get together to create a system that provides high throughput in high contention scenarios. Our work builds on top of a previous LSD prototype. We identified and diagnosed its shortcomings, and devised and implemented a new prototype that addressed them. We validated our new LSD system and evaluated its behaviour and performance by comparing and contrasting with the original prototype. Our evaluation showed that the throughput of the new LSD prototype is from 3.7× to 4.9× higher, in centralized and distributed settings respectively, while also reducing the latency up to 10 times. With this work, we provide an LSD-based Key-Value Database System that has better vertical and horizontal scalability, and can take advantage of systems with higher core count or high number of nodes, in centralized and distributed settings, respectively.As aplicações continuam a exigir aos sistemas de base de dados (BD) débitos cada vez maiores e tempos de resposta cada vez menores. As BD respondem explorando a concorrência, usando múltiplos sistemas computacionais (nós) e os vários cores disponíveis em cada um desses nós, para executar vários pedidos (transações) simultaneamente. A execução de múltiplas transações simultaneamente requer coordenação, assegurada pelo módulo de controlo de concorrência (CC) da BD. No entanto, o controlo/limitação excessiva de concorrência pelo módulo de CC impacta negativamente o desempenho geral (latência e débito) do sistema de BD. As limitações de desempenho impostas pelo módulo CC da BD podem ser abordadas tanto explorando novo hardware como recorrendo a técnicas baseadas em software, como futuros e avaliação diferida de transações. É aqui que o Lazy State Determination (LSD) brilha [43, 42]. O LSD propõe uma nova API transacional que permite o uso de futuros em sistemas de BD SQL e Chave-Valor, diminuindo os conflitos entre transações concorrentes. O uso de futuros permite também que o LSD capture melhor a semântica da aplicação e tome decisões mais informadas sobre o que realmente constitui um conflito. Estes dois aspetos combinam-se para criar um sistema transacional que fornece elevado débito em cenários de alta contenção. O nosso trabalho foi desenvolvido sobe um protótipo anterior de LSD. Identificamos e diagnosticamos as suas deficiências e limitações, e concebemos e implementamos um novo protótipo que as endereçou. Validamos o novo sistema LSD e avaliamos o seu comportamento e desempenho comparando e contrastando com o protótipo original. A nossa avaliação mostrou que o débito do novo protótipo LSD é de 3,7× a 4,9× maior, em configurações centralizadas e distribuídas, respetivamente, além de reduzir a latência até 10 vezes. Com este trabalho, disponibilizamos um sistema de base de dados de Chave-Valor baseado em LSD que possui melhor escalabilidade vertical e horizontal, fazendo melhor uso de sistemas com múltiplos cores ou com elevado número de nós
    • …
    corecore