9 research outputs found

    Predicting large scale fine grain energy consumption

    Get PDF
    Today a large volume of energy-related data have been continuously collected. Extracting actionable knowledge from such data is a multi-step process that opens up a variety of interesting and novel research issues across two domains: energy and computer science. The computer science aim is to provide energy scientists with cutting-edge and scalable engines to effectively support them in their daily research activities. This paper presents SPEC, a scalable and distributed predictor of fine grain energy consumption in buildings. SPEC exploits a data stream methodology analysis over a sliding time window to train a prediction model tailored to each building. The building model is then exploited to predict the upcoming energy consumption at a time instant in the near future. SPEC currently integrates the artificial neural networks technique and the random forest regression algorithm. The SPEC methodology exploits the computational advantages of distributed computing frameworks as the current implementation runs on Spark. As a case study, real data of thermal energy consumption collected in a major city have been exploited to preliminarily assess the SPEC accuracy. The initial results are promising and represent a first step towards predicting fine grain energy consumption over a sliding time window

    PyRQA -- Conducting Recurrence Quantification Analysis on Very Long Time Series Efficiently

    Full text link
    PyRQA is a software package that efficiently conducts recurrence quantification analysis (RQA) on time series consisting of more than one million data points. RQA is a method from non-linear time series analysis that quantifies the recurrent behaviour of systems. Existing implementations to RQA are not capable of analysing such very long time series at all or require large amounts of time to calculate the quantitative measures. PyRQA overcomes their limitations by conducting the RQA computations in a highly parallel manner. Building on the OpenCL framework, PyRQA leverages the computing capabilities of a variety of parallel hardware architectures, such as GPUs. The underlying computing approach partitions the RQA computations and enables to employ multiple compute devices at the same time. The goal of this publication is to demonstrate the features and the runtime efficiency of PyRQA. For this purpose we employ a real-world example, comparing the dynamics of two climatological time series, and a synthetic example, reducing the runtime regarding the analysis of a series consisting of over one million data points from almost eight hours using state-of-the-art RQA software to roughly 69 seconds using PyRQA.Comment: 15 pages, 3 figure

    Finite Automata Algorithms in Map-Reduce

    Get PDF
    In this thesis the intersection of several large nondeterministic finite automata (NFA's) as well as minimization of a large deterministic finite automaton (DFA) in map-reduce are studied. We have derived a lower bound on replication rate for computing NFA intersections and provided three concrete algorithms for the problem. Our investigation of the replication rate for each of all three algorithms shows where each algorithm could be applied through detailed experiments on large datasets of finite automata. Denoting n the number of states in DFA A, we propose an algorithm to minimize A in n map-reduce rounds in the worst-case. Our experiments, however, indicate that the number of rounds, in practice, is much smaller than n for all DFA's we examined. In other words, this algorithm converges in d iterations by computing the equivalence classes of each state, where d is the diameter of the input DFA

    Flexible query processing of SPARQL queries

    Get PDF
    SPARQL is the predominant language for querying RDF data, which is the standard model for representing web data and more specifically Linked Open Data (a collection of heterogeneous connected data). Datasets in RDF form can be hard to query by a user if she does not have a full knowledge of the structure of the dataset. Moreover, many datasets in Linked Data are often extracted from actual web page content which might lead to incomplete or inaccurate data. We extend SPARQL 1.1 with two operators, APPROX and RELAX, previously introduced in the context of regular path queries. Using these operators we are able to support exible querying over the property path queries of SPARQL 1.1. We call this new language SPARQLAR. Using SPARQLAR users are able to query RDF data without fully knowing the structure of a dataset. APPROX and RELAX encapsulate different aspects of query flexibility: finding different answers and finding more answers, respectively. This means that users can access complex and heterogeneous datasets without the need to know precisely how the data is structured. One of the open problems we address is how to combine the APPROX and RELAX operators with a pragmatic language such as SPARQL. We also devise an implementation of a system that evaluates SPARQLAR queries in order to study the performance of the new language. We begin by defining the semantics of SPARQLAR and the complexity of query evaluation. We then present a query processing technique for evaluating SPARQLAR queries based on a rewriting algorithm and prove its soundness and completeness. During the evaluation of a SPARQLAR query we generate multiple SPARQL 1.1 queries that are evaluated against the dataset. Each such query will generate answers with a cost that indicates their distance with respect to the exact form of the original SPARQLAR query. Our prototype implementation incorporates three optimisation techniques that aim to enhance query execution performance: the first optimisation is a pre-computation technique that caches the answers of parts of the queries generated by the rewriting algorithm. These answers will then be reused to avoid the re-execution of those sub-queries. The second optimisation utilises a summary of the dataset to discard queries that it is known will not return any answer. The third optimisation technique uses the query containment concept to discard queries whose answers would be returned by another query at the same or lower cost. We conclude by conducting a performance study of the system on three different RDF datasets: LUBM (Lehigh University Benchmark), YAGO and DBpedia
    corecore