1,489 research outputs found

    Static Partitioning of Spreadsheets for Parallel Execution

    Get PDF

    PiCo: A Domain-Specific Language for Data Analytics Pipelines

    Get PDF
    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world

    Low power and high performance heterogeneous computing on FPGAs

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    THE COLUMBUS GROUND SEGMENT – A PRECURSOR FOR FUTURE MANNED MISSIONS

    Get PDF
    In the beginning the space programs were self standing national activities, often in competition to other nations. Today space flight becomes more and more an international task. Complex space mission and deep space explorations are not longer to be stemmed by one agency or nation alone but are joint activities of several nations. The best example for such a joint (ad-) venture at the moment is the International Space Station ISS. Such international activities define complete new requirements for the supporting ground segments. The world-wide distribution of a ground segment is not any longer limited to a network of ground stations with the aim to provide a good coverage of the space craft. The coverage is sometimes – like for the ISSanyway ensured by using a relay satellite system instead. In addition to the enhanced down- and uplink methods a ground segment is aimed to connect the different centres of competence of all participating agencies/nations. From the space craft operations point of view such transnational ground segments are required to support distributed and shared operations in a predefined decision/commanding hierarchy. This has to be taken into account in the technical topology as well as for the operational set-up and teaming. Last not least increases the duration of missions, which requires a certain flexibility of the ground segment and long-term maintenance strategies for the ground segment with a special emphasis on nonintrusive replacements. The Russian space station MIR has been in the orbit for about 15 years, the ISS is currently targeted for 2020, to be for over 20 years in space

    The VINEYARD Approach: Versatile, Integrated, Accelerator-Based, Heterogeneous Data Centres.

    Get PDF
    Emerging web applications like cloud computing, Big Data and social networks have created the need for powerful centres hosting hundreds of thousands of servers. Currently, the data centres are based on general purpose processors that provide high flexibility buts lack the energy efficiency of customized accelerators. VINEYARD aims to develop an integrated platform for energy-efficient data centres based on new servers with novel, coarse-grain and fine-grain, programmable hardware accelerators. It will, also, build a high-level programming framework for allowing end-users to seamlessly utilize these accelerators in heterogeneous computing systems by employing typical data-centre programming frameworks (e.g. MapReduce, Storm, Spark, etc.). This programming framework will, further, allow the hardware accelerators to be swapped in and out of the heterogeneous infrastructure so as to offer high flexibility and energy efficiency. VINEYARD will foster the expansion of the soft-IP core industry, currently limited in the embedded systems, to the data-centre market. VINEYARD plans to demonstrate the advantages of its approach in three real use-cases (a) a bio-informatics application for high-accuracy brain modeling, (b) two critical financial applications, and (c) a big-data analysis application

    PiCo: High-performance data analytics pipelines in modern C++

    Get PDF
    In this paper, we present a new C++ API with a fluent interface called PiCo (Pipeline Composition). PiCo’s programming model aims at making easier the programming of data analytics applications while preserving or enhancing their performance. This is attained through three key design choices: 1) unifying batch and stream data access models, 2) decoupling processing from data layout, and 3) exploiting a stream-oriented, scalable, efficient C++11 runtime system. PiCo proposes a programming model based on pipelines and operators that are polymorphic with respect to data types in the sense that it is possible to reuse the same algorithms and pipelines on different data models (e.g., streams, lists, sets, etc.). Preliminary results show that PiCo, when compared to Spark and Flink, can attain better performances in terms of execution times and can hugely improve memory utilization, both for batch and stream processing.Author's copy (postprint) of C. Misale, M. Drocco, G. Tremblay, A.R. Martinelli, M. Aldinucci, PiCo: High-performance data analytics pipelines in modern C++, Future Generation Computer Systems (2018), https://doi.org/10.1016/j.future.2018.05.03
    • …
    corecore