733 research outputs found

    PiCo: A Domain-Specific Language for Data Analytics Pipelines

    Get PDF
    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world

    A Comparison of Big Data Frameworks on a Layered Dataflow Model

    Get PDF
    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.Comment: 19 pages, 6 figures, 2 tables, In Proc. of the 9th Intl Symposium on High-Level Parallel Programming and Applications (HLPP), July 4-5 2016, Muenster, German

    Languages for Big Data analysis

    Get PDF

    Dexamethasone Treatment Effects on H3k27me3 Chromatin Organization Is Related to NK Cell Immune Dysregulation

    Get PDF
    It is well-established that psychological stress reduces natural killer (NK) cell immune function. This reduction is mediated by stress-induced release of glucocorticoids (GC), which can suppress immune function. Associated with suppression of a particular immune function are GC induced histone-epigenetic marks. Histone-epigenetic marks are responsible for the organization and compartmentalization of genomes into transcriptionally active euchromatin domains that are localized to the interior of the nucleus. Transcriptionally silent heterochromatic domains are enriched with methylated epigenetic marks and are localized to the nuclear periphery. The purpose of this investigation was to assess the influence of GC on H3K27me3 chromatin organization by measurement of that repressive epigenetic mark. As well as the relationship of H3K27me3 chromatin organization to NK cell effector function, i.e. interferon (IFN) gamma production, was determined. IFN gamma was selected because it is the prototypic cytokine produced by NK cells and is known to modulate both innate and adaptive immunity. GC treatment of human peripheral blood mononuclear cells significantly reduced IFN gamma production. GC treatment produced a distinct NK cell H3K27me3 chromatin organization phenotype. This phenotype was localization of the histone post-translational epigenetic mark, H3K27me3, to the nuclear periphery and was directly related to the reduced production of IFN gamma by NK cells. This nuclear phenotype was determined by direct visual inspection and by use of an automated, high through-put technology, the Amnis ImageStream. This technology combines the per-cell information content provided by standard microscopy with the statistical significance afforded by large sample sizes common to standard flow cytometry. Most importantly, this technology provided for direct assessment of the localization of H3K27me3 within individual nuclei. These results demonstrate GC to reduce NK cell function at least in part through altered H3K27me3 nuclear organization and suggests that H3K27me3 chromatin organization may be a predictive measure of GC induced immune dysregulation in NK cells

    Experiments on parallel connected loops in single phase natural circutation: preliminary results

    Get PDF
    Natural circulation is the most important heat removal mechanism for passive protection systems in a lot of industrial applications, such as nuclear power plants, solar energy systems, reboilers and cooling of electronic systems. The aim of the present work is to investigate the flow and heat transfer characteristics of parallel loops, connected in the lower heated sections, in single-phase natural circulation. The test facility was composed by 2 vertical circuits connected in parallel; each of them was rectangular in geometry, the aspect ratio (defined as the height to width ratio) was 1.63, with circular copper tube of 4 mm (I.D.). An upper cold heat exchanger provided the heat sink, while the heat source at the bottom was a power supply system. Several calibrated thermocouples (T-type) placed in the fluid along the vertical tubes allowed the evaluation of the hot and cold legs average fluid temperature differences. Tests were carried out imposing 3 different heat sink temperatures (10, 20, 30\ub0C); for each of these temperatures the power supply at the lower heater was increased from 20 to 90 W. The fluid investigated was distillate water. The experimental results have been analysed in terms of thermal performance of the single or connected loops. Collected data have also been compared with Vijaiyan\u2019s correlation

    PiCo: a Novel Approach to Stream Data Analytics

    Get PDF
    In this paper, we present a new C++ API with a fluent interface called PiCo (Pipeline Composition). PiCo’s programming model aims at making easier the programming of data analytics applications while preserving or enhancing their performance. This is attained through three key design choices: 1) unifying batch and stream data access models, 2) decoupling processing from data layout, and 3) exploiting a stream-oriented, scalable, efficient C++11 runtime system. PiCo proposes a programming model based on pipelines and operators that are polymorphic with respect to data types in the sense that it is possible to re-use the same algorithms and pipelines on different data models (e.g., streams, lists, sets, etc.). Preliminary results show that PiCo can attain better performances in terms of execution times and hugely improve memory utilization when compared to Spark and Flink in both batch and stream processing.Author's copy (postprint) of C. Misale, M. Drocco, G. Tremblay, and M. Aldinucci, "PiCo: a Novel Approach to Stream Data Analytics," in Proc. of Euro-Par Workshops: 1st Intl. Workshop on Autonomic Solutions for Parallel and Distributed Data Stream Processing (Auto-DaSP 2017), Santiago de Compostela, Spain, 2018. doi:10.1007/978-3-319-75178-8_1

    Assessment of a 2D CFD model for a single phase natural circulation loop

    Get PDF
    The use of passive safety systems are more and more diffused in many technological fields. Natural circulation is probably one of the main phenomenon applied in this kind of systems: indeed, as known, by means of gravity and buoyancy forces, the fluids can circulate without any external power sources. In this paper a preliminary analysis (also by comparisons between experimental tests and numerical simulations) of a natural circulation based loop (namely a natural circulation based facility installed at University of Genova) is presented. Starting from some experimental results, the data deriving from CFD loop simulations (both in steady and in unsteady conditions) are used for a first preliminary validation, mainly in order to have a computational tool reliable and able to computationally simulate motion inversions related phenomena. The physical inversions phenomena are very well reproduced also by the a simplified numerical 2D model of the loop, and the physical considerations related to the temperature and velocity fluctuations during the transient simulations, are in agreement with the well-known observations formulated by Welander on the basis of a simple point source analysis scheme
    • …
    corecore