27,082 research outputs found

    On-Demand Big Data Integration: A Hybrid ETL Approach for Reproducible Scientific Research

    Full text link
    Scientific research requires access, analysis, and sharing of data that is distributed across various heterogeneous data sources at the scale of the Internet. An eager ETL process constructs an integrated data repository as its first step, integrating and loading data in its entirety from the data sources. The bootstrapping of this process is not efficient for scientific research that requires access to data from very large and typically numerous distributed data sources. a lazy ETL process loads only the metadata, but still eagerly. Lazy ETL is faster in bootstrapping. However, queries on the integrated data repository of eager ETL perform faster, due to the availability of the entire data beforehand. In this paper, we propose a novel ETL approach for scientific data integration, as a hybrid of eager and lazy ETL approaches, and applied both to data as well as metadata. This way, Hybrid ETL supports incremental integration and loading of metadata and data from the data sources. We incorporate a human-in-the-loop approach, to enhance the hybrid ETL, with selective data integration driven by the user queries and sharing of integrated data between users. We implement our hybrid ETL approach in a prototype platform, Obidos, and evaluate it in the context of data sharing for medical research. Obidos outperforms both the eager ETL and lazy ETL approaches, for scientific research data integration and sharing, through its selective loading of data and metadata, while storing the integrated data in a scalable integrated data repository.Comment: Pre-print Submitted to the DMAH Special Issue of the Springer DAPD Journa

    Weaving Rules into [email protected] for Embedded Smart Systems

    Get PDF
    Smart systems are characterised by their ability to analyse measured data in live and to react to changes according to expert rules. Therefore, such systems exploit appropriate data models together with actions, triggered by domain-related conditions. The challenge at hand is that smart systems usually need to process thousands of updates to detect which rules need to be triggered, often even on restricted hardware like a Raspberry Pi. Despite various approaches have been investigated to efficiently check conditions on data models, they either assume to fit into main memory or rely on high latency persistence storage systems that severely damage the reactivity of smart systems. To tackle this challenge, we propose a novel composition process, which weaves executable rules into a data model with lazy loading abilities. We quantitatively show, on a smart building case study, that our approach can handle, at low latency, big sets of rules on top of large-scale data models on restricted hardware.Comment: pre-print version, published in the proceedings of MOMO-17 Worksho

    Caching and Distributing Statistical Analyses in R

    Get PDF
    We present the cacher package for R, which provides tools for caching statistical analyses and for distributing these analyses to others in an efficient manner. The cacher package takes objects created by evaluating R expressions and stores them in key-value databases. These databases of cached objects can subsequently be assembled into packages for distribution over the web. The cacher package also provides tools to help readers examine the data and code in a statistical analysis and reproduce, modify, or improve upon the results. In addition, readers can easily conduct alternate analyses of the data. We describe the design and implementation of the cacher package and provide two examples of how the package can be used for reproducible research.

    The Impact of Vein Mechanical Compliance on Arteriovenous Fistula Outcomes

    Get PDF
    © 2016 Elsevier Inc. Background Arteriovenous fistulae (AVFs) are the preferred access for hemodialysis but suffer a high early failure rate. The aim of this study was to determine how venous distensibility, as measured in vitro, relates to early outcomes of AVF formed with the sampled vein. Methods Ethical approval was obtained for all aspects of this study. During AVF formation a circumferential segment of the target vein was sampled. Mechanical stress testing of the venous segments was undertaken using a dynamic mechanical analyzer, with progressive stress loading at 2 N/min to a maximum of 10 N or until sample disruption. Stress-strain curves were obtained for vein samples and Young's modulus (YM) calculated. Duplex assessment of the fistulae was undertaken at 30 days. Results Thirty patients consented to participate with 29 samples obtained for analysis. Statistical comparison of YM demonstrated no relationship with common cardiovascular risk factors or dialysis status. Subject age greater than 65 was the only patient factor which showed a significant difference in YM (P = 0.05). Furthermore, a negative correlation was confirmed between age and YM (Pearson's r = -0.465, P < 0.05). Nine of the 29 subjects suffered an early AVF failure. Mann-Whitney U testing for differences in distribution reported that YM was significantly higher in those fistulas which failed (P < 0.005). Conclusions Reduced venous compliance appears to result in higher failure rates of AVFs. With the advancement of clinical tools such as speckle tracing ultrasound identification of vessel compliance in vivo may produce valuable additional information for clinicians planning AVF surgery

    Aerobrake assembly with minimum Space Station accommodation

    Get PDF
    The minimum Space Station Freedom accommodations required for initial assembly, repair, and refurbishment of the Lunar aerobrake were investigated. Baseline Space Station Freedom support services were assumed, as well as reasonable earth-to-orbit possibilities. A set of three aerobrake configurations representative of the major themes in aerobraking were developed. Structural assembly concepts, along with on-orbit assembly and refurbishment scenarios were created. The scenarios were exercised to identify required Space Station Freedom accommodations. Finally, important areas for follow-on study were also identified

    An overview of the ciao multiparadigm language and program development environment and its design philosophy

    Full text link
    We describe some of the novel aspects and motivations behind the design and implementation of the Ciao multiparadigm programming system. An important aspect of Ciao is that it provides the programmer with a large number of useful features from different programming paradigms and styles, and that the use of each of these features can be turned on and off at will for each program module. Thus, a given module may be using e.g. higher order functions and constraints, while another module may be using objects, predicates, and concurrency. Furthermore, the language is designed to be extensible in a simple and modular way. Another important aspect of Ciao is its programming environment, which provides a powerful preprocessor (with an associated assertion language) capable of statically finding non-trivial bugs, verifying that programs comply with specifications, and performing many types of program optimizations. Such optimizations produce code that is highly competitive with other dynamic languages or, when the highest levéis of optimization are used, even that of static languages, all while retaining the interactive development environment of a dynamic language. The environment also includes a powerful auto-documenter. The paper provides an informal overview of the language and program development environment. It aims at illustrating the design philosophy rather than at being exhaustive, which would be impossible in the format of a paper, pointing instead to the existing literature on the system
    corecore