4,017 research outputs found

    ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments

    Get PDF
    This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version

    Designing Traceability into Big Data Systems

    Full text link
    Providing an appropriate level of accessibility and traceability to data or process elements (so-called Items) in large volumes of data, often Cloud-resident, is an essential requirement in the Big Data era. Enterprise-wide data systems need to be designed from the outset to support usage of such Items across the spectrum of business use rather than from any specific application view. The design philosophy advocated in this paper is to drive the design process using a so-called description-driven approach which enriches models with meta-data and description and focuses the design process on Item re-use, thereby promoting traceability. Details are given of the description-driven design of big data systems at CERN, in health informatics and in business process management. Evidence is presented that the approach leads to design simplicity and consequent ease of management thanks to loose typing and the adoption of a unified approach to Item management and usage.Comment: 10 pages; 6 figures in Proceedings of the 5th Annual International Conference on ICT: Big Data, Cloud and Security (ICT-BDCS 2015), Singapore July 2015. arXiv admin note: text overlap with arXiv:1402.5764, arXiv:1402.575

    Adaptive Process Distribution at the Edge of IoT using the Integration of BPMS and Containerization

    Get PDF
    TĂ€na levivad pilvepĂ”hised vĂ€rkvĂ”rgu (asjade interneti) sĂŒsteemid tuginevad protsesside halduseks kaugel asuvatel andmekeskustel, mis toob endaga kaasa latentsusprobleeme. Vastusena sellele probleemile on varem vĂ€lja pakutud servaarvutuse lĂ€henemine, kus arvutused viiakse lĂ€bi asjade interneti sĂŒsteemi vĂ”rgule fĂŒĂŒsiliselt lĂ€hemal. Mitmete servaarvutuse metoodikate seas on uduarvutus lĂ€henemine, kus rĂ”hk on arvutuste liigutamisel vĂ€rkvĂ”rgu seadmetele endile. Ehkki uduarvutusel pĂ”hinev arhitektuur on paljutĂ”otav, tĂ”statab see kĂŒsimuse – kuidas vĂ€rkvĂ”rgu protsessihaldussĂŒsteemid (BPMS4IoT-sĂŒsteemid) Ă€riprotsesse heterogeensetele vĂ€rkvĂ”rgu seadmetele jaotama peaksid? Levinud on lĂ€henemine, kus protsesside töövooĂŒlesannete kĂ€ituseks tuginetakse ĂŒhisele platvormile. NĂ€iteks, kui haldusserver defineerib teatud töövoo ĂŒlesandena Pythoni skripti ja mÀÀrab selle seadmele, siis peab seadme töövookĂ€itusmootor toetama vastavat mehhanismi skriptide jooksutamiseks. Selline nĂ”ue ei ole paindlik, arvestades vĂ€rkvĂ”rgu seadmete heterogeensust. KĂ€esolevas magistritöös pakub autor vĂ€lja raamistiku, mis eraldab töövoo ĂŒlesannete kĂ€itusmeetodi kĂ€itusmootorist kasutades selleks konteinertehnoloogiat. Töö kĂ€igus arendati vĂ€lja raamistiku prototĂŒĂŒp ning viidi lĂ€bi katseid mikroarvutitel pĂ”hinevail seadmetel. Lisaks vĂ”rreldi vĂ€ljapakutud uduarvutuse raamistiku jĂ”udlust pilvearvutusel pĂ”hineva sĂŒsteemiga.Emerging cloud-centric Internet of Things (IoT) system relies on distant data centers to manage the entire processes, which raises the issue of latency. To address the issue, researchers have introduced the Edge computing methodologies that carry out computation closer to the edge network of IoT system. Among the numerous Edge computing approaches, Mist computing paradigm emphasises the mechanism that moves the computation further to the front-end IoT devices. Although the architecture of Mist computing is promising, it raises a new challenge in how the Business Process Management System for IoT (BPMS4IoT) distributes the business process workflow to the heterogeneous IoT devices? In general, executing business process workflows relies on the common platform for executing customized tasks. For example, if the management server defines a Python script task in a workflow, which has been allocated to an IoT device, the workflow engine of the IoT device must have the compatible execution method. Such a requirement is less flexible when one considers the heterogeneity of the IoT devices. Therefore, in this thesis, the author proposes a framework to decouple the workflow task execution method from the workflow engines using the containerization technology. A proof-of-concept prototype has been developed and has been tested on several single-board computers-based IoT devices. Further, a case study has been performed to demonstrate the performance of the proposed framework comparing to the cloud-centric system

    Transparent Orchestration of Task-based Parallel Applications in Containers Platforms

    Get PDF
    This paper presents a framework to easily build and execute parallel applications in container-based distributed computing platforms in a user-transparent way. The proposed framework is a combination of the COMP Superscalar (COMPSs) programming model and runtime, which provides a straightforward way to develop task-based parallel applications from sequential codes, and containers management platforms that ease the deployment of applications in computing environments (as Docker, Mesos or Singularity). This framework provides scientists and developers with an easy way to implement parallel distributed applications and deploy them in a one-click fashion. We have built a prototype which integrates COMPSs with different containers engines in different scenarios: i) a Docker cluster, ii) a Mesos cluster, and iii) Singularity in an HPC cluster. We have evaluated the overhead in the building phase, deployment and execution of two benchmark applications compared to a Cloud testbed based on KVM and OpenStack and to the usage of bare metal nodes. We have observed an important gain in comparison to cloud environments during the building and deployment phases. This enables better adaptation of resources with respect to the computational load. In contrast, we detected an extra overhead during the execution, which is mainly due to the multi-host Docker networking.This work is partly supported by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316 project, by the Generalitat de Catalunya under contracts 2014-SGR-1051 and 2014-SGR-1272, and by the European Union through the Horizon 2020 research and innovation program under grant 690116 (EUBra-BIGSEA Project). Results presented in this paper were obtained using the Chameleon testbed supported by the National Science Foundation.Peer ReviewedPostprint (author's final draft

    The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web

    Get PDF
    Research in life sciences is increasingly being conducted in a digital and online environment. In particular, life scientists have been pioneers in embracing new computational tools to conduct their investigations. To support the sharing of digital objects produced during such research investigations, we have witnessed in the last few years the emergence of specialized repositories, e.g., DataVerse and FigShare. Such repositories provide users with the means to share and publish datasets that were used or generated in research investigations. While these repositories have proven their usefulness, interpreting and reusing evidence for most research results is a challenging task. Additional contextual descriptions are needed to understand how those results were generated and/or the circumstances under which they were concluded. Because of this, scientists are calling for models that go beyond the publication of datasets to systematically capture the life cycle of scientific investigations and provide a single entry point to access the information about the hypothesis investigated, the datasets used, the experiments carried out, the results of the experiments, the people involved in the research, etc. In this paper we present the Research Object (RO) suite of ontologies, which provide a structured container to encapsulate research data and methods along with essential metadata descriptions. Research Objects are portable units that enable the sharing, preservation, interpretation and reuse of research investigation results. The ontologies we present have been designed in the light of requirements that we gathered from life scientists. They have been built upon existing popular vocabularies to facilitate interoperability. Furthermore, we have developed tools to support the creation and sharing of Research Objects, thereby promoting and facilitating their adoption.Comment: 20 page
    • 

    corecore