Search CORE

4,017 research outputs found

ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments

Author: Berral García Josep Lluís
Call Aaron
Carrera Pérez David
Green Daron
Poggi Mastrokalo Nicolas
Reinauer Rob
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Designing Traceability into Big Data Systems

Author: Branson Andrew
Consortium the CRISTAL-ISE
Kovacs Zsolt
McClatchey Richard
Shamdasani Jetendr
Publication venue
Publication date: 01/01/2015
Field of study

Providing an appropriate level of accessibility and traceability to data or process elements (so-called Items) in large volumes of data, often Cloud-resident, is an essential requirement in the Big Data era. Enterprise-wide data systems need to be designed from the outset to support usage of such Items across the spectrum of business use rather than from any specific application view. The design philosophy advocated in this paper is to drive the design process using a so-called description-driven approach which enriches models with meta-data and description and focuses the design process on Item re-use, thereby promoting traceability. Details are given of the description-driven design of big data systems at CERN, in health informatics and in business process management. Evidence is presented that the approach leads to design simplicity and consequent ease of management thanks to loose typing and the adoption of a unified approach to Item management and usage.Comment: 10 pages; 6 figures in Proceedings of the 5th Annual International Conference on ICT: Big Data, Cloud and Security (ICT-BDCS 2015), Singapore July 2015. arXiv admin note: text overlap with arXiv:1402.5764, arXiv:1402.575

arXiv.org e-Print Archive

CERN Document Server

Adaptive Process Distribution at the Edge of IoT using the Integration of BPMS and Containerization

Author: Agaba Isaac
Publication venue
Publication date: 01/01/2017
Field of study

Täna levivad pilvepõhised värkvõrgu (asjade interneti) süsteemid tuginevad protsesside halduseks kaugel asuvatel andmekeskustel, mis toob endaga kaasa latentsusprobleeme. Vastusena sellele probleemile on varem välja pakutud servaarvutuse lähenemine, kus arvutused viiakse läbi asjade interneti süsteemi võrgule füüsiliselt lähemal. Mitmete servaarvutuse metoodikate seas on uduarvutus lähenemine, kus rõhk on arvutuste liigutamisel värkvõrgu seadmetele endile. Ehkki uduarvutusel põhinev arhitektuur on paljutõotav, tõstatab see küsimuse – kuidas värkvõrgu protsessihaldussüsteemid (BPMS4IoT-süsteemid) äriprotsesse heterogeensetele värkvõrgu seadmetele jaotama peaksid? Levinud on lähenemine, kus protsesside töövooülesannete käituseks tuginetakse ühisele platvormile. Näiteks, kui haldusserver defineerib teatud töövoo ülesandena Pythoni skripti ja määrab selle seadmele, siis peab seadme töövookäitusmootor toetama vastavat mehhanismi skriptide jooksutamiseks. Selline nõue ei ole paindlik, arvestades värkvõrgu seadmete heterogeensust. Käesolevas magistritöös pakub autor välja raamistiku, mis eraldab töövoo ülesannete käitusmeetodi käitusmootorist kasutades selleks konteinertehnoloogiat. Töö käigus arendati välja raamistiku prototüüp ning viidi läbi katseid mikroarvutitel põhinevail seadmetel. Lisaks võrreldi väljapakutud uduarvutuse raamistiku jõudlust pilvearvutusel põhineva süsteemiga.Emerging cloud-centric Internet of Things (IoT) system relies on distant data centers to manage the entire processes, which raises the issue of latency. To address the issue, researchers have introduced the Edge computing methodologies that carry out computation closer to the edge network of IoT system. Among the numerous Edge computing approaches, Mist computing paradigm emphasises the mechanism that moves the computation further to the front-end IoT devices. Although the architecture of Mist computing is promising, it raises a new challenge in how the Business Process Management System for IoT (BPMS4IoT) distributes the business process workflow to the heterogeneous IoT devices? In general, executing business process workflows relies on the common platform for executing customized tasks. For example, if the management server defines a Python script task in a workflow, which has been allocated to an IoT device, the workflow engine of the IoT device must have the compatible execution method. Such a requirement is less flexible when one considers the heterogeneity of the IoT devices. Therefore, in this thesis, the author proposes a framework to decouple the workflow task execution method from the workflow engines using the containerization technology. A proof-of-concept prototype has been developed and has been tested on several single-board computers-based IoT devices. Further, a case study has been performed to demonstrate the performance of the proposed framework comparing to the cloud-centric system

DSpace at Tartu University Library

Transparent Orchestration of Task-based Parallel Applications in Containers Platforms

Author: Badia Sala Rosa Maria
Ejarque Jorge
Lezzi Daniele
Ramón Cortés Cristian
Serven Albert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

This paper presents a framework to easily build and execute parallel applications in container-based distributed computing platforms in a user-transparent way. The proposed framework is a combination of the COMP Superscalar (COMPSs) programming model and runtime, which provides a straightforward way to develop task-based parallel applications from sequential codes, and containers management platforms that ease the deployment of applications in computing environments (as Docker, Mesos or Singularity). This framework provides scientists and developers with an easy way to implement parallel distributed applications and deploy them in a one-click fashion. We have built a prototype which integrates COMPSs with different containers engines in different scenarios: i) a Docker cluster, ii) a Mesos cluster, and iii) Singularity in an HPC cluster. We have evaluated the overhead in the building phase, deployment and execution of two benchmark applications compared to a Cloud testbed based on KVM and OpenStack and to the usage of bare metal nodes. We have observed an important gain in comparison to cloud environments during the building and deployment phases. This enables better adaptation of resources with respect to the computational load. In contrast, we detected an extra overhead during the execution, which is mainly due to the multi-host Docker networking.This work is partly supported by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316 project, by the Generalitat de Catalunya under contracts 2014-SGR-1051 and 2014-SGR-1272, and by the European Union through the Horizon 2020 research and innovation program under grant 690116 (EUBra-BIGSEA Project). Results presented in this paper were obtained using the Chameleon testbed supported by the National Science Foundation.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web

Author: Bechhofer Sean
Belhajjame Khalid
Corcho Óscar
Garijo Daniel
Goble Carole
Gómez-Pérez José-Manuel
Hettne Kristina
Klyne Graham
Palma Raul
Zhao Jun
Publication venue
Publication date: 03/02/2014
Field of study

Research in life sciences is increasingly being conducted in a digital and online environment. In particular, life scientists have been pioneers in embracing new computational tools to conduct their investigations. To support the sharing of digital objects produced during such research investigations, we have witnessed in the last few years the emergence of specialized repositories, e.g., DataVerse and FigShare. Such repositories provide users with the means to share and publish datasets that were used or generated in research investigations. While these repositories have proven their usefulness, interpreting and reusing evidence for most research results is a challenging task. Additional contextual descriptions are needed to understand how those results were generated and/or the circumstances under which they were concluded. Because of this, scientists are calling for models that go beyond the publication of datasets to systematically capture the life cycle of scientific investigations and provide a single entry point to access the information about the hypothesis investigated, the datasets used, the experiments carried out, the results of the experiments, the people involved in the research, etc. In this paper we present the Research Object (RO) suite of ontologies, which provide a structured container to encapsulate research data and methods along with essential metadata descriptions. Research Objects are portable units that enable the sharing, preservation, interpretation and reuse of research investigation results. The ontologies we present have been designed in the light of requirements that we gathered from life scientists. They have been built upon existing popular vocabularies to facilitate interoperability. Furthermore, we have developed tools to support the creation and sharing of Research Objects, thereby promoting and facilitating their adoption.Comment: 20 page

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository