Search CORE

173 research outputs found

Contribution à la convergence d'infrastructure entre le calcul haute performance et le traitement de données à large échelle

Author: Mercier Michael
Publication venue: HAL CCSD
Publication date: 01/07/2019
Field of study

The amount of produced data, either in the scientific community or the commercialworld, is constantly growing. The field of Big Data has emerged to handle largeamounts of data on distributed computing infrastructures. High-Performance Computing (HPC) infrastructures are traditionally used for the execution of computeintensive workloads. However, the HPC community is also facing an increasingneed to process large amounts of data derived from high definition sensors andlarge physics apparati. The convergence of the two fields -HPC and Big Data- iscurrently taking place. In fact, the HPC community already uses Big Data tools,which are not always integrated correctly, especially at the level of the file systemand the Resource and Job Management System (RJMS).In order to understand how we can leverage HPC clusters for Big Data usage, andwhat are the challenges for the HPC infrastructures, we have studied multipleaspects of the convergence: We initially provide a survey on the software provisioning methods, with a focus on data-intensive applications. We contribute a newRJMS collaboration technique called BeBiDa which is based on 50 lines of codewhereas similar solutions use at least 1000 times more. We evaluate this mechanism on real conditions and in simulated environment with our simulator Batsim.Furthermore, we provide extensions to Batsim to support I/O, and showcase thedevelopments of a generic file system model along with a Big Data applicationmodel. This allows us to complement BeBiDa real conditions experiments withsimulations while enabling us to study file system dimensioning and trade-offs.All the experiments and analysis of this work have been done with reproducibilityin mind. Based on this experience, we propose to integrate the developmentworkflow and data analysis in the reproducibility mindset, and give feedback onour experiences with a list of best practices.RésuméLa quantité de données produites, que ce soit dans la communauté scientifiqueou commerciale, est en croissance constante. Le domaine du Big Data a émergéface au traitement de grandes quantités de données sur les infrastructures informatiques distribuées. Les infrastructures de calcul haute performance (HPC) sont traditionnellement utilisées pour l’exécution de charges de travail intensives en calcul. Cependant, la communauté HPC fait également face à un nombre croissant debesoin de traitement de grandes quantités de données dérivées de capteurs hautedéfinition et de grands appareils physique. La convergence des deux domaines-HPC et Big Data- est en cours. En fait, la communauté HPC utilise déjà des outilsBig Data, qui ne sont pas toujours correctement intégrés, en particulier au niveaudu système de fichiers ainsi que du système de gestion des ressources (RJMS).Afin de comprendre comment nous pouvons tirer parti des clusters HPC pourl’utilisation du Big Data, et quels sont les défis pour les infrastructures HPC, nousavons étudié plusieurs aspects de la convergence: nous avons d’abord proposé uneétude sur les méthodes de provisionnement logiciel, en mettant l’accent sur lesapplications utilisant beaucoup de données. Nous contribuons a l’état de l’art avecune nouvelle technique de collaboration entre RJMS appelée BeBiDa basée sur 50lignes de code alors que des solutions similaires en utilisent au moins 1000 fois plus.Nous évaluons ce mécanisme en conditions réelles et en environnement simuléavec notre simulateur Batsim. En outre, nous fournissons des extensions à Batsimpour prendre en charge les entrées/sorties et présentons le développements d’unmodèle de système de fichiers générique accompagné d’un modèle d’applicationBig Data. Cela nous permet de compléter les expériences en conditions réellesde BeBiDa en simulation tout en étudiant le dimensionnement et les différentscompromis autours des systèmes de fichiers.Toutes les expériences et analyses de ce travail ont été effectuées avec la reproductibilité à l’esprit. Sur la base de cette expérience, nous proposons d’intégrerle flux de travail du développement et de l’analyse des données dans l’esprit dela reproductibilité, et de donner un retour sur nos expériences avec une liste debonnes pratiques

Behavior life style analysis for mobile sensory data in cloud computing through MapReduce

Author: Bao
Bryan Scotney
Chris Nugent
Cleland
Duncan
Eddy
Gerard Parr
Han
Han
Jae Bang
Kwapisz
Lewis
Manhyung Han
Muhammad Ahmed
Muhammad Amin
Paniagua
Sally McClean
Shujaat Hussain
Sirin
Sungyoung Lee
Wielemaker
Publication venue: 'MDPI AG'
Publication date: 01/11/2014
Field of study

Cloud computing has revolutionized healthcare in today's world as it can be seamlessly integrated into a mobile application and sensor devices. The sensory data is then transferred from these devices to the public and private clouds. In this paper, a hybrid and distributed environment is built which is capable of collecting data from the mobile phone application and store it in the cloud. We developed an activity recognition application and transfer the data to the cloud for further processing. Big data technology Hadoop MapReduce is employed to analyze the data and create user timeline of user's activities. These activities are visualized to find useful health analytics and trends. In this paper a big data solution is proposed to analyze the sensory data and give insights into user behavior and lifestyle trends

Crossref

Directory of Open Access Journals

PubMed Central

University of East Anglia digital repository

Parallelism in Prolog: concepts and systems

Author: Fabrício Filho João
Silva Anderson Faustino da
Publication venue: 'Revista Brasileira de Hematologia e Hemoterapia (RBHH)'
Publication date: 01/05/2016
Field of study

Parallelism is a study area that grows up each day, caused by the cost reduction and popularizing of machines with parallels architecture. In this context, the logical languages, especially PROLOG, show a feasible and practical alternative of parallelism. This exploitation can be accomplished of different ways, and are there several challenges on this task. This survey aims to show the main concepts of parallelism in PROLOG, the faced challenges when aims to do parallelism in this language and the state-of-art of systems development to give parallelism support in logical languages. Systems with basis on implicit parallelism developed in different platforms are presented. At the end, is accomplished a comparison between the presented systems and the implemented models by they.Paralelismo é uma área de estudo que cresce a cada dia, devido à redução do custo e popularização de máquinas com arquiteturas paralelas. Nesse contexto, as linguagens lógicas, sobretudo o PROLOG, apresenta uma alternativa viável e prática de paralelismo. A exploração desse paralelismo pode ser realizada de diferentes formas, e há inúmeros desafios nessa tarefa. Este tutorial visa apresentar os principais conceitos de paralelismo em PROLOG, os desafios enfrentados quando se busca a paralelização nessa linguagem e o estado-da-arte do desenvolvimento de sistemas que dão suporte à paralelização em linguagens lógicas. São apresentados sistemas baseados em paralelismo implícito implementados em diferentes plataformas. Ao final é realizada uma comparação entre os sistemas apresentados e os modelos neles implementados

A MapReduce Construct for Yap Prolog

Author: Joana Sílvia Santos Côrte-Real
Publication venue
Publication date: 26/07/2013
Field of study

Repositório Aberto da Universidade do Porto

Paralelismo em Prolog: Conceitos e Sistemas

Author: Fabrício Filho João
Faustino da Silva Anderson
Publication venue: 'Universidade Federal do Rio Grande do Sul'
Publication date: 29/05/2016
Field of study

Paralelismo é uma área de estudo que cresce a cada dia, devido à redução do custo e popularização de máquinas com arquiteturas paralelas. Nesse contexto, as linguagens lógicas, sobretudo o PROLOG, apresenta uma alternativa viável e prática de paralelismo. A exploração desse paralelismo pode ser realizada de diferentes formas, e há inúmeros desafios nessa tarefa. Este tutorial visa apresentar os principais conceitos de paralelismo em PROLOG, os desafios enfrentados quando se busca a paralelização nessa linguagem e o estado-da-arte do desenvolvimento de sistemas que dão suporte à paralelização em linguagens lógicas. São apresentados sistemas baseados em paralelismo implícito implementados em diferentes plataformas. Ao final é realizada uma comparação entre os sistemas apresentados e os modelos neles implementados

Em Questao

Archives of the Faculty of Veterinary Medicine UFRGS

Improving Data-sharing and Policy Compliance in a Hybrid Cloud:The Case of a Healthcare Provider

Author: Kwame Azumah Kenneth
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2022
Field of study

VBN

PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development

Author: Barnett R. Matthew
Jermaine Chris
Lorido-Botran Tania
Luo Shangyu
Monroy Carlos
Sikdar Sourav
Teymourian Kia
Yuan Binhang
Zou Jia
Publication venue
Publication date: 01/01/2017
Field of study

This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/function dispatch to the JVM. This hybrid approach---declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small---results in a system that is ideal for the development of reusable, data-intensive tools and libraries. Through extensive benchmarking, we show that implementing complex objects manipulation and non-trivial, library-style computations on top of PlinyCompute can result in a speedup of 2x to more than 50x or more compared to equivalent implementations on Spark.Comment: 48 pages, including references and Appendi

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Orthology guided transcriptome assembly of Italian ryegrass and meadow fescue for single-nucleotide polymorphism discovery

Author: Bruno Studer
David Kopecký
Elodie Rey
Ghesquiere M.
Humphreys M.
Isabel Roldán‐Ruiz
Jan Bartoš
Jaroslav Doležel
Michael Abrouk
Rognli O.A.
Steven Yates
Tom Ruttink
Tomasz Książczyk
Zbigniew Zwierzykowski
Zerbino D.R.
Štěpán Stočes
Publication venue: 'Crop Science Society of America'
Publication date: 01/01/2016
Field of study

Single-nucleotide polymorphisms (SNPs) represent natural DNA sequence variation. They can be used for various applications including the construction of high-density genetic maps, analysis of genetic variability, genome-wide association studies, and mapbased cloning. Here we report on transcriptome sequencing in the two forage grasses, meadow fescue (Festuca pratensis Huds.) and Italian ryegrass (Lolium multiflorum Lam.), and identification of various classes of SNPs. Using the Orthology Guided Assembly (OGA) strategy, we assembled and annotated a total of 18,952 and 19,036 transcripts for Italian ryegrass and meadow fescue, respectively. In addition, we used transcriptome sequence data of perennial ryegrass (L. perenne L.) from a previous study to identify 16,613 transcripts shared across all three species. Large numbers of intraspecific SNPs were identified in all three species: 248,000 in meadow fescue, 715,000 in Italian ryegrass, and 529,000 in perennial ryegrass. Moreover, we identified almost 25,000 interspecific SNPs located in 5343 genes that can distinguish meadow fescue from Italian ryegrass and 15,000 SNPs located in 3976 genes that discriminate meadow fescue from both Lolium species. All identified SNPs were positioned in silico on the seven linkage groups (LGs) of L. perenne using the GenomeZipper approach. With the identification and positioning of interspecific SNPs, our study provides a valuable resource for the grass research and breeding community and will enable detailed characterization of genomic composition and gene expression analysis in prospective Festuca Lolium hybrids

Repository for Publications and Research Data

Crossref

Ghent University Academic Bibliography

Directory of Open Access Journals

Embedding programming languages: Prolog in Haskell

Author
Publication venue: University of Northern British Columbia
Publication date: 01/01/2016
Field of study

This thesis focuses on combining the two most important and wide spread declarative programming paradigms, functional and logic programming. The proposed approach aims at adding logic programming features which are native to Prolog onto Haskell. We develop extensions which replicate the target language by utilizing advanced features of the host language for an efficient implementation. The thesis aims to provide insights into merging two declarative languages namely, Haskell and Prolog by embedding the latter into the former and analyzing the results of doing so as the two languages have conflicting characteristics. The finished products will be something similar to a haskellised Prolog which has logic programming-like capabilities. --Leaf ii.The original print copy of this thesis may be available here: http://wizard.unbc.ca/record=b214135

Arca British Columbia's network of post-secondary digital repositories