Search CORE

13 research outputs found

Parallel stochastic systems biology in the cloud

Author: Aldinucci Marco
Calcagno Cristina
Concetto Spampinato
Coppo Mario
Drocco Maurizio
Massimo Torquati
Misale Claudia
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

Institutional Research Information System University of Turin

NeoHiC: A web application for the analysis of Hi-C data

Author: A Galizia
AJ Banegas-Luna
CT Have
D D’Agostino
D Szklarczyk
DM Bean
F Chiappori
F Serra
F Tordini
F Tordini
F Viti
I Merelli
I Merelli
I Merelli
I Merelli
JE Phillips-Cremins
JQ Ling
M Aldinucci
NC Durand
P Shannon
RN Smith
Y Shavit
Z Duan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Crossref

Institutional Research Information System University of Turin

Scientific workflows on clouds with heterogeneous and preemptible instances

Author: Aldinucci Marco
Ivan Merelli
Lio&apos
Tordini Fabio
Viviani Paolo
Publication venue: 'IOS Press'
Publication date: 01/01/2018
Field of study

Institutional Research Information System University of Turin

Discovering biological knowledge by integrating high-throughput data and scientific literature on the cloud

Author: Alberto Faro
Aldinucci Marco
Carmelo Pino
Concetto Spampinato
Daniela Giordano
Isaak Kavasidis
Publication venue: 'Wiley'
Publication date: 01/01/2014
Field of study

Institutional Research Information System University of Turin

PWHATSHAP: efficient haplotyping for future generation sequencing

Author: Aldinucci Marco
Bracciali Andrea
Marschall Tobias
Merelli Ivan
Patterson Murray
Pisanti Nadia
Torquati Massimo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Haplotype phasing is an important problem in the analysis of genomics information. Given a set of DNA fragments of an individual, it consists of determining which one of the possible alleles (alternative forms of a gene) each fragment comes from. Haplotype information is relevant to gene regulation, epigenetics, genome-wide association studies, evolutionary and population studies, and the study of mutations. Haplotyping is currently addressed as an optimisation problem aiming at solutions that minimise, for instance, error correction costs, where costs are a measure of the con dence in the accuracy of the information acquired from DNA sequencing. Solutions have typically an exponential computational complexity. WhatsHap is a recent optimal approach which moves computational complexity from DNA fragment length to fragment overlap, i.e. coverage, and is hence of particular interest when considering sequencing technology's current trends that are producing longer fragments.  Results: Given the potential relevance of ecient haplotyping in several analysis pipelines, we have designed and engineered pWhatsHap, a parallel, high-performance version of WhatsHap. pWhatsHap is embedded in a toolkit developed in Python and supports genomics datasets in standard le formats. Building on WhatsHap, pWhatsHap exhibits the same complexity exploring a number of possible solutions which is exponential in the coverage of the dataset. The parallel implementation on multi-core architectures allows for a relevant reduction of the execution time for haplotyping, while the provided results enjoy the same high accuracy as that provided by WhatsHap, which increases with coverage.  Conclusions: Due to its structure and management of the large datasets, the parallelisation of WhatsHap posed demanding technical challenges, which have been addressed exploiting a high-level parallel programming framework. The result, pWhatsHap, is a freely available toolkit that improves the eciency of the analysis of genomics information

Crossref

Stirling Online Research Repository (RIOXX)

Springer - Publisher Connector

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

PubMed Central

Stirling Online Research Repository

MPG.PuRe

Hal-Diderot

Institutional Research Information System University of Turin

PWHATSHAP: efficient haplotyping for future generation sequencing

Author: A Menelaou
A Panconesi
Andrea Bracciali
AS Mikheyev
BN Howie
CS Chin
D He
D Leung
F Deng
G Glusman
G Lancia
GM Amdahl
Ivan Merelli
J Duitama
J Huang
J Marchini
M Aldinucci
M Aldinucci
M Aldinucci
M Carneiro
M Patterson
M Patterson
M Slatkin
MA DePristo
Marco Aldinucci
Massimo Torquati
Murray Patterson
Nadia Pisanti
P Fouilhoux
P Scheet
R Cilibrasi
R Roberts
RG Downey
S Levy
SR Browning
SR Mousavi
The 1000 Genomes Project Consortium
The Genome of the Netherlands Consortium
The International HapMap Consortium
Tobias Marschall
V Bansal
V Bansal
V Bansal
V Kuleshov
V Kuleshov
Y Li
Y Pirola
YT Zhao
ZZ Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Why High-Performance Modelling and Simulation for Big Data Applications Matters

Author: A Abdullatif
A Al-Fuqaha
A Bracciali
A Cristina-Bicharra
A Fanti
A Glotić
A Heidari Gorji
A Inostrosa-Psijas
A Oussous
A Serra
A Sikora
A Singh
A Zamuda
A Zamuda
A Zamuda
A Zamuda
A Zamuda
A Zamuda
B Kitchenham
C Grelck
C Grelck
C Lorenzo
C Misale
C Sansom
C Zechner
CS Iliopoulos
D Griol
E Bartocci
E Bartocci
E Capobianco
E Capobianco
E Frank
E Niewiadomska-Szynkiewicz
E Niewiadomska-Szynkiewicz
EA Lee
EI Vlahogianni
F Bardozzo
F Berman
G Bernardini
G Garnett
G Vitello
H Casanova
H Kennedy
I Cotes-Ruiz
I Milne
I Park
J Dean
J Holub
J Zhang
K Rutherford
L Calviello
L Calzone
L Garg
L Garg
L Huang
L Lazzerini-Ospri
L Marti
L Nasti
M Aldinucci
M Aldinucci
M Aldinucci
M Aldinucci
M Beccuti
M Cannataro
M Cole
M Herlihy
M Jahangirian
M Karpowicz
M Patterson
MA Martínez-del-Amor
MP Karpowicz
N Akhter
N Paoletti
N Sehgal
N Totis
P Danecek
P Liò
P Liò
P Suravajhala
P Szynkiewicz
PD Healy
PL Luisi
PS Pacheco
R Calheiros
RJ Walters
S Aleem
S John Walker
S McClean
S McClean
S Shanmugam
S Vitabile
T Akidau
T Carver
T Mastelic
T White
X Song
Y Kuruma
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Modelling and Simulation (M&S) offer adequate abstractions to manage the complexity of analysing big data in scientific and engineering domains. Unfortunately, big data problems are often not easily amenable to efficient and effective use of High Performance Computing (HPC) facilities and technologies. Furthermore, M&S communities typically lack the detailed expertise required to exploit the full potential of HPC solutions while HPC specialists may not be fully aware of specific modelling and simulation requirements and applications. The COST Action IC1406 High-Performance Modelling and Simulation for Big Data Applications has created a strategic framework to foster interaction between M&S experts from various application domains on the one hand and HPC experts on the other hand to develop effective solutions for big data applications. One of the tangible outcomes of the COST Action is a collection of case studies from various computing domains. Each case study brought together both HPC and M&S experts, giving witness of the effective cross-pollination facilitated by the COST Action. In this introductory article we argue why joining forces between M&S and HPC communities is both timely in the big data era and crucial for success in many application domains. Moreover, we provide an overview on the state of the art in the various research areas concerned

Crossref

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Institutional Research Information System University of Turin

Why High-Performance Modelling and Simulation for Big Data Applications Matters

Author: Aldinucci M.
Bracciali A.
Grelck C.
Larsson E.
Niewiadomska-Szynkiewicz E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

International Migration, Integration and Social Cohesion online publications

PiCo: A Domain-Specific Language for Data Analytics Pipelines

Author: Misale Claudia
Publication venue
Publication date: 01/01/2017
Field of study

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institutional Research Information System University of Turin

High-Performance Modelling and Simulation for Big Data Applications

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2021
Field of study

This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

Directory of Open Access Books (DOAB)