Search CORE

1,450 research outputs found

Virtual Environment for Next Generation Sequencing Analysis

Author: Abate Francesco
Acquaviva Andrea
Mossucca L.
Provenzano R.
Terzo Olivier
Publication venue: IARIA
Publication date: 01/01/2012
Field of study

Next Generation Sequencing technology, on the one hand, allows a more accurate analysis, and, on the other hand, increases the amount of data to process. A new protocol for sequencing the messenger RNA in a cell, known as RNA- Seq, generates millions of short sequence fragments in a single run. These fragments, or reads, can be used to measure levels of gene expression and to identify novel splice variants of genes. The proposed solution is a distributed architecture consisting of a Grid Environment and a Virtual Grid Environment, in order to reduce processing time by making the system scalable and flexibl

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

High-Throughput Computing on High-Performance Platforms: A Case Study

Author: Angius Alessio
De Kaushik
Jha Shantenu
Klimentov Alexei
Oleynik Danila
Oral Sarp H.
Panitkin Sergey
Turilli Matteo
Wells Jack C.
Publication venue
Publication date: 27/10/2017
Field of study

The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size resource. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan---a DOE leadership facility in conjunction with traditional distributed high- throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner

arXiv.org e-Print Archive

Recommended from our members

Nutrient Estimation from 24-Hour Food Recalls Using Machine Learning and Database Mapping: A Case Study with Lactose.

Author: Bouzid Yasmine Y
Burnett Dustin J
Chin Elizabeth L
Kan Annie
Lemay Danielle G
Simmons Gabriel
Tagkopoulos Ilias
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

The Automated Self-Administered 24-Hour Dietary Assessment Tool (ASA24) is a free dietary recall system that outputs fewer nutrients than the Nutrition Data System for Research (NDSR). NDSR uses the Nutrition Coordinating Center (NCC) Food and Nutrient Database, both of which require a license. Manual lookup of ASA24 foods into NDSR is time-consuming but currently the only way to acquire NCC-exclusive nutrients. Using lactose as an example, we evaluated machine learning and database matching methods to estimate this NCC-exclusive nutrient from ASA24 reports. ASA24-reported foods were manually looked up into NDSR to obtain lactose estimates and split into training (n = 378) and test (n = 189) datasets. Nine machine learning models were developed to predict lactose from the nutrients common between ASA24 and the NCC database. Database matching algorithms were developed to match NCC foods to an ASA24 food using only nutrients ("Nutrient-Only") or the nutrient and food descriptions ("Nutrient + Text"). For both methods, the lactose values were compared to the manual curation. Among machine learning models, the XGB-Regressor model performed best on held-out test data (R2 = 0.33). For the database matching method, Nutrient + Text matching yielded the best lactose estimates (R2 = 0.76), a vast improvement over the status quo of no estimate. These results suggest that computational methods can successfully estimate an NCC-exclusive nutrient for foods reported in ASA24

eScholarship - University of California

Algorithms for advance bandwidth reservation in media production networks

Author: Barshan Maryam
De Turck Filip
Famaey Jeroen
Moens Hendrik
Publication venue
Publication date: 01/01/2015
Field of study

Media production generally requires many geographically distributed actors (e.g., production houses, broadcasters, advertisers) to exchange huge amounts of raw video and audio data. Traditional distribution techniques, such as dedicated point-to-point optical links, are highly inefficient in terms of installation time and cost. To improve efficiency, shared media production networks that connect all involved actors over a large geographical area, are currently being deployed. The traffic in such networks is often predictable, as the timing and bandwidth requirements of data transfers are generally known hours or even days in advance. As such, the use of advance bandwidth reservation (AR) can greatly increase resource utilization and cost efficiency. In this paper, we propose an Integer Linear Programming formulation of the bandwidth scheduling problem, which takes into account the specific characteristics of media production networks, is presented. Two novel optimization algorithms based on this model are thoroughly evaluated and compared by means of in-depth simulation results

A Persistent Storage Model for Extreme Computing

Author: Yang Shuangyang
Publication venue: LSU Digital Commons
Publication date: 01/01/2014
Field of study

The continuing technological progress resulted in a dramatic growth in aggregate computational performance of the largest supercomputing systems. Unfortunately, these advances did not translate to the required extent into accompanying I/O systems and little more in terms of architecture or effective access latency. New classes of algorithms developed for massively parallel applications, that gracefully handle the challenges of asynchrony, heavily multi-threaded distributed codes, and message-driven computation, must be matched by similar advances in I/O methods and algorithms to produce a well performing and balanced supercomputing system. This dissertation proposes PXFS, a storage model for persistent objects inspired by the ParalleX model of execution that addresses many of these challenges. The PXFS model is designed to be asynchronous in nature to comply with ParalleX model and proposes an active TupleSpace concept to hold all kinds of metadata/meta-object for either storage objects or runtime objects. The new active TupleSpace can also register ParalleX actions to be triggered under certain tuple operations. An first implementation of PXFS utilizing a well-known Orange parallel file system as its back-end via asynchronous I/O layer and the implementation of TupleSpace component in HPX, the implementation of ParalleX. These details are also described along with the preliminary performance data. A house-made micro benchmark is developed to measure the disk I/O throughput of the PXFS asynchronous interface. The results show perfect scalability and 3x to 20x times speedup of I/O throughput performance comparing to OrangeFS synchronous user interface. Use cases of TupleSpace components are discussed for real-world applications including micro check-pointing. By utilizing TupleSpace in HPX applications for I/O, global barrier can be replaced with fine-grained parallelism to overlap more computation with communication and greatly boost the performance and efficiency. Also the dissertation showcases the distributed directory service in Orange file system which process directory entries in parallel and effectively improves the directory metada operations

CiteSeerX

Louisiana State University

EGI user forum 2011 : book of abstracts

Author
Publication venue
Publication date: 01/01/2011
Field of study

Using Premia and Nsp for Constructing a Risk Management Benchmark for Testing Parallel Architecture

Author: Chancelier Jean-Philippe
Lapeyre Bernard
Lelong Jérôme
Publication venue: 'Wiley'
Publication date: 25/06/2014
Field of study

International audienceFinancial institutions have massive computations to carry out overnight which are very demanding in terms of the consumed CPU. The challenge is to price many different products on a cluster-like architecture. We have used the Premia software to valuate the financial derivatives. In this work, we explain how Premia can be embedded into Nsp, a scientific software like Matlab, to provide a powerful tool to valuate a whole portfolio. Finally, we have integrated an MPI toolbox into Nsp to enable to use Premia to solve a bunch of pricing problems on a cluster. This unified framework can then be used to test different parallel architectures

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Optimal Load Balancing in a Beowulf Cluster

Author: Adams Daniel Alan
Publication venue: Digital WPI
Publication date: 02/05/2005
Field of study

PANTS (PANTS Application Node Transparency System) is a suite of programs designed to add transparent load balancing to a Beowulf cluster so that processes are transfered among the nodes of the cluster to improve performance. PANTS provides the option of using one of several different load balancing policies, each having a different approach. This paper studies the scalability and performance of these policies on large clusters and under various workloads. We measure the performance of our policies on our current cluster, and use that performance data to build simulations to test the performance of the policies in larger clusters and under differing workloads. Two policies, one deterministic and one non-deterministic, are presented which offer optimal steady-state performance. We also present best practices and discuss the major challenges of load balancing policy design