394 research outputs found
PaPaS: A Portable, Lightweight, and Generic Framework for Parallel Parameter Studies
The current landscape of scientific research is widely based on modeling and
simulation, typically with complexity in the simulation's flow of execution and
parameterization properties. Execution flows are not necessarily
straightforward since they may need multiple processing tasks and iterations.
Furthermore, parameter and performance studies are common approaches used to
characterize a simulation, often requiring traversal of a large parameter
space. High-performance computers offer practical resources at the expense of
users handling the setup, submission, and management of jobs. This work
presents the design of PaPaS, a portable, lightweight, and generic workflow
framework for conducting parallel parameter and performance studies. Workflows
are defined using parameter files based on keyword-value pairs syntax, thus
removing from the user the overhead of creating complex scripts to manage the
workflow. A parameter set consists of any combination of environment variables,
files, partial file contents, and command line arguments. PaPaS is being
developed in Python 3 with support for distributed parallelization using SSH,
batch systems, and C++ MPI. The PaPaS framework will run as user processes, and
can be used in single/multi-node and multi-tenant computing systems. An example
simulation using the BehaviorSpace tool from NetLogo and a matrix multiply
using OpenMP are presented as parameter and performance studies, respectively.
The results demonstrate that the PaPaS framework offers a simple method for
defining and managing parameter studies, while increasing resource utilization.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
QoS-aware predictive workflow scheduling
This research places the basis of QoS-aware predictive workflow scheduling. This research novel contributions will open up prospects for future research in handling complex big workflow applications with high uncertainty and dynamism. The results from the proposed workflow scheduling algorithm shows significant improvement in terms of the performance and reliability of the workflow applications
MOLNs: A cloud platform for interactive, reproducible and scalable spatial stochastic computational experiments in systems biology using PyURDME
Computational experiments using spatial stochastic simulations have led to
important new biological insights, but they require specialized tools, a
complex software stack, as well as large and scalable compute and data analysis
resources due to the large computational cost associated with Monte Carlo
computational workflows. The complexity of setting up and managing a
large-scale distributed computation environment to support productive and
reproducible modeling can be prohibitive for practitioners in systems biology.
This results in a barrier to the adoption of spatial stochastic simulation
tools, effectively limiting the type of biological questions addressed by
quantitative modeling. In this paper, we present PyURDME, a new, user-friendly
spatial modeling and simulation package, and MOLNs, a cloud computing appliance
for distributed simulation of stochastic reaction-diffusion models. MOLNs is
based on IPython and provides an interactive programming platform for
development of sharable and reproducible distributed parallel computational
experiments
Technical Conception and Implementation of an IT-System supporting the flexible Distribution of Documents within a large scale Sales Organization
This thesis aims to discuss and present an IT system to distribute documents within a large scale sales organization. Therefore, different sales applications on the market are analyzed and described. Based on this analysis, requirements for the IT system are defined. The requirements are divided into client-side and server-side requirements. The server application is realized using an existing ECM system to support the content throughout the enterprise content life cycle. Therefore, an evaluation of existing ECM systems is performed with a criteria catalog based on the requirements. Afterwards, the conception and architecture of the IT system is described, followed by further insights into the technical implementation of the mobile application. Finally, the features of the overall IT system are discussed and an outlook on how to extend the IT system is presented
Big Data Optimization : Algorithmic Framework for Data Analysis Guided by Semantics
Fecha de Lectura de Tesis: 9 noviembre 2018.Over the past decade the rapid rise of creating data in all domains of knowledge such as traffic, medicine, social network, industry, etc., has highlighted the need for enhancing the process of analyzing large data volumes, in order to be able to manage them with more easiness and in addition, discover new relationships which are hidden in them
Optimization problems, which are commonly found in current industry, are not unrelated to this trend, therefore Multi-Objective Optimization Algorithms (MOA) should bear in mind this new scenario. This means that, MOAs have to deal with problems, which have either various data sources (typically streaming) of huge amount of data. Indeed these features, in particular, are found in Dynamic Multi-Objective Problems (DMOPs), which are related to Big Data optimization problems. Mostly with regards to velocity and variability. When dealing with DMOPs, whenever there exist changes in the environment that affect the solutions of the problem (i.e., the Pareto set, the Pareto front, or both), therefore in the fitness landscape, the optimization algorithm must react to adapt the search to the new features of the problem.
Big Data analytics are long and complex processes therefore, with the aim of simplify them, a series of steps are carried out through. A typical analysis is composed of data collection, data manipulation, data analysis and finally result visualization.
In the process of creating a Big Data workflow the analyst should bear in mind the semantics involving the problem domain knowledge and its data. Ontology is the standard way for describing the knowledge about a domain.
As a global target of this PhD Thesis, we are interested in investigating the use of the semantic in the process of Big Data analysis, not only focused on machine learning analysis, but also in optimization
DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge
The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for
processing large astronomical datasets at a scale required by the Square
Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex
data reduction pipelines consisting of both data sets and algorithmic
components and an implementation run-time to execute such pipelines on
distributed resources. By mapping the logical view of a pipeline to its
physical realisation, DALiuGE separates the concerns of multiple stakeholders,
allowing them to collectively optimise large-scale data processing solutions in
a coherent manner. The execution in DALiuGE is data-activated, where each
individual data item autonomously triggers the processing on itself. Such
decentralisation also makes the execution framework very scalable and flexible,
supporting pipeline sizes ranging from less than ten tasks running on a laptop
to tens of millions of concurrent tasks on the second fastest supercomputer in
the world. DALiuGE has been used in production for reducing interferometry data
sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide
Spectral Radioheliograph; and is being developed as the execution framework
prototype for the Science Data Processor (SDP) consortium of the Square
Kilometre Array (SKA) telescope. This paper presents a technical overview of
DALiuGE and discusses case studies from the CHILES and MUSER projects that use
DALiuGE to execute production pipelines. In a companion paper, we provide
in-depth analysis of DALiuGE's scalability to very large numbers of tasks on
two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and
Computin
- …