9,893 research outputs found
A Real-Time Machine Learning and Visualization Framework for Scientific Workflows
High-performance computing resources are currently widely used in science and engineering areas. Typical post-hoc approaches use persistent storage to save produced data from simulation, thus reading from storage to memory is required for data analysis tasks. For large-scale scientific simulations, such I/O operation will produce significant overhead. In-situ/in-transit approaches bypass I/O by accessing and processing in-memory simulation results directly, which suggests simulations and analysis applications should be more closely coupled. This paper constructs a flexible and extensible framework to connect scientific simulations with multi-steps machine learning processes and in-situ visualization tools, thus providing plugged-in analysis and visualization functionality over complex workflows at real time. A distributed simulation-time clustering method is proposed to detect anomalies from real turbulence flows
Building a scientific workflow framework to enable realātime machine learning and visualization
Nowadays, we have entered the era of big data. In the area of high performance computing, largeāscale simulations can generate huge amounts of data with potentially critical information. However, these data are usually saved in intermediate files and are not instantly visible until advanced data analytics techniques are applied after reading all simulation data from persistent storages (eg, local disks or a parallel file system). This approach puts users in a situation where they spend long time on waiting for running simulations while not knowing the status of the running job. In this paper, we build a new computational framework to couple scientific simulations with multiāstep machine learning processes and ināsitu data visualizations. We also design a new scalable simulationātime clustering algorithm to automatically detect fluid flow anomalies. This computational framework is built upon different software components and provides plugāin data analysis and visualization functions over complex scientific workflows. With this advanced framework, users can monitor and get realātime notifications of special patterns or anomalies from ongoing extremeāscale turbulent flow simulations
BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments
Advances in sequencing techniques have led to exponential growth in
biological data, demanding the development of large-scale bioinformatics
experiments. Because these experiments are computation- and data-intensive,
they require high-performance computing (HPC) techniques and can benefit from
specialized technologies such as Scientific Workflow Management Systems (SWfMS)
and databases. In this work, we present BioWorkbench, a framework for managing
and analyzing bioinformatics experiments. This framework automatically collects
provenance data, including both performance data from workflow execution and
data from the scientific domain of the workflow application. Provenance data
can be analyzed through a web application that abstracts a set of queries to
the provenance database, simplifying access to provenance information. We
evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree
assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a
RASopathy analysis workflow. We analyze each workflow from both computational
and scientific domain perspectives, by using queries to a provenance and
annotation database. Some of these queries are available as a pre-built feature
of the BioWorkbench web application. Through the provenance data, we show that
the framework is scalable and achieves high-performance, reducing up to 98% of
the case studies execution time. We also show how the application of machine
learning techniques can enrich the analysis process
What May Visualization Processes Optimize?
In this paper, we present an abstract model of visualization and inference
processes and describe an information-theoretic measure for optimizing such
processes. In order to obtain such an abstraction, we first examined six
classes of workflows in data analysis and visualization, and identified four
levels of typical visualization components, namely disseminative,
observational, analytical and model-developmental visualization. We noticed a
common phenomenon at different levels of visualization, that is, the
transformation of data spaces (referred to as alphabets) usually corresponds to
the reduction of maximal entropy along a workflow. Based on this observation,
we establish an information-theoretic measure of cost-benefit ratio that may be
used as a cost function for optimizing a data visualization process. To
demonstrate the validity of this measure, we examined a number of successful
visualization processes in the literature, and showed that the
information-theoretic measure can mathematically explain the advantages of such
processes over possible alternatives.Comment: 10 page
IMP Science Gateway: from the Portal to the Hub of Virtual Experimental Labs in Materials Science
"Science gateway" (SG) ideology means a user-friendly intuitive interface
between scientists (or scientific communities) and different software
components + various distributed computing infrastructures (DCIs) (like grids,
clouds, clusters), where researchers can focus on their scientific goals and
less on peculiarities of software/DCI. "IMP Science Gateway Portal"
(http://scigate.imp.kiev.ua) for complex workflow management and integration of
distributed computing resources (like clusters, service grids, desktop grids,
clouds) is presented. It is created on the basis of WS-PGRADE and gUSE
technologies, where WS-PGRADE is designed for science workflow operation and
gUSE - for smooth integration of available resources for parallel and
distributed computing in various heterogeneous distributed computing
infrastructures (DCI). The typical scientific workflows with possible scenarios
of its preparation and usage are presented. Several typical use cases for these
science applications (scientific workflows) are considered for molecular
dynamics (MD) simulations of complex behavior of various nanostructures
(nanoindentation of graphene layers, defect system relaxation in metal
nanocrystals, thermal stability of boron nitride nanotubes, etc.). The user
experience is analyzed in the context of its practical applications for MD
simulations in materials science, physics and nanotechnologies with available
heterogeneous DCIs. In conclusion, the "science gateway" approach - workflow
manager (like WS-PGRADE) + DCI resources manager (like gUSE)- gives opportunity
to use the SG portal (like "IMP Science Gateway Portal") in a very promising
way, namely, as a hub of various virtual experimental labs (different software
components + various requirements to resources) in the context of its practical
MD applications in materials science, physics, chemistry, biology, and
nanotechnologies.Comment: 6 pages, 5 figures, 3 tables; 6th International Workshop on Science
Gateways, IWSG-2014 (Dublin, Ireland, 3-5 June, 2014). arXiv admin note:
substantial text overlap with arXiv:1404.545
- ā¦