27,755 research outputs found
IMP Science Gateway: from the Portal to the Hub of Virtual Experimental Labs in Materials Science
"Science gateway" (SG) ideology means a user-friendly intuitive interface
between scientists (or scientific communities) and different software
components + various distributed computing infrastructures (DCIs) (like grids,
clouds, clusters), where researchers can focus on their scientific goals and
less on peculiarities of software/DCI. "IMP Science Gateway Portal"
(http://scigate.imp.kiev.ua) for complex workflow management and integration of
distributed computing resources (like clusters, service grids, desktop grids,
clouds) is presented. It is created on the basis of WS-PGRADE and gUSE
technologies, where WS-PGRADE is designed for science workflow operation and
gUSE - for smooth integration of available resources for parallel and
distributed computing in various heterogeneous distributed computing
infrastructures (DCI). The typical scientific workflows with possible scenarios
of its preparation and usage are presented. Several typical use cases for these
science applications (scientific workflows) are considered for molecular
dynamics (MD) simulations of complex behavior of various nanostructures
(nanoindentation of graphene layers, defect system relaxation in metal
nanocrystals, thermal stability of boron nitride nanotubes, etc.). The user
experience is analyzed in the context of its practical applications for MD
simulations in materials science, physics and nanotechnologies with available
heterogeneous DCIs. In conclusion, the "science gateway" approach - workflow
manager (like WS-PGRADE) + DCI resources manager (like gUSE)- gives opportunity
to use the SG portal (like "IMP Science Gateway Portal") in a very promising
way, namely, as a hub of various virtual experimental labs (different software
components + various requirements to resources) in the context of its practical
MD applications in materials science, physics, chemistry, biology, and
nanotechnologies.Comment: 6 pages, 5 figures, 3 tables; 6th International Workshop on Science
Gateways, IWSG-2014 (Dublin, Ireland, 3-5 June, 2014). arXiv admin note:
substantial text overlap with arXiv:1404.545
Hardware-accelerated interactive data visualization for neuroscience in Python.
Large datasets are becoming more and more common in science, particularly in neuroscience where experimental techniques are rapidly evolving. Obtaining interpretable results from raw data can sometimes be done automatically; however, there are numerous situations where there is a need, at all processing stages, to visualize the data in an interactive way. This enables the scientist to gain intuition, discover unexpected patterns, and find guidance about subsequent analysis steps. Existing visualization tools mostly focus on static publication-quality figures and do not support interactive visualization of large datasets. While working on Python software for visualization of neurophysiological data, we developed techniques to leverage the computational power of modern graphics cards for high-performance interactive data visualization. We were able to achieve very high performance despite the interpreted and dynamic nature of Python, by using state-of-the-art, fast libraries such as NumPy, PyOpenGL, and PyTables. We present applications of these methods to visualization of neurophysiological data. We believe our tools will be useful in a broad range of domains, in neuroscience and beyond, where there is an increasing need for scalable and fast interactive visualization
AstroGrid-D: Grid Technology for Astronomical Science
We present status and results of AstroGrid-D, a joint effort of
astrophysicists and computer scientists to employ grid technology for
scientific applications. AstroGrid-D provides access to a network of
distributed machines with a set of commands as well as software interfaces. It
allows simple use of computer and storage facilities and to schedule or monitor
compute tasks and data management. It is based on the Globus Toolkit middleware
(GT4). Chapter 1 describes the context which led to the demand for advanced
software solutions in Astrophysics, and we state the goals of the project. We
then present characteristic astrophysical applications that have been
implemented on AstroGrid-D in chapter 2. We describe simulations of different
complexity, compute-intensive calculations running on multiple sites, and
advanced applications for specific scientific purposes, such as a connection to
robotic telescopes. We can show from these examples how grid execution improves
e.g. the scientific workflow. Chapter 3 explains the software tools and
services that we adapted or newly developed. Section 3.1 is focused on the
administrative aspects of the infrastructure, to manage users and monitor
activity. Section 3.2 characterises the central components of our architecture:
The AstroGrid-D information service to collect and store metadata, a file
management system, the data management system, and a job manager for automatic
submission of compute tasks. We summarise the successfully established
infrastructure in chapter 4, concluding with our future plans to establish
AstroGrid-D as a platform of modern e-Astronomy.Comment: 14 pages, 12 figures Subjects: data analysis, image processing,
robotic telescopes, simulations, grid. Accepted for publication in New
Astronom
State-of-the-Art in Parallel Computing with R
R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly useful for general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems four different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix
Simulation modelling and visualisation: toolkits for building artificial worlds
Simulations users at all levels make heavy use of compute resources to drive computational
simulations for greatly varying applications areas of research using different simulation
paradigms. Simulations are implemented in many software forms, ranging from highly standardised
and general models that run in proprietary software packages to ad hoc hand-crafted
simulations codes for very specific applications. Visualisation of the workings or results of a
simulation is another highly valuable capability for simulation developers and practitioners.
There are many different software libraries and methods available for creating a visualisation
layer for simulations, and it is often a difficult and time-consuming process to assemble a
toolkit of these libraries and other resources that best suits a particular simulation model. We
present here a break-down of the main simulation paradigms, and discuss differing toolkits and
approaches that different researchers have taken to tackle coupled simulation and visualisation
in each paradigm
ROOT - A C++ Framework for Petabyte Data Storage, Statistical Analysis and Visualization
ROOT is an object-oriented C++ framework conceived in the high-energy physics
(HEP) community, designed for storing and analyzing petabytes of data in an
efficient way. Any instance of a C++ class can be stored into a ROOT file in a
machine-independent compressed binary format. In ROOT the TTree object
container is optimized for statistical data analysis over very large data sets
by using vertical data storage techniques. These containers can span a large
number of files on local disks, the web, or a number of different shared file
systems. In order to analyze this data, the user can chose out of a wide set of
mathematical and statistical functions, including linear algebra classes,
numerical algorithms such as integration and minimization, and various methods
for performing regression analysis (fitting). In particular, ROOT offers
packages for complex data modeling and fitting, as well as multivariate
classification based on machine learning techniques. A central piece in these
analysis tools are the histogram classes which provide binning of one- and
multi-dimensional data. Results can be saved in high-quality graphical formats
like Postscript and PDF or in bitmap formats like JPG or GIF. The result can
also be stored into ROOT macros that allow a full recreation and rework of the
graphics. Users typically create their analysis macros step by step, making use
of the interactive C++ interpreter CINT, while running over small data samples.
Once the development is finished, they can run these macros at full compiled
speed over large data sets, using on-the-fly compilation, or by creating a
stand-alone batch program. Finally, if processing farms are available, the user
can reduce the execution time of intrinsically parallel tasks - e.g. data
mining in HEP - by using PROOF, which will take care of optimally distributing
the work over the available resources in a transparent way
Distributed-based massive processing of activity logs for efficient user modeling in a Virtual Campus
This paper reports on a multi-fold approach for the building of user models based on the identification of navigation patterns in a virtual campus, allowing for adapting the campus’ usability to the actual learners’ needs, thus resulting in a great stimulation of the learning experience. However, user modeling in this context implies a constant processing and analysis of user interaction data during long-term learning activities, which produces huge amounts of valuable data stored typically in server log files. Due to the large or very large size of log files generated daily, the massive processing is a foremost step in extracting useful information. To this end, this work studies, first, the viability of processing large log data files of a real Virtual Campus using different distributed infrastructures. More precisely, we study the time performance of massive processing of daily log files implemented following the master-slave paradigm and evaluated using Cluster Computing and PlanetLab platforms. The study reveals the complexity and challenges of massive processing in the big data era, such as the need to carefully tune the log file processing in terms of chunk log data size to be processed at slave nodes as well as the bottleneck in processing in truly geographically distributed infrastructures due to the overhead caused by the communication time among the master and slave nodes. Then, an application of the massive processing approach resulting in log data processed and stored in a well-structured format is presented. We show how to extract knowledge from the log data analysis by using the WEKA framework for data mining purposes showing its usefulness to effectively build user models in terms of identifying interesting navigation patters of on-line learners. The study is motivated and conducted in the context of the actual data logs of the Virtual Campus of the Open University of Catalonia.Peer ReviewedPostprint (author's final draft
State of the Art in Parallel Computing with R
R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly suited to general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems five different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix.
- …