8,804 research outputs found
Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking
Montage is a portable software toolkit for constructing custom, science-grade
mosaics by composing multiple astronomical images. The mosaics constructed by
Montage preserve the astrometry (position) and photometry (intensity) of the
sources in the input images. The mosaic to be constructed is specified by the
user in terms of a set of parameters, including dataset and wavelength to be
used, location and size on the sky, coordinate system and projection, and
spatial sampling rate. Many astronomical datasets are massive, and are stored
in distributed archives that are, in most cases, remote with respect to the
available computational resources. Montage can be run on both single- and
multi-processor computers, including clusters and grids. Standard grid tools
are used to run Montage in the case where the data or computers used to
construct a mosaic are located remotely on the Internet. This paper describes
the architecture, algorithms, and usage of Montage as both a software toolkit
and as a grid portal. Timing results are provided to show how Montage
performance scales with number of processors on a cluster computer. In
addition, we compare the performance of two methods of running Montage in
parallel on a grid.Comment: 16 pages, 11 figure
Scientific Computing Meets Big Data Technology: An Astronomy Use Case
Scientific analyses commonly compose multiple single-process programs into a
dataflow. An end-to-end dataflow of single-process programs is known as a
many-task application. Typically, tools from the HPC software stack are used to
parallelize these analyses. In this work, we investigate an alternate approach
that uses Apache Spark -- a modern big data platform -- to parallelize
many-task applications. We present Kira, a flexible and distributed astronomy
image processing toolkit using Apache Spark. We then use the Kira toolkit to
implement a Source Extractor application for astronomy images, called Kira SE.
With Kira SE as the use case, we study the programming flexibility, dataflow
richness, scheduling capacity and performance of Apache Spark running on the
EC2 cloud. By exploiting data locality, Kira SE achieves a 2.5x speedup over an
equivalent C program when analyzing a 1TB dataset using 512 cores on the Amazon
EC2 cloud. Furthermore, we show that by leveraging software originally designed
for big data infrastructure, Kira SE achieves competitive performance to the C
implementation running on the NERSC Edison supercomputer. Our experience with
Kira indicates that emerging Big Data platforms such as Apache Spark are a
performant alternative for many-task scientific applications
MultiLibOS: an OS architecture for cloud computing
Cloud computing is resulting in fundamental changes to computing infrastructure, yet these changes have not resulted in corresponding changes to operating systems. In this paper we discuss some key changes we see in the computing infrastructure and applications of IaaS systems. We argue that these changes enable and demand a very different model of operating system. We then describe the MulitLibOS architecture we are exploring and how it helps exploit the scale and elasticity of integrated systems while still allowing for legacy software run on traditional OSes
Supporting Complex Scientific Database Schemas in a Grid Middleware
âThis material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder." âCopyright IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.â DOI: 10.1109/AINA.2009.129The volume of digital scientific data has increased considerably with advancing technologies of computing devices and scientific instruments. We are exploring the use of emerging Grid technologies for the management and manipulation of very large distributed scientific datasets. Taking as an example a terabyte-size scientific database with complex database schema, this paper focuses on the potential of a well-known Grid middleware - OGSA-DQP - for distributing such datasets. In particular, we investigate and extend the data type support in this system to handle a complex schema of a real scientific database - the Sloan Digital Sky Survey database
Integrating Existing Software Toolkits into VO System
Virtual Observatory (VO) is a collection of interoperating data archives and
software tools. Taking advantages of the latest information technologies, it
aims to provide a data-intensively online research environment for astronomers
all around the world.
A large number of high-qualified astronomical software packages and libraries
are powerful and easy of use, and have been widely used by astronomers for many
years. Integrating those toolkits into the VO system is a necessary and
important task for the VO developers.
VO architecture greatly depends on Grid and Web services, consequently the
general VO integration route is "Java Ready - Grid Ready - VO Ready". In the
paper, we discuss the importance of VO integration for existing toolkits and
discuss the possible solutions. We introduce two efforts in the field from
China-VO project, "gImageMagick" and " Galactic abundance gradients statistical
research under grid environment". We also discuss what additional work should
be done to convert Grid service to VO service.Comment: 9 pages, 3 figures, will be published in SPIE 2004 conference
proceeding
Cherenkov Telescope Array Data Management
Very High Energy gamma-ray astronomy with the Cherenkov Telescope Array (CTA)
is evolving towards the model of a public observatory. Handling, processing and
archiving the large amount of data generated by the CTA instruments and
delivering scientific products are some of the challenges in designing the CTA
Data Management. The participation of scientists from within CTA Consortium and
from the greater worldwide scientific community necessitates a sophisticated
scientific analysis system capable of providing unified and efficient user
access to data, software and computing resources. Data Management is designed
to respond to three main issues: (i) the treatment and flow of data from remote
telescopes; (ii) "big-data" archiving and processing; (iii) and open data
access. In this communication the overall technical design of the CTA Data
Management, current major developments and prototypes are presented.Comment: 8 pages, 2 figures, In Proceedings of the 34th International Cosmic
Ray Conference (ICRC2015), The Hague, The Netherlands. All CTA contributions
at arXiv:1508.0589
- âŠ