8,804 research outputs found

    Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking

    Full text link
    Montage is a portable software toolkit for constructing custom, science-grade mosaics by composing multiple astronomical images. The mosaics constructed by Montage preserve the astrometry (position) and photometry (intensity) of the sources in the input images. The mosaic to be constructed is specified by the user in terms of a set of parameters, including dataset and wavelength to be used, location and size on the sky, coordinate system and projection, and spatial sampling rate. Many astronomical datasets are massive, and are stored in distributed archives that are, in most cases, remote with respect to the available computational resources. Montage can be run on both single- and multi-processor computers, including clusters and grids. Standard grid tools are used to run Montage in the case where the data or computers used to construct a mosaic are located remotely on the Internet. This paper describes the architecture, algorithms, and usage of Montage as both a software toolkit and as a grid portal. Timing results are provided to show how Montage performance scales with number of processors on a cluster computer. In addition, we compare the performance of two methods of running Montage in parallel on a grid.Comment: 16 pages, 11 figure

    Scientific Computing Meets Big Data Technology: An Astronomy Use Case

    Full text link
    Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end dataflow of single-process programs is known as a many-task application. Typically, tools from the HPC software stack are used to parallelize these analyses. In this work, we investigate an alternate approach that uses Apache Spark -- a modern big data platform -- to parallelize many-task applications. We present Kira, a flexible and distributed astronomy image processing toolkit using Apache Spark. We then use the Kira toolkit to implement a Source Extractor application for astronomy images, called Kira SE. With Kira SE as the use case, we study the programming flexibility, dataflow richness, scheduling capacity and performance of Apache Spark running on the EC2 cloud. By exploiting data locality, Kira SE achieves a 2.5x speedup over an equivalent C program when analyzing a 1TB dataset using 512 cores on the Amazon EC2 cloud. Furthermore, we show that by leveraging software originally designed for big data infrastructure, Kira SE achieves competitive performance to the C implementation running on the NERSC Edison supercomputer. Our experience with Kira indicates that emerging Big Data platforms such as Apache Spark are a performant alternative for many-task scientific applications

    MultiLibOS: an OS architecture for cloud computing

    Full text link
    Cloud computing is resulting in fundamental changes to computing infrastructure, yet these changes have not resulted in corresponding changes to operating systems. In this paper we discuss some key changes we see in the computing infrastructure and applications of IaaS systems. We argue that these changes enable and demand a very different model of operating system. We then describe the MulitLibOS architecture we are exploring and how it helps exploit the scale and elasticity of integrated systems while still allowing for legacy software run on traditional OSes

    Supporting Complex Scientific Database Schemas in a Grid Middleware

    Get PDF
    “This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder." “Copyright IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.” DOI: 10.1109/AINA.2009.129The volume of digital scientific data has increased considerably with advancing technologies of computing devices and scientific instruments. We are exploring the use of emerging Grid technologies for the management and manipulation of very large distributed scientific datasets. Taking as an example a terabyte-size scientific database with complex database schema, this paper focuses on the potential of a well-known Grid middleware - OGSA-DQP - for distributing such datasets. In particular, we investigate and extend the data type support in this system to handle a complex schema of a real scientific database - the Sloan Digital Sky Survey database

    Integrating Existing Software Toolkits into VO System

    Full text link
    Virtual Observatory (VO) is a collection of interoperating data archives and software tools. Taking advantages of the latest information technologies, it aims to provide a data-intensively online research environment for astronomers all around the world. A large number of high-qualified astronomical software packages and libraries are powerful and easy of use, and have been widely used by astronomers for many years. Integrating those toolkits into the VO system is a necessary and important task for the VO developers. VO architecture greatly depends on Grid and Web services, consequently the general VO integration route is "Java Ready - Grid Ready - VO Ready". In the paper, we discuss the importance of VO integration for existing toolkits and discuss the possible solutions. We introduce two efforts in the field from China-VO project, "gImageMagick" and " Galactic abundance gradients statistical research under grid environment". We also discuss what additional work should be done to convert Grid service to VO service.Comment: 9 pages, 3 figures, will be published in SPIE 2004 conference proceeding

    Cherenkov Telescope Array Data Management

    Get PDF
    Very High Energy gamma-ray astronomy with the Cherenkov Telescope Array (CTA) is evolving towards the model of a public observatory. Handling, processing and archiving the large amount of data generated by the CTA instruments and delivering scientific products are some of the challenges in designing the CTA Data Management. The participation of scientists from within CTA Consortium and from the greater worldwide scientific community necessitates a sophisticated scientific analysis system capable of providing unified and efficient user access to data, software and computing resources. Data Management is designed to respond to three main issues: (i) the treatment and flow of data from remote telescopes; (ii) "big-data" archiving and processing; (iii) and open data access. In this communication the overall technical design of the CTA Data Management, current major developments and prototypes are presented.Comment: 8 pages, 2 figures, In Proceedings of the 34th International Cosmic Ray Conference (ICRC2015), The Hague, The Netherlands. All CTA contributions at arXiv:1508.0589
    • 

    corecore