25,300 research outputs found

    Scientific Computing Meets Big Data Technology: An Astronomy Use Case

    Full text link
    Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end dataflow of single-process programs is known as a many-task application. Typically, tools from the HPC software stack are used to parallelize these analyses. In this work, we investigate an alternate approach that uses Apache Spark -- a modern big data platform -- to parallelize many-task applications. We present Kira, a flexible and distributed astronomy image processing toolkit using Apache Spark. We then use the Kira toolkit to implement a Source Extractor application for astronomy images, called Kira SE. With Kira SE as the use case, we study the programming flexibility, dataflow richness, scheduling capacity and performance of Apache Spark running on the EC2 cloud. By exploiting data locality, Kira SE achieves a 2.5x speedup over an equivalent C program when analyzing a 1TB dataset using 512 cores on the Amazon EC2 cloud. Furthermore, we show that by leveraging software originally designed for big data infrastructure, Kira SE achieves competitive performance to the C implementation running on the NERSC Edison supercomputer. Our experience with Kira indicates that emerging Big Data platforms such as Apache Spark are a performant alternative for many-task scientific applications

    Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking

    Full text link
    Montage is a portable software toolkit for constructing custom, science-grade mosaics by composing multiple astronomical images. The mosaics constructed by Montage preserve the astrometry (position) and photometry (intensity) of the sources in the input images. The mosaic to be constructed is specified by the user in terms of a set of parameters, including dataset and wavelength to be used, location and size on the sky, coordinate system and projection, and spatial sampling rate. Many astronomical datasets are massive, and are stored in distributed archives that are, in most cases, remote with respect to the available computational resources. Montage can be run on both single- and multi-processor computers, including clusters and grids. Standard grid tools are used to run Montage in the case where the data or computers used to construct a mosaic are located remotely on the Internet. This paper describes the architecture, algorithms, and usage of Montage as both a software toolkit and as a grid portal. Timing results are provided to show how Montage performance scales with number of processors on a cluster computer. In addition, we compare the performance of two methods of running Montage in parallel on a grid.Comment: 16 pages, 11 figure

    Computing server power modeling in a data center: survey,taxonomy and performance evaluation

    Full text link
    Data centers are large scale, energy-hungry infrastructure serving the increasing computational demands as the world is becoming more connected in smart cities. The emergence of advanced technologies such as cloud-based services, internet of things (IoT) and big data analytics has augmented the growth of global data centers, leading to high energy consumption. This upsurge in energy consumption of the data centers not only incurs the issue of surging high cost (operational and maintenance) but also has an adverse effect on the environment. Dynamic power management in a data center environment requires the cognizance of the correlation between the system and hardware level performance counters and the power consumption. Power consumption modeling exhibits this correlation and is crucial in designing energy-efficient optimization strategies based on resource utilization. Several works in power modeling are proposed and used in the literature. However, these power models have been evaluated using different benchmarking applications, power measurement techniques and error calculation formula on different machines. In this work, we present a taxonomy and evaluation of 24 software-based power models using a unified environment, benchmarking applications, power measurement technique and error formula, with the aim of achieving an objective comparison. We use different servers architectures to assess the impact of heterogeneity on the models' comparison. The performance analysis of these models is elaborated in the paper

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    The VLA/ALMA Nascent Disk and Multiplicity (VANDAM) Survey of Orion Protostars. II. A Statistical Characterization of Class 0 and Class I Protostellar Disks

    Get PDF
    We have conducted a survey of 328 protostars in the Orion molecular clouds with the Atacama Large Millimeter/submillimeter Array at 0.87 mm at a resolution of ~0.”1 (40 au), including observations with the Very Large Array at 9 mm toward 148 protostars at a resolution of ~0 08 (32 au). This is the largest multiwavelength survey of protostars at this resolution by an order of magnitude. We use the dust continuum emission at 0.87 and 9 mm to measure the dust disk radii and masses toward the Class 0, Class I, and flat-spectrum protostars, characterizing the evolution of these disk properties in the protostellar phase. The mean dust disk radii for the Class 0, Class I, and flat-spectrum protostars are 44.9^(+5.8)_(βˆ’3.4), 37.0^(+4.9)_(βˆ’3.0), and 28.5^(+3.7)_(βˆ’2.3) au, respectively, and the mean protostellar dust disk masses are 25.9^(+7.7)_(βˆ’4.0), 14.9^(+3.8)_(βˆ’2.2), 1.6^(+3.5)_(βˆ’1.9) MβŠ•, respectively. The decrease in dust disk masses is expected from disk evolution and accretion, but the decrease in disk radii may point to the initial conditions of star formation not leading to the systematic growth of disk radii or that radial drift is keeping the dust disk sizes small. At least 146 protostellar disks (35% of 379 detected 0.87 mm continuum sources plus 42 nondetections) have disk radii greater than 50 au in our sample. These properties are not found to vary significantly between different regions within Orion. The protostellar dust disk mass distributions are systematically larger than those of Class II disks by a factor of >4, providing evidence that the cores of giant planets may need to at least begin their formation during the protostellar phase
    • …
    corecore