25,300 research outputs found
Scientific Computing Meets Big Data Technology: An Astronomy Use Case
Scientific analyses commonly compose multiple single-process programs into a
dataflow. An end-to-end dataflow of single-process programs is known as a
many-task application. Typically, tools from the HPC software stack are used to
parallelize these analyses. In this work, we investigate an alternate approach
that uses Apache Spark -- a modern big data platform -- to parallelize
many-task applications. We present Kira, a flexible and distributed astronomy
image processing toolkit using Apache Spark. We then use the Kira toolkit to
implement a Source Extractor application for astronomy images, called Kira SE.
With Kira SE as the use case, we study the programming flexibility, dataflow
richness, scheduling capacity and performance of Apache Spark running on the
EC2 cloud. By exploiting data locality, Kira SE achieves a 2.5x speedup over an
equivalent C program when analyzing a 1TB dataset using 512 cores on the Amazon
EC2 cloud. Furthermore, we show that by leveraging software originally designed
for big data infrastructure, Kira SE achieves competitive performance to the C
implementation running on the NERSC Edison supercomputer. Our experience with
Kira indicates that emerging Big Data platforms such as Apache Spark are a
performant alternative for many-task scientific applications
Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking
Montage is a portable software toolkit for constructing custom, science-grade
mosaics by composing multiple astronomical images. The mosaics constructed by
Montage preserve the astrometry (position) and photometry (intensity) of the
sources in the input images. The mosaic to be constructed is specified by the
user in terms of a set of parameters, including dataset and wavelength to be
used, location and size on the sky, coordinate system and projection, and
spatial sampling rate. Many astronomical datasets are massive, and are stored
in distributed archives that are, in most cases, remote with respect to the
available computational resources. Montage can be run on both single- and
multi-processor computers, including clusters and grids. Standard grid tools
are used to run Montage in the case where the data or computers used to
construct a mosaic are located remotely on the Internet. This paper describes
the architecture, algorithms, and usage of Montage as both a software toolkit
and as a grid portal. Timing results are provided to show how Montage
performance scales with number of processors on a cluster computer. In
addition, we compare the performance of two methods of running Montage in
parallel on a grid.Comment: 16 pages, 11 figure
Computing server power modeling in a data center: survey,taxonomy and performance evaluation
Data centers are large scale, energy-hungry infrastructure serving the
increasing computational demands as the world is becoming more connected in
smart cities. The emergence of advanced technologies such as cloud-based
services, internet of things (IoT) and big data analytics has augmented the
growth of global data centers, leading to high energy consumption. This upsurge
in energy consumption of the data centers not only incurs the issue of surging
high cost (operational and maintenance) but also has an adverse effect on the
environment. Dynamic power management in a data center environment requires the
cognizance of the correlation between the system and hardware level performance
counters and the power consumption. Power consumption modeling exhibits this
correlation and is crucial in designing energy-efficient optimization
strategies based on resource utilization. Several works in power modeling are
proposed and used in the literature. However, these power models have been
evaluated using different benchmarking applications, power measurement
techniques and error calculation formula on different machines. In this work,
we present a taxonomy and evaluation of 24 software-based power models using a
unified environment, benchmarking applications, power measurement technique and
error formula, with the aim of achieving an objective comparison. We use
different servers architectures to assess the impact of heterogeneity on the
models' comparison. The performance analysis of these models is elaborated in
the paper
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
The VLA/ALMA Nascent Disk and Multiplicity (VANDAM) Survey of Orion Protostars. II. A Statistical Characterization of Class 0 and Class I Protostellar Disks
We have conducted a survey of 328 protostars in the Orion molecular clouds with the Atacama Large Millimeter/submillimeter Array at 0.87 mm at a resolution of ~0.β1 (40 au), including observations with the Very Large Array at 9 mm toward 148 protostars at a resolution of ~0 08 (32 au). This is the largest multiwavelength survey of protostars at this resolution by an order of magnitude. We use the dust continuum emission at 0.87 and 9 mm to measure the dust disk radii and masses toward the Class 0, Class I, and flat-spectrum protostars, characterizing the evolution of these disk properties in the protostellar phase. The mean dust disk radii for the Class 0, Class I, and flat-spectrum protostars are 44.9^(+5.8)_(β3.4), 37.0^(+4.9)_(β3.0), and 28.5^(+3.7)_(β2.3) au, respectively, and the mean protostellar dust disk masses are 25.9^(+7.7)_(β4.0), 14.9^(+3.8)_(β2.2), 1.6^(+3.5)_(β1.9) Mβ, respectively. The decrease in dust disk masses is expected from disk evolution and accretion, but the decrease in disk radii may point to the initial conditions of star formation not leading to the systematic growth of disk radii or that radial drift is keeping the dust disk sizes small. At least 146 protostellar disks (35% of 379 detected 0.87 mm continuum sources plus 42 nondetections) have disk radii greater than 50 au in our sample. These properties are not found to vary significantly between different regions within Orion. The protostellar dust disk mass distributions are systematically larger than those of Class II disks by a factor of >4, providing evidence that the cores of giant planets may need to at least begin their formation during the protostellar phase
- β¦