334,857 research outputs found
Astronomy in the Cloud: Using MapReduce for Image Coaddition
In the coming decade, astronomical surveys of the sky will generate tens of
terabytes of images and detect hundreds of millions of sources every night. The
study of these sources will involve computation challenges such as anomaly
detection and classification, and moving object tracking. Since such studies
benefit from the highest quality data, methods such as image coaddition
(stacking) will be a critical preprocessing step prior to scientific
investigation. With a requirement that these images be analyzed on a nightly
basis to identify moving sources or transient objects, these data streams
present many computational challenges. Given the quantity of data involved, the
computational load of these problems can only be addressed by distributing the
workload over a large number of nodes. However, the high data throughput
demanded by these applications may present scalability challenges for certain
storage architectures. One scalable data-processing method that has emerged in
recent years is MapReduce, and in this paper we focus on its popular
open-source implementation called Hadoop. In the Hadoop framework, the data is
partitioned among storage attached directly to worker nodes, and the processing
workload is scheduled in parallel on the nodes that contain the required input
data. A further motivation for using Hadoop is that it allows us to exploit
cloud computing resources, e.g., Amazon's EC2. We report on our experience
implementing a scalable image-processing pipeline for the SDSS imaging database
using Hadoop. This multi-terabyte imaging dataset provides a good testbed for
algorithm development since its scope and structure approximate future surveys.
First, we describe MapReduce and how we adapted image coaddition to the
MapReduce framework. Then we describe a number of optimizations to our basic
approach and report experimental results comparing their performance.Comment: 31 pages, 11 figures, 2 table
Efficient Machine Learning Approach for Optimizing Scientific Computing Applications on Emerging HPC Architectures
Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly-structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-flow and irregular memory accesses. Furthermore, these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-flow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-flow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps.
In this dissertation, we present novel machine learning based optimization techniques to address the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-flow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization
Gestión de conocimiento cientÃfico por los grupos de investigación. Una experiencia en la Universidad de Oriente // Management of scientific knowledge by research groups. An experience at the Universidad de Oriente
The research groups are a viable alternative to carry out important scientific projects that provide solutions to the complex problems of today's society. In this regard, it is necessary to increase their organizational capacity to meet individual and institutional goals, stimulating creativity in scientific production and communication. The objective of the work was to make a critical reflection on the importance of research groups for the management of scientific knowledge and present the experience developed by the Research Group on Mathematics and Computation (GIDMAC), belonging to the Universidad de Oriente, Cuba. The conclusion is that these groups have a very promising future in knowledge management, as long as they guarantee their strategic projection towards organizational aims and cardinal values, which contribute to institutional excellence and the sustainable development of society
Gestión de conocimiento cientÃfico por los grupos de investigación. Una experiencia en la Universidad de Oriente // Management of scientific knowledge by research groups. An experience at the Universidad de Oriente
The research groups are a viable alternative to carry out important scientific projects that provide solutions to the complex problems of today's society. In this regard, it is necessary to increase their organizational capacity to meet individual and institutional goals, stimulating creativity in scientific production and communication. The objective of the work was to make a critical reflection on the importance of research groups for the management of scientific knowledge and present the experience developed by the Research Group on Mathematics and Computation (GIDMAC), belonging to the Universidad de Oriente, Cuba. The conclusion is that these groups have a very promising future in knowledge management, as long as they guarantee their strategic projection towards organizational aims and cardinal values, which contribute to institutional excellence and the sustainable development of society
A numerical code for the solution of the Kompaneets equation in cosmological context
Context: The cosmic microwave background (CMB) spectrum probes physical
processes and astrophysical phenomena occurring at various epochs of the
Universe evolution. Current and future CMB absolute temperature experiments are
aimed to the discovery of the very small distortions such those associated to
the cosmological reionization process or that could be generated by different
kinds of earlier processes. The interpretation of future data calls for a
continuous improvement in the theoretical modeling of CMB spectrum. Aims: In
this work we describe the fundamental approach and, in particular, the update
to recent NAG versions of a numerical code, KYPRIX, specifically written for
the solution of the Kompaneets equation in cosmological context, first
implemented in the years 1989-1991, aimed at the very accurate computation of
the CMB spectral distortions under quite general assumptions. Methods: We
describe the structure and the main subdivisions of the code and discuss the
most relevant aspects of its technical implementation. Results: We present some
of fundamental tests we carried out to verify the accuracy, reliability, and
performance of the code. Conclusions: All the tests done demonstrates the
reliability and versatility of the new code version and its very good accuracy
and applicability to the scientific analysis of current CMB spectrum data and
of much more precise measurements that will be available in the future. The
recipes and tests described in this work can be also useful to implement
accurate numerical codes for other scientific purposes using the same or
similar numerical libraries or to verify the validity of different codes aimed
at the same or similar problems.Comment: 14 pages, 6 figures. Accepted for publication on Astronomy and
Astrophysics on July 23, 2009. Abstract shorter than in the version in
publicatio
The Computational Lens: from Quantum Physics to Neuroscience
Two transformative waves of computing have redefined the way we approach
science. The first wave came with the birth of the digital computer, which
enabled scientists to numerically simulate their models and analyze massive
datasets. This technological breakthrough led to the emergence of many
sub-disciplines bearing the prefix "computational" in their names. Currently,
we are in the midst of the second wave, marked by the remarkable advancements
in artificial intelligence. From predicting protein structures to classifying
galaxies, the scope of its applications is vast, and there can only be more
awaiting us on the horizon.
While these two waves influence scientific methodology at the instrumental
level, in this dissertation, I will present the computational lens in science,
aiming at the conceptual level. Specifically, the central thesis posits that
computation serves as a convenient and mechanistic language for understanding
and analyzing information processing systems, offering the advantages of
composability and modularity.
This dissertation begins with an illustration of the blueprint of the
computational lens, supported by a review of relevant previous work.
Subsequently, I will present my own works in quantum physics and neuroscience
as concrete examples. In the concluding chapter, I will contemplate the
potential of applying the computational lens across various scientific fields,
in a way that can provide significant domain insights, and discuss potential
future directions.Comment: PhD thesis, Harvard University, Cambridge, Massachusetts, USA. 2023.
Some chapters report joint wor
- …