334,857 research outputs found

    Astronomy in the Cloud: Using MapReduce for Image Coaddition

    Full text link
    In the coming decade, astronomical surveys of the sky will generate tens of terabytes of images and detect hundreds of millions of sources every night. The study of these sources will involve computation challenges such as anomaly detection and classification, and moving object tracking. Since such studies benefit from the highest quality data, methods such as image coaddition (stacking) will be a critical preprocessing step prior to scientific investigation. With a requirement that these images be analyzed on a nightly basis to identify moving sources or transient objects, these data streams present many computational challenges. Given the quantity of data involved, the computational load of these problems can only be addressed by distributing the workload over a large number of nodes. However, the high data throughput demanded by these applications may present scalability challenges for certain storage architectures. One scalable data-processing method that has emerged in recent years is MapReduce, and in this paper we focus on its popular open-source implementation called Hadoop. In the Hadoop framework, the data is partitioned among storage attached directly to worker nodes, and the processing workload is scheduled in parallel on the nodes that contain the required input data. A further motivation for using Hadoop is that it allows us to exploit cloud computing resources, e.g., Amazon's EC2. We report on our experience implementing a scalable image-processing pipeline for the SDSS imaging database using Hadoop. This multi-terabyte imaging dataset provides a good testbed for algorithm development since its scope and structure approximate future surveys. First, we describe MapReduce and how we adapted image coaddition to the MapReduce framework. Then we describe a number of optimizations to our basic approach and report experimental results comparing their performance.Comment: 31 pages, 11 figures, 2 table

    Efficient Machine Learning Approach for Optimizing Scientific Computing Applications on Emerging HPC Architectures

    Get PDF
    Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly-structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-flow and irregular memory accesses. Furthermore, these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-flow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-flow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-flow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization

    Gestión de conocimiento científico por los grupos de investigación. Una experiencia en la Universidad de Oriente // Management of scientific knowledge by research groups. An experience at the Universidad de Oriente

    Get PDF
    The research groups are a viable alternative to carry out important scientific projects that provide solutions to the complex problems of today's society. In this regard, it is necessary to increase their organizational capacity to meet individual and institutional goals, stimulating creativity in scientific production and communication. The objective of the work was to make a critical reflection on the importance of research groups for the management of scientific knowledge and present the experience developed by the Research Group on Mathematics and Computation (GIDMAC), belonging to the Universidad de Oriente, Cuba. The conclusion is that these groups have a very promising future in knowledge management, as long as they guarantee their strategic projection towards organizational aims and cardinal values, which contribute to institutional excellence and the sustainable development of society

    Gestión de conocimiento científico por los grupos de investigación. Una experiencia en la Universidad de Oriente // Management of scientific knowledge by research groups. An experience at the Universidad de Oriente

    Get PDF
    The research groups are a viable alternative to carry out important scientific projects that provide solutions to the complex problems of today's society. In this regard, it is necessary to increase their organizational capacity to meet individual and institutional goals, stimulating creativity in scientific production and communication. The objective of the work was to make a critical reflection on the importance of research groups for the management of scientific knowledge and present the experience developed by the Research Group on Mathematics and Computation (GIDMAC), belonging to the Universidad de Oriente, Cuba. The conclusion is that these groups have a very promising future in knowledge management, as long as they guarantee their strategic projection towards organizational aims and cardinal values, which contribute to institutional excellence and the sustainable development of society

    A numerical code for the solution of the Kompaneets equation in cosmological context

    Get PDF
    Context: The cosmic microwave background (CMB) spectrum probes physical processes and astrophysical phenomena occurring at various epochs of the Universe evolution. Current and future CMB absolute temperature experiments are aimed to the discovery of the very small distortions such those associated to the cosmological reionization process or that could be generated by different kinds of earlier processes. The interpretation of future data calls for a continuous improvement in the theoretical modeling of CMB spectrum. Aims: In this work we describe the fundamental approach and, in particular, the update to recent NAG versions of a numerical code, KYPRIX, specifically written for the solution of the Kompaneets equation in cosmological context, first implemented in the years 1989-1991, aimed at the very accurate computation of the CMB spectral distortions under quite general assumptions. Methods: We describe the structure and the main subdivisions of the code and discuss the most relevant aspects of its technical implementation. Results: We present some of fundamental tests we carried out to verify the accuracy, reliability, and performance of the code. Conclusions: All the tests done demonstrates the reliability and versatility of the new code version and its very good accuracy and applicability to the scientific analysis of current CMB spectrum data and of much more precise measurements that will be available in the future. The recipes and tests described in this work can be also useful to implement accurate numerical codes for other scientific purposes using the same or similar numerical libraries or to verify the validity of different codes aimed at the same or similar problems.Comment: 14 pages, 6 figures. Accepted for publication on Astronomy and Astrophysics on July 23, 2009. Abstract shorter than in the version in publicatio

    The Computational Lens: from Quantum Physics to Neuroscience

    Full text link
    Two transformative waves of computing have redefined the way we approach science. The first wave came with the birth of the digital computer, which enabled scientists to numerically simulate their models and analyze massive datasets. This technological breakthrough led to the emergence of many sub-disciplines bearing the prefix "computational" in their names. Currently, we are in the midst of the second wave, marked by the remarkable advancements in artificial intelligence. From predicting protein structures to classifying galaxies, the scope of its applications is vast, and there can only be more awaiting us on the horizon. While these two waves influence scientific methodology at the instrumental level, in this dissertation, I will present the computational lens in science, aiming at the conceptual level. Specifically, the central thesis posits that computation serves as a convenient and mechanistic language for understanding and analyzing information processing systems, offering the advantages of composability and modularity. This dissertation begins with an illustration of the blueprint of the computational lens, supported by a review of relevant previous work. Subsequently, I will present my own works in quantum physics and neuroscience as concrete examples. In the concluding chapter, I will contemplate the potential of applying the computational lens across various scientific fields, in a way that can provide significant domain insights, and discuss potential future directions.Comment: PhD thesis, Harvard University, Cambridge, Massachusetts, USA. 2023. Some chapters report joint wor
    • …
    corecore