38 research outputs found

    Generic Metadata Handling in Scientific Data Life Cycles

    Get PDF
    Scientific data life cycles define how data is created, handled, accessed, and analyzed by users. Such data life cycles become increasingly sophisticated as the sciences they deal with become more and more demanding and complex with the coming advent of exascale data and computing. The overarching data life cycle management background includes multiple abstraction categories with data sources, data and metadata management, computing and workflow management, security, data sinks, and methods on how to enable utilization. Challenges in this context are manifold. One is to hide the complexity from the user and to enable seamlessness in using resources to usability and efficiency. Another one is to enable generic metadata management that is not restricted to one use case but can be adapted with limited effort to further ones. Metadata management is essential to enable scientists to save time by avoiding the need for manually keeping track of data, meaning for example by its content and location. As the number of files grows into the millions, managing data without metadata becomes increasingly difficult. Thus, the solution is to employ metadata management to enable the organization of data based on information about it. Previously, use cases tended to only support highly specific or no metadata management at all. Now, a generic metadata management concept is available that can be used to efficiently integrate metadata capabilities with use cases. The concept was implemented within the MoSGrid data life cycle that enables molecular simulations on distributed HPC-enabled data and computing infrastructures. The implementation enables easy-to-use and effective metadata management. Automated extraction, annotation, and indexing of metadata was designed, developed, integrated, and search capabilities provided via a seamless user interface. Further analysis runs can be directly started based on search results. A complete evaluation of the concept both in general and along the example implementation is presented. In conclusion, generic metadata management concept advances the state of the art in scientific date life cycle management

    Environmental Molecular Sciences Laboratory 2007 Annual Report

    Get PDF
    This annual report provides details on the research conducted at the Environmental Molecular Sciences Laboratory in Fiscal Year 2007 and path forward for capability upgrades in Fiscal Year 2008

    Quantum chemical characterization of Biomolecules in the gas phase and on surfaces of metal oxides

    Get PDF
    During the four years of my PhD study, I performed systematic studies of the conformations of biomolecules ranging from a small amino acid (e.g. glycine) to a medium-sized nucleoside (e.g. 2’-deoxycytidine). To better account for possible effects brought by explicit environments (e.g. radiation, aqueous solution, and so on), we studied biomolecules in different phases, including neutral and charged species, in the gas phase and solid state, and neutral on solid surface. The work being presented in this thesis is original as: (1) A tool which can automatically generate libraries of conformations for a systematic search of the conformational space of a molecule was developed. When combined with tools developed by our colleagues, our toolbox facilitates a combinatorial computational chemical study of some small biomolecules; (2) A new method which can suppress barriers between different local minima on a molecular potential energy surface (PES) was developed, and with this new deformed PES, a lot of other techniques (e.g. Monte Carlo and simulated annealing) could be adopted to search for the global minima structure in a much more efficient way; (3) We performed a highly accurate study of two conformers of glycine up to the coupled-cluster with single and double and perturbative triple excitations (CCSD(T)) with basis sets up to aug-cc-pVQZ level of theory, and we found that the treatment at the CCSD(T) level of theory is necessary to achieve numerical stability of the relative energies with respect to different basis sets at different geometries; (4) Through a thorough search of the conformational space of 2’-deoxycytidine, we found that its conformations in the gas phase are quite different from those in the solid state, and hopefully this finding could correct some of the previous approaches, in which structural information extracted from solid state experiments was used in computational studies of molecules in the gas phase; (5) Adsorptions of hydrogen, methanol and glycine on different types of solid surfaces (conductive and semiconductive) were studied, and catalytic performances of these surfaces on breaking chemical bonds were discussed. The current thesis not only covers the main applications of computational chemistry tools in the conformational study of biomolecules, it also includes discussions on accuracy and methodology which is involved in these studies. We definitely did not intend to solve all of the problems which people have met in their conformational studies of biomolecules. We just hope that the work being presented here was performed in a much more systematic way, and we hope these studies can give people some insights which might be helpful in their further studies

    ARITHMETIC LOGIC UNIT ARCHITECTURES WITH DYNAMICALLY DEFINED PRECISION

    Get PDF
    Modern central processing units (CPUs) employ arithmetic logic units (ALUs) that support statically defined precisions, often adhering to industry standards. Although CPU manufacturers highly optimize their ALUs, industry standard precisions embody accuracy and performance compromises for general purpose deployment. Hence, optimizing ALU precision holds great potential for improving speed and energy efficiency. Previous research on multiple precision ALUs focused on predefined, static precisions. Little previous work addressed ALU architectures with customized, dynamically defined precision. This dissertation presents approaches for developing dynamic precision ALU architectures for both fixed-point and floating-point to enable better performance, energy efficiency, and numeric accuracy. These new architectures enable dynamically defined precision, including support for vectorization. The new architectures also prevent performance and energy loss due to applying unnecessarily high precision on computations, which often happens with statically defined standard precisions. The new ALU architectures support different precisions through the use of configurable sub-blocks, with this dissertation including demonstration implementations for floating point adder, multiply, and fused multiply-add (FMA) circuits with 4-bit sub-blocks. For these circuits, the dynamic precision ALU speed is nearly the same as traditional ALU approaches, although the dynamic precision ALU is nearly twice as large

    Investigation of exciton properties in organic materials via many-body perturbation theory

    Get PDF
    Modeling energy transport in an organic solar cel

    Investigation of exciton properties in organic materials via many-body perturbation theory

    No full text

    X10 for high-performance scientific computing

    No full text
    High performance computing is a key technology that enables large-scale physical simulation in modern science. While great advances have been made in methods and algorithms for scientific computing, the most commonly used programming models encourage a fragmented view of computation that maps poorly to the underlying computer architecture. Scientific applications typically manifest physical locality, which means that interactions between entities or events that are nearby in space or time are stronger than more distant interactions. Linear-scaling methods exploit physical locality by approximating distant interactions, to reduce computational complexity so that cost is proportional to system size. In these methods, the computation required for each portion of the system is different depending on that portion’s contribution to the overall result. To support productive development, application programmers need programming models that cleanly map aspects of the physical system being simulated to the underlying computer architecture while also supporting the irregular workloads that arise from the fragmentation of a physical system. X10 is a new programming language for high-performance computing that uses the asynchronous partitioned global address space (APGAS) model, which combines explicit representation of locality with asynchronous task parallelism. This thesis argues that the X10 language is well suited to expressing the algorithmic properties of locality and irregular parallelism that are common to many methods for physical simulation. The work reported in this thesis was part of a co-design effort involving researchers at IBM and ANU in which two significant computational chemistry codes were developed in X10, with an aim to improve the expressiveness and performance of the language. The first is a Hartree–Fock electronic structure code, implemented using the novel Resolution of the Coulomb Operator approach. The second evaluates electrostatic interactions between point charges, using either the smooth particle mesh Ewald method or the fast multipole method, with the latter used to simulate ion interactions in a Fourier Transform Ion Cyclotron Resonance mass spectrometer. We compare the performance of both X10 applications to state-of-the-art software packages written in other languages. This thesis presents improvements to the X10 language and runtime libraries for managing and visualizing the data locality of parallel tasks, communication using active messages, and efficient implementation of distributed arrays. We evaluate these improvements in the context of computational chemistry application examples. This work demonstrates that X10 can achieve performance comparable to established programming languages when running on a single core. More importantly, X10 programs can achieve high parallel efficiency on a multithreaded architecture, given a divide-and-conquer pattern parallel tasks and appropriate use of worker-local data. For distributed memory architectures, X10 supports the use of active messages to construct local, asynchronous communication patterns which outperform global, synchronous patterns. Although point-to-point active messages may be implemented efficiently, productive application development also requires collective communications; more work is required to integrate both forms of communication in the X10 language. The exploitation of locality is the key insight in both linear-scaling methods and the APGAS programming model; their combination represents an attractive opportunity for future co-design efforts
    corecore