38 research outputs found
Generic Metadata Handling in Scientific Data Life Cycles
Scientific data life cycles define how data is created, handled, accessed, and analyzed by users. Such data life cycles become increasingly sophisticated as the sciences they deal with become more and more demanding and complex with the coming advent of exascale data and computing. The overarching data life cycle management background includes multiple abstraction categories with data sources, data and metadata management, computing and workflow management, security, data sinks, and methods on how to enable utilization. Challenges in this context are manifold. One is to hide the complexity from the user and to enable seamlessness in using resources to usability and efficiency. Another one is to enable generic metadata management that is not restricted to one use case but can be adapted with limited effort to further ones.
Metadata management is essential to enable scientists to save time by avoiding the need for manually keeping track of data, meaning for example by its content and location. As the number of files grows into the millions, managing data without metadata becomes increasingly difficult. Thus, the solution is to employ metadata management to enable the organization of data based on information about it. Previously, use cases tended to only support highly specific or no metadata management at all. Now, a generic metadata management concept is available that can be used to efficiently integrate metadata capabilities with use cases.
The concept was implemented within the MoSGrid data life cycle that enables molecular simulations on distributed HPC-enabled data and computing infrastructures. The implementation enables easy-to-use and effective metadata management. Automated extraction, annotation, and indexing of metadata was designed, developed, integrated, and search capabilities provided via a seamless user interface. Further analysis runs can be directly started based on search results. A complete evaluation of the concept both in general and along the example implementation is presented. In conclusion, generic metadata management concept advances the state of the art in scientific date life cycle management
Environmental Molecular Sciences Laboratory 2007 Annual Report
This annual report provides details on the research conducted at the Environmental Molecular Sciences Laboratory in Fiscal Year 2007 and path forward for capability upgrades in Fiscal Year 2008
Quantum chemical characterization of Biomolecules in the gas phase and on surfaces of metal oxides
During the four years of my PhD study, I performed systematic studies of the
conformations of biomolecules ranging from a small amino acid (e.g. glycine) to a
medium-sized nucleoside (e.g. 2’-deoxycytidine). To better account for possible
effects brought by explicit environments (e.g. radiation, aqueous solution, and so
on), we studied biomolecules in different phases, including neutral and charged
species, in the gas phase and solid state, and neutral on solid surface. The work
being presented in this thesis is original as:
(1) A tool which can automatically generate libraries of conformations for a
systematic search of the conformational space of a molecule was developed.
When combined with tools developed by our colleagues, our toolbox facilitates a
combinatorial computational chemical study of some small biomolecules;
(2) A new method which can suppress barriers between different local minima on
a molecular potential energy surface (PES) was developed, and with this new
deformed PES, a lot of other techniques (e.g. Monte Carlo and simulated
annealing) could be adopted to search for the global minima structure in a much
more efficient way;
(3) We performed a highly accurate study of two conformers of glycine up to the
coupled-cluster with single and double and perturbative triple excitations
(CCSD(T)) with basis sets up to aug-cc-pVQZ level of theory, and we found that
the treatment at the CCSD(T) level of theory is necessary to achieve numerical
stability of the relative energies with respect to different basis sets at different
geometries;
(4) Through a thorough search of the conformational space of 2’-deoxycytidine,
we found that its conformations in the gas phase are quite different from those in
the solid state, and hopefully this finding could correct some of the previous
approaches, in which structural information extracted from solid state experiments
was used in computational studies of molecules in the gas phase;
(5) Adsorptions of hydrogen, methanol and glycine on different types of solid
surfaces (conductive and semiconductive) were studied, and catalytic
performances of these surfaces on breaking chemical bonds were discussed.
The current thesis not only covers the main applications of computational
chemistry tools in the conformational study of biomolecules, it also includes
discussions on accuracy and methodology which is involved in these studies. We
definitely did not intend to solve all of the problems which people have met in
their conformational studies of biomolecules. We just hope that the work being
presented here was performed in a much more systematic way, and we hope these
studies can give people some insights which might be helpful in their further
studies
ARITHMETIC LOGIC UNIT ARCHITECTURES WITH DYNAMICALLY DEFINED PRECISION
Modern central processing units (CPUs) employ arithmetic logic units (ALUs) that support statically defined precisions, often adhering to industry standards. Although CPU manufacturers highly optimize their ALUs, industry standard precisions embody accuracy and performance compromises for general purpose deployment. Hence, optimizing ALU precision holds great potential for improving speed and energy efficiency. Previous research on multiple precision ALUs focused on predefined, static precisions. Little previous work addressed ALU architectures with customized, dynamically defined precision. This dissertation presents approaches for developing dynamic precision ALU architectures for both fixed-point and floating-point to enable better performance, energy efficiency, and numeric accuracy. These new architectures enable dynamically defined precision, including support for vectorization. The new architectures also prevent performance and energy loss due to applying unnecessarily high precision on computations, which often happens with statically defined standard precisions. The new ALU architectures support different precisions through the use of configurable sub-blocks, with this dissertation including demonstration implementations for floating point adder, multiply, and fused multiply-add (FMA) circuits with 4-bit sub-blocks. For these circuits, the dynamic precision ALU speed is nearly the same as traditional ALU approaches, although the dynamic precision ALU is nearly twice as large
Investigation of exciton properties in organic materials via many-body perturbation theory
Modeling energy transport in an organic solar cel
X10 for high-performance scientific computing
High performance computing is a key technology that enables large-scale physical
simulation in modern science. While great advances have been made in methods and
algorithms for scientific computing, the most commonly used programming models
encourage a fragmented view of computation that maps poorly to the underlying
computer architecture.
Scientific applications typically manifest physical locality, which means that interactions
between entities or events that are nearby in space or time are stronger
than more distant interactions. Linear-scaling methods exploit physical locality by approximating
distant interactions, to reduce computational complexity so that cost is
proportional to system size. In these methods, the computation required for each
portion of the system is different depending on that portion’s contribution to the
overall result. To support productive development, application programmers need
programming models that cleanly map aspects of the physical system being simulated
to the underlying computer architecture while also supporting the irregular
workloads that arise from the fragmentation of a physical system.
X10 is a new programming language for high-performance computing that uses
the asynchronous partitioned global address space (APGAS) model, which combines
explicit representation of locality with asynchronous task parallelism. This thesis
argues that the X10 language is well suited to expressing the algorithmic properties
of locality and irregular parallelism that are common to many methods for physical
simulation.
The work reported in this thesis was part of a co-design effort involving researchers
at IBM and ANU in which two significant computational chemistry codes
were developed in X10, with an aim to improve the expressiveness and performance
of the language. The first is a Hartree–Fock electronic structure code, implemented
using the novel Resolution of the Coulomb Operator approach. The second evaluates
electrostatic interactions between point charges, using either the smooth particle
mesh Ewald method or the fast multipole method, with the latter used to simulate
ion interactions in a Fourier Transform Ion Cyclotron Resonance mass spectrometer.
We compare the performance of both X10 applications to state-of-the-art software
packages written in other languages.
This thesis presents improvements to the X10 language and runtime libraries for
managing and visualizing the data locality of parallel tasks, communication using
active messages, and efficient implementation of distributed arrays. We evaluate these improvements in the context of computational chemistry application examples.
This work demonstrates that X10 can achieve performance comparable to established
programming languages when running on a single core. More importantly,
X10 programs can achieve high parallel efficiency on a multithreaded architecture,
given a divide-and-conquer pattern parallel tasks and appropriate use of worker-local
data. For distributed memory architectures, X10 supports the use of active messages
to construct local, asynchronous communication patterns which outperform global,
synchronous patterns. Although point-to-point active messages may be implemented
efficiently, productive application development also requires collective communications;
more work is required to integrate both forms of communication in the X10
language. The exploitation of locality is the key insight in both linear-scaling methods and
the APGAS programming model; their combination represents an attractive opportunity
for future co-design efforts
Recommended from our members
Pacific Northwest National Laboratory institutional plan FY 1997--2002
Pacific Northwest National Laboratory`s core mission is to deliver environmental science and technology in the service of the nation and humanity. Through basic research fundamental knowledge is created of natural, engineered, and social systems that is the basis for both effective environmental technology and sound public policy. Legacy environmental problems are solved by delivering technologies that remedy existing environmental hazards, today`s environmental needs are addressed with technologies that prevent pollution and minimize waste, and the technical foundation is being laid for tomorrow`s inherently clean energy and industrial processes. Pacific Northwest National Laboratory also applies its capabilities to meet selected national security, energy, and human health needs; strengthen the US economy; and support the education of future scientists and engineers. Brief summaries are given of the various tasks being carried out under these broad categories