1,694 research outputs found
Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels
Achieving optimal program performance requires deep insight into the
interaction between hardware and software. For software developers without an
in-depth background in computer architecture, understanding and fully utilizing
modern architectures is close to impossible. Analytic loop performance modeling
is a useful way to understand the relevant bottlenecks of code execution based
on simple machine models. The Roofline Model and the Execution-Cache-Memory
(ECM) model are proven approaches to performance modeling of loop nests. In
comparison to the Roofline model, the ECM model can also describes the
single-core performance and saturation behavior on a multicore chip. We give an
introduction to the Roofline and ECM models, and to stencil performance
modeling using layer conditions (LC). We then present Kerncraft, a tool that
can automatically construct Roofline and ECM models for loop nests by
performing the required code, data transfer, and LC analysis. The layer
condition analysis allows to predict optimal spatial blocking factors for loop
nests. Together with the models it enables an ab-initio estimate of the
potential benefits of loop blocking optimizations and of useful block sizes. In
cases where LC analysis is not easily possible, Kerncraft supports a cache
simulator as a fallback option. Using a 25-point long-range stencil we
demonstrate the usefulness and predictive power of the Kerncraft tool.Comment: 22 pages, 5 figure
Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential
Emerging computer architectures will feature drastically decreased flops/byte
(ratio of peak processing rate to memory bandwidth) as highlighted by recent
studies on Exascale architectural trends. Further, flops are getting cheaper
while the energy cost of data movement is increasingly dominant. The
understanding and characterization of data locality properties of computations
is critical in order to guide efforts to enhance data locality. Reuse distance
analysis of memory address traces is a valuable tool to perform data locality
characterization of programs. A single reuse distance analysis can be used to
estimate the number of cache misses in a fully associative LRU cache of any
size, thereby providing estimates on the minimum bandwidth requirements at
different levels of the memory hierarchy to avoid being bandwidth bound.
However, such an analysis only holds for the particular execution order that
produced the trace. It cannot estimate potential improvement in data locality
through dependence preserving transformations that change the execution
schedule of the operations in the computation. In this article, we develop a
novel dynamic analysis approach to characterize the inherent locality
properties of a computation and thereby assess the potential for data locality
enhancement via dependence preserving transformations. The execution trace of a
code is analyzed to extract a computational directed acyclic graph (CDAG) of
the data dependences. The CDAG is then partitioned into convex subsets, and the
convex partitioning is used to reorder the operations in the execution trace to
enhance data locality. The approach enables us to go beyond reuse distance
analysis of a single specific order of execution of the operations of a
computation in characterization of its data locality properties. It can serve a
valuable role in identifying promising code regions for manual transformation,
as well as assessing the effectiveness of compiler transformations for data
locality enhancement. We demonstrate the effectiveness of the approach using a
number of benchmarks, including case studies where the potential shown by the
analysis is exploited to achieve lower data movement costs and better
performance.Comment: Transaction on Architecture and Code Optimization (2014
Workload-Aware Database Monitoring and Consolidation
In most enterprises, databases are deployed on dedicated database servers. Often, these servers are underutilized much of the time. For example, in traces from almost 200 production servers from different organizations, we see an average CPU utilization of less than 4%. This unused capacity can be potentially harnessed to consolidate multiple databases on fewer machines, reducing hardware and operational costs. Virtual machine (VM) technology is one popular way to approach this problem. However, as we demonstrate in this paper, VMs fail to adequately support database consolidation, because databases place a unique and challenging set of demands on hardware resources, which are not well-suited to the assumptions made by VM-based consolidation.
Instead, our system for database consolidation, named Kairos, uses novel techniques to measure the hardware requirements of database workloads, as well as models to predict the combined resource utilization of those workloads. We formalize the consolidation problem as a non-linear optimization program, aiming to minimize the number of servers and balance load, while achieving near-zero performance degradation. We compare Kairos against virtual machines, showing up to a factor of 12× higher throughput on a TPC-C-like benchmark. We also tested the effectiveness of our approach on real-world data collected from production servers at Wikia.com, Wikipedia, Second Life, and MIT CSAIL, showing absolute consolidation ratios ranging between 5.5:1 and 17:1
High Performance Computing for DNA Sequence Alignment and Assembly
Recent advances in DNA sequencing technology have dramatically increased the scale and scope of DNA sequencing. These data are used for a wide variety of important biological analyzes, including genome sequencing, comparative genomics, transcriptome analysis, and personalized medicine but are complicated by the volume and complexity of the data involved. Given the massive size of these datasets, computational biology must draw on the advances of high performance computing.
Two fundamental computations in computational biology are read alignment and genome assembly. Read alignment maps short DNA sequences to a reference genome to discover conserved and polymorphic regions of the genome. Genome assembly computes the sequence of a genome from many short DNA sequences. Both computations benefit from recent advances in high performance computing to efficiently process the huge datasets involved, including using highly parallel graphics processing units (GPUs) as high performance desktop processors, and using the MapReduce framework coupled with cloud computing to parallelize computation to large compute grids. This dissertation demonstrates how these technologies can be used to accelerate these computations by orders of magnitude, and have the potential to make otherwise infeasible computations practical
Earth Observatory Satellite system definition study. Report no. 3: Design/cost tradeoff studies. Appendix D: EOS configuration design data. Part 2: Data management system configuration
The Earth Observatory Satellite (EOS) data management system (DMS) is discussed. The DMS is composed of several subsystems or system elements which have basic purposes and are connected together so that the DMS can support the EOS program by providing the following: (1) payload data acquisition and recording, (2) data processing and product generation, (3) spacecraft and processing management and control, and (4) data user services. The configuration and purposes of the primary or high-data rate system and the secondary or local user system are explained. Diagrams of the systems are provided to support the systems analysis
Finding Safety in Numbers with Secure Allegation Escrows
For fear of retribution, the victim of a crime may be willing to report it
only if other victims of the same perpetrator also step forward. Common
examples include 1) identifying oneself as the victim of sexual harassment,
especially by a person in a position of authority or 2) accusing an influential
politician, an authoritarian government, or ones own employer of corruption. To
handle such situations, legal literature has proposed the concept of an
allegation escrow: a neutral third-party that collects allegations anonymously,
matches them against each other, and de-anonymizes allegers only after
de-anonymity thresholds (in terms of number of co-allegers), pre-specified by
the allegers, are reached.
An allegation escrow can be realized as a single trusted third party;
however, this party must be trusted to keep the identity of the alleger and
content of the allegation private. To address this problem, this paper
introduces Secure Allegation Escrows (SAE, pronounced "say"). A SAE is a group
of parties with independent interests and motives, acting jointly as an escrow
for collecting allegations from individuals, matching the allegations, and
de-anonymizing the allegations when designated thresholds are reached. By
design, SAEs provide a very strong property: No less than a majority of parties
constituting a SAE can de-anonymize or disclose the content of an allegation
without a sufficient number of matching allegations (even in collusion with any
number of other allegers). Once a sufficient number of matching allegations
exist, the join escrow discloses the allegation with the allegers' identities.
We describe how SAEs can be constructed using a novel authentication protocol
and a novel allegation matching and bucketing algorithm, provide formal proofs
of the security of our constructions, and evaluate a prototype implementation,
demonstrating feasibility in practice.Comment: To appear in NDSS 2020. New version includes improvements to writing
and proof. The protocol is unchange
Insightful classification of crystal structures using deep learning
Computational methods that automatically extract knowledge from data are
critical for enabling data-driven materials science. A reliable identification
of lattice symmetry is a crucial first step for materials characterization and
analytics. Current methods require a user-specified threshold, and are unable
to detect average symmetries for defective structures. Here, we propose a
machine-learning-based approach to automatically classify structures by crystal
symmetry. First, we represent crystals by calculating a diffraction image, then
construct a deep-learning neural-network model for classification. Our approach
is able to correctly classify a dataset comprising more than 100 000 simulated
crystal structures, including heavily defective ones. The internal operations
of the neural network are unraveled through attentive response maps,
demonstrating that it uses the same landmarks a materials scientist would use,
although never explicitly instructed to do so. Our study paves the way for
crystal-structure recognition of - possibly noisy and incomplete -
three-dimensional structural data in big-data materials science.Comment: Nature Communications, in press (2018
- …