435,075 research outputs found
The Cost of Address Translation
Modern computers are not random access machines (RAMs). They have a memory
hierarchy, multiple cores, and virtual memory. In this paper, we address the
computational cost of address translation in virtual memory. Starting point for
our work is the observation that the analysis of some simple algorithms (random
scan of an array, binary search, heapsort) in either the RAM model or the EM
model (external memory model) does not correctly predict growth rates of actual
running times. We propose the VAT model (virtual address translation) to
account for the cost of address translations and analyze the algorithms
mentioned above and others in the model. The predictions agree with the
measurements. We also analyze the VAT-cost of cache-oblivious algorithms.Comment: A extended abstract of this paper was published in the proceedings of
ALENEX13, New Orleans, US
Cache-Oblivious VAT-Algorithms
The VAT-model (virtual address translation model) extends the EM-model
(external memory model) and takes the cost of address translation in virtual
memories into account. In this model, the cost of a single memory access may be
logarithmic in the largest address used. We show that the VAT-cost of
cache-oblivious algorithms is only by a constant factor larger than their
EM-cost; this requires a somewhat more stringent tall cache assumption as for
the EM-model
Near-Memory Address Translation
Memory and logic integration on the same chip is becoming increasingly cost
effective, creating the opportunity to offload data-intensive functionality to
processing units placed inside memory chips. The introduction of memory-side
processing units (MPUs) into conventional systems faces virtual memory as the
first big showstopper: without efficient hardware support for address
translation MPUs have highly limited applicability. Unfortunately, conventional
translation mechanisms fall short of providing fast translations as
contemporary memories exceed the reach of TLBs, making expensive page walks
common.
In this paper, we are the first to show that the historically important
flexibility to map any virtual page to any page frame is unnecessary in today's
servers. We find that while limiting the associativity of the
virtual-to-physical mapping incurs no penalty, it can break the
translate-then-fetch serialization if combined with careful data placement in
the MPU's memory, allowing for translation and data fetch to proceed
independently and in parallel. We propose the Distributed Inverted Page Table
(DIPTA), a near-memory structure in which the smallest memory partition keeps
the translation information for its data share, ensuring that the translation
completes together with the data fetch. DIPTA completely eliminates the
performance overhead of translation, achieving speedups of up to 3.81x and
2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure
Optimality of the genetic code with respect to protein stability and amino acid frequencies
How robust is the natural genetic code with respect to mistranslation errors?
It has long been known that the genetic code is very efficient in limiting the
effect of point mutation. A misread codon will commonly code either for the
same amino acid or for a similar one in terms of its biochemical properties, so
the structure and function of the coded protein remain relatively unaltered.
Previous studies have attempted to address this question more quantitatively,
namely by statistically estimating the fraction of randomly generated codes
that do better than the genetic code regarding its overall robustness. In this
paper, we extend these results by investigating the role of amino acid
frequencies in the optimality of the genetic code. When measuring the relative
fitness of the natural code with respect to a random code, it is indeed natural
to assume that a translation error affecting a frequent amino acid is less
favorable than that of a rare one, at equal mutation cost. We find that taking
the amino acid frequency into account accordingly decreases the fraction of
random codes that beat the natural code, making the latter comparatively even
more robust. This effect is particularly pronounced when more refined measures
of the amino acid substitution cost are used than hydrophobicity. To show this,
we devise a new cost function by evaluating with computer experiments the
change in folding free energy caused by all possible single-site mutations in a
set of known protein structures. With this cost function, we estimate that of
the order of one random code out of 100 millions is more fit than the natural
code when taking amino acid frequencies into account. The genetic code seems
therefore structured so as to minimize the consequences of translation errors
on the 3D structure and stability of proteins.Comment: 31 pages, 2 figures, postscript fil
Light field super resolution through controlled micro-shifts of light field sensor
Light field cameras enable new capabilities, such as post-capture refocusing
and aperture control, through capturing directional and spatial distribution of
light rays in space. Micro-lens array based light field camera design is often
preferred due to its light transmission efficiency, cost-effectiveness and
compactness. One drawback of the micro-lens array based light field cameras is
low spatial resolution due to the fact that a single sensor is shared to
capture both spatial and angular information. To address the low spatial
resolution issue, we present a light field imaging approach, where multiple
light fields are captured and fused to improve the spatial resolution. For each
capture, the light field sensor is shifted by a pre-determined fraction of a
micro-lens size using an XY translation stage for optimal performance
Dynamic partitioned global address spaces for high-efficiency computing
The current trend of ever larger clusters and data centers has coincided with a dramatic increase in the cost and
power of these installations. While many efficiency improvements have focused on processor power and cooling costs,
reducing the cost and power consumption of high-performance memory has mostly been overlooked. This thesis proposes
a new address translation model called Dynamic Partitioned Global Address Space (DPGAS) that extends the ideas of
NUMA and software-based approaches to create a high-performance hardware model that can be used to reduce the overall cost and power of memory in larger server installations. A memory model and hardware implementation of DPGAS is developed, and simulations of memory-intensive workloads are used to show potential cost and power reductions when DPGAS is integrated into a server environment.M.S.Committee Chair: Yalamanchili, Sudhakar; Committee Member: Riley, George; Committee Member: Schimmel, Davi
- …