101 research outputs found
RowHammer: Reliability Analysis and Security Implications
As process technology scales down to smaller dimensions, DRAM chips become
more vulnerable to disturbance, a phenomenon in which different DRAM cells
interfere with each other's operation. For the first time in academic
literature, our ISCA paper exposes the existence of disturbance errors in
commodity DRAM chips that are sold and used today. We show that repeatedly
reading from the same address could corrupt data in nearby addresses. More
specifically: When a DRAM row is opened (i.e., activated) and closed (i.e.,
precharged) repeatedly (i.e., hammered), it can induce disturbance errors in
adjacent DRAM rows. This failure mode is popularly called RowHammer. We tested
129 DRAM modules manufactured within the past six years (2008-2014) and found
110 of them to exhibit RowHammer disturbance errors, the earliest of which
dates back to 2010. In particular, all modules from the past two years
(2012-2013) were vulnerable, which implies that the errors are a recent
phenomenon affecting more advanced generations of process technology.
Importantly, disturbance errors pose an easily-exploitable security threat
since they are a breach of memory protection, wherein accesses to one page
(mapped to one row) modifies the data stored in another page (mapped to an
adjacent row).Comment: This is the summary of the paper titled "Flipping Bits in Memory
Without Accessing Them: An Experimental Study of DRAM Disturbance Errors"
which appeared in ISCA in June 201
Improving DRAM Performance by Parallelizing Refreshes with Accesses
Modern DRAM cells are periodically refreshed to prevent data loss due to
leakage. Commodity DDR DRAM refreshes cells at the rank level. This degrades
performance significantly because it prevents an entire rank from serving
memory requests while being refreshed. DRAM designed for mobile platforms,
LPDDR DRAM, supports an enhanced mode, called per-bank refresh, that refreshes
cells at the bank level. This enables a bank to be accessed while another in
the same rank is being refreshed, alleviating part of the negative performance
impact of refreshes. However, there are two shortcomings of per-bank refresh.
First, the per-bank refresh scheduling scheme does not exploit the full
potential of overlapping refreshes with accesses across banks because it
restricts the banks to be refreshed in a sequential round-robin order. Second,
accesses to a bank that is being refreshed have to wait.
To mitigate the negative performance impact of DRAM refresh, we propose two
complementary mechanisms, DARP (Dynamic Access Refresh Parallelization) and
SARP (Subarray Access Refresh Parallelization). The goal is to address the
drawbacks of per-bank refresh by building more efficient techniques to
parallelize refreshes and accesses within DRAM. First, instead of issuing
per-bank refreshes in a round-robin order, DARP issues per-bank refreshes to
idle banks in an out-of-order manner. Furthermore, DARP schedules refreshes
during intervals when a batch of writes are draining to DRAM. Second, SARP
exploits the existence of mostly-independent subarrays within a bank. With
minor modifications to DRAM organization, it allows a bank to serve memory
accesses to an idle subarray while another subarray is being refreshed.
Extensive evaluations show that our mechanisms improve system performance and
energy efficiency compared to state-of-the-art refresh policies and the benefit
increases as DRAM density increases.Comment: The original paper published in the International Symposium on
High-Performance Computer Architecture (HPCA) contains an error. The arxiv
version has an erratum that describes the error and the fix for i
Impact of Dietary Fat Source on Beef Display Life
This study was conducted to evaluate the effects of dietary fat source with modified distillers grains plus solubles (MDGS) on beef display life. Steers were fed either a corn control, full-fat MDGS, de-oiled MDGS, or de-oiled MDGS plus corn oil diet. Strip loins were aged for 2, 9, 16 and 23 days and placed under retail conditions for 7 days. Results suggest that feeding MDGS to cattle increases polyunsaturated fatty acid content of beef and has the potential to reduce beef color and lipid stability in comparison to corn diets. These data indicate that feeding MDGS to cattle may decrease beef display life. Addition of corn oil to de-oiled MDGS decreased redness and increased discoloration and lipid oxidation in comparison to corn control diets
FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption
Fully Homomorphic Encryption (FHE) is a technique that allows arbitrary
computations to be performed on encrypted data without the need for decryption,
making it ideal for securing many emerging applications. However, FHE
computation is significantly slower than computation on plain data due to the
increase in data size after encryption. Processing In-Memory (PIM) is a
promising technology that can accelerate data-intensive workloads with
extensive parallelism. However, FHE is challenging for PIM acceleration due to
the long-bitwidth multiplications and complex data movements involved. We
propose a PIM-based FHE accelerator, FHEmem, which exploits a novel processing
in-memory architecture to achieve high-throughput and efficient acceleration
for FHE. We propose an optimized end-to-end processing flow, from low-level
hardware processing to high-level application mapping, that fully exploits the
high throughput of FHEmem hardware. Our evaluation shows FHEmem achieves
significant speedup and efficiency improvement over state-of-the-art FHE
accelerators
The regulation of equatorial Pacific new production and pCO 2 by silicate-limited diatoms
a b s t r a c t Modeling and data from the JGOFS EqPac program suggested that the eastern equatorial Pacific upwelling ecosystem includes a quasi-chemostat culture system dominated by diatoms and limited by Si(OH) 4 due to a low ratio of Si(OH) 4 to NO 3 in the upwelling source water, the Equatorial Undercurrent. Diatoms were hypothesized to be the major users of NO 3 in this system and the amount assimilated limited by the low amount of Si(OH) 4 available. As a consequence NO 3 is left in the surface waters along with unused CO 2 . Two cruises to the eastern equatorial Pacific (EB04 and EB05) were made to test the existing hypothesis of Si(OH) 4 limitation, and study the roles of source concentrations of Si(OH) 4 and Fe, and nutrient uptake kinetics for comparison with model predictions. Fractionated nitrogen uptake measurements showed that diatoms at times take up the major portion of the NO 3 . Picoplankton and some phytoplankton in the 4 5-mm size group carried out primarily regenerated production, i.e. NH 4 uptake in a grazing dominated system. Equatorial diatoms followed uptake kinetics for Si(OH) 4 and NO 3 uptake as observed in laboratory investigations of diatoms under Si(OH) 4 and Fe limitations. Si(OH) 4 uptake responded to additions of Si(OH) 4 on a time scale of hours in uptake kinetic experiments while NO 3 uptake was unaffected by added NO 3 . The uptake of Si(OH) 4 varied in a narrow range on a Michaelis-Menten hyperbola of Si(OH) 4 uptake vs. Si(OH) 4 concentration, with a maximal Si(OH) 4 uptake rate, V 0 maxSi set to a relatively low value by some factor(s) other than Fe on a longer time scale, i.e., days in shipboard enclosures. Simply enclosing water collected from the mid euphotic zone and incubating for some days on deck at 50% surface irradiance increased
ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data
AbstractBackgroundNext-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results.ResultsHere we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy.ConclusionReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration
Parallel depth first vs. work stealing schedulers on CMP architectures
In chip multiprocessors (CMPs), limiting the number of off-chip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in which concurrently scheduled threads share a largely overlapping working set. In this brief announcement, we highlight our ongoing study [4] comparing the performance of two schedulers designed for fine-grained multithreaded programs: Parallel Depth First (PDF) [2], which is designed for constructive sharing, and Work Stealing (WS) [3], which takes a more traditional approach.Overview of schedulers. In PDF, processing cores are allocated ready-to-execute program tasks such that higher scheduling priority is given to those tasks the sequential program would have executed earlier. As a result, PDF tends to co-schedule threads in a way that tracks the sequential execution. Hence, the aggregate working set is (provably) not much larger than the single thread working set [1]. In WS, each processing core maintains a local work queue of readyto-execute threads. Whenever its local queue is empty, the core steals a thread from the bottom of the first non-empty queue it finds. WS is an attractive scheduling policy because when there is plenty of parallelism, stealing is quite rare. However, WS is not designed for constructive cache sharing, because the cores tend to have disjoint working sets.CMP configurations studied. We evaluated the performance of PDF and WS across a range of simulated CMP configurations. We focused on designs that have fixed-size private L1 caches and a shared L2 cache on chip. For a fixed die size (240 mm2), we varied the number of cores from 1 to 32. For a given number of cores, we used a (default) configuration based on current CMPs and realistic projections of future CMPs, as process technologies decrease from 90nm to 32nm.Summary of findings. We studied a variety of benchmark programs to show the following findings.For several application classes, PDF enables significant constructive sharing between threads, leading to better utilization of the on-chip caches and reducing off-chip traffic compared to WS. In particular, bandwidth-limited irregular programs and parallel divide-and-conquer programs present a relative speedup of 1.3-1.6X over WS, observing a 13- 41% reduction in off-chip traffic. An example is shown in Figure 1, for parallel merge sort. For each schedule, the number of L2 misses (i.e., the off-chip traffic) is shown on the left and the speed-up over running on one core is shown on the right, for 1 to 32 cores. Note that reducing the offchip traffic has the additional benefit of reducing the power consumption. Moreover, PDF's smaller working sets provide opportunities to power down segments of the cache without increasing the running time. Furthermore, when multiple programs are active concurrently, the PDF version is also less of a cache hog and its smaller working set is more likely to remain in the cache across context switches.For several other applications classes, PDF and WS have roughly the same execution times, either because there is only limited data reuse that can be exploited or because the programs are not limited by off-chip bandwidth. In the latter case, the constructive sharing PDF enables does provide the power and multiprogramming benefits discussed above.Finally, most parallel benchmarks to date, written for SMPs, use such a coarse-grained threading that they cannot exploit the constructive cache behavior inherent in PDF.We find that mechanisms to finely grain multithreaded applications are crucial to achieving good performance on CMPs
Recommended from our members
Patterns of woodboring beetle activity following fires and bark beetle outbreaks in montane forests of California, USA
Increasingly frequent and severe drought in the western United States has contributed to more frequent and severe wildfires, longer fire seasons, and more frequent bark beetle outbreaks that kill large numbers of trees. Climate change is expected to perpetuate these trends, especially in montane ecosystems, calling for improved strategies for managing Western forests and conserving the wildlife that they support. Woodboring beetles (e.g., Buprestidae and Cerambycidae) colonize dead and weakened trees and speed succession of habitats altered by fire or bark beetles, while serving as prey for some early-seral habitat specialists, including several woodpecker species. To understand how these ecologically important beetles respond to different sources of tree mortality, we sampled woodborers in 16 sites affected by wildfire or bark beetle outbreak in the previous one to eight years. Study sites were located in the Sierra Nevada, Modoc Plateau, Warner Mountains, and southern Cascades of California, USA. We used generalized linear mixed models to evaluate hypotheses concerning the response of woodboring beetles to disturbance type, severity, and timing; forest stand composition and structure; and tree characteristics.</p
- …