101 research outputs found

    RowHammer: Reliability Analysis and Security Implications

    Full text link
    As process technology scales down to smaller dimensions, DRAM chips become more vulnerable to disturbance, a phenomenon in which different DRAM cells interfere with each other's operation. For the first time in academic literature, our ISCA paper exposes the existence of disturbance errors in commodity DRAM chips that are sold and used today. We show that repeatedly reading from the same address could corrupt data in nearby addresses. More specifically: When a DRAM row is opened (i.e., activated) and closed (i.e., precharged) repeatedly (i.e., hammered), it can induce disturbance errors in adjacent DRAM rows. This failure mode is popularly called RowHammer. We tested 129 DRAM modules manufactured within the past six years (2008-2014) and found 110 of them to exhibit RowHammer disturbance errors, the earliest of which dates back to 2010. In particular, all modules from the past two years (2012-2013) were vulnerable, which implies that the errors are a recent phenomenon affecting more advanced generations of process technology. Importantly, disturbance errors pose an easily-exploitable security threat since they are a breach of memory protection, wherein accesses to one page (mapped to one row) modifies the data stored in another page (mapped to an adjacent row).Comment: This is the summary of the paper titled "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors" which appeared in ISCA in June 201

    Improving DRAM Performance by Parallelizing Refreshes with Accesses

    Full text link
    Modern DRAM cells are periodically refreshed to prevent data loss due to leakage. Commodity DDR DRAM refreshes cells at the rank level. This degrades performance significantly because it prevents an entire rank from serving memory requests while being refreshed. DRAM designed for mobile platforms, LPDDR DRAM, supports an enhanced mode, called per-bank refresh, that refreshes cells at the bank level. This enables a bank to be accessed while another in the same rank is being refreshed, alleviating part of the negative performance impact of refreshes. However, there are two shortcomings of per-bank refresh. First, the per-bank refresh scheduling scheme does not exploit the full potential of overlapping refreshes with accesses across banks because it restricts the banks to be refreshed in a sequential round-robin order. Second, accesses to a bank that is being refreshed have to wait. To mitigate the negative performance impact of DRAM refresh, we propose two complementary mechanisms, DARP (Dynamic Access Refresh Parallelization) and SARP (Subarray Access Refresh Parallelization). The goal is to address the drawbacks of per-bank refresh by building more efficient techniques to parallelize refreshes and accesses within DRAM. First, instead of issuing per-bank refreshes in a round-robin order, DARP issues per-bank refreshes to idle banks in an out-of-order manner. Furthermore, DARP schedules refreshes during intervals when a batch of writes are draining to DRAM. Second, SARP exploits the existence of mostly-independent subarrays within a bank. With minor modifications to DRAM organization, it allows a bank to serve memory accesses to an idle subarray while another subarray is being refreshed. Extensive evaluations show that our mechanisms improve system performance and energy efficiency compared to state-of-the-art refresh policies and the benefit increases as DRAM density increases.Comment: The original paper published in the International Symposium on High-Performance Computer Architecture (HPCA) contains an error. The arxiv version has an erratum that describes the error and the fix for i

    Impact of Dietary Fat Source on Beef Display Life

    Get PDF
    This study was conducted to evaluate the effects of dietary fat source with modified distillers grains plus solubles (MDGS) on beef display life. Steers were fed either a corn control, full-fat MDGS, de-oiled MDGS, or de-oiled MDGS plus corn oil diet. Strip loins were aged for 2, 9, 16 and 23 days and placed under retail conditions for 7 days. Results suggest that feeding MDGS to cattle increases polyunsaturated fatty acid content of beef and has the potential to reduce beef color and lipid stability in comparison to corn diets. These data indicate that feeding MDGS to cattle may decrease beef display life. Addition of corn oil to de-oiled MDGS decreased redness and increased discoloration and lipid oxidation in comparison to corn control diets

    FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption

    Full text link
    Fully Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption, making it ideal for securing many emerging applications. However, FHE computation is significantly slower than computation on plain data due to the increase in data size after encryption. Processing In-Memory (PIM) is a promising technology that can accelerate data-intensive workloads with extensive parallelism. However, FHE is challenging for PIM acceleration due to the long-bitwidth multiplications and complex data movements involved. We propose a PIM-based FHE accelerator, FHEmem, which exploits a novel processing in-memory architecture to achieve high-throughput and efficient acceleration for FHE. We propose an optimized end-to-end processing flow, from low-level hardware processing to high-level application mapping, that fully exploits the high throughput of FHEmem hardware. Our evaluation shows FHEmem achieves significant speedup and efficiency improvement over state-of-the-art FHE accelerators

    The regulation of equatorial Pacific new production and pCO 2 by silicate-limited diatoms

    Get PDF
    a b s t r a c t Modeling and data from the JGOFS EqPac program suggested that the eastern equatorial Pacific upwelling ecosystem includes a quasi-chemostat culture system dominated by diatoms and limited by Si(OH) 4 due to a low ratio of Si(OH) 4 to NO 3 in the upwelling source water, the Equatorial Undercurrent. Diatoms were hypothesized to be the major users of NO 3 in this system and the amount assimilated limited by the low amount of Si(OH) 4 available. As a consequence NO 3 is left in the surface waters along with unused CO 2 . Two cruises to the eastern equatorial Pacific (EB04 and EB05) were made to test the existing hypothesis of Si(OH) 4 limitation, and study the roles of source concentrations of Si(OH) 4 and Fe, and nutrient uptake kinetics for comparison with model predictions. Fractionated nitrogen uptake measurements showed that diatoms at times take up the major portion of the NO 3 . Picoplankton and some phytoplankton in the 4 5-mm size group carried out primarily regenerated production, i.e. NH 4 uptake in a grazing dominated system. Equatorial diatoms followed uptake kinetics for Si(OH) 4 and NO 3 uptake as observed in laboratory investigations of diatoms under Si(OH) 4 and Fe limitations. Si(OH) 4 uptake responded to additions of Si(OH) 4 on a time scale of hours in uptake kinetic experiments while NO 3 uptake was unaffected by added NO 3 . The uptake of Si(OH) 4 varied in a narrow range on a Michaelis-Menten hyperbola of Si(OH) 4 uptake vs. Si(OH) 4 concentration, with a maximal Si(OH) 4 uptake rate, V 0 maxSi set to a relatively low value by some factor(s) other than Fe on a longer time scale, i.e., days in shipboard enclosures. Simply enclosing water collected from the mid euphotic zone and incubating for some days on deck at 50% surface irradiance increased

    ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data

    Get PDF
    AbstractBackgroundNext-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results.ResultsHere we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy.ConclusionReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration

    Parallel depth first vs. work stealing schedulers on CMP architectures

    Get PDF
    In chip multiprocessors (CMPs), limiting the number of off-chip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in which concurrently scheduled threads share a largely overlapping working set. In this brief announcement, we highlight our ongoing study [4] comparing the performance of two schedulers designed for fine-grained multithreaded programs: Parallel Depth First (PDF) [2], which is designed for constructive sharing, and Work Stealing (WS) [3], which takes a more traditional approach.Overview of schedulers. In PDF, processing cores are allocated ready-to-execute program tasks such that higher scheduling priority is given to those tasks the sequential program would have executed earlier. As a result, PDF tends to co-schedule threads in a way that tracks the sequential execution. Hence, the aggregate working set is (provably) not much larger than the single thread working set [1]. In WS, each processing core maintains a local work queue of readyto-execute threads. Whenever its local queue is empty, the core steals a thread from the bottom of the first non-empty queue it finds. WS is an attractive scheduling policy because when there is plenty of parallelism, stealing is quite rare. However, WS is not designed for constructive cache sharing, because the cores tend to have disjoint working sets.CMP configurations studied. We evaluated the performance of PDF and WS across a range of simulated CMP configurations. We focused on designs that have fixed-size private L1 caches and a shared L2 cache on chip. For a fixed die size (240 mm2), we varied the number of cores from 1 to 32. For a given number of cores, we used a (default) configuration based on current CMPs and realistic projections of future CMPs, as process technologies decrease from 90nm to 32nm.Summary of findings. We studied a variety of benchmark programs to show the following findings.For several application classes, PDF enables significant constructive sharing between threads, leading to better utilization of the on-chip caches and reducing off-chip traffic compared to WS. In particular, bandwidth-limited irregular programs and parallel divide-and-conquer programs present a relative speedup of 1.3-1.6X over WS, observing a 13- 41% reduction in off-chip traffic. An example is shown in Figure 1, for parallel merge sort. For each schedule, the number of L2 misses (i.e., the off-chip traffic) is shown on the left and the speed-up over running on one core is shown on the right, for 1 to 32 cores. Note that reducing the offchip traffic has the additional benefit of reducing the power consumption. Moreover, PDF's smaller working sets provide opportunities to power down segments of the cache without increasing the running time. Furthermore, when multiple programs are active concurrently, the PDF version is also less of a cache hog and its smaller working set is more likely to remain in the cache across context switches.For several other applications classes, PDF and WS have roughly the same execution times, either because there is only limited data reuse that can be exploited or because the programs are not limited by off-chip bandwidth. In the latter case, the constructive sharing PDF enables does provide the power and multiprogramming benefits discussed above.Finally, most parallel benchmarks to date, written for SMPs, use such a coarse-grained threading that they cannot exploit the constructive cache behavior inherent in PDF.We find that mechanisms to finely grain multithreaded applications are crucial to achieving good performance on CMPs
    • …
    corecore