25 research outputs found
Programmable DARPin-based receptors for the detection of thrombotic markers
Cellular therapies remain constrained by the limited availability of sensors for disease markers. Here we present an integrated target-to-receptor pipeline for constructing a customizable advanced modular bispecific extracellular receptor (AMBER) that combines our generalized extracellular molecule sensor (GEMS) system with a high-throughput platform for generating designed ankyrin repeat proteins (DARPins). For proof of concept, we chose human fibrin degradation products (FDPs) as markers with high clinical relevance and screened a DARPin library for FDP binders. We built AMBERs equipped with 19 different DARPins selected from 160 hits, and found 4 of them to be functional as heterodimers with a known single-chain variable fragments binder. Tandem receptors consisting of combinations of the validated DARPins are also functional. We demonstrate applications of these AMBER receptors in vitro and in vivo by constructing designer cell lines that detect pathological concentrations of FDPs and respond with the production of a reporter and a therapeutic anti-thrombotic protein
Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories
Modern computing systems are embracing hybrid memory comprising of DRAM and
non-volatile memory (NVM) to combine the best properties of both memory
technologies, achieving low latency, high reliability, and high density. A
prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access
latency much higher than DRAM access latency. We call this inter-memory
asymmetry. We observe that parasitic components on a long bitline are a major
source of high latency in both DRAM and NVM, and a significant factor
contributing to high-voltage operations in NVM, which impact their reliability.
We propose an architectural change, where each long bitline in DRAM and NVM is
split into two segments by an isolation transistor. One segment can be accessed
with lower latency and operating voltage than the other. By introducing tiers,
we enable non-uniform accesses within each memory type (which we call
intra-memory asymmetry), leading to performance and reliability trade-offs in
DRAM-NVM hybrid memory. We extend existing NVM-DRAM OS in three ways. First, we
exploit both inter- and intra-memory asymmetries to allocate and migrate memory
pages between the tiers in DRAM and NVM. Second, we improve the OS's page
allocation decisions by predicting the access intensity of a newly-referenced
memory page in a program and placing it to a matching tier during its initial
allocation. This minimizes page migrations during program execution, lowering
the performance overhead. Third, we propose a solution to migrate pages between
the tiers of the same memory without transferring data over the memory channel,
minimizing channel occupancy and improving performance. Our overall approach,
which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid
tiered memory improves both performance and reliability for both single-core
and multi-programmed workloads.Comment: 15 pages, 29 figures, accepted at ACM SIGPLAN International Symposium
on Memory Managemen
Improving Phase Change Memory Performance with Data Content Aware Access
A prominent characteristic of write operation in Phase-Change Memory (PCM) is
that its latency and energy are sensitive to the data to be written as well as
the content that is overwritten. We observe that overwriting unknown memory
content can incur significantly higher latency and energy compared to
overwriting known all-zeros or all-ones content. This is because all-zeros or
all-ones content is overwritten by programming the PCM cells only in one
direction, i.e., using either SET or RESET operations, not both. In this paper,
we propose data content aware PCM writes (DATACON), a new mechanism that
reduces the latency and energy of PCM writes by redirecting these requests to
overwrite memory locations containing all-zeros or all-ones. DATACON operates
in three steps. First, it estimates how much a PCM write access would benefit
from overwriting known content (e.g., all-zeros, or all-ones) by
comprehensively considering the number of set bits in the data to be written,
and the energy-latency trade-offs for SET and RESET operations in PCM. Second,
it translates the write address to a physical address within memory that
contains the best type of content to overwrite, and records this translation in
a table for future accesses. We exploit data access locality in workloads to
minimize the address translation overhead. Third, it re-initializes unused
memory locations with known all-zeros or all-ones content in a manner that does
not interfere with regular read and write accesses. DATACON overwrites unknown
content only when it is absolutely necessary to do so. We evaluate DATACON with
workloads from state-of-the-art machine learning applications, SPEC CPU2017,
and NAS Parallel Benchmarks. Results demonstrate that DATACON significantly
improves system performance and memory system energy consumption compared to
the best of performance-oriented state-of-the-art techniques.Comment: 18 pages, 21 figures, accepted at ACM SIGPLAN International Symposium
on Memory Management (ISMM
Recommended from our members
Mitigating DRAM complexities through coordinated scheduling policies
textContemporary DRAM systems have maintained impressive scaling by managing a careful balance between performance, power, and storage density. In achieving these goals, a significant sacrifice has been made in DRAM's operational complexity. To realize good performance, systems must properly manage the significant number of structural and timing restrictions of the DRAM devices. DRAM's efficient use is further complicated in many-core systems where the memory interface has to be shared among multiple cores/threads competing for memory bandwidth. In computer architecture, caches have primarily been viewed as a means to hide memory latency from the CPU. Cache policies have focused on anticipating the CPU's data needs, and are mostly oblivious to the main memory. This work demonstrates that the era of many-core architectures has created new main memory bottlenecks, and mandates a new approach: coordination of cache policy with main memory characteristics. Using the cache for memory optimization purposes dramatically expands the memory controller's visibility of processor behavior, at low implementation overhead. Through memory-centric modification of existing policies, such as scheduled writebacks, this work demonstrates that performance-limiting effects of highly-threaded architectures combined with complex DRAM operation can be overcome. This work shows that an awareness of the physical main memory layout and by focusing on writes, both read and write average latency can be shortened, memory power reduced, and overall system performance improved. The use of the "Page-Mode" feature of DRAM devices can mitigate many DRAM constraints. Current open-page policies attempt to garner the highest level of page hits. In an effort to achieve this, such greedy schemes map sequential address sequences to a single DRAM resource. This non-uniform resource usage pattern introduces high levels of conflict when multiple workloads in a many-core system map to the same set of resources. This work presents a scheme that provides a careful balance between the benefits (increased performance and decreased power), and the detractors (unfairness) of page-mode accesses. In the proposed Minimalist approach, the system targets "just enough" page-mode accesses to garner page-mode benefits, avoiding system unfairness. This is accomplished with the use of a fair memory hashing scheme to control the maximum number of page mode hits. High density memory is becoming ever more important as many execution streams are consolidated onto single chip many-core processors. DRAM is ubiquitous as a main memory technology, but while DRAM's per-chip density and frequency continue to scale, the time required to refresh its dynamic cells has grown at an alarming rate. This work shows how currently-employed methods to schedule refresh operations are ineffective in mitigating the significant performance degradation caused by longer refresh times. Current approaches are deficient -- they do not effectively exploit the flexibility of DRAMs to postpone refresh operations. This work proposes dynamically reconfigurable predictive mechanisms that exploit the full dynamic range allowed in the industry standard DRAM memory specifications. The proposed mechanisms are shown to mitigate much of the penalties seen with dense DRAM devices. In summary this work presents a significant improvement in the ability to exploit the capabilities of high density, high frequency, DRAM devices in a many-core environment. This is accomplished though coordination of previously disparate system components, exploiting integration of such components into highly integrated system designs.Electrical and Computer Engineerin
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Abstract—As Chip-Multiprocessor systems (CMP) have be-come the predominant topology for leading microprocessors, critical components of the system are now integrated on a single chip. This enables sharing of computation resources that was not previously possible. In addition, the virtualization of these computational resources exposes the system to a mix of diverse and competing workloads. Cache is a resource of primary concern as it can be dominant in controlling overall throughput. In order to prevent destructive interference between divergent workloads, the last level of cache must be partitioned. In the past, many solutions have been proposed but most of them are assuming either simplified cache hierarchies with no realistic restrictions or complex cache schemes that are difficult to integrate in a real design. To address this problem, we propose a dynamic partitioning strategy based on realistic last level cache designs of CMP processors. We used a cycle accurate, full system simulator based on Simics and Gems to evaluate our partitioning scheme on an 8-core DNUCA CMP system. Results for an 8-core system show that our proposed scheme provides on average a 70 % reduction in misses compared to non-partitioned shared caches, and a 25 % misses reduction compared to static equally partitioned (private) caches. I