Search CORE

25 research outputs found

Programmable DARPin-based receptors for the detection of thrombotic markers

Author: Bertschi Adrian
Freitag Patrick C
Fussenegger Martin
Plückthun Andreas
Ray Preetam Guha
Reinberg Thomas
Schaefer Jonas V
Scheller Leo
Strittmatter Tobias
Stuecheli Pascal
Tsakiris Dimitrios
Wang Yidan
Ye Haifeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2022
Field of study

Cellular therapies remain constrained by the limited availability of sensors for disease markers. Here we present an integrated target-to-receptor pipeline for constructing a customizable advanced modular bispecific extracellular receptor (AMBER) that combines our generalized extracellular molecule sensor (GEMS) system with a high-throughput platform for generating designed ankyrin repeat proteins (DARPins). For proof of concept, we chose human fibrin degradation products (FDPs) as markers with high clinical relevance and screened a DARPin library for FDP binders. We built AMBERs equipped with 19 different DARPins selected from 160 hits, and found 4 of them to be functional as heterodimers with a known single-chain variable fragments binder. Tandem receptors consisting of combinations of the validated DARPins are also functional. We demonstrate applications of these AMBER receptors in vitro and in vivo by constructing designer cell lines that detect pathological concentrations of FDPs and respond with the production of a reporter and a therapeutic anti-thrombotic protein

ZORA

Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories

Author: Antognetti P.
Arafa M.
Arjomand M.
Bhattacharyya A.
Blagodurov S.
Cao Y.
Chang Y.-M.
Cho B.-H.
Das A.
Das A.
Dray C.
Goda A.
Huang Y.
Jayasena N. S.
Kang U.
Kim Y.
Lee D.
Mallik A.
Mutlu O.
Mutlu O.
Pourshirazi B.
Qureshi M. K.
Qureshi M. K.
Redaelli A.
Rixner S.
Sandhu B. S.
Seong N. H.
Seshadri V.
Srinivasan J.
Stuecheli J.
Yoon H.
Yue J.
Zhang L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2020
Field of study

Modern computing systems are embracing hybrid memory comprising of DRAM and non-volatile memory (NVM) to combine the best properties of both memory technologies, achieving low latency, high reliability, and high density. A prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access latency much higher than DRAM access latency. We call this inter-memory asymmetry. We observe that parasitic components on a long bitline are a major source of high latency in both DRAM and NVM, and a significant factor contributing to high-voltage operations in NVM, which impact their reliability. We propose an architectural change, where each long bitline in DRAM and NVM is split into two segments by an isolation transistor. One segment can be accessed with lower latency and operating voltage than the other. By introducing tiers, we enable non-uniform accesses within each memory type (which we call intra-memory asymmetry), leading to performance and reliability trade-offs in DRAM-NVM hybrid memory. We extend existing NVM-DRAM OS in three ways. First, we exploit both inter- and intra-memory asymmetries to allocate and migrate memory pages between the tiers in DRAM and NVM. Second, we improve the OS's page allocation decisions by predicting the access intensity of a newly-referenced memory page in a program and placing it to a matching tier during its initial allocation. This minimizes page migrations during program execution, lowering the performance overhead. Third, we propose a solution to migrate pages between the tiers of the same memory without transferring data over the memory channel, minimizing channel occupancy and improving performance. Our overall approach, which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid tiered memory improves both performance and reliability for both single-core and multi-programmed workloads.Comment: 15 pages, 29 figures, accepted at ACM SIGPLAN International Symposium on Memory Managemen

arXiv.org e-Print Archive

Crossref

Improving Phase Change Memory Performance with Data Content Aware Access

Author: Ahn S. J.
Alshboul M.
Awad A.
Awad A.
Bock S.
Bock S.
Bondurant D.
Boroumand A.
Burr G. W.
Chen J.
Chhabra S.
Dogan H.
Du Y.
Ferreira A. P.
Frigo P.
Gueron S.
Guerra J.
Ham T. J.
Hashemi M.
Hsieh K.
Hwang W.
Jia Y.
Jiang L.
Joo Y.
Kang U.
Karlsson M.
Kim J.
Kim Y.
Kim Y.
Lalam A.
Lam C. H.
Lee J. I.
Mallik A.
Marathe V. J.
Meza J.
Morikawa T.
Mutlu O.
Mutlu O.
Pourshirazi B.
Qureshi M. K.
Qureshi M. K.
Saileshwar G.
Seong N. H.
Seshadri V.
Stuecheli J.
Villa C.
Wang Y.
Wang Z.
Wuttig M.
Yamada N.
Yang J.
Yue J.
Zhang L.
Zhou M.
Zhou M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2020
Field of study

A prominent characteristic of write operation in Phase-Change Memory (PCM) is that its latency and energy are sensitive to the data to be written as well as the content that is overwritten. We observe that overwriting unknown memory content can incur significantly higher latency and energy compared to overwriting known all-zeros or all-ones content. This is because all-zeros or all-ones content is overwritten by programming the PCM cells only in one direction, i.e., using either SET or RESET operations, not both. In this paper, we propose data content aware PCM writes (DATACON), a new mechanism that reduces the latency and energy of PCM writes by redirecting these requests to overwrite memory locations containing all-zeros or all-ones. DATACON operates in three steps. First, it estimates how much a PCM write access would benefit from overwriting known content (e.g., all-zeros, or all-ones) by comprehensively considering the number of set bits in the data to be written, and the energy-latency trade-offs for SET and RESET operations in PCM. Second, it translates the write address to a physical address within memory that contains the best type of content to overwrite, and records this translation in a table for future accesses. We exploit data access locality in workloads to minimize the address translation overhead. Third, it re-initializes unused memory locations with known all-zeros or all-ones content in a manner that does not interfere with regular read and write accesses. DATACON overwrites unknown content only when it is absolutely necessary to do so. We evaluate DATACON with workloads from state-of-the-art machine learning applications, SPEC CPU2017, and NAS Parallel Benchmarks. Results demonstrate that DATACON significantly improves system performance and memory system energy consumption compared to the best of performance-oriented state-of-the-art techniques.Comment: 18 pages, 21 figures, accepted at ACM SIGPLAN International Symposium on Memory Management (ISMM

arXiv.org e-Print Archive

Crossref

Recommended from our members

Mitigating DRAM complexities through coordinated scheduling policies

Author: Stuecheli Jeffrey Adam
Publication venue
Publication date: 04/06/2012
Field of study

textContemporary DRAM systems have maintained impressive scaling by managing a careful balance between performance, power, and storage density. In achieving these goals, a significant sacrifice has been made in DRAM's operational complexity. To realize good performance, systems must properly manage the significant number of structural and timing restrictions of the DRAM devices. DRAM's efficient use is further complicated in many-core systems where the memory interface has to be shared among multiple cores/threads competing for memory bandwidth. In computer architecture, caches have primarily been viewed as a means to hide memory latency from the CPU. Cache policies have focused on anticipating the CPU's data needs, and are mostly oblivious to the main memory. This work demonstrates that the era of many-core architectures has created new main memory bottlenecks, and mandates a new approach: coordination of cache policy with main memory characteristics. Using the cache for memory optimization purposes dramatically expands the memory controller's visibility of processor behavior, at low implementation overhead. Through memory-centric modification of existing policies, such as scheduled writebacks, this work demonstrates that performance-limiting effects of highly-threaded architectures combined with complex DRAM operation can be overcome. This work shows that an awareness of the physical main memory layout and by focusing on writes, both read and write average latency can be shortened, memory power reduced, and overall system performance improved. The use of the "Page-Mode" feature of DRAM devices can mitigate many DRAM constraints. Current open-page policies attempt to garner the highest level of page hits. In an effort to achieve this, such greedy schemes map sequential address sequences to a single DRAM resource. This non-uniform resource usage pattern introduces high levels of conflict when multiple workloads in a many-core system map to the same set of resources. This work presents a scheme that provides a careful balance between the benefits (increased performance and decreased power), and the detractors (unfairness) of page-mode accesses. In the proposed Minimalist approach, the system targets "just enough" page-mode accesses to garner page-mode benefits, avoiding system unfairness. This is accomplished with the use of a fair memory hashing scheme to control the maximum number of page mode hits. High density memory is becoming ever more important as many execution streams are consolidated onto single chip many-core processors. DRAM is ubiquitous as a main memory technology, but while DRAM's per-chip density and frequency continue to scale, the time required to refresh its dynamic cells has grown at an alarming rate. This work shows how currently-employed methods to schedule refresh operations are ineffective in mitigating the significant performance degradation caused by longer refresh times. Current approaches are deficient -- they do not effectively exploit the flexibility of DRAMs to postpone refresh operations. This work proposes dynamically reconfigurable predictive mechanisms that exploit the full dynamic range allowed in the industry standard DRAM memory specifications. The proposed mechanisms are shown to mitigate much of the penalties seen with dense DRAM devices. In summary this work presents a significant improvement in the ability to exploit the capabilities of high density, high frequency, DRAM devices in a many-core environment. This is accomplished though coordination of previously disparate system components, exploiting integration of such components into highly integrated system designs.Electrical and Computer Engineerin

Texas ScholarWorks

IBM POWER9 systems designed for commercial, cognitive, and cloud

Author: arimilli
chun
heller
heller
rosedahl
stuecheli
Publication venue: 'IBM'
Publication date
Field of study

Crossref

The IBM z13 memory subsystem for big data

Author: lastras-montano
sonnelitter
stuecheli
warnock
warnock
Publication venue: 'IBM'
Publication date
Field of study

Crossref

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Author: Dimitris Kaseridis
Jeffrey Stuecheli
Lizy K. John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Abstract—As Chip-Multiprocessor systems (CMP) have be-come the predominant topology for leading microprocessors, critical components of the system are now integrated on a single chip. This enables sharing of computation resources that was not previously possible. In addition, the virtualization of these computational resources exposes the system to a mix of diverse and competing workloads. Cache is a resource of primary concern as it can be dominant in controlling overall throughput. In order to prevent destructive interference between divergent workloads, the last level of cache must be partitioned. In the past, many solutions have been proposed but most of them are assuming either simplified cache hierarchies with no realistic restrictions or complex cache schemes that are difficult to integrate in a real design. To address this problem, we propose a dynamic partitioning strategy based on realistic last level cache designs of CMP processors. We used a cycle accurate, full system simulator based on Simics and Gems to evaluate our partitioning scheme on an 8-core DNUCA CMP system. Results for an 8-core system show that our proposed scheme provides on average a 70 % reduction in misses compared to non-partitioned shared caches, and a 25 % misses reduction compared to static equally partitioned (private) caches. I

CiteSeerX

Crossref