35 research outputs found
Development and Application of Software to Understand 3D Chromatin Structure and Gene Regulation
Nearly all cells contain the same 2 meters of DNA that must be systematically organized into their nucleus for timely access to genes in response to stimuli. Proteins and biomolecular condensates make this possible by dynamically shaping chromatin into 3D structures that connect regulators to their genes. Chromatin loops are structures that are partly responsible for forming these connections and can result in disease when disrupted or aberrantly formed. In this work, I describe three studies centered on using 3D chromatin structure to understand gene regulation. Using multi-omic data from a macrophage activation time course, we show that regulation temporally precedes gene expression and that chromatin loops play a key role in connecting enhancers to their target genes. In the next study, we investigated the role of biomolecular condensates in loop formation by mapping 3D chromatin structure in cell lines before and after disruption of NUP98-HOXA9 condensate formation. Differential analysis revealed evidence of CTCF-independent loop formation sensitive to condensate disruption. In the last study, we used 3D chromatin structure and multi-omic data in chondrocytes to link variant-gene pairs associated with Osteoarthritis (OA). Computational analysis suggests that a specific variant may disrupt transcription factor binding and misregulate inflammatory pathways in OA. To carry out these analyses I built computational pipelines and two R/Bioconductor packages to support the processing and analysis of genomic data. The nullranges package contains functions for performing covariate-matched subsampling to generate null-hypothesis genomic data and mitigate the effects of confounding. The mariner package is designed for working with large chromatin contact data. It extends existing Bioconductor tools to allow fast and efficient extraction and manipulation of chromatin interactions for better understanding 3D chromatin structure and its impact on gene regulation.Doctor of Philosoph
Reduced pin-count testing, 3D SICs, time division multiplexing, test access mechanism, simultaneous bidirectional signaling
3D Stacked Integrated Circuits (SICs) offer a promising way to cope with the technology scaling; however, the test access requirements are highly complicated due to increased transistor density and a limited number of test channels. Moreover, although the vertical interconnects in 3D SIC are capable of high-speed data transfer, the overall test speed is restricted by scan-chains that are not optimized for timing. Reduced Pin-Count Testing (RPCT) has been effectively used under these scenarios. In particular, Time Division Multiplexing (TDM) allows full utilization of interconnect bandwidth while providing low scan frequencies supported by the scan chains. However, these methods rely on Uni-Directional Signaling (UDS), in which a chip terminal (pin or a TSV) can either be used to transmit or receive data at a given time. This requires that at least two chip terminals are available at every die interface (Tester-Die or Die-Die) to form a single test channel. In this paper, we propose Simultaneous Bi-Directional Signaling (SBS), which allows a chip terminal to be used simultaneously to send and receive data, thus forming a test channel using one pin instead of two. We demonstrate how SBS can be used in conjunction with TDM to achieve reduced pin count testing while using only half the number of pins compared to conventional TDM based methods, consuming only 22.6% additional power. Alternatively, the advantage could be manifested as a test time reduction by utilizing all available test channels, allowing more parallelism and test time reduction down to half compared to UDS-based TDM. Experiments using 45nm technology suggest that the proposed method can operate at up to 1.2 GHz test clock for a stack of 3-dies, whereas for higher frequencies, a binary-weighted transmitter is proposed capable of up to 2.46 GHz test clock
Network-on-Chip
Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems
REDUCING POWER DURING MANUFACTURING TEST USING DIFFERENT ARCHITECTURES
Power during manufacturing test can be several times higher than power consumption in functional mode. Excessive power during test can cause IR drop, over-heating, and early aging of the chips. In this dissertation, three different architectures have been introduced to reduce test power in general cases as well as in certain scenarios, including field test.
In the first architecture, scan chains are divided into several segments. Every segment needs a control bit to enable capture in a segment when new faults are detectable on that segment for that pattern. Otherwise, the segment should be disabled to reduce capture power. We group the control bits together into one or more control chains.
To address the extra pin(s) required to shift data into the control chain(s) and significant post processing in the first architecture, we explored a second architecture. The second architecture stitches the control bits into the chains they control as EECBs (embedded enable capture bits) in between the segments. This allows an ATPG software tool to automatically generate the appropriate EECB values for each pattern to maintain the fault coverage. This also works in the presence of an on-chip decompressor.
The last architecture focuses primarily on the self-test of a device in a 3D stacked IC when an existing FPGA in the stack can be programmed as a tester. We show that the energy expended during test is significantly less than would be required using low power patterns fed by an on-chip decompressor for the same very short scan chains
Recommended from our members
Test and security in a System-on-Chip environment
This dissertation outlines new approaches for test and security in a System-on-Chip (SoC) environment. A methodology is proposed for designing a single test access mechanism (TAM) architecture on each die with a "bandwidth adapter" that allows it to be efficiently used for multiple different test data bandwidths in three-dimensional integrated circuits (3D-IC) using through-silicon vias (TSVs). In this way, a single test architecture can be re-used for pre-bond, partial stack, and post-bond testing while minimizing test time across all phases of test. Unlike previous approaches, this methodology does not need multiple TAM architectures or reconfigurable wrappers in order to be efficient when the test data bandwidth changes. In industry, sequential linear decompression is widely used to reduce data and bandwidth requirements. A new scheme using a multiple polynomial linear feedback shift register (LFSR) with rotating polynomial is proposed here to increase encoding flexibility resulting in higher compression ratios. An algorithm is described to assign test cubes to polynomials in a way that enhances encoding efficiency. For hardware security, a new attack strategy against logic obfuscation is described here. It is based on applying brute force iteratively to each logic cone one at a time and is shown to significantly reduce the number of brute force key combinations that need to be tried by an attacker. It is shown that inserting key gates based on MUXes is an effective approach to increase security against this type of attack. In data security for hardware, existing techniques for computing with encrypted operands are either prohibitively expense (e.g., fully homomorphic encryption) or only work for special cases (e.g., linear circuits). A lightweight scheme implemented at the gate-level is proposed for computing with noise-obfuscated data. By carefully selecting internal locations for noise cancellation in arbitrary logic circuits, the overhead can be greatly minimized. One important application of the proposed scheme is for protecting data inside a computing unit obtained from a third party IP provider where a hidden backdoor access mechanism or hardware Trojan could be maliciously inserted.Electrical and Computer Engineerin
Three-Dimensional Processing-In-Memory-Architectures: A Holistic Tool For Modeling And Simulation
Die gemeinhin als Memory Wall bekannte, sich stetig weitende Leistungslücke zwischen Prozessor- und Speicherarchitekturen erfordert neue Konzepte, um weiterhin eine Skalierung der Rechenleistung zu ermöglichen. Da Speicher als die Beschränkung innerhalb einer Von-Neumann-Architektur identifiziert wurden, widmet sich die Arbeit dieser Problemstellung. Obgleich dreidimensionale Speicher zu einer Linderung der Memory Wall beitragen können, sind diese alleinig für die zukünftige Skalierung ungenügend. Aufgrund höherer Effizienzen stellt die Integration von Rechenkapazität in den Speicher (Processing-In-Memory, PIM) ein vielversprechender Ausweg dar, jedoch existiert ein Mangel an PIM-Simulationsmodellen. Daher wurde ein flexibles Simulationswerkzeug für dreidimensionale Speicherstapel geschaffen, welches zur Modellierung von dreidimensionalen PIM erweitert wurde. Dieses kann Speicherstapel wie etwa Hybrid Memory Cube standardkonform simulieren und bietet zugleich eine hohe Genauigkeit indem auf elementaren Datenpaketen in Kombination mit dem Hardware validierten Simulator BOBSim modelliert wird. Ein eigens entworfener Simulationstaktbaum ermöglicht zugleich eine schnelle Ausführung. Messungen weisen im funktionalen Modus eine 100-fache Beschleunigung auf, wohingegen eine Verdoppelung der Ausführungsgeschwindigkeit mit Taktgenauigkeit erzielt wird. Anhand eines eigens implementierten, binärkompatiblen GPU-Beschleunigers wird die Modellierung einer vollständig dreidimensionalen PIM-Architektur demonstriert. Dabei orientieren sich die maximalen Hardwareressourcen an einem PIM-Beschleuniger aus der Literatur. Evaluiert wird einerseits das GPU-Simulationsmodell eigenständig, andererseits als PIM-Verbund jeweils mit Hilfe einer repräsentativ gewählten, speicherbeschränkten geophysikalischen Bildverarbeitung. Bei alleiniger Betrachtung des GPU-Simulationsmodells weist dieses eine signifikant gesteigerte Simulationsgeschwindigkeit auf, bei gleichzeitiger Abweichung von 6% gegenüber dem Verilator-Modell. Nachfolgend werden innerhalb dieser Arbeit unterschiedliche Konfigurationen des integrierten PIM-Beschleunigers evaluiert. Je nach gewählter Konfiguration kann der genutzte Algorithmus entweder bis zu 140GFLOPS an tatsächlicher Rechenleistung abrufen oder eine maximale Recheneffizienz von synthetisch 30% bzw. real 24,5% erzielen. Letzteres stellt eine Verdopplung des Stands der Technik dar. Eine anknüpfende Diskussion erläutert eingehend die Resultate.The steadily widening performance gap between processor- and memory-architectures - commonly known as the Memory Wall - requires novel concepts to achieve further scaling in processing performance. As memories were identified as the limitation within a Von-Neumann-architecture, this work addresses this constraining issue. Although three-dimensional memories alleviate the effects of the Memory Wall, the sole utilization of such memories would be insufficient. Due to higher efficiencies, the integration of processing capacity into memories (so-called Processing-In-Memory, PIM) depicts a promising alternative. However, a lack of PIM simulation models still remains. As a consequence, a flexible simulation tool for three-dimensional stacked memories was established, which was extended for modeling three-dimensional PIM architectures. This tool can simulate stacked memories such as Hybrid Memory Cube standard-compliant and simultaneously offers high accuracy by modeling on elementary data packets (FLIT) in combination with the hardware validated BOBSim simulator. To this, a specifically designed simulation clock tree enables an rapid simulation execution. A 100x speed up in simulation execution can be measured while utilizing the functional mode, whereas a 2x speed up is achieved during clock-cycle accuracy mode. With the aid of a specifically implemented, binary compatible GPU accelerator and the established tool, the modeling of a holistic three-dimensional PIM architecture is demonstrated within this work. Hardware resources used were constrained by a PIM architecture from literature. A representative, memory-bound, geophysical imaging algorithm was leveraged to evaluate the GPU model as well as the compound PIM simulation model. The sole GPU simulation model depicts a significantly improved simulation performance with a deviation of 6% compared to a Verilator model. Subsequently, various PIM accelerator configurations with the integrated GPU model were evaluated. Depending on the chosen PIM configuration, the utilized algorithm achieves 140GFLOPS of processing performance or a maximum computing efficiency of synthetically 30% or realistically 24.5%. The latter depicts a 2x improvement compared to state-of-the-art. A following discussion showcases the results in depth