Search CORE

22 research outputs found

Recommended from our members

Design techniques for clocking high performance signaling systems

Author: Hanumolu Pavan Kumar
Publication venue: 'Oregon State University'
Publication date
Field of study

Scaling of CMOS technology has progressed relentlessly for the past several decades. In order for this unprecedented scaling to benefit the performance of large digital systems, the communication bandwidth between integrated circuits (ICs) must scale accordingly. However, interconnect technology does not scale as aggressively, making communication between chips the major bottleneck in overall system performance. In addition, supply voltage scaling, increasing device leakage, and increased noise make existing signaling circuits inefficient and difficult to scale. In this thesis, both analog and digital enhancement techniques to mitigate scaling related issues and improve the performance of building blocks used in high- speed signaling systems are discussed. A digital-to-phase converter (DPC) with a resolution better than 100 femto-second resolution, a hybrid analog/digital clock and data recovery (CDR) architecture that improves the tracking range of tra- ditional CDRs by an order of magnitude, and a digital CDR architecture that obviates the need for the charge pump and the large area occupying loop filter, while achieving error-free operation are presented. Measured results obtained from the prototype chips are presented to illustrate the proposed design techniques.Keywords: CDR, PL

ScholarsArchive@OSU

TuRaN: True Random Number Generation Using Supply Voltage Underscaling in SRAMs

Author: Bostancı F. Nisa
Ergin Oğuz
Ghiasi Nika Mansouri
Mutlu Onur
Olgun Ataberk
Salami Behzad
Tuğrul Yahya Can
Yağlıkçı A. Giray
Yüksel İsmail Emir
Publication venue
Publication date: 20/11/2022
Field of study

Prior works propose SRAM-based TRNGs that extract entropy from SRAM arrays. SRAM arrays are widely used in a majority of specialized or general-purpose chips that perform the computation to store data inside the chip. Thus, SRAM-based TRNGs present a low-cost alternative to dedicated hardware TRNGs. However, existing SRAM-based TRNGs suffer from 1) low TRNG throughput, 2) high energy consumption, 3) high TRNG latency, and 4) the inability to generate true random numbers continuously, which limits the application space of SRAM-based TRNGs. Our goal in this paper is to design an SRAM-based TRNG that overcomes these four key limitations and thus, extends the application space of SRAM-based TRNGs. To this end, we propose TuRaN, a new high-throughput, energy-efficient, and low-latency SRAM-based TRNG that can sustain continuous operation. TuRaN leverages the key observation that accessing SRAM cells results in random access failures when the supply voltage is reduced below the manufacturer-recommended supply voltage. TuRaN generates random numbers at high throughput by repeatedly accessing SRAM cells with reduced supply voltage and post-processing the resulting random faults using the SHA-256 hash function. To demonstrate the feasibility of TuRaN, we conduct SPICE simulations on different process nodes and analyze the potential of access failure for use as an entropy source. We verify and support our simulation results by conducting real-world experiments on two commercial off-the-shelf FPGA boards. We evaluate the quality of the random numbers generated by TuRaN using the widely-adopted NIST standard randomness tests and observe that TuRaN passes all tests. TuRaN generates true random numbers with (i) an average (maximum) throughput of 1.6Gbps (1.812Gbps), (ii) 0.11nJ/bit energy consumption, and (iii) 278.46us latency

arXiv.org e-Print Archive

Repository for Publications and Research Data

DRAM의 전력-성능 상보 관계를 고려한 높은 에너지 효율의 메모리 시스템 설계

Author: 조현윤
Publication venue: 서울대학교 대학원
Publication date: 01/02/2017
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 융합과학부(지능형융합시스템전공), 2017. 2. 안정호.최근 서버에 요구되는 주기억장치의 용량이 증가되면서 기존에 비해 많은 개수의 기억장치 모듈이 추가적으로 장착되기 시작하였다. 이로 인해 대용량 주기억장치를 갖춘 서버 시스템에서 주기억장치가 프로세서에 이어 두 번째로 많은 에너지를 소모하는 구성 성분이 되었다. 게다가 특정 서버에서는 시스템 구성 방법에 따라서는 주기억장치가 프로세서에 맞먹는 에너지를 소모하는 경우까지 있다. 따라서 대용량 주기억장치를 가진 서버 시스템에서 주기억장치의 에너지 효율을 높이는 것이 매우 중요해졌다. 기존의 연구들은 보다 에너지 효율적인 주기억장치 시스템을 구성하기 위해서 모바일용 DRAM인 LPDDR을 활용하려고 하였다. LPDDR은 기존 DDR 대비 전력 소모가 적다는 장점이 있다. 그러나 대신 데이터 접근 지연시간이 너무 크고 대역폭이 낮다는 단점도 동시에 가지고 있다. 따라서 에너지 효율을 높이기 위하여 성능 제약을 극복하려고 애써왔다. 하지만 본 논문에서 DDR4대신 LPDDR4를 기반으로 모바일 DRAM을 대신 사용하는 주기억장치 아키텍처가 더 이상 효과적이지 않다는 것을 실험으로 확인하였다. 주기억장치를 빈번하게 사용하는 워크로드에서는 기준점인 DDR4 대비 LPDDR4를 사용하는 시스템의 에너지 효율이 49% 감소한다. 그 이유는 DDR4가 모바일과 그래픽용 DRAM의 장점(낮은 전력 소모, 높은 대역폭, 많은 뱅크 등)을 벤치마킹하여 적용함으로써 성능과 에너지 효율을 동시에 개선하고자 하였으나, LPDDR4에서 더 높은 대역폭 확보를 위해 대신 에너지 효율을 희생하였기 때문이다. 추가적으로 DDR4의 전력 소모가 제조사별로 산포가 존재하는 것을 확인하였다. 그리고 DDR4의 새로운 에너지 소모 감소 기술에 대하여 심도 있게 조사하였다. 그래서 이 기술들을 적용하였을 경우 에너지 효율이 오히려 나빠질 수 있다는 것을 실험으로 확인하였다. 앞서 나열한 사항에 근거하여, 궁극적으로 에너지 소모 감소를 위하여 가변적으로 DRAM의 power-down 모드를 활용하는, 간단하고 효과적인 방법을 제안한다. 제안하는 방법을 적용하였을 경우 에너지-지연시간의 곱이 기존 power-down 대비 4% 개선됨을 확인하였다.As servers are equipped with more memory modules each with larger capacity, main-memory systems are now the second highest energy-consuming component in big-memory servers and their energy consumption even becomes comparable to processors in some servers. Meanwhile, it is critical for big-memory servers and their main-memory systems to offer high energy efficiency. In pursuit of energy-efficient main memory systems, prior work exploited mobile LPDDR devices advantages (lower power than DDR devices) while attempting to surmount their limitations (longer latency, lower bandwidth, or both). However, we demonstrate that such main memory architectures (based on the latest LPDDR4 devices) are no longer effective and even hurt overall energy efficiency of servers by 49% on memory intensive workloads compared to ones based on DDR4 devices. This is because the power consumption of present DDR4 devices has substantially decreased by adopting the strength of mobile and graphics memory whereas LPDDR4 has sacrificed energy efficiency and focused more on increasing data transfer rateswe also exhibit that the power consumption of DDR4 devices can substantially vary across manufacturers. Moreover, investigating new energy-saving features of DDR4 devices in depth, we show that activating these features often hurts overall energy efficiency of servers due to their performance penalties. Subsequently, we propose a simple but effective scheme that adaptively exploits DRAM power-down modes which improves the system energy-delay product by 4.0%.Introduction 1 Background and Related Work 5 2.1 DRAM Organization and Operation 5 2.2 Breaking Down DRAM Power Dissipation 8 2.3 Recent Progresses in Improving the Energy Efficiency of Main Memory Systems 10 Energy Efficiency and Performance Trade-Offs of Modern Main Memory Devices 14 3.1 DDR4 is not Energy Inefficient Any More 15 3.2 Saving Standby Power by Exploiting Power-down Modes 18 3.3 Saving Data Transfer Energy with DBI/TSV 20 3.3.1 Benefits of DBI 21 3.3.2 Energy savings by DBI considering its cost 22 3.3.3 Impact of module types 23 Improving Main-Memory Efficiency Without Compromising Performance: Exploiting Power-Down Modes Adaptively 25 Experimental Setup 28 Evaluation 31 Conclusion 36 Bibliography 38 국문 초록 43Maste

SNU Open Repository and Archive

Implementation of Ultra-Low Latency and High-Speed Communication Channels for an FPGA-Based HPC Cluster

Author: Sanchez Correa Roberto
Publication venue
Publication date: 01/05/2017
Field of study

RÉSUMÉ Les clusters basés sur les FPGA bénéficient de leur flexibilité et de leurs performances en termes de puissance de calcul et de faible consommation. Et puisque la consommation de puissance devient un élément de plus en plus importants sur le marché des superordinateurs, le domaine d’exploration multi-FPGA devient chaque année plus populaire. Les performances des ordinateurs n’ont jamais cessé d’augmenter mais la latence des réseaux d’interconnexion n’a pas suivi leur taux d’amélioration. Dans le but d’augmenter le niveau d’abstraction et les fonctionnalités des interconnexions, la complexité des piles de communication atteinte à nos jours engendre des coûts et affecte la latence des communications, ce qui rend ces piles de communication très souvent inefficaces, voire inutiles. Les protocoles de communication commerciaux existants et les contrôleurs d’interfaces réseau FPGA-FPGA n’ont la performance pour supporter ni les applications à temps critique ni un partitionnement étroitement couplé des systèmes sur puce. Au lieu de cela, les approches de communication personnalisées sont souvent préférées. Dans ce travail, nous proposons une implémentation de canaux de communication à haut débit et à faible latence pour une grappe de FPGA. Le système est constitué de deux BEE3, chacun contenant 4 FPGA de la famille Virtex-5 interconnectés par une topologie en anneau. Notre approche exploite la technologie à transducteur à plusieurs gigabits par seconde pour l’obtention d’une bande passante fiable de 8Gbps. Le module de propriété intellectuelle (IP) de communication proposé permet le transfert de données entre des milliers de coprocesseurs sur le réseau, grâce à l’implémentation d’un réseau direct avec capacité de routage de paquets. Les résultats expérimentaux ont montré une latence de seulement 34 cycles d’horloge entre deux noeuds voisins, ce qui est un des plus bas parmi ceux rapportés dans la littérature. En outre, nous proposons une architecture adaptée au calcul à haute performance qui comporte un traitement extensible, parallèle et distribué. Pour une plateforme à 8 FPGA, l’architecture fournit 35.6Go/s de bande passante effective pour la mémoire externe, une bande passante globale de réseau de 128Gbps et une puissance de calcul de 8.9GFLOPS. Un solveur matrice-vecteur de grande taille est partitionné et mis en oeuvre à travers le cluster. Nous avons obtenu une performance et une efficacité de calcul concurrentielles grâce à la faible empreinte du protocole de communication entre les éléments de traitement distribués. Ce travail contribue à soutenir de nouvelles recherches dans le domaine du calcul parallèle intensif et permet le partitionnement de système sur puce à grande taille sur des clusters à base de FPGA.----------ABSTRACT An FPGA-based cluster profits from the flexibility and the performance potential FPGA technology provides. Since price and power consumption are becoming increasingly important elements in the High-Performance Computing market, the multi-FPGA exploration field is getting more popular each year. Network latency has failed to keep up with other improvements in computer performance. Complex communication stacks have sacrificed latency and increased overhead to achieve other goals, being in most of the time inefficient and unnecessary. The existing commercial offthe- shelf communication protocols and Network Interfaces Controllers for FPGA-to-FPGA interconnection lack of performance to support time-critical applications and tightly coupled System-on-Chip partitioning. Instead, custom communication approaches are preferred. In this work, ultra-low latency and high-speed communication channels for an FPGA-based cluster are presented. Two BEE3s grouping 8 FPGAs Virtex-5 interconnected in a ring topology, compose the targeting platform. Our approach exploits Multi-Gigabit Transceiver technology to achieve reliable 8Gbps channel bandwidth. The proposed communication IP supports data transfer from coprocessors over the network, by means of a direct network implementation with hop-by-hop packet routing capability. Experimental results showed a latency of only 34 clock cycles between two neighboring nodes, being one of the lowest in the literature. In addition, it is proposed an architecture suitable for High-Performance Computing which includes performing scalable, parallel, and distributed processing. For an 8 FPGAs platform, the architecture provides 35.6GB/s off-chip memory throughput, 128Gbps network aggregate bandwidth, and 8.9GFLOPS computing power. A large and dense matrix-vector solver is partitioned and implemented across the cluster. We achieved competitive performance and computational efficiency as a result of the low communication overhead among the distributed processing elements. This work contributes to support new researches on the intense parallel computing fields, and enables large System-on-Chip partitioning and scaling on FPGA-based clusters

PolyPublie

A 10Gb/s Full On-chip Bang-Bang Clock and Data Recovery System Using an Adaptive Loop Bandwidth Strategy

Author: Jeon Hyung-Joon
Publication venue
Publication date
Field of study

As demand for higher bandwidth I/O grows, the front end design of serial link becomes significant to overcome stringent timing requirements on noisy and bandwidthlimited channels. As a clock reconstructing module in a receiver, the recovered clock quality of Clock and Data Recovery is the main issue of the receiver performance. However, from unknown incoming jitter, it is difficult to optimize loop dynamics to minimize steady-state and dynamic jitter. In this thesis a 10 Gb/s adaptive loop bandwidth clock and data recovery circuit with on-chip loop filter is presented. The proposed system optimizes the loop bandwidth adaptively to minimize jitter so that it leads to an improved jitter tolerance performance. This architecture tunes the loop bandwidth by a factor of eight based on the phase information of incoming data. The resulting architecture performs as good as a maximum fixed loop bandwidth CDR while tracking high speed input jitter and as good as a minimum fixed bandwidth CDR while suppressing wide bandwidth steady-state jitter. By employing a mixed mode predictor, high updating rate loop bandwidth adaptation is achieved with low power consumption. Another relevant feature is that it integrates a typically large off-chip filter using a capacitance multiplication technique that employs dual charge pumps. The functionality of the proposed architecture has been verified through schematic and behavioral model simulations. In the simulation, the performance of jitter tolerance is confirmed that the proposed solution provides improved results and robustness to the variation of jitter profile. Its applicability to industrial standards is also verified by the jitter tolerance passing SONET OC-192 successfully

Texas A&M Repository

Recommended from our members

Highly digital power efficient techniques for serial links

Author: Inti Rajesh
Publication venue: 'Oregon State University'
Publication date
Field of study

Low power, high speed serial transceivers are employed in a wide range of applications ranging from chip-to-chip, backplane, and optical interconnects. Apart from being capable of handling a wide range of data rates, the transceivers should have low power consumption (mW/Gbps) and be fully integrated. This work discusses enabling techniques to implement such transceivers. Specifically, three designs: (1) a 0.5-4 Gbps serial link which uses current recycling to reduce power dissipation and (2) a 0.5-2.5 Gbps reference-less clock and data recovery circuit which uses a novel frequency detector to achieve unlimited acquisition range and (3) a 2-4 Gbps low power receiver architecture capable of resolving multiple signalling formats with a simplified XOR based phase rotating PLL will be presented. All the three circuit topologies are highly digital and aim to address the requirements of wide operating range, low power dissipation while being fully integrated. Measured results obtained from the prototypes illustrate the effectiveness of the proposed design techniques

ScholarsArchive@OSU

Experimental Evaluation and Comparison of Time-Multiplexed Multi-FPGA Routing Architectures

Author: Kashif Asmeen
Publication venue: 'University of Windsor Leddy Library'
Publication date: 05/10/2017
Field of study

Emulating large complex designs require multi-FPGA systems (MFS). However, inter-FPGA communication is confronted by the challenge of lack of interconnect capacity due to limited number of FPGA input/output (I/O) pins. Serializing parallel signals onto a single trace effectively addresses the limited I/O pin obstacle. Besides the multiplexing scheme and multiplexing ratio (number of inter-FPGA signals per trace), the choice of the MFS routing architecture also affect the critical path latency. The routing architecture of an MFS is the interconnection pattern of FPGAs, fixed wires and/or programmable interconnect chips. Performance of existing MFS routing architectures is also limited by off-chip interface selection. In this dissertation we proposed novel 2D and 3D latency-optimized time-multiplexed MFS routing architectures. We used rigorous experimental approach and real sequential benchmark circuits to evaluate and compare the proposed and existing MFS routing architectures. This research provides a new insight into the encouraging effects of using off-chip optical interface and three dimensional MFS routing architectures. The vertical stacking results in shorter off-chip links improving the overall system frequency with the additional advantage of smaller footprint area. The proposed 3D architectures employed serialized interconnect between intra-plane and inter-plane FPGAs to address the pin limitation problem. Additionally, all off-chip links are replaced by optical fibers that exhibited latency improvement and resulted in faster MFS. Results indicated that exploiting third dimension provided latency and area improvements as compared to 2D MFS. We also proposed latency-optimized planar 2D MFS architectures in which electrical interconnections are replaced by optical interface in same spatial distribution. Performance evaluation and comparison showed that the proposed architectures have reduced critical path delay and system frequency improvement as compared to conventional MFS. We also experimentally evaluated and compared the system performance of three inter-FPGA communication schemes i.e. Logic Multiplexing, SERDES and MGT in conjunction with two routing architectures i.e. Completely Connected Graph (CCG) and TORUS. Experimental results showed that SERDES attained maximum frequency than the other two schemes. However, for very high multiplexing ratios, the performance of SERDES & MGT became comparable

Scholarship at UWindsor

A LPDDR4 MEMORY CONTROLLER DESIGN WITH EYE CENTER DETECTION ALGORITHM

Author: 홍기문
Publication venue: 서울대학교 대학원
Publication date: 01/02/2016
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. 김수환.The demand for higher bandwidth with reduced power consumption in mobile memory is increasing. In this thesis, architecture of the LPDDR4 memory controller, operated with a LPDDR4 memory, is proposed and designed, and efficient training algorithm, which is appropriate for this architecture, is proposed for memory training and verification. The operation speed range of the LPDDR4 memory specification is from 533Mbps to 4266Mbps, and the LPDDR4 memory controller is designed to support that range of the LPDDR4 memory. The phase-locked loop in the LPDDR4 memory controller is designed to operate between 1333MHz and 2133MHz. To cover the range of the LPDDR4 memory, the selectable frequency divider is used to provide operation clock. The output frequency of the phase-locked loop with divider is from 266MHz to 2133MHz. The delay-locked loop in the LPDDR4 memory controller is designed to operate between 266MHz and 2133MHz with 180˚ phase locking. The delay-locked loop is used each training operation, which is command training, data read and write training. To complete training in each training stage, eye center detection algorithm is used. The circuits for the proposed eye center detection algorithm such as delay line, phase interpolator and reference generator are designed and validated. The proposed 1x2y3x eye center detection algorithm is 23 times faster than conventional two-dimensional eye center detection algorithm and it can be implemented simply. Using 65nm CMOS process, the proposed LPDDR4 memory controller occupies 12mm2. The verification of the LPDDR4 memory controller is performed with commodity LPDDR4 memory. The verification of all training sequence, which is power on, initializing, boot up, command training, write leveling, read training, write training, is performed in this environment. The low voltage swing terminated logic driver and other several functions, including write leveling and data transmission, are verified at 4266Mbps and the entire LPDDR4 memory controller operations from 566Mbps to 1600Mbps are verified. The proposed eye center detection algorithm is verified from 566Mbps to 2843Mbps.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 INTRODUCTION 5 1.3 THESIS ORGANIZATION 7 CHAPTER 2 LPDDR4 MEMORY CONTROLLER DESIGN 8 2.1 DIFFERENCE BETWEEN LPDDR3 AND LPDDR4 MEMORY 8 2.1.1 ARCHITECTURAL DIFFERENCE BETWEEN LPDDR3 AND LPDDR4 MEMORY 10 2.1.2 SOURCE SYNCHRONOUS MATCHED SCHEME AND UNMATCHED SCHEME 11 2.1.3 LOW VOLTAGE SWING TERMINATED LOGIC DRIVER AND TERMINATION SCHEME 12 2.2 LPDDR4 MEMORY CONTROLLER SPECIFICATION 15 2.3 DESIGN PROCEDURE 18 CHAPTER 3 LPDDR4 MEMORY CONTROLLER ARCHITECTURE BASED ON MEMORY TRAINING 20 3.1 LPDDR4 MEMORY TRAINING SEQUENCE 20 3.2 LPDDR4 MEMORY TRAINING EYE DETECTION ALGORITHM 24 3.2.1 EYE CENTER DETECTION 24 3.2.2 1X2Y3X EYE CENTER DETECTION ALGORITHM 27 3.3. LPDDR4 MEMORY CONTROLLER DESIGN BASED ON MEMORY TRAINING 31 3.3.1 ARCHITECTURE FOR MEMORY BOOT UP AND POWER UP 31 3.3.2 CLOCK PATH ARCHITECTURE AND CLOCK TREE 34 3.3.3 COMMAND TRAINING AND COMMAND PATH ARCHITECTURE 35 3.3.4 WRITE LEVELING AND DATA STROBE TRANSMISSION PATH ARCHITECTURE 39 3.3.5 READ TRAINING AND READ PATH ARCHITECTURE 41 3.3.6 WRITE TRAINING AND WRITE PATH ARCHITECTURE 43 3.3.7 NORMAL READ/WRITE OPERATION AND MARGIN TEST 46 CHAPTER 4 LPDDR4 MEMORY CONTROLLER ARCHITECTURE MODELING AND CIRCUIT DESIGN 48 4.1 OVERALL LPDDR4 MEMORY CONTROLLER ARCHITECTURE MODELING 48 4.2 SIMULATION RESULT OF LPDDR4 MEMORY CONTROLLER MODELING 51 4.3 LPDDR4 MEMORY CONTROLLER CIRCUIT DESIGN 61 4.3.1 PHASE-LOCKED LOOP 61 4.3.2 DELAY-LOCKED LOOP 65 4.3.3 TRANSMITTER OF LPDDR4 MEMORY CONTROLLER: WRITE PATH 70 4.3.4 DE-SERIALIZER WITH CLOCK DOMAIN CROSSING 75 CHAPTER 5 MEASUREMENT RESULT OF LPDDR4 MEMORY CONTROLLER 77 5.1 LPDDR4 MEMORY CONTROLLER MEASUREMENT SETUP 77 5.1.1 LPDDR4 MEMORY CONTROLLER FLOOR PLAN AND LAYOUT 77 5.1.2 PACKAGE AND TEST BOARD 79 5.2 LPDDR4 MEMORY CONTROLLER SUB-BLOCK MEASUREMENT 81 5.2.1 PHASE-LOCKED LOOP 81 5.2.2 DELAY-LOCKED LOOP 83 5.2.3 200PS AND 800PS DELAY LINE 85 5.2.4 VOLTAGE REFERENCE GENERATOR 86 5.2.5 PHASE INTERPOLATOR 87 5.3 LPDDR4 MEMORY SYSTEM OPERATION MEASUREMENT 90 CHAPTER 6 CONCLUSION 93 APPENDIX OPERATION FLOW CHART OF THE PROPOSED LPDDR4 MEMORY CONTROLLER 95 BIBLIOGRAPHY 118 KOREAN ABSTRACT 124Docto

SNU Open Repository and Archive

Characterization and optimization of the prototype DEPFET modules for the Belle II Pixel Vertex Detector

Author: Müller Felix
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 19/07/2017
Field of study

Der Elektron-Positron-Speicherring KEKB wurde von 1999 bis 2010 am Hochenergie- und Beschleunigerforschungszentrum KEK in Tsukuba (Japan) betrieben, wobei die Schwerpunktsenergie hauptsächlich dem Anregungszustand des Y(4S)-Teilchens (10.58 GeV) entsprach. KEKB erreichte während seiner Betriebszeit eine integrierte Luminosität von 1041 fb^-1. Mit dem Belle-Detektor wurden die Zerfälle von B-Mesonen untersucht, die die Theorie über den Ursprung der CP-Verletzung im Standardmodell von Kobayashi und Maskawa bestätigten; dafür erhielten beide im Jahr 2008 den Nobelpreis. Der Speicherring KEKB wird zu SuperKEKB erneuert, um Antworten auf die vielen offenen Fragen des Standardmodells und möglicherweise „Neue Physik“ jenseits des Standardmodells zu finden. Die Teilchenstrahlen werden auf etwa 50 Nanometer am Wechselwirkungspunkt kollimiert (“nano-beam scheme“), damit die weltweit höchste instantane Luminosität von KEKB um einen weiteren Faktor 40 auf 8x10^35 cm^-2 s^-1 gesteigert werden kann. Die (physikalischen) Ziele des Projekts sind die präzise Vermessung der CP-Verletzung und die Suche nach seltenen oder sogar „verbotenen“ Zerfällen von B-Mesonen, um mögliche Abweichungen vom Standardmodell zu finden. Verschiedene Komponenten müssen von Belle erneuert werden (Belle II), um die hohe instantane Luminosität von SuperKEKB zu bewältigen. Nicht nur die Anzahl der Ereignisse nimmt zu, sondern auch der Untergrund, insbesondere der unvermeidbare Zwei-Photonen-Untergrundprozess. Mit einem Siliziumvertexdetektor werden im Experiment die Zerfallsvertices der B-Mesonen analysiert. Der Vertexdetektor soll so nah wie möglich um das Strahlrohr platziert werden, damit Extrapolationsfehler der Zerfallsvertices minimiert werden. Da ein Siliziumstreifendetektor, wie er in Belle benutzt wurde, den hohen Untergrund im geringen Abstand zum Strahlrohr nicht bewältigen kann, wird ein neuartiger Pixel-Detektor (PXD) installiert, der aus monolitischen DEPFET (DEPletierter p-Kanal Feld Effekt Transistor) Pixel-Sensoren besteht. Der DEPFET-Sensor kann bis zu 75 um gedünnt werden, um die Mehrfachstreuung zu minimieren, besitzt ein hohes Signal-Rausch-Verhältnis, verfügt über eine intrinsische Positionsausflösung von 15 um, unterstützt schnelle Auslesezeiten von weniger als 20 us und hat einen geringen Stromverbrauch. Der PXD besteht insgesamt aus 40 Sensor-Modulen, wobei jedes mit 14 ASICs für die Steuerung und Auslese bestückt ist. Die Module werden in zwei Lagen um das Strahlrohr montiert. Die vorliegende Arbeit fokussiert sich auf die Charakterisierung und Optimierung der ersten Prototypen der finalen PXD-Module. Die kombinierte Kontroll- und Ausleseelektronik wurde auf Prototyp-Modulen untersucht, verbessert und optimiert: Sechs Switcher pro Modul schalten die Pixelzeilen nacheinander ein (rolling-shutter Modus / zeilenweiser Auslesemodus), um die signalverstärkten Drainströme der DEPFET-Pixel zu messen und die Pixelzelle zurückzusetzen. Insgesamt messen 1000 ADCs auf jedem Modul die Drainströme mit einer PXD-Auslesefrequenz von 50 kHz. Damit die Pixel korrekt angesteuert werden, wurden Steuerungssequenzen für die Switcher simuliert und auf den Prototyp-Modulen getestet. Die systemrelevanten Aspekte, wie die inter-ASIC Kommunikation, Kontrollsequenzen und Synchronisationsprobleme wurden eingehend untersucht und optimiert. Zusätzlich wurden Messungen mit radioaktiven Quellen und Lasern durchgeführt, um die optimalen Operationsspannungen für verschiedene Betriebsmodi zu bestimmen. Der zeilenweise Auslesemodus von 20 us erscheint problematisch, wenn ein kurzzeitiger, periodischer Untergrund auftritt, beispielsweise während der Aufstockungsinjektion der Teilchenpakete in SuperKEKB. Um dieses Problem zu lösen, wurde ein neuer Arbeitsmodus vorgeschlagen und untersucht, welcher einen „gated“ Betriebsmodus des Detektors ermöglicht. Dies schaltet den Pixel-Vertex-Detektor für eine kurze Zeitspanne 1-2 us blind, während der hohe Untergrund erwartet wird. Ein Prototyp-Modul wurde im „Gated Mode“ betrieben; Ursachen von auftretenden Problemen wurden ausfindig gemacht. Die daraus resultierenden Verbesserungen trugen dem finalen Modul-Layout bei. Außerdem wurden zwei verschiedene Arten von Prototyp-Modulen erfolgreich in einer Strahltest-Kampagne betrieben. Die Ladungs-Cluster-Verteilungen, Positionsauflösung und Effizienzen wurden studiert, wobei deutlich wird, dass sich die Sensoren gut für den Betrieb in Belle II eignen.The Belle detector was located at the electron-positron collider KEKB in Tsukuba, Japan. It operated from 1999 to 2010, running mostly at the Y(4S) resonance, and achieved an integrated luminosity of 1041 fb^-1. The main research topic was the CP violation in the B meson system. The measured results on B meson decays confirmed the theory of Kobayashi and Maskawa (Nobel Prize 2008) on the origin of CP violation within the Standard Model. Since the Standard Model nevertheless leaves many open questions, the upgrade of KEKB to SuperKEKB has the potential to find New Physics beyond the Standard Model. SuperKEKB will increase the world-record instantaneous luminosity of KEKB by a factor of 40 to 8x10^35 cm^-2 s^-1 using the nano-beam scheme. The physics goals are the precise measurement of CP violation, searching for rare or even "forbidden" decays of B mesons and finding small deviations from the Standard Model with larger statistics and more precise measurements than ever before. To cope with the large luminosity of SuperKEKB various components of Belle need to be upgraded to the Belle II detector. Given the high luminosity, not only the number of events increases but also the background, in particular, the inevitable two-photon process. To minimize the extrapolation errors of the decay vertices of the B mesons the vertex detector should be situated as close as possible to the beam pipe. A silicon strip detector, as used in Belle, is not able to cope with the high background at SuperKEKB. Therefore, a novel pixel vertex detector (PXD) will be installed, featuring monolithic sensors using the DEPFET (DEPleted p-channel Field Effect Transistor) technology. The sensors can be thinned down to only 75 um to minimize multiple scattering, offer high signal-to-noise ratio, provide high intrinsic position resolution of ~15 um, support fast readout within 20 us and have low power consumption. The PXD consists of 40 sensors, each equipped with 14 custom-made ASICs for control and readout, which are mounted in two layers around the beam pipe. This thesis focuses on the characterization and optimization of the first full-size prototypes of the final sensor modules for the PXD. The combined control and readout electronics was investigated, improved and optimized on prototype modules equipped with the complete set of ASICs: six Switchers per module enable the pixel rows subsequently (rolling shutter mode) to measure the signal-amplified Drain currents from the DEPFETs and reset the device. A total of 1000 ADCs on each module sample the Drain currents resulting in a readout frequency of 50 kHz for the PXD. Switcher control sequences were simulated and applied for the prototypes to control the pixels properly. The system-related aspects like the inter-ASIC communication, control sequences and synchronization issues were studied and optimized. Measurements with radioactive sources and lasers were performed to determine optimal voltages for the different operation modes. The rolling shutter readout mode is problematic when transient intermittent high background is present, for instance during the top-up injection of SuperKEKB. To address this issue a new readout mode is proposed and investigated, which allows a "gated" or shutter-controlled operation of the detector. This makes the detector blind for a certain time interval in which high background is expected. A prototype module was operated in the Gated Mode; causes of encountered problems were identified and improvements were proposed and applied to the module layout. Two different kinds of prototype modules were operated successfully in a beam test campaign. The cluster charge distributions, position resolutions and efficiencies were studied and prove that the sensor is well suited for the operation at Belle II

Recommended from our members

Performance enhancement techniques for low power digital phase locked loops

Author: Elshazly Amr
Publication venue: 'Oregon State University'
Publication date
Field of study

Desire for low-power, high performance computing has been at core of the symbiotic union between digital circuits and CMOS scaling. While digital circuit performance improves with device scaling, analog circuits have not gained these benefits. As a result, it has become necessary to leverage increased digital circuit performance to mitigate analog circuit deficiencies in nanometer scale CMOS in order to realize world class analog solutions. In this thesis, both circuit and system enhancement techniques to improve performance of clock generators are discussed. The following techniques were developed: (1) A digital PLL that employs an adaptive and highly efficient way to cancel the effect of supply noise, (2) a supply regulated DPLL that uses low power regulator and improves supply noise rejection, (3) a digital multiplying DLL that obviates the need for high-resolution TDC while achieving sub-picosecond jitter and excellent supply noise immunity, and (4) a high resolution TDC based on a switched ring oscillator, are presented. Measured results obtained from the prototype chips are presented to illustrate the proposed design techniques

ScholarsArchive@OSU