816 research outputs found

    ์ •ํ™•ํ•˜๊ณ  ํ•™์Šต ๊ธฐ๋ฐ˜ ์ „๋ ฅ ๋ถ„์„์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” ํด๋ก ๊ฒŒ์ดํŒ…์˜ ํ•ฉ์„ฑ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2023. 2. ๊น€ํƒœํ™˜.In this paper, we introduce two techniques to efficiently apply clock gating in the synthesis stage. First, We propose a new clock gating methodology based on a precise power saving analysis to overcome the ineffectiveness of the conventional logic structure based clock gating. Two new features exploited in our proposed clock gating are (i) the multiplexer selection signal probability that a flip-flop with multiplexer feedback loop receives a new input and (ii) the joint probability of selection signals that two flip-flops with different multiplexor selection signals both receive new inputs at the same clock cycle. In summary, our method reduces the total power consumption by 2.46% on average (up to 5.00%) over the conventional clock gating method. In the second work, we address a new problem of transforming the long toggling/untoggling sequences of flip-flops cycle-accurate activities into short embedding vectors, so that the flip-flop grouping for clock gating is practically feasible in terms of the memory usage and run time for checking activity similarity among flip-flops. To this end, we propose a machine learning based generation of embedding vectors which are accurate enough to predict the original flip-flop toggling sequences. Precisely, we develop a neural network model of LSTM (long short-term memory) based AE(autoencoder) model combined with SDAE (stacked denoising autoencoder) to take into account the time-series (i.e., clock cycle) similarity feature among the toggling sequences, which is essential to determine which flip-flops should be grouped together for clock gating. By integrating (1) our LSTM based embedding vector generation model, we propose two additional ML models for clock gating: (2) joint state probability predictor (JSP) model for generating 0-state probability of two embedding vectors, and (3) joint feature predictor (JFP) model for generating a new embedding vector that combines two embedding vectors. Through experiments, it is confirmed that our proposed LSTM combined with AutoEnc improves the toggling sequence prediction accuracy up to 0.88 while an LSTM (long short-term memory) based AE model produces accuracy to 0.72, thereby enabling our ML based clock gating framework to save the dynamic power consumption further over that by the state-of-the-art commercial clock gating tool, which relies on the flip-flops toggling probability for grouping flip-flops. Through experiments with benchmark circuits in IWLS, it is shown that our method is able to reduce the dynamic power by 14.0% on average over that by the conventional toggling-driven clock gating.๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ•ฉ์„ฑ ๋‹จ๊ณ„์—์„œ ํด๋ก ๊ฒŒ์ดํŒ…์„ ํšจ์œจ์ ์œผ๋กœ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•œ ๋‘ ๊ฐ€์ง€ ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ฒซ์งธ๋กœ, ํด๋ก ๊ฒŒ์ดํŒ… ๊ธฐ๋ฐ˜์˜ ๊ธฐ์กด ๋กœ์ง ๊ตฌ์กฐ์˜ ๋น„ํšจ์œจ์„ฑ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ์ •๋ฐ€ ํ•œ ์ ˆ์ „ ๋ถ„์„์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์ƒˆ๋กœ์šด ํด๋ก ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ ํด๋ก ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์—์„œ ํ™œ์šฉ๋˜๋Š” ๋‘ ๊ฐ€์ง€ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์€ (i) ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ํ”Œ๋ฆฝํ”Œ๋กญ ์˜ ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ ์„ ํƒ ์‹ ํ˜ธ ํ™•๋ฅ  ๋ฐ (ii) ์„œ๋กœ ๋‹ค๋ฅธ ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ ์„ ํƒ ์‹ ํ˜ธ๋ฅผ ๊ฐ–๋Š” ๋‘ ํ”Œ๋ฆฝํ”Œ๋กญ์˜ ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ ์„ ํƒ ์‹ ํ˜ธ ๊ฒฐํ•ฉ ํ™•๋ฅ ์ด๋‹ค. ์ „๋ ฅ ์ด๋“์ด ์žˆ๋Š” ๊ฒฝ์šฐ์—๋งŒ ํด๋ก ๊ฒŒ์ดํŒ…์„ ์ ์šฉํ•˜๊ณ  ์„œ๋กœ ๋‹ค๋ฅธ ํด๋ก ๊ฒŒ์ดํŒ… ๊ทธ๋ฃน์„ ํ†ตํ•ฉํ•จ์œผ๋กœ์„œ ์ „์ฒด ๋™์  ์ „๋ ฅ๋ฅผ ์ค„์ด๊ณ ์ž ํ•˜์˜€๋‹ค. ์‹คํ—˜์„ ํ†ตํ•ด ๊ธฐ์กด์˜ ํด๋ก ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ํ‰๊ท  2.46%(์ตœ๋Œ€ 5.00%)์˜ ์ด ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ ํ”Œ๋ฆฝํ”Œ๋กญ์˜ ํด๋ก ์ฃผ๊ธฐ๋ณ„ ์ƒํƒœ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ธด ํ† ๊ธ€๋ง/์–ธํ† ๊ธ€๋ง ์‹œํ€€์Šค ๋ฅผ ์งง์€ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ์ด๋ฅผ ํ† ๊ธ€๋ง ๊ธฐ๋ฐ˜ ํด๋ก ๊ฒŒ์ด ํŒ…์„ ์œ„ํ•œ ํ”Œ๋ฆฝํ”Œ๋กญ ๊ทธ๋ฃนํ™”์— ์ ์šฉํ•˜์—ฌ ํ”Œ๋ฆฝํ”Œ๋กญ ๊ฐ„์˜ ์ƒํƒœ ์œ ์‚ฌ์„ฑ ํ™•์ธ์ด ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ ๋ฐ ์‹คํ–‰ ์‹œ๊ฐ„ ์ธก๋ฉด์—์„œ ์‹ค์งˆ์ ์œผ๋กœ ์‹คํ˜„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์˜€๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๊ธฐ๊ณ„ ํ•™์Šต ๊ธฐ๋ฐ˜์œผ๋กœ ์›๋ž˜์˜ ํ”Œ๋ฆฝํ”Œ๋กญ ํ† ๊ธ€ ์‹œํ€€์Šค๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ์— ์ถฉ๋ถ„ํžˆ ์ •ํ™•ํ•œ ์ €์ฐจ์›์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์˜ ์ƒ์„ฑ์„ ์ œ์•ˆํ•œ๋‹ค. ์šฐ๋ฆฌ๋Š” ํ† ๊ธ€๋ง ์‹œํ€€์Šค ๊ฐ„์˜ ์‹œ๊ณ„์—ด ์œ ์‚ฌ์„ฑ์„ ๊ณ ๋ ค ํ•˜๊ธฐ ์œ„ํ•ด ๋””๋…ธ์ด์ฆˆ ์˜คํ† ์ธ์ฝ”๋”๋ฅผ ์ด์šฉํ•˜์—ฌ 5000 ํด๋ก ์‚ฌ์ดํด์˜ ํ† ๊ธ€๋ง ์‹œํ€€์Šค๋ฅผ 10์ฐจ์›์œผ๋กœ ์••์ถ•ํ•˜๊ณ  ์ด๋ฅผ ์žฅ๋‹จ๊ธฐ ๋ฉ”๋ชจ๋ฆฌ ์˜คํ† ์ธ์ฝ”๋”์— ์ž…๋ ฅํ•˜์—ฌ ์ „์ฒด ์‹œํ€€์Šค๋ฅผ ๋Œ€๋ณ€ํ•˜๋Š” ์ €์ฐจ์› ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ๋˜ํ•œ ์šฐ๋ฆฌ๋Š” ํด๋ก ๊ฒŒ์ดํŒ…์„ ์œ„ํ•œ ๋‘ ๊ฐ€์ง€ ๋ถ€๊ฐ€์ ์ธ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์ธ (1) 2๊ฐœ์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์˜ 0- ์ƒํƒœ ํ™•๋ฅ  ์ƒ์„ฑ์„ ์œ„ํ•œ ๊ฒฐํ•ฉ ํ™•๋ฅ  ์˜ˆ์ธก ๋ชจ๋ธ๊ณผ (2) ๋‘ ๊ฐœ์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒฐํ•ฉ ํŠน์ง• ์˜ˆ์ธก ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. IWLS ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ๋ฅผ ์ด์šฉํ•œ ์‹คํ—˜์„ ํ†ตํ•ด, ๋””๋…ธ์ด์ฆˆ ์˜คํ† ์ธ์ฝ”๋”๋งŒ ์‚ฌ์šฉํ–ˆ์„๋•Œ๋ณด๋‹ค ์žฅ๋‹จ๊ธฐ ๋ฉ”๋ชจ๋ฆฌ ๊ธฐ๋ฐ˜์˜ ์˜คํ† ์ธ์ฝ”๋”๋ฅผ ๊ฒฐํ•ฉํ–ˆ์„ ๋•Œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณต์› ์ •ํ™•๋„๊ฐ€ ๋” ์šฐ์ˆ˜ํ•œ ๊ฒƒ์„ ํ™• ์ธํ•˜์˜€๋‹ค. ๋˜ํ•œ ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์ด ๊ธฐ์กด์˜ ํ† ๊ธ€๋ง ๊ธฐ๋ฐ˜ ํด๋ก ๊ฒŒ์ดํŒ…์— ๋น„ํ•ด ํ‰๊ท  14.0% ์˜ ๋™์  ์ „๋ ฅ์„ ์ค„์ผ ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค.1 Selective Clock Gating Based on Comprehensive Power Saving Analysis 1 1.1 Introduction 1 1.2 Preliminary and Motivation 1 1.3 Selective Clock Gating 3 1.3.1 Concept of Selective Clock Gating 3 1.3.2 Joint probability of selection signals 5 1.4 Experimental Results 6 1.4.1 Experimental Setup 6 1.4.2 Experimental Result 7 1.5 Conclusion 10 2 Machine Learning Based Flip-Flop Grouping for Toggling Driven Clock Gating 11 2.1 Introduction 11 2.2 Preliminaries and Prior Works 13 2.2.1 Preliminary and Motivation 13 2.2.2 Prior Works 14 2.3 Machine Learning Based Clock Gating Framework 14 2.3.1 Primary Model: Embedding Vector Generation 14 2.3.2 Secondary Models: Joint State Probability and Joint Feature Prediction 17 2.3.3 Distance Analysis Between Embedding Vectors 18 2.3.4 Power Analysis Model 19 2.3.5 Overall Flow of Flip-flop Grouping 19 2.4 Experimental Results 19 2.4.1 Comparison of Dynamic Power Saving 20 2.4.2 Performance of Auto-encoder Reconstruction Model 21 2.5 Conclusion 21 Abstract (In Korean) 26์„

    STATISTICAL MACHINE LEARNING BASED MODELING FRAMEWORK FOR DESIGN SPACE EXPLORATION AND RUN-TIME CROSS-STACK ENERGY OPTIMIZATION FOR MANY-CORE PROCESSORS

    Get PDF
    The complexity of many-core processors continues to grow as a larger number of heterogeneous cores are integrated on a single chip. Such systems-on-chip contains computing structures ranging from complex out-of-order cores, simple in-order cores, digital signal processors (DSPs), graphic processing units (GPUs), application specific processors, hardware accelerators, I/O subsystems, network-on-chip interconnects, and large caches arranged in complex hierarchies. While the industry focus is on putting higher number of cores on a single chip, the key challenge is to optimally architect these many-core processors such that performance, energy and area constraints are satisfied. The traditional approach to processor design through extensive cycle accurate simulations are ill-suited for designing many-core processors due to the large microarchitecture design space that must be explored. Additionally it is hard to optimize such complex processors and the applications that run on them statically at design time such that performance and energy constraints are met under dynamically changing operating conditions. The dissertation establishes statistical machine learning based modeling framework that enables the efficient design and operation of many-core processors that meets performance, energy and area constraints. We apply the proposed framework to rapidly design the microarchitecture of a many-core processor for multimedia, computer graphics rendering, finance, and data mining applications derived from the Parsec benchmark. We further demonstrate the application of the framework in the joint run-time adaptation of both the application and microarchitecture such that energy availability constraints are met

    Recent Trends in Communication Networks

    Get PDF
    In recent years there has been many developments in communication technology. This has greatly enhanced the computing power of small handheld resource-constrained mobile devices. Different generations of communication technology have evolved. This had led to new research for communication of large volumes of data in different transmission media and the design of different communication protocols. Another direction of research concerns the secure and error-free communication between the sender and receiver despite the risk of the presence of an eavesdropper. For the communication requirement of a huge amount of multimedia streaming data, a lot of research has been carried out in the design of proper overlay networks. The book addresses new research techniques that have evolved to handle these challenges

    On Energy Efficient Computing Platforms

    Get PDF
    In accordance with the Moore's law, the increasing number of on-chip integrated transistors has enabled modern computing platforms with not only higher processing power but also more affordable prices. As a result, these platforms, including portable devices, work stations and data centres, are becoming an inevitable part of the human society. However, with the demand for portability and raising cost of power, energy efficiency has emerged to be a major concern for modern computing platforms. As the complexity of on-chip systems increases, Network-on-Chip (NoC) has been proved as an efficient communication architecture which can further improve system performances and scalability while reducing the design cost. Therefore, in this thesis, we study and propose energy optimization approaches based on NoC architecture, with special focuses on the following aspects. As the architectural trend of future computing platforms, 3D systems have many bene ts including higher integration density, smaller footprint, heterogeneous integration, etc. Moreover, 3D technology can signi cantly improve the network communication and effectively avoid long wirings, and therefore, provide higher system performance and energy efficiency. With the dynamic nature of on-chip communication in large scale NoC based systems, run-time system optimization is of crucial importance in order to achieve higher system reliability and essentially energy efficiency. In this thesis, we propose an agent based system design approach where agents are on-chip components which monitor and control system parameters such as supply voltage, operating frequency, etc. With this approach, we have analysed the implementation alternatives for dynamic voltage and frequency scaling and power gating techniques at different granularity, which reduce both dynamic and leakage energy consumption. Topologies, being one of the key factors for NoCs, are also explored for energy saving purpose. A Honeycomb NoC architecture is proposed in this thesis with turn-model based deadlock-free routing algorithms. Our analysis and simulation based evaluation show that Honeycomb NoCs outperform their Mesh based counterparts in terms of network cost, system performance as well as energy efficiency.Siirretty Doriast

    On-chip Voltage Regulatorโ€“ Circuit Design and Automation

    Get PDF
    Title from PDF of title page viewed May 24, 2021Dissertation advisors: Masud H Chowdhury and Yugyung LeeVitaIncludes bibliographical references (page 106-121)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2021With the increase of density and complexity of high-performance integrated circuits and systems, including many-core chips and system-on-chip (SoC), it is becoming difficult to meet the power delivery and regulation requirements with off-chip regulators. The off-chip regulators become a less attractive choice because of the higher overheads and complexity imposed by the additional wires, pins, and pads. The increased I2R loss makes it challenging to maintain the integrity of different voltage domains under a lower supply voltage environment in the smaller technology nodes. Fully integrated on-chip voltage regulators have proven to be an effective solution to mitigate power delivery and integrity issues. Two types of regulators are considered as most promising for on-chip implementation: (i) the low-drop-out (LDO) regulator and (ii) the switched-capacitor (SC)regulator. The first part of our research mainly focused on the LDO regulator. Inspired by the recent surge of interest for cap-less voltage regulators, we presented two fully on-chip external capacitor-less low-dropout voltage regulator design. The second part of this proposal explores the complexity of designing each block of the regulator/analog circuit and proposed a design methodology for analog circuit synthesis using simulation and learning-based approach. As the complexity is increasing day-by-day in an analog circuit, hierarchical flow mostly uses for design automation. In this work, we focused mainly on Circuit-level, one of the significant steps in the flow. We presented a novel, efficient circuit synthesis flow based on simulation and learning-based optimization methods. The proposed methodology has two phases: the learning phase and the evaluation phase. Random forest, a supervised learning is used to reduce the sample points in the design space and iteration number during the learning phase. Additionally, symmetric constraints are used further to reduce the iteration number during the sizing process. We introduced a three-step circuit synthesis flow to automate the analog circuit design. We used H-spice as a simulation tool during the evaluation phase of the proposed methodology. The three most common analog circuits are chosen: single-stage differential amplifier, operational transconductance amplifier, and two-stage differential amplifier to verify the algorithm. The tool is developed in Python, and the technology we used is0.6um. We also verified the optimized result in Cadence Virtuoso.Introduction -- On-chip power delivery system -- Fundamentals of on-chip voltage regulator -- LDO design in 45NM technology -- LDO design in technology -- Analog design automation -- Proposed analog design methodology -- Energy efficient FDSOI and FINFET based power gating circuit using data retention transistor -- Conclusion and future wor

    Molecular-genetic analysis of natural variation in photoperiodic flowering of Arabidopsis thaliana

    Get PDF
    In Arabidopsis thaliana, the focus of my research, three developmental switches controlling the life cycle can be recognised. The first is germination that separates embryonic from post-embryonic development. The second signals the transition from the juvenile to the adult vegetative phase while the third, flowering, marks the initiation of the reproductive phase (Isabel Baurle and Caroline Dean, Cell 2006). All three exhibit both external (environmental) and endogenous (hormones) regulation. Natural genetic variation, namely phenotypic diversity due to genetic differences between individuals of the same species, has been reported both for germination and flowering initiation (Bentsink et al., PNAS 2006; O Neill et al., TAG 2008). Since individuals of Arabidopsis, commonly referred to as accessions, are collected from a variety of locations, it is believed that this genetic diversity reflects differences in the seasonal oscillations of environmental cues among the collection sites leading to local adaptation. Although natural genetic variation as a tool has been used in the study of flowering initiation in Arabidopsis (Alonso-Blanco and Maarten Koornneef, Trends in Plant Science 2000) a systematic survey that focuses mainly on the photoperiodic aspect of this regulation has been lacking. In order to expand the current knowledge two approaches were designed. First a survey for natural genetic variation in the flowering responses of phylogenetically distant Arabidopsis accessions under six different photoperiods was made. In parallel the transgenic equivalents of the same accessions, carrying a promoter fusion of the flowering time and circadian clock gene GIGANTEA (GI) were screened in the same photoperiods as for flowering time in order to detect for the first time trans-specific natural variation in the circadian regulation of an evening gene. Here I present evidence that natural genetic variation is present in a wide range of photoperiods both for the circadian clock and for flowering initiation per se. The flowering time responses are compared with the ones of mutants and transgenic lines of previously identified flowering time genes and I show that the affected known genes cannot fully cover the different patterns of day length discrimination that the natural accessions exhibit. Five different mapping populations were constructed by selecting interesting accessions from both screens, which led to the identification of new as well as known QTL, which alter various circadian and flowering responses between short and long days of similar duration. Generating advanced genetic material allows fine mapping and eventually cloning of some of the loci, while identification of genome-wide patterns of genetic interactions reveals additional loci that classical QTL mapping approaches cannot detect. Using RT-PCR and in situ hybridisation, I link this novel natural genetic variation between similar long day lengths with molecular variability in the temporal and spatial expression of flowering time genes FT and SOC1 thereby also demonstrating the tight dependence of the SAM floral commitment on the FT florigen. Finally I show that in nature, genetic variability in the property of enhanced photoperiod discrimination under similar long days, is enough to prevent winter flowering in a plant without any requirements for vernalization. Cologne, 200
    • โ€ฆ
    corecore