16 research outputs found

    Datacenter Design for Future Cloud Radio Access Network.

    Full text link
    Cloud radio access network (C-RAN), an emerging cloud service that combines the traditional radio access network (RAN) with cloud computing technology, has been proposed as a solution to handle the growing energy consumption and cost of the traditional RAN. Through aggregating baseband units (BBUs) in a centralized cloud datacenter, C-RAN reduces energy and cost, and improves wireless throughput and quality of service. However, designing a datacenter for C-RAN has not yet been studied. In this dissertation, I investigate how a datacenter for C-RAN BBUs should be built on commodity servers. I first design WiBench, an open-source benchmark suite containing the key signal processing kernels of many mainstream wireless protocols, and study its characteristics. The characterization study shows that there is abundant data level parallelism (DLP) and thread level parallelism (TLP). Based on this result, I then develop high performance software implementations of C-RAN BBU kernels in C++ and CUDA for both CPUs and GPUs. In addition, I generalize the GPU parallelization techniques of the Turbo decoder to the trellis algorithms, an important family of algorithms that are widely used in data compression and channel coding. Then I evaluate the performance of commodity CPU servers and GPU servers. The study shows that the datacenter with GPU servers can meet the LTE standard throughput with 4ร— to 16ร— fewer machines than with CPU servers. A further energy and cost analysis show that GPU servers can save on average 13ร— more energy and 6ร— more cost. Thus, I propose the C-RAN datacenter be built using GPUs as a server platform. Next I study resource management techniques to handle the temporal and spatial traffic imbalance in a C-RAN datacenter. I propose a โ€œhill-climbingโ€ power management that combines powering-off GPUs and DVFS to match the temporal C-RAN traffic pattern. Under a practical traffic model, this technique saves 40% of the BBU energy in a GPU-based C-RAN datacenter. For spatial traffic imbalance, I propose three workload distribution techniques to improve load balance and throughput. Among all three techniques, pipelining packets has the most throughput improvement at 10% and 16% for balanced and unbalanced loads, respectively.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120825/1/qizheng_1.pd

    Turbo-Entzerrung: Implementierungsaspekte fรผr Software Defined Radios

    Get PDF
    Damit mobile Funkkommunikation mit SDR-basierten Funksystemen zuverlรคssig mithilfe eines Turbo-Entzerrers in Echtzeit durchgefรผhrt werden kann, muss dessen Implementierung an die Hardware eines SDRs angepasst werden. Die Arbeit befasst sich mit den Implementierungsaspekten der Turbo-Entzerrung fรผr SDRs und analysiert die Parallelisierbarkeit und die Festkommaaspekte des Verfahrens im Detail. Die Ergebnisse kรถnnen auf ein breites Spektrum von Prozessoren oder Logiken angewendet werden

    Reconfigurable Antenna Systems: Platform implementation and low-power matters

    Get PDF
    Antennas are a necessary and often critical component of all wireless systems, of which they share the ever-increasing complexity and the challenges of present and emerging trends. 5G, massive low-orbit satellite architectures (e.g. OneWeb), industry 4.0, Internet of Things (IoT), satcom on-the-move, Advanced Driver Assistance Systems (ADAS) and Autonomous Vehicles, all call for highly flexible systems, and antenna reconfigurability is an enabling part of these advances. The terminal segment is particularly crucial in this sense, encompassing both very compact antennas or low-profile antennas, all with various adaptability/reconfigurability requirements. This thesis work has dealt with hardware implementation issues of Radio Frequency (RF) antenna reconfigurability, and in particular with low-power General Purpose Platforms (GPP); the work has encompassed Software Defined Radio (SDR) implementation, as well as embedded low-power platforms (in particular on STM32 Nucleo family of micro-controller). The hardware-software platform work has been complemented with design and fabrication of reconfigurable antennas in standard technology, and the resulting systems tested. The selected antenna technology was antenna array with continuously steerable beam, controlled by voltage-driven phase shifting circuits. Applications included notably Wireless Sensor Network (WSN) deployed in the Italian scientific mission in Antarctica, in a traffic-monitoring case study (EU H2020 project), and into an innovative Global Navigation Satellite Systems (GNSS) antenna concept (patent application submitted). The SDR implementation focused on a low-cost and low-power Software-defined radio open-source platform with IEEE 802.11 a/g/p wireless communication capability. In a second embodiment, the flexibility of the SDR paradigm has been traded off to avoid the power consumption associated to the relevant operating system. Application field of reconfigurable antenna is, however, not limited to a better management of the energy consumption. The analysis has also been extended to satellites positioning application. A novel beamforming method has presented demonstrating improvements in the quality of signals received from satellites. Regarding those who deal with positioning algorithms, this advancement help improving precision on the estimated position

    Design of large polyphase filters in the Quadratic Residue Number System

    Full text link

    ์žฌ๊ตฌ์„ฑํ˜• ๊ตฌ์กฐ์—์„œ์˜ ํšจ์œจ์ ์ธ ์กฐ๊ฑด์‹คํ–‰ ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2013. 8. ์ตœ๊ธฐ์˜.์žฌ๊ตฌ์„ฑํ˜• ๊ตฌ์กฐ๋Š” ์—ฐ์‚ฐ๋Ÿ‰์ด ๋งŽ์€ ํ”„๋กœ๊ทธ๋žจ์„ ๋‚ด์žฅํ˜• ์‹œ์Šคํ…œ์—์„œ ๊ฐ€์†์‹œํ‚ค๋Š” ๋ฐ ์ ํ•ฉํ•œ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์ด๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋งŽ์€ ์—ฐ์‚ฐ์œ ๋‹›๋“ค๊ณผ ํ•˜๋‚˜์˜ ์ปจํŠธ๋กค๋Ÿฌ๋กœ ๊ตฌ์„ฑ๋˜์–ด ๊ณ ์„ฑ๋Šฅ, ์œ ์—ฐ์„ฑ, ์ €์ „๋ ฅ์„ ๋™์‹œ์— ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ด์ค€๋‹ค. ๋งŽ์€ ์—ฐ์‚ฐ์œ ๋‹›์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•œ ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ๋Š” ์‘์šฉํ”„๋กœ๊ทธ๋žจ์˜ ์‹คํ–‰์†๋„๋ฅผ ๋น ๋ฅด๊ฒŒ ํ•˜๋ฉฐ, ์žฌ๊ตฌ์„ฑ ๊ธฐ๋Šฅ์€ ๋‹ค์–‘ํ•œ ์‘์šฉํ”„๋กœ๊ทธ๋žจ์—์˜ ํ™œ์šฉ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด์ค€๋‹ค. ๋˜ํ•œ, ๋ช…๋ น์–ด์™€ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์Šค์ผ€์ฅด์„ ๋ฏธ๋ฆฌ ์ •ํ•ด๋†“์Œ์œผ๋กœ์จ ์ œ์–ด๊ตฌ์กฐ๋ฅผ ๋‹จ์ˆœํ™”์‹œํ‚ฌ ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ด๋Š” ์—ฐ์‚ฐ๋Ÿ‰ ๋Œ€๋น„ ์ „๋ ฅ์†Œ๋ชจ๋ฅผ ์ตœ์†Œํ•œ์œผ ๋กœ ์ค„์—ฌ์ค€๋‹ค. ํ•˜์ง€๋งŒ ์‘์šฉํ”„๋กœ๊ทธ๋žจ์ด ๋ณต์žกํ•ด์ง์— ๋”ฐ๋ผ ์—ฐ์‚ฐ๋Ÿ‰์ด ๋งŽ์€ ๋ถ€๋ถ„๋“ค์— ๋ถ„๊ธฐ๋ฌธ์ด ์ƒ๊ธฐ๊ฒŒ ๋˜์—ˆ์œผ๋ฉฐ ์ด๋Š” ์žฌ๊ตฌ์„ฑํ˜• ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•จ์— ์žˆ์–ด ํฐ ์œ„ํ˜‘์ด ๋˜๊ณ  ์žˆ๋‹ค. ๋ถ„๊ธฐ๋ฌธ์„ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋Š” ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ํ•˜๋‚˜์ด๊ธฐ ๋•Œ๋ฌธ์— ์ปจํŠธ๋กค๋Ÿฌ์— ๋ณ‘๋ชฉํ˜„์ƒ์ด ๋ฐœ์ƒํ•˜๊ฑฐ๋‚˜ ๋™์‹œ์— ์„œ๋กœ ๋‹ค๋ฅธ ์ œ์–ด๋ฅผ ์š”๊ตฌํ•˜๊ฒŒ ๋˜๋ฉด ํ•ด๋‹น ํ”„๋กœ๊ทธ๋žจ์€ ๊ฐ€์†์ด ๋ถˆ๊ฐ€๋Šฅํ•ด์ง„๋‹ค. ์กฐ๊ฑด์‹คํ–‰์ด๋ผ๋Š” ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ์ด๋ฅผ ๋ถ€๋ถ„์ ์œผ๋กœ ํ•ด์†Œํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ๊ธฐ์กด์— ๊ฐœ๋ฐœ๋˜์–ด ์žˆ๋Š” ์กฐ๊ฑด์‹คํ–‰ ๊ธฐ์ˆ ๋“ค์€ ์žฌ๊ตฌ์„ฑํ˜• ๊ตฌ์กฐ์— ์„ฑ๋Šฅ ๋ฐ ์ „๋ ฅ์†Œ๋ชจ ๋ฉด์—์„œ ๋ถ€์ •์ ์ธ ์˜ํ–ฅ์„ ๋ผ์นœ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์—ฐ์‚ฐ๋Ÿ‰์ด ๋งŽ์ง€๋งŒ ๋ถ„๊ธฐ๋ฌธ์„ ๊ฐ€์ง„ ์‘์šฉํ”„๋กœ๊ทธ๋žจ์—์„œ ์กฐ๊ฑด์‹คํ–‰์ด ์„ฑ๋Šฅ๊ณผ ์ „๋ ฅ ๋ฉด์—์„œ ์–ด๋– ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ๋ฐํžˆ๋ฉฐ ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ณ ์„ฑ๋Šฅ๊ณผ ์ €์ „๋ ฅ์„ ๊ฐ€์ง„ ์กฐ๊ฑด์‹คํ–‰ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ์— ๋”ฐ๋ฅด๋ฉด ์ œ์•ˆํ•œ ๋ฐฉ์‹์€ ๊ธฐ์กด์˜ ์„ธ๊ฐ€์ง€ ๋ฐฉ์‹๋ณด๋‹ค ์„ฑ๋Šฅ๊ณผ ์ „๋ ฅ์†Œ๋ชจ๋ฅผ ๊ณฑ์œผ๋กœ ํ‘œํ˜„ํ•œ ์ˆ˜์น˜์— ์žˆ์–ด์„œ 11.9%, 14.7%, 23.8% ๋งŒํผ์˜ ์ด๋“์„ ๋ณด์˜€๋‹ค. ๋˜ํ•œ, ์ œ์•ˆํ•œ ์กฐ๊ฑด์‹คํ–‰ ๋ฐฉ๋ฒ•์— ์ ํ•ฉํ•œ ์ปดํŒŒ์ผ ์ฒด๊ณ„๋„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆํ•œ ์กฐ๊ฑด์‹คํ–‰์€ ์ ˆ์ „๋ชจ๋“œ๋ฅผ ์‚ฌ์šฉํ•จ์— ๋”ฐ๋ผ ์ „๋ ฅ์„ ์•„๋‚„ ์ˆ˜ ์žˆ์ง€๋งŒ ๊ธฐ์กด์˜ ์ปดํŒŒ์ผ๋ฐฉ์‹์œผ๋กœ๋Š” ์—ฌ๋Ÿฌ ์กฐ๊ฑด๋ฌธ์„ ๋ณ‘๋ ฌ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋„๋ก ์ปดํŒŒ์ผํ•  ์ˆ˜ ์—†๋Š” ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธด๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฐ ๋ฌธ์ œ๋ฅผ ๋ฐํžˆ๊ณ  ์กฐ๊ฑด๋ฌธ๋“ค์„ ์„œ๋กœ ๋‹ค๋ฅธ ์—ฐ์‚ฐ์œ ๋‹›์— ํ• ๋‹นํ•จ์œผ๋กœ์จ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜๊ณ  ์žˆ๋‹ค. ์ œ์•ˆํ•œ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ๋‹จ์ˆœํ•˜๊ณ  ์ง๊ด€์ ์ธ ๋ฐฉ๋ฒ•์— ๋น„ํ•˜์—ฌ ํ‰๊ท ์ ์œผ๋กœ 2.21๋ฐฐ์˜ ๋†’์€ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค.Coarse-Grained Reconfigurable Architecture (CGRA) is one of viable solutions in embedded systems to accelerate data-intensive applications. It typically consists of an array of processing elements (PEs) and a centralized controller, which can provide high performance, flexibility, and low power. Parallel array processing reduces execution time of applications, reconfigurability of PEs allows changing its functionality, and simplified control structure with static scheduling for instruction fetching and data communication minimizes power consumption. However, as applications become complex so that data-intensive parts are having control flows in them, CGRAs face a challenge for its effectiveness. Since the entire PEs are controlled by a centralized unit, it is impossible to execute programs having control divergence among PEs. To overcome the problem, we can adopt the technique called predicated execution, which is the unique solution known so far, but conventional predication techniques have a negative impact on both performance and power consumption due to longer instruction words and unnecessary instruction-fetching/decoding/nullifying steps. Thus, this thesis reveals performance and power issues in predicated execution when a CGRA executes both data- and control-intensive applications, which have not been well-addressed yet. Then it proposes high-performance and low-power predication mechanisms. Experiments conducted through gate-level simulation show that the proposed mechanism improves energy-delay product by 11.9%, 14.7%, and 23.8% compared to three conventional techniques. In addition, this thesis also reveals mapping issues when mapping applications on CGRAs using the proposed predication. A power-saving mode introduced into PEs prohibits multiple conditionals from being parallelized if conventional mapping algorithms are used. Thus, this thesis proposes the framework to release this problem by mapping conditionals to different PEs. Experiments show that mapping results from the proposed approach lead to 2.21 times higher performance than those of the naรฏve approach.Abstract i Chapter 1 Introduction 1 Chapter 2 Background and Related Work 5 2.1 Coarse-Grained Reconfigurable Architecture . . . . . . . . . . . . 5 2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 Target Domain . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.3 Comparison with Other Architectures . . . . . . . . . . . 6 2.1.4 Application Mapping . . . . . . . . . . . . . . . . . . . . . 8 2.1.5 Target CGRA . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Predicated Execution Technique . . . . . . . . . . . . . . . . . . 11 2.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.3 Different Roles in ILP and DLP processors . . . . . . . . 13 2.2.4 Predication Support on CGRAs . . . . . . . . . . . . . . . 14 Chapter 3 Conventional Predicated Execution Techniques 15 3.1 Partial Predication (Partial) . . . . . . . . . . . . . . . . . . . . 16 3.2 Condition-Based Full Predication (CondFull) . . . . . . . . . . 18 Chapter 4 State-Based Full Predication 23 4.1 Previous Approach (PseudoBranch) . . . . . . . . . . . . . . . 24 4.2 Counter-Based Approach (StateFull) . . . . . . . . . . . . . . 25 4.3 Dual-Issue-Single-Execution (DISE) . . . . . . . . . . . . . . . . 28 4.4 Hybrid Predication . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4.2 StateFull+Partial . . . . . . . . . . . . . . . . . . . . 34 4.4.3 StateFull+Partial+DISE . . . . . . . . . . . . . . . . 35 Chapter 5 Evaluation 39 5.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.1.1 Conventional Techniques . . . . . . . . . . . . . . . . . . . 39 5.1.2 Proposed Techniques . . . . . . . . . . . . . . . . . . . . . 40 5.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3.1 Effect of Predication Mechanism on Power Consumption of a PE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.2 Quantitative Definitions of short-if and long-if . . . . . . 48 5.3.3 Compilation Strategy in StateFull+Partial . . . . . . 48 5.3.4 Conventional Techniques (Partial, CondFull, and PseudoBranch) vs. Proposed StateFull Technique . . . . . 49 5.3.5 Proposed Hybrid Predication Techniques . . . . . . . . . 53 5.3.6 Putting Together . . . . . . . . . . . . . . . . . . . . . . . 54 5.3.7 Speedup of Applications . . . . . . . . . . . . . . . . . . . 57 Chapter 6 Mapping Framework 61 6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.2.1 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.2.2 From IR to CDFG . . . . . . . . . . . . . . . . . . . . . . 64 6.2.3 Separation . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.2.4 CDFG Mapping . . . . . . . . . . . . . . . . . . . . . . . 68 6.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 69 6.4.2 Verification of Mapping Framework . . . . . . . . . . . . . 70 6.4.3 Quality of Mapping Results . . . . . . . . . . . . . . . . . 70 Chapter 7 Conclusion 73 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.2 Applicable Scope and Future Work . . . . . . . . . . . . . . . . . 75 Appendix 77 ๊ตญ๋ฌธ์ดˆ๋ก 93 ๊ฐ์‚ฌ์˜ ๊ธ€ 95Docto
    corecore