51 research outputs found
조건부 자기회귀형 인공신경망을 이용한 제어 가능한 가창 음성 합성
학위논문(박사) -- 서울대학교대학원 : 융합과학기술대학원 지능정보융합학과, 2022. 8. 이교구.Singing voice synthesis aims at synthesizing a natural singing voice from given input information. A successful singing synthesis system is important not only because it can significantly reduce the cost of the music production process, but also because it helps to more easily and conveniently reflect the creator's intentions. However, there are three challenging problems in designing such a system - 1) It should be possible to independently control the various elements that make up the singing. 2) It must be possible to generate high-quality sound sources, 3) It is difficult to secure sufficient training data. To deal with this problem, we first paid attention to the source-filter theory, which is a representative speech production modeling technique. We tried to secure training data efficiency and controllability at the same time by modeling a singing voice as a convolution of the source, which is pitch information, and filter, which is the pronunciation information, and designing a structure that can model each independently. In addition, we used a conditional autoregressive model-based deep neural network to effectively model sequential data in a situation where conditional inputs such as pronunciation, pitch, and speaker are given. In order for the entire framework to generate a high-quality sound source with a distribution more similar to that of a real singing voice, the adversarial training technique was applied to the training process. Finally, we applied a self-supervised style modeling technique to model detailed unlabeled musical expressions. We confirmed that the proposed model can flexibly control various elements such as pronunciation, pitch, timbre, singing style, and musical expression, while synthesizing high-quality singing that is difficult to distinguish from ground truth singing. Furthermore, we proposed a generation and modification framework that considers the situation applied to the actual music production process, and confirmed that it is possible to apply it to expand the limits of the creator's imagination, such as new voice design and cross-generation.가창 합성은 주어진 입력 악보로부터 자연스러운 가창 음성을 합성해내는 것을 목표로 한다. 가창 합성 시스템은 음악 제작 비용을 크게 줄일 수 있을 뿐만 아니라 창작자의 의도를 보다 쉽고 편리하게 반영할 수 있도록 돕는다. 하지만 이러한 시스템의 설계를 위해서는 다음 세 가지의 도전적인 요구사항이 존재한다. 1) 가창을 이루는 다양한 요소를 독립적으로 제어할 수 있어야 한다. 2) 높은 품질 수준 및 사용성을 달성해야 한다. 3) 충분한 훈련 데이터를 확보하기 어렵다. 이러한 문제에 대응하기 위해 우리는 대표적인 음성 생성 모델링 기법인 소스-필터 이론에 주목하였다. 가창 신호를 음정 정보에 해당하는 소스와 발음 정보에 해당하는 필터의 합성곱으로 정의하고, 이를 각각 독립적으로 모델링할 수 있는 구조를 설계하여 훈련 데이터 효율성과 제어 가능성을 동시에 확보하고자 하였다. 또한 우리는 발음, 음정, 화자 등 조건부 입력이 주어진 상황에서 시계열 데이터를 효과적으로 모델링하기 위하여 조건부 자기회귀 모델 기반의 심층신경망을 활용하였다. 마지막으로 레이블링 되어있지 않은 음악적 표현을 모델링할 수 있도록 우리는 자기지도학습 기반의 스타일 모델링 기법을 제안했다. 우리는 제안한 모델이 발음, 음정, 음색, 창법, 표현 등 다양한 요소를 유연하게 제어하면서도 실제 가창과 구분이 어려운 수준의 고품질 가창 합성이 가능함을 확인했다. 나아가 실제 음악 제작 과정을 고려한 생성 및 수정 프레임워크를 제안하였고, 새로운 목소리 디자인, 교차 생성 등 창작자의 상상력과 한계를 넓힐 수 있는 응용이 가능함을 확인했다.1 Introduction 1
1.1 Motivation 1
1.2 Problems in singing voice synthesis 4
1.3 Task of interest 8
1.3.1 Single-singer SVS 9
1.3.2 Multi-singer SVS 10
1.3.3 Expressive SVS 11
1.4 Contribution 11
2 Background 13
2.1 Singing voice 14
2.2 Source-filter theory 18
2.3 Autoregressive model 21
2.4 Related works 22
2.4.1 Speech synthesis 25
2.4.2 Singing voice synthesis 29
3 Adversarially Trained End-to-end Korean Singing Voice Synthesis System 31
3.1 Introduction 31
3.2 Related work 33
3.3 Proposed method 35
3.3.1 Input representation 35
3.3.2 Mel-synthesis network 36
3.3.3 Super-resolution network 38
3.4 Experiments 42
3.4.1 Dataset 42
3.4.2 Training 42
3.4.3 Evaluation 43
3.4.4 Analysis on generated spectrogram 46
3.5 Discussion 49
3.5.1 Limitations of input representation 49
3.5.2 Advantages of using super-resolution network 53
3.6 Conclusion 55
4 Disentangling Timbre and Singing Style with multi-singer Singing Synthesis System 57
4.1Introduction 57
4.2 Related works 59
4.2.1 Multi-singer SVS system 60
4.3 Proposed Method 60
4.3.1 Singer identity encoder 62
4.3.2 Disentangling timbre & singing style 64
4.4 Experiment 64
4.4.1 Dataset and preprocessing 64
4.4.2 Training & inference 65
4.4.3 Analysis on generated spectrogram 65
4.4.4 Listening test 66
4.4.5 Timbre & style classification test 68
4.5 Discussion 70
4.5.1 Query audio selection strategy for singer identity encoder 70
4.5.2 Few-shot adaptation 72
4.6 Conclusion 74
5 Expressive Singing Synthesis Using Local Style Token and Dual-path Pitch Encoder 77
5.1 Introduction 77
5.2 Related work 79
5.3 Proposed method 80
5.3.1 Local style token module 80
5.3.2 Dual-path pitch encoder 85
5.3.3 Bandwidth extension vocoder 85
5.4 Experiment 86
5.4.1 Dataset 86
5.4.2 Training 86
5.4.3 Qualitative evaluation 87
5.4.4 Dual-path reconstruction analysis 89
5.4.5 Qualitative analysis 90
5.5 Discussion 93
5.5.1 Difference between midi pitch and f0 93
5.5.2 Considerations for use in the actual music production process 94
5.6 Conclusion 95
6 Conclusion 97
6.1 Thesis summary 97
6.2 Limitations and future work 99
6.2.1 Improvements to a faster and robust system 99
6.2.2 Explainable and intuitive controllability 101
6.2.3 Extensions to common speech synthesis tools 103
6.2.4 Towards a collaborative and creative tool 104박
무신미 EMS 돌연변이체의 유전적 분석
학위논문 (석사)-- 서울대학교 대학원 : 농업생명과학대학 식물생산과학부, 2018. 2. 강병철.Capsaicinoid is the alkaloid compounds produced in peppers
(Capsicum spp.). They are responsible for pepper pungency or hotness
and is one of the important traits in breeding programs. Although many
studies have been performed to elucidate its biosynthesis, the
biosynthetic pathway is largely based on studies on the similar pathways
of other plants. To understand the biosynthesis of capsaicinoid, a nonpungent
mutant 221-2-1a, developed from pungent 'Yuwol-cho' were
analyzed. 221-2-1a was found to have no mutation in the coding
sequence of Pun1, but the levels of capsaicinoid in their fruits were
drastically decreased compared to that of Yuwol-cho. To identify the
gene(s) responsible for the non-pungent trait in 221-2-1a. Gene
expressions of 12 genes involved in capsaicinoid biosynthesis were
compared between 221-2-1a and Yuwol-cho together with several
selected cultivars. Seven out of 12 genes (pAMT, BCAT, ACL, KAS, FatA,
PAL, and Pun1) showed a significant decrease in their expression levels
in 221-2-1a compared to pungent cultivars. Furthermore, the inheritance
of pungency was studied in a population derived from a between Yuwolcho
and 221-2-1a. The inheritance study showed that the nonpungency
in 221-2-1a is controlled by two recessive genes. To identify the genes
responsible for non-pungency trait, samples of Yuwol-cho and bulked F3
were sequenced and analyzed by MutMap. A total of 11 SNPs were
identified in the intergenic sequences and the candidates were annotated.
Although candidate genes were not capsaicinoid biosynthesic genes, the
candidate genes are believed to be ideal targets in studies to carryout in
the future.I. LITERATURE REVIEW 01
II. INTRODUCTION 07
III. MATERIALS AND METHODS 10
3.1 Plant materials 10
3.2 Sample preparation for capsaicinoid analysis.. 10
3.3 Gibbs screening and HPLC analysis . 11
3.4 Isolation of RNA and cDNA synthesis 13
3.5 Quantitative real-time PCR analysis. 13
3.6 Polymerase chain reaction (PCR) amplification and sequence analysis of Pun1. 17
3.7 Genomic DNA extraction 17
3.8 Whole genome sequencing of wild type and mutant bulk 18
3.9 Alignment of reference sequence and MutMap.. 18
IV. RESULTS 20
4.1 Capsaicinoid measurement in placenta tissues . 20
4.2 Expression analysis of capsaicinoid pathway genes. 23
4.3 Comparison of Pun1 exon sequences . 28
4.4 Evaluation of pungency segregation. 30
4.5 Sequencing of Yuwol-cho and bulked F3 DNA 34
4.6 MutMap analysis for candidate region identification 36
V. DISCUSSION 40
VI. REFERENCES. 44
VII. ABSTRACT IN KOREAN 55
VIII. APPENDIX. 57Maste
오프로딩을 이용한 모바일 기기에서의 실시간 이미지 초해상도 기술
학위논문 (석사)-- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2018. 8. 최성현.The rapid enhancement of camera performances in smartphones has allowed users to take high quality pictures without high-end digital cameras. However, there still remains a large gap between smartphone cameras and digital cameras when in comes to zoom-in functionality. Most smartphones provide only digial zoom-in functionality, where image quality degradation is inevitable when the user enlarges the image. Even the high-end smartphones embedded with optical lens provide limited optical zoom-in capabilities, leaving users with great inconvenience. While users can employ an external optical lens to utilize the optical zoom-in functionality, having to carry around an extra hardware incurs great overhead, not to mention its price.
Image Super Resolution (SR) can be a solution to overcome this limitation by recovering the quality degradation caused by digital zoom-in. Image SR, a technique to restore high frequency details from a Low Resolution (LR) image to obtain a High Resolution (HR) image, has been a traditional field of research in computer vision. As deep learning based, especially Convolutional Neural Network (CNN) based, methods have shown to outperform traditional methods, and have been actively researched in recent years.
In this paper, we exploit deep learning based image SR to replace the optical zoom-in functionality in smartphones without embedded optical lenses. As there are several resource constraints in smartphones~(e.g., computing power, energy, memory), challenges occur when aiming to provide a real-time performance relying solely based on local execution. To tackle the challenge, we propose a server offloading based approach to provide higher frame rate. Through a prototype implementation on Android and extensive experiments in real world environments, we show that our proposed system can provide at least 10~fps.1 Introduction 1
2 Preliminaries 4
2.1 Image Super Resolution (SR) . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Mobile deep learning framework . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Local execution based framework . . . . . . . . . . . . . . . 5
2.2.2 Server offloading based framework . . . . . . . . . . . . . . 6
2.3 What is different about SR in smartphones . . . . . . . . . . . . . . 6
2.3.1 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.2 Resource constraints . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Local or server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Implementation 8
3.1 SR model implementation . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Prototype implementation on Android . . . . . . . . . . . . . . . . . 9
4 Evaluation 11
4.1 SR model performance . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Inference time measured on smartphone and server . . . . . . . . . . 12
4.3 Latency analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3.1 Offloading latency . . . . . . . . . . . . . . . . . . . . . . . 14
4.3.2 Overall latency breakdown . . . . . . . . . . . . . . . . . . . 17
5 Discussion 19
5.1 Perceptual quality of generated images . . . . . . . . . . . . . . . . . 19
5.2 Managing high data rate . . . . . . . . . . . . . . . . . . . . . . . . 20
6 Conclusion 22
Abstract (In Korean) 28
감사의글 31Maste
A Numerical Simulation of Near-Cloud Turbulence Associated with Tropical Cyclone Hagibis
학위논문(석사) -- 서울대학교대학원 : 자연과학대학 협동과정 계산과학전공, 2023. 2. 김정훈.2019년 10월 11일 0840에서 0900 UTC 기간에 북서 태평양 상공을 지나던 항공기가 태풍 하기비스 북서쪽 전면의 약 11 km 고도에서 다발성의 난류를 조우하였다. 난류 발생 지점은 태풍 중심으로부터 500 km 이상 떨어져 있었으며, 항공기 관측 자료에 따르면 에너지 소산률이 0.22 m2/3s-1 이상인 중강도 크기를 갖는 난류를 포함하였다. 따라서 본 연구에서는 본 난류 사례의 발생 메커니즘을 살펴보기 위해 Weather and Forecasting (WRF)을 이용한 수치모델 실험을 수행하였다. 수치실험은 난류 관측 지점을 중심으로 수평격자가 각각 15, 5, 1, 0.2 km인 영역 4개로 구성하였으며, 연직 해상도는 난류 발생 구간인 8-13 km에서 약 280 m로 설정하였다. 자유 대기에서의 연직 혼합 모수화를 위해 Mellor-Yamada 2.5차 난류 종결 방법론을 이용하여 아격자 규모의 난류 운동에너지(subgrid-scale turbulent kinetic energy, 이하 SGS TKE)를 산출하는 Mellor-Yamada-Janjić 방안을 각 영역에 적용하였다. 그 결과 SGS TKE가 0.25 m2s-2 이상인 강한 난류가 국지적으로 세 구간 1) z = 13-15 km, 2) z = 10-12 km, 3) z = 6-8 km에서 모의되었으며, 본 연구에서는 중강도의 난류가 관측된 2) 구간을 중점적으로 분석하였다. 난류 발생 2시간 이전부터 모루운 하층에 위치한 2) 구간에서는 태풍 상층의 고기압성 유출로 유도된 강한 연직 시어에 의한 Kelvin-Helmholtz 불안정이 지속적으로 존재하였다. 이후 0800 UTC 부터 정적 안정도가 감소하여 0840-0900 UTC에는 2) 구간 난류 발생의 직접적 원인인 대류 불안정이 발생하였다. 이때 태풍 상층의 고기압 흐름에 의한 중립 혹은 약한 관성 불안정이 지속적으로 존재했던 2) 구간에서는 난류 관측 시기에 태풍의 북상으로 고기압성 흐름이 강화되어 관성 불안정도가 증가하였다. 즉 관성 불안정 강화시기와 대류 불안정의 발현 시기가 일치하였으며, 이를 통해 관성 불안정이 2) 구간 난류 발생의 주원인인 대류 불안정 발현에 중요 요소로 작용한 것으로 해석된다. 한편, 1) 구간의 난류 발생은 태풍 고기압성 흐름 내 온도 이류 차이에 의한 얕은 대류 불안정, 3) 구간의 난류 발생은 권운 내 떨어지는 눈 입자의 승화 과정으로 발생하는 대류 불안정 때문인 것으로 분석된다.From 0840 to 0900 UTC 11 October 2019, light-or-moderate turbulence events were observed with in situ eddy dissipation rate data provided by Aircraft Meteorological Data Relay at 11 km within the anticyclonic outflow of tropical cyclone (TC) Hagibis over the northwestern Pacific Ocean. The area of turbulence was farther than 500 km from the central of the TC and showed the low density of cloud. The generation mechanism of near-cloud turbulence (NCT) occurred in the northwestern side of the TC was examined using the Weather Research and Forecasting (WRF) model. Four nested model domains with horizontal grid spacings of 15, 5, 1, and 0.2 km and 112 hybrid layers with vertical grid spacing of about 280 m within z = 8-13 km, near altitudes where the NCT encounters occurred were used. The Mellor-Yamada-Janjic scheme was applied in each domain to parameterize local vertical mixings by computing subgrid-scale turbulent kinetic energy (SGS TKE) in free atmosphere by Mellor-Yamada 2.5-level turbulence closure method. The results showed that there were three distinct areas of simulated turbulence, which occurred in 1) z = 13-15 km, 2) z = 10-12 km, and 3) z = 6-8 km layers showing SGS TKE larger than 0.25 m2s-2. We focused on the 10 to 12 km layer in which turbulence was observed. Richardson (Ri) number smaller than 0.25 was found consistently before the time of the incident, which implies Kelvin-Helmholtz instability occurred due to strong vertical wind shear induced by anticyclonic outflow of the TC at the beneath of the cirrus anvil cloud. From 0800 UTC, static stability started to decrease and convective instability occurred during 0840 to 0900 UTC, which produced light-or-moderate level turbulence. At the same time, intensity of inertial instability at z = 10-12 km layer increased with strengthened upper-level anticyclonic outflow where neutral or weak inertial instability was consistently existed due to anticyclonic outflow of the TC. Consequently, we suggested that inertial instability was responsible for the occurrence of convective instability given that the strengthening period of inertial instability was coincided with the manifestation period of convective instability. SGS TKE simulated at z = 13-15 km was due to convective instability induced by the differential thermal advection within the anticyclonic outflow of the TC. SGS TKE found at z = 6-8 km was also induced by convective instability by sublimation of precipitating snow in the beneath of cirrus anvil cloud.1. 서론 1
2. 사례 선정 6
3. 실험 설계 13
4. 실험 결과 개요 17
5. 난류 메커니즘 분석 24
5.1 횡적 권운 밴드 내 난류 발생 24
5.2 태풍 상층 고기압성 흐름 내 난류 발생 40
5.3 모루운 하단 내 난류 발생 61
6. 요약 및 결론 66
참고문헌 71
Abstract 78석
(The) cross-section of equity returns in the KOSPI market using financial leverage
학위논문(석사) - 한국과학기술원 : 금융공학프로그램, 2020.2,[iii, 45 p. :]본 연구는 한국 유가증권시장에서 주식의 횡단면 수익률을 설명하는 요인인 시장 위험, 기업규모 할인, 가치 프리미엄, 변동성을 바탕으로 비재무레버리지 주식 수익률의 횡단면에서 이러한 요인의 역할을 확인한다. 주식 수익률의 횡단면 분석과는 달리, 비재무레버리지 주식 수익률의 횡단면 분석에서는 가치 효과는 감소하며, 기업규모 효과는 여전히 존재한다. 또한, 재무레버리지는 주식 수익률의 이분산성을 유발하는 것을 보인다.한국과학기술원 :금융공학프로그램
Mobile telemedicine system via PSTN
보건정보관리학과/석사[한글]
보건의료 정보화의 핵심 내용 중 하나인 원격진료는 현재 다각도로 시도되고 있으나 대부분 보편화되지 않은 통신 수단을 사용하기 때문에 의료취약지역 및 거동불능 환자에 대한 원격진료의 장점을 충분히 살리지 못하고 있다. 이에 과천보건소의 방문간호 환자들을
대상으로 방문간호사가 일반공중전화망을 사용한 이동형 원격진료 기기를 지참하고 환자의 거주지를 방문하여 보건소의 의사와 영상과 음성을 교환하는 원격진료 시스템을 8개월간 운영하였다.
운영성과를 평가하기 위해 환자의 만족도와 일반진료 대체 가능성으로서의 효용성을 조사한 결과 전체 대상환자 50명 중 38명(76.0%)이 본 원격진료시스템을 통해 도움을 받았다고 응답하였으며 방문간호만을 실시했을 때 환자의 의료기관을 내원 횟수가 환자 1인당 1달 평균 0.64회였으나 원격진료와 방문간호를 동시에 실시한 이후 0.42회로 감소하였다. 대상환자 중 양로원보다 개인가정에 거주하는 환자의 만족도가 높았으며, 화질 및 음질, 연결시간 등 시스템의 성능과 환자의 전산경험, 임상상황은 만족도와 유의한 관련성을 나타내지 못했다.
본 시스템을 통해 현재의 통신환경 내에서 원격진료가 일반화되어 의료 취약지역의 환자들이 보건의료서비스를 손쉽게 받을 수 있게 됨과 아울러 원격진료의 경험축적을 통해 미래 통신환경의 보건정보 시스템 정비에 많은 도움을 받을 수 있을 것으로 기대된다.
[영문]
There is a need for caring the elderly and disabled people without increasing the cost of on-site expertise at the location where home health nursing services are provided.
Recently, telemedicine is changing the traditional form of health care delivery, by providing cost-effective technical solutions to communicate between patients and doctor. Despite of high reliability, ISDN-based telemedicine systems are not widely
used in home health care because of high communication cost.
In this study, an efficient and inexpensive electronic system to transmit audio and video signal in telemedicine via PSTN was presented. The Mobile Telemedicine System via PSTN was developed and implemented to provide health care for elderly and disabled people at Kwachon as a demonstration project. The telemedicine systems consisted of notebook computer and digital camera interconnected via PSTN, which enables communication between the patient at home and the doctor at health center.
After eight months trial of this project, 50 patients are being cared for at the patient's home or the nursing home for the old. The ratio of satisfied patients was 76% and the number of patient's visit to the clinic was reduced. The quality of video and audio did not have an influence on satisfaction. The results suggested that telemedicine was both feasible and acceptable via PSTNrestrictio
실시간 비디오 분석 응용을 위한 엣지-클라우드 협력적 플랫폼
학위논문(박사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2024. 2. 이영기.실시간 비디오 분석은 교통 모니터링, 감시, 개인 식별 및 AR/MR 등 다양한 유용한 서비스의 핵심이 되는 기술이다. 하지만, 강건하고 효율적인 실시간 비디오 분석 시스템을 디자인 하는데에는 많은 어려움이 따른다. 핵심 문제는 자원 한정적인 모바일 기기에서 비디오 스트림을 실시간으로 정밀히 분석하고, 분석 결과를 사용자에게 전달 및 상호 작용을 가능하게 하는 것이다. 특히, 이를 위해서는 고해상도 비디오에 다수의 심층 신경망 (DNNs)을 연속적으로, 동시에 실행해야 한다. 본 논문에서는 MR, 자율주행 등 미래형 실시간 비디오 분석 응용의 워크로드를 특징짓고, 해당 워크로드를 지원하기 위한 엣지-클라우드 협력적 플랫폼을 설계한다. 구체적으로 엣지, 네트워크, 클라우드를 아우르는 엔드-투-엔드 최적화를 수행하여 작업을 실시간 처리량, 낮은 프레임 지연 시간 및 높은 정확도 성능을 지원한다.
먼저, 본 논문에서는 혼잡한 도시 공간에서 대상 사람(예: 실종 아동, 도주 중인 범인)을 실시간으로 추적하는 AR 시스템 EagleEye를 디자인한다. 얼굴 식별은 매
고해상도 비디오 프레임마다 복잡한 DNN의 시퀀스의 반복적 수행을 요구하여, 자원 한정적인 모바일 기기에서 실시간 수행이 매우 어렵다. EagleEye의 핵심 기술은
콘텐츠 적응형 병렬 실행으로, 얼굴 해상도, 자세 등 인식 난이도에 따라 다중 DNN 얼굴 식별 파이프라인을 적응적으로 조절하고, 이를 모바일 및 클라우드의 이기종 프로세서를 협력적으로 활용하여 실시간 수행하는 기술이다. 또한, 본 논문에서는 대상의 예시 얼굴 이미지를 활용하여 저해상도로 캡처된 얼굴에서 얼굴의 세부 정보를 복원하여 정확한 인식을 가능하게 하는 새로운 ICN 및 그 학습 방법론을 디자인인한다. 다양한 실환경 성능 평가 결과, ICN이 저해상도 얼굴 인식 정확도를 크게
향상시키며, 매 1080p 비디오 프레임 당 108 kB의 데이터를 전송만으로 프레임 지연 시간을 최대 9.07배 가속화하는 것을 검증하였다.
다음으로, 본 논문에서는 네트워크-컴퓨트 공동 스케줄링을 통한 엔드-투-엔드 실시간 비디오 분석 시스템 Pendulum을 디자인한다. 비디오 분석 시스템은 동적인
비디오 콘텐츠와 가용 자원 변동으로 인해 네트워크(비디오 스트리밍) 및 컴퓨트(DNN 추론) 단계에서의 자원 병목이 복잡한 패턴으로 번갈아 발생한다. 하지만, 기존 시스템은 네트워크 또는 컴퓨트 단일 스케줄링에 한정되어, 지연 시간/정확도 성능 저하 및 리소스 낭비 문제를 겪는다. 이러한 한계를 극복하기 위해, 본 논문에서는 비디오 비트레이트와 DNN 모델 크기 간의 트레이드오프 관계를 새롭게 발견한다. 이를 활용하여, (i) 효율적이고 확장 가능한 네트워크-컴퓨트 공동 스케줄링 메커니
즘, (ii) 경량 트레이드오프 프로파일러 및 (iii) 다중 사용자 공동 자원 스케줄러로 구성된 엔드-투-엔드 시스템을 디자인한다. 다양한 데이터셋 및 최신 DNN에 대한
실험 결과, Pendulum은 최신 단일 스케줄링 시스템 대비 최대 0.64 mIoU 향상 및 1.29배 높은 처리량을 달성하는 것을 검증하였다.
마지막으로, 본 논문에서는 모바일 GPU에서 다중 DNN 및 렌더링 동시 수행을 지원하는 모바일 플랫폼 Heimdall을 디자인한다. 기존 모바일 딥러닝 프레임워크는
자원경쟁이 없는 환경에서 단일 DNN 실행을 가정하고 설계되어, 다중 DNN 및 렌더링 동시 수행 시 심각한 성능 저하를 겪는다(예: 추론 지연이 59.93에서 1181 ms로 증가, 렌더링 프레임 속도가 30에서 12 fps로 감소). 다중 작업 스케줄링은 데스크톱 GPU에 대해 활발히 연구되었지만, 모바일 GPU에서는 제한된 아키텍처 지원 및 메모리 대역폭으로 인해 적용이 어렵다. 이를 해결하기 위해, 우리는 유사 선점 메커니즘을 디자인하여 DNN을 렌더링 지연 시간을 고려하여 작은 단위로 분할하고,
동시 수행되는 GPU 수행작업 간 우선순위를 지정하고 응용 요구사항을 고려하여 동적으로 스케줄링한다. 다양한 MR 응용 시나리오에 대한 성능 평가 결과, Heimdall
은 DNN 추론 지연 시간을 기존의 멀티 스레딩 접근에 비해 약 15배 감소시키며, 프레임 속도를 11.99에서 29.96 fps로 향상키는 것을 검증하였다.Live video analytics enable various useful services including traffic monitoring, surveillance, person identification, and AR/MR. Despite the huge potential, enabling robust and efficient live video analytics is non-trivial. The core challenge lies in running the unique workload of analyzing the live video stream in real-time and seamlessly delivering the analysis results to the user for interaction on resource-constrained mobile devices. Such workload often requires a continuous and simultaneous execution of multiple Deep Neural Networks (DNNs) on high-resolution videos. In this dissertation, we (i) characterize the workload of emerging live video analytics apps, and (ii) design an edge-cloud cooperative platform to support the workload. Specifically, we perform end-to-end optimization across the edge, network, and cloud to support the workload with real-time throughput, low per-frame latency, and high accuracy.
We first design EagleEye, an AR system to identify missing person(s) in large, crowded urban spaces in real-time. Person identification imposes a unique workload of running a series of complex DNNs multiple times per each high-resolution video frame. Our key approach is Content-Adaptive Parallel Execution, which adapts the multi-DNN face identification pipeline depending on recognition difficulty (e.g., face resolution, pose) and cooperatively execute the workload at low latency using heterogeneous processors on mobile and cloud. We also design a novel ICN and its training methodology that utilize the probes of the target to recover missing facial details in the LR faces to improve the accuracy of the state-of-the-art face identification techniques. Our results show that ICN significantly enhances LR face recognition accuracy (true
positive by 78% with only 14% false positive), and EagleEye accelerates the latency by 9.07× with only 108 KBytes of data offloaded to the cloud.
We next design Pendulum, an end-to-end live video analytics system with network-compute joint scheduling. In practical scenarios, resource bottleneck frequently alternates across the network (video streaming) and compute (DNN inference) stages due to dynamic scene content and resource availability. However, prior systems are mostly designed for network- or compute-only scheduling; they suffer from latency/accuracy fluctuation as well as resource wastage. To overcome the limitations, we newly discover the tradeoff relationship between the video bitrate and DNN complexity. Leveraging this, we design an end-to-end system composed of (i) an efficient and scalable knob control mechanism, (ii) a lightweight tradeoff profiler, and (iii) a multi-user joint resource scheduler. Extensive evaluation on various datasets and state-of-the-art DNNs show that shows that Pendulum achieves up to 0.64 mIoU gain (from 0.17 to 0.81) and 1.29× higher throughput compared to state-of-the-art single-stage scheduling systems.
Finally, we design Heimdall, a mobile platform to support multi-DNN and rendering concurrency on mobile GPUs. We analyze that existing mobile deep learning frameworks are designed for single DNN execution in isolated environments and fail to support multi-DNN and rendering concurrent workload (e.g., inference latency increases from 59.93 to 1181 ms, rendering frame rate drops from 30 to 12 fps). While multi-task scheduling has been actively studied for desktop GPUs (e.g., parallelization, preemption), applying it to mobile GPUs is challenging due to limited architectural support and memory bandwidth. To tackle the challenge, we design a Pseudo-Preemption mechanism which i) breaks down the bulky DNN into smaller units, and ii) prioritizes and flexibly schedules concurrent GPU tasks. Heimdall efficiently supports multiple MR app scenarios, enhancing the frame rate from 11.99 to 29.96 fps while reducing the worst-case DNN inference latency by up to ≈15× compared to the baseline multi-threading approach.Abstract i
Contents iii
List of Tables viii
List of Figures ix
1 Introduction 1
1.1 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Dissertation Statement and Contributions . . . . . . . . . . . . . . . 3
1.2.1 Dissertation Statement . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Proposed Platform Architecture . . . . . . . . . . . . . . . . 4
1.2.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . 6
2 Motivational Studies 7
2.1 Applications and Requirements . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Application Scenarios . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Workload Characterization . . . . . . . . . . . . . . . . . . . 8
2.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Complexity of the State-of-the-art DNNs . . . . . . . . . . . 9
2.2.2 Large Data Size and Compute of Each Analysis Task . . . . . 11
2.2.3 Dynamic Resource Availability and Workload . . . . . . . . . 12
2.2.4 Multi-Task Resource Contention . . . . . . . . . . . . . . . . 14
3 Related Work 17
3.1 Live Video Analytics Applications . . . . . . . . . . . . . . . . . . . 17
3.2 On-Device Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 Mobile Deep Learning Frameworks . . . . . . . . . . . . . . 17
3.2.2 On-Device Continuous Mobile Vision . . . . . . . . . . . . . 18
3.3 Cloud Offloading Systems . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.1 Offloading for Continuous Mobile Vision . . . . . . . . . . . 18
3.3.2 Adaptive Bitrate for Live Video Analytics . . . . . . . . . . . 19
3.3.3 ML Serving in Edge/Cloud Server . . . . . . . . . . . . . . . 19
3.3.4 Edge-Cloud cooperative Inference Systems . . . . . . . . . . 19
3.4 Tiny ML/Efficient Deep Learning . . . . . . . . . . . . . . . . . . . 20
4 EagleEye: AR-based Person Identification in Crowded Urban Spaces 21
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Motivating Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Preliminary Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.1 How Fast Can Humans Identify Faces? . . . . . . . . . . . . 27
4.3.2 How Accurate Can DNNs Identify Faces? . . . . . . . . . . . 28
4.3.3 How Fast Can DNNs Identify Faces? . . . . . . . . . . . . . 31
4.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 EagleEye: System Overview . . . . . . . . . . . . . . . . . . . . . . 31
4.4.1 Design Considerations . . . . . . . . . . . . . . . . . . . . . 31
4.4.2 Operational Flow . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5 Identity Clarification-Enabled Face Identification Pipeline . . . . . . 34
4.5.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5.2 Identity Clarification Network . . . . . . . . . . . . . . . . . 34
4.5.3 Face Recognition and Service Provision . . . . . . . . . . . . 38
4.6 Real-Time Multi-DNN Execution . . . . . . . . . . . . . . . . . . . 39
4.6.1 Workload Characterization . . . . . . . . . . . . . . . . . . . 39
4.6.2 Content-Adaptive Parallel Execution . . . . . . . . . . . . . . 39
4.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.8 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.8.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . 45
4.8.2 Performance Overview . . . . . . . . . . . . . . . . . . . . . 47
4.8.3 Identity Clarification Network . . . . . . . . . . . . . . . . . 48
4.8.4 Content-Adaptive Parallel Execution . . . . . . . . . . . . . . 49
4.8.5 Performance for Varying Crowdedness . . . . . . . . . . . . 52
4.8.6 Performance on Other Mobile Devices . . . . . . . . . . . . . 53
5 Pendulum: Network-Compute Joint Scheduling for Scalable Live Video Analytics 54
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2.1 Target Scenarios and System Goals . . . . . . . . . . . . . . 58
5.2.2 Limitations of Single-Stage Scheduling . . . . . . . . . . . . 59
5.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3.1 Key Idea: Joint Scheduling . . . . . . . . . . . . . . . . . . . 60
5.3.2 Why is Joint Scheduling Possible? . . . . . . . . . . . . . . . 61
5.3.3 Generality of Joint Scheduling . . . . . . . . . . . . . . . . . 63
5.4 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.5 Joint Scheduling Mechanism . . . . . . . . . . . . . . . . . . . . . . 65
5.5.1 Joint Scheduling Knob Selection . . . . . . . . . . . . . . . . 65
5.5.2 Network-Compute Tradeoff Profiler . . . . . . . . . . . . . . 68
5.5.3 Resource Availability Estimator . . . . . . . . . . . . . . . . 71
5.5.4 Other Design Considerations . . . . . . . . . . . . . . . . . . 71
5.6 Multi-User Joint Scheduling . . . . . . . . . . . . . . . . . . . . . . 73
5.6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.6.2 Scheduling Problem Formulation . . . . . . . . . . . . . . . 74
5.6.3 Scheduling Algorithm . . . . . . . . . . . . . . . . . . . . . 74
5.7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.7.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.7.2 End-to-End Improvement . . . . . . . . . . . . . . . . . . . 78
5.7.3 Joint Scheduling on SOTA Systems . . . . . . . . . . . . . . 79
5.7.4 Performance on Other Models & Tasks . . . . . . . . . . . . 79
5.7.5 Performance in Compute Bottleneck . . . . . . . . . . . . . . 80
5.7.6 System Microbenchmarks . . . . . . . . . . . . . . . . . . . 81
6 Heimdall: Mobile GPU Coordination Platform for AR Applications 83
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Analysis on GPU Contention . . . . . . . . . . . . . . . . . . . . . . 86
6.3 Heimdall System Overview . . . . . . . . . . . . . . . . . . . . . . . 88
6.3.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3.2 Design Considerations . . . . . . . . . . . . . . . . . . . . . 90
6.3.3 System Architecture . . . . . . . . . . . . . . . . . . . . . . 91
6.4 Preemption-Enabling DNN Analyzer . . . . . . . . . . . . . . . . . . 92
6.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.4.2 Latency Profiling . . . . . . . . . . . . . . . . . . . . . . . . 93
6.4.3 DNN Partitioning . . . . . . . . . . . . . . . . . . . . . . . . 94
6.5 Pseudo-Preemptive GPU Coordinator . . . . . . . . . . . . . . . . . 95
6.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.5.2 Utility Function . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.5.3 Scheduling Problem and Policy . . . . . . . . . . . . . . . . 97
6.5.4 Greedy Scheduling Algorithm . . . . . . . . . . . . . . . . . 99
6.6 Additional Optimizations . . . . . . . . . . . . . . . . . . . . . . . . 100
6.6.1 Preprocessing and postprocessing . . . . . . . . . . . . . . . 100
6.6.2 CPU Fallback Operators . . . . . . . . . . . . . . . . . . . . 101
6.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.8 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.8.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . 102
6.8.2 Performance Overview . . . . . . . . . . . . . . . . . . . . . 103
6.8.3 DNN Partitioning/Coordination Overhead . . . . . . . . . . . 104
6.8.4 Pseudo-Preemptive GPU Coordinator . . . . . . . . . . . . . 105
6.8.5 Performance for Various App Scenarios . . . . . . . . . . . . 106
6.8.6 DNN Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.8.7 Energy Consumption Overhead . . . . . . . . . . . . . . . . 108
7 Conclusion 109
7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.2.1 Scalability of EagleEye to Other Workloads . . . . . . . . . 110
7.2.2 Generality of Pendulum to Wider Network and System Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.2.3 Impact of Hardware Evolution on Heimdall . . . . . . . . . . 112
7.2.4 Practicality of Proposed System Optimization Techniques . . 113
7.3 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.3.1 Joint Scheduling Extension to App-RAN Cross-Layer Control 114
7.3.2 System Support for 3D Point Cloud Videos . . . . . . . . . . 115
Abstract (In Korean) 144박
모멘텀 전략에 관성은 존재하는가? 국제 주식시장의 실증연구를 바탕으로
학위논문 (석사)-- 서울대학교 대학원 : 경영학과, 2015. 2. 이관휘.I examine the claim argued by Novy-Marx (2012) that the intermediate past performance of stock does better job in explaining conventional momentum profit. Using equity level data from 49 countries, I find that there exists a cross-country variation in the superiority of performance of momentum portfolio based on intermediate past return, and such variation can be explained by the individualism index established by Hofstede (2001). Partly confirming the finding by Chui, Titman, and Wei (2010), I argue that the finding by Novy-Marx is a manifestation of cross-country cultural differences that have direct impact on investor behavior, expressed in terms of momentum profit.1. Introduction 1
2. Data Selection and Methodology 4
3. Term Structure of Momentum around the world 5
4. Behavioral explanation – Individualism index 9
5. Conclusion 14
6. Reference 16
7. Tables and Figures 19
8. Appendix 29
국문초록 39Maste
찾아가는 동주민센터 활용한 서울형 주민자치 가능성 모색 =From local government to citizen initiative: a search for Seoul's model of self-government
- …
