1,927 research outputs found

    Predictive Coding for Dynamic Visual Processing: Development of Functional Hierarchy in a Multiple Spatio-Temporal Scales RNN Model

    Get PDF
    The current paper proposes a novel predictive coding type neural network model, the predictive multiple spatio-temporal scales recurrent neural network (P-MSTRNN). The P-MSTRNN learns to predict visually perceived human whole-body cyclic movement patterns by exploiting multiscale spatio-temporal constraints imposed on network dynamics by using differently sized receptive fields as well as different time constant values for each layer. After learning, the network becomes able to proactively imitate target movement patterns by inferring or recognizing corresponding intentions by means of the regression of prediction error. Results show that the network can develop a functional hierarchy by developing a different type of dynamic structure at each layer. The paper examines how model performance during pattern generation as well as predictive imitation varies depending on the stage of learning. The number of limit cycle attractors corresponding to target movement patterns increases as learning proceeds. And, transient dynamics developing early in the learning process successfully perform pattern generation and predictive imitation tasks. The paper concludes that exploitation of transient dynamics facilitates successful task performance during early learning periods.Comment: Accepted in Neural Computation (MIT press

    Error resilience and concealment techniques for high-efficiency video coding

    Get PDF
    This thesis investigates the problem of robust coding and error concealment in High Efficiency Video Coding (HEVC). After a review of the current state of the art, a simulation study about error robustness, revealed that the HEVC has weak protection against network losses with significant impact on video quality degradation. Based on this evidence, the first contribution of this work is a new method to reduce the temporal dependencies between motion vectors, by improving the decoded video quality without compromising the compression efficiency. The second contribution of this thesis is a two-stage approach for reducing the mismatch of temporal predictions in case of video streams received with errors or lost data. At the encoding stage, the reference pictures are dynamically distributed based on a constrained Lagrangian rate-distortion optimization to reduce the number of predictions from a single reference. At the streaming stage, a prioritization algorithm, based on spatial dependencies, selects a reduced set of motion vectors to be transmitted, as side information, to reduce mismatched motion predictions at the decoder. The problem of error concealment-aware video coding is also investigated to enhance the overall error robustness. A new approach based on scalable coding and optimally error concealment selection is proposed, where the optimal error concealment modes are found by simulating transmission losses, followed by a saliency-weighted optimisation. Moreover, recovery residual information is encoded using a rate-controlled enhancement layer. Both are transmitted to the decoder to be used in case of data loss. Finally, an adaptive error resilience scheme is proposed to dynamically predict the video stream that achieves the highest decoded quality for a particular loss case. A neural network selects among the various video streams, encoded with different levels of compression efficiency and error protection, based on information from the video signal, the coded stream and the transmission network. Overall, the new robust video coding methods investigated in this thesis yield consistent quality gains in comparison with other existing methods and also the ones implemented in the HEVC reference software. Furthermore, the trade-off between coding efficiency and error robustness is also better in the proposed methods

    A New H.264/AVC Error Resilience Model Based on Regions of Interest

    Get PDF
    International audienceVideo transmission over the Internet can sometimes be subject to packet loss which reduces the end-user's Quality of Experience (QoE). Solutions aiming at improving the robustness of a video bitstream can be used to subdue this problem. In this paper, we propose a new Region of Interest-based error resilience model to protect the most important part of the picture from distortions. We conduct eye tracking tests in order to collect the Region of Interest (RoI) data. Then, we apply in the encoder an intra-prediction restriction algorithm to the macroblocks belonging to the RoI. Results show that while no significant overhead is noted, the perceived quality of the video's RoI, measured by means of a perceptual video quality metric, increases in the presence of packet loss compared to the traditional encoding approach

    Understanding user experience of mobile video: Framework, measurement, and optimization

    Get PDF
    Since users have become the focus of product/service design in last decade, the term User eXperience (UX) has been frequently used in the field of Human-Computer-Interaction (HCI). Research on UX facilitates a better understanding of the various aspects of the userโ€™s interaction with the product or service. Mobile video, as a new and promising service and research field, has attracted great attention. Due to the significance of UX in the success of mobile video (Jordan, 2002), many researchers have centered on this area, examining usersโ€™ expectations, motivations, requirements, and usage context. As a result, many influencing factors have been explored (Buchinger, Kriglstein, Brandt & Hlavacs, 2011; Buchinger, Kriglstein & Hlavacs, 2009). However, a general framework for specific mobile video service is lacking for structuring such a great number of factors. To measure user experience of multimedia services such as mobile video, quality of experience (QoE) has recently become a prominent concept. In contrast to the traditionally used concept quality of service (QoS), QoE not only involves objectively measuring the delivered service but also takes into account userโ€™s needs and desires when using the service, emphasizing the userโ€™s overall acceptability on the service. Many QoE metrics are able to estimate the user perceived quality or acceptability of mobile video, but may be not enough accurate for the overall UX prediction due to the complexity of UX. Only a few frameworks of QoE have addressed more aspects of UX for mobile multimedia applications but need be transformed into practical measures. The challenge of optimizing UX remains adaptations to the resource constrains (e.g., network conditions, mobile device capabilities, and heterogeneous usage contexts) as well as meeting complicated user requirements (e.g., usage purposes and personal preferences). In this chapter, we investigate the existing important UX frameworks, compare their similarities and discuss some important features that fit in the mobile video service. Based on the previous research, we propose a simple UX framework for mobile video application by mapping a variety of influencing factors of UX upon a typical mobile video delivery system. Each component and its factors are explored with comprehensive literature reviews. The proposed framework may benefit in user-centred design of mobile video through taking a complete consideration of UX influences and in improvement of mobile videoservice quality by adjusting the values of certain factors to produce a positive user experience. It may also facilitate relative research in the way of locating important issues to study, clarifying research scopes, and setting up proper study procedures. We then review a great deal of research on UX measurement, including QoE metrics and QoE frameworks of mobile multimedia. Finally, we discuss how to achieve an optimal quality of user experience by focusing on the issues of various aspects of UX of mobile video. In the conclusion, we suggest some open issues for future study

    Recognizing recurrent neural networks (rRNN): Bayesian inference for recurrent neural networks

    Get PDF
    Recurrent neural networks (RNNs) are widely used in computational neuroscience and machine learning applications. In an RNN, each neuron computes its output as a nonlinear function of its integrated input. While the importance of RNNs, especially as models of brain processing, is undisputed, it is also widely acknowledged that the computations in standard RNN models may be an over-simplification of what real neuronal networks compute. Here, we suggest that the RNN approach may be made both neurobiologically more plausible and computationally more powerful by its fusion with Bayesian inference techniques for nonlinear dynamical systems. In this scheme, we use an RNN as a generative model of dynamic input caused by the environment, e.g. of speech or kinematics. Given this generative RNN model, we derive Bayesian update equations that can decode its output. Critically, these updates define a 'recognizing RNN' (rRNN), in which neurons compute and exchange prediction and prediction error messages. The rRNN has several desirable features that a conventional RNN does not have, for example, fast decoding of dynamic stimuli and robustness to initial conditions and noise. Furthermore, it implements a predictive coding scheme for dynamic inputs. We suggest that the Bayesian inversion of recurrent neural networks may be useful both as a model of brain function and as a machine learning tool. We illustrate the use of the rRNN by an application to the online decoding (i.e. recognition) of human kinematics

    Dynamic bandwidth allocation in ATM networks

    Get PDF
    Includes bibliographical references.This thesis investigates bandwidth allocation methodologies to transport new emerging bursty traffic types in ATM networks. However, existing ATM traffic management solutions are not readily able to handle the inevitable problem of congestion as result of the bursty traffic from the new emerging services. This research basically addresses bandwidth allocation issues for bursty traffic by proposing and exploring the concept of dynamic bandwidth allocation and comparing it to the traditional static bandwidth allocation schemes

    ๋”ฅ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•œ ์ •๋ณด ์ „๋‹ฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ์œค์„ฑ๋กœ.์˜ค๋Š˜ ๋‚  ๋”ฅ๋Ÿฌ๋‹์˜ ํฐ ์„ฑ๊ณต์€ ๊ณ ์„ฑ๋Šฅ ๋ณ‘๋ ฌ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์˜ ๋ฐœ์ „๊ณผ ๋ณต์žกํ•œ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์ˆ˜์ง‘๋˜์–ด ์ ‘๊ทผ์ด ๊ฐ€๋Šฅํ•ด์ง„ ์ ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ์‹ค์ œ ์„ธ์ƒ์— ์กด์žฌํ•˜๋Š” ๋” ์–ด๋ ค์šด ๋ฌธ์ œ๋“ค์„ ํ’€๊ณ ์žํ•  ๋•Œ๋Š” ๋”์šฑ ๋” ์„ฌ์„ธํ•˜๊ณ  ๋ณต์žกํ•œ ๋ชจ๋ธ๊ณผ ์ด ๋ชจ๋ธ์„ ์„ฑ๊ณต์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋Œ€ํ•œ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•„์š”ํ•œ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ์ ๋“ค์€ ๋ชจ๋ธ ์ˆ˜ํ–‰ ์‹œ ์—ฐ์‚ฐ ์˜ค๋ฒ„ํ—ค๋“œ์™€ ์ „๋ ฅ ์†Œ๋ชจ๋ฅผ ๊ธ‰๊ฒฉํ•˜๊ฒŒ ์ฆ๊ฐ€์‹œํ‚ฌ ์ˆ˜ ๋ฐ–์— ์—†๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ๋“ค์„ ๊ทน๋ณตํ•˜๋Š” ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•๋“ค ์ค‘ ํ•˜๋‚˜๋กœ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๊ฐ€ ์ตœ๊ทผ ๋งŽ์€ ์ฃผ๋ชฉ์„ ๋ฐ›๊ณ  ์žˆ๋‹ค. ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋Š” ์ œ 3์„ธ๋Œ€ ์ธ๊ณต ์‹ ๊ฒฝ๋ง์œผ๋กœ ๋ถˆ๋ฆฌ๋ฉฐ ์ด๋ฒคํŠธ ์ค‘์‹ฌ์˜ ๋™์ž‘์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜์—ฌ ์ €์ „๋ ฅ์ด ๊ฐ€์žฅ ํฐ ์žฅ์ ์ด๋‹ค. ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋Š” ์‹ค์ œ ์ธ๊ฐ„์˜ ๋‡Œ์—์„œ ๋‰ด๋Ÿฐ๋“ค ๊ฐ„ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•˜๋Š” ๋ฐฉ์‹์„ ๋ชจ๋ฐฉํ•˜๋ฉฐ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿฐ์„ ์—ฐ์‚ฐ ๋‹จ์œ„๋กœ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋‹ค. ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋Š” ์ƒ๋ฌผํ•™์  ์‹ ๊ฒฝ๊ณ„์™€ ๋™์ผํ•˜๊ฒŒ ์‹œ๊ฐ„์  ์ •๋ณด๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋งค์šฐ ๋›ฐ์–ด๋‚œ ์—ฐ์‚ฐ ๋Šฅ๋ ฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋Š” ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜์™€ ๊ฐ™์€ ๋น„๊ต์  ์‰ฌ์šด ์‘์šฉ์—๋งŒ ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์œผ๋ฉฐ ์–•์€ ์ธ๊ณต ์‹ ๊ฒฝ๋ง๊ณผ ๊ฐ„๋‹จํ•œ ๋ฐ์ดํ„ฐ์…‹์—์„œ๋งŒ ์ฃผ๋กœ ์ˆ˜ํ–‰๋˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ œ์•ฝ์ด ์กด์žฌํ•˜๋Š” ๊ฐ€์žฅ ํฐ ์š”์ธ ์ค‘ ํ•˜๋‚˜๋Š” ์ŠคํŒŒ์ดํฌ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์— ์ ํ•ฉํ•œ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์•„์ง ์กด์žฌํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ŠคํŒŒ์ดํฌ๋กœ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•˜๊ณ  ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฏธ๋ถ„์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ๋”ฐ๋ผ์„œ ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ์—ญ์ „ํŒŒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์‚ฌ์šฉ์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ๋”ฅ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋ณด๋‹ค ๋” ์–ด๋ ค์šด ํšŒ๊ท€ ๋ฌธ์ œ (๊ฐ์ฒด ์ธ์‹)์— ์ ์šฉํ•ด ๋ณด๊ณ , ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ์„ฑ๋Šฅ์— ๋ฒ„๊ธˆ๊ฐ€๋Š” ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ์„ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œ์—์„œ ์ฒ˜์Œ์œผ๋กœ ์ œ์•ˆํ•œ๋‹ค. ๋” ๋‚˜์•„๊ฐ€, ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ๊ณผ ์ง€์—ฐ์‹œ๊ฐ„, ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ํ–ฅ์ƒ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ์ฃผ์ œ๋กœ ๋‚˜๋ˆ„์–ด ์„ค๋ช…ํ•œ๋‹ค: (a) ๋”ฅ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์—์„œ์˜ ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ, (b) ๋”ฅ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์—์„œ์˜ ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ๋ฐ ํšจ์œจ์„ฑ ํ–ฅ์ƒ. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ํ†ตํ•ด ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•œ ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ์„ ๋”ฅ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์—์„œ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์€ ๋”ฅ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์—์„œ์˜ ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ์ด๋‹ค. ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ์€ Spiking-YOLO๋กœ ๋ถ€๋ฅด๊ณ , ์ €์ž๋“ค์ด ์•„๋Š” ๋ฐ”์— ์˜ํ•˜๋ฉด PASCAL VOC, MS COCO์™€ ๊ฐ™์€ ๋ฐ์ดํ„ฐ ์…‹์—์„œ ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ์„ฑ๋Šฅ์— ๋ฒ„๊ธˆ๊ฐ€๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค€ ์ฒซ ๋ฒˆ์งธ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ์ด๋‹ค. Spiking-YOLO์—์„œ๋Š” ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ๋ฒˆ ์งธ๋Š” ์ฑ„๋„ ๋ณ„ ๊ฐ€์ค‘์น˜ ์ •๊ทœํ™”์ด๊ณ  ๋‘๋ฒˆ์งธ๋Š” ๋ถˆ๊ท ํ˜• ํ•œ๊ณ„ ์ „์••์„ ๊ฐ€์ง€๋Š” ์–‘์Œ์ˆ˜ ๋‰ด๋Ÿฐ์ด๋‹ค. ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•œ ์ •๋ณด๋ฅผ ๋”ฅ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์—์„œ ์ „๋‹ฌ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, Spiking-YOLO๋Š” PASCAL VOC์™€ MS COCO ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ๊ฐ์ฒด ์ธ์‹๋ฅ ์˜ 98%์— ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ๋˜ํ•œ Spiking-YOLO๊ฐ€ ๋‰ด๋กœ๋ชจํ”ฝ ์นฉ์— ๊ตฌํ˜„๋˜์—ˆ์Œ ๊ฐ€์ •ํ•˜์˜€์„ ๋•Œ, Tiny YOLO๋ณด๋‹ค ์•ฝ 280์˜ ์—๋„ˆ์ง€๋ฅผ ์ ๊ฒŒ ์†Œ๋ชจํ•˜์˜€๊ณ  ๊ธฐ์กด์˜ DNN-to-SNN ์ „ํ™˜ ๋ฐฉ๋ฒ•๋“ค ๋ณด๋‹ค 2.3๋ฐฐ์—์„œ 4๋ฐฐ ๋” ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์€ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์— ์กฐ๊ธˆ ๋” ํšจ์œจ์ ์ธ ์—ฐ์‚ฐ ๋Šฅ๋ ฅ์„ ๋ถ€์—ฌํ•˜๋Š”๋ฐ ์ค‘์ ์„ ์ฃผ๊ณ  ์žˆ๋‹ค. ๋น„๋ก ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๊ฐ€ ํฌ๋ฐ•ํ•œ ์–‘์˜ ์ŠคํŒŒ์ดํฌ๋กœ ์ •๋ณด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ „๋‹ฌํ•˜๋ฉฐ ์—ฐ์‚ฐ ์˜ค๋ฒ„ํ—ค๋“œ์™€ ์—๋„ˆ์ง€ ์†Œ๋ชจ๊ฐ€ ์ ์ง€๋งŒ, ๋‘ ๊ฐ€์ง€ ๋งค์šฐ ์ค‘์š”ํ•œ ๋ฌธ์ œ๋“ค์ด ์กด์žฌํ•œ๋‹ค: (a) ์ง€์—ฐ์†๋„: ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ํƒ€์ž„์Šคํƒญ, (b) ์‹œ๋ƒ…ํ‹ฑ ์—ฐ์‚ฐ์ˆ˜: ์ถ”๋ก  ์‹œ ์ƒ์„ฑ๋œ ์ด ์ŠคํŒŒ์ดํฌ์˜ ์ˆ˜. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋“ค์„ ์ ์ ˆํžˆ ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ•œ๋‹ค๋ฉด ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ํฐ ์žฅ์ ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋Š” ์—๋„ˆ์ง€์™€ ์ „๋ ฅ ํšจ์œจ์„ฑ์ด ํฌ๊ฒŒ ์ €ํ•˜๋  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ•œ๊ณ„ ์ „์•• ๊ท ํ˜• ๋ฐฉ๋ฒ•๋ก ์„ ์ƒˆ๋กœ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์€ ๋ฒ ์ด์‹œ์•ˆ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์žฅ ์ตœ์ ์˜ ํ•œ๊ณ„์ „์•• ๊ฐ’์„ ์ฐพ๋Š”๋‹ค. ๋˜ํ•œ ๋ฒ ์ด์‹œ์•ˆ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ง€์—ฐ์†๋„๋‚˜ ์‹œ๋ƒ…ํ‹ฑ ์—ฐ์‚ฐ์ˆ˜ ๋“ฑ์˜ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋””์ž์ธํ•œ๋‹ค. ๋” ๋‚˜์•„๊ฐ€, ๋‘ ๋‹จ๊ณ„์˜ ํ•œ๊ณ„ ์ „์••์„ ์ œ์•ˆํ•˜์—ฌ ๋†’์€ ์—๋„ˆ์ง€ ํšจ์œจ์„ ๊ฐ€์ง€๋ฉฐ ๋” ๋น ๋ฅด๊ณ  ๋” ์ •ํ™•ํ•œ ๊ฐ์ฒด ์ธ์‹ ๋ชจ๋ธ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ์— ๋”ฐ๋ฅด๋ฉด ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ํ†ตํ•ด state-of-the-art ๊ฐ์ฒด ์ธ์‹๋ฅ ์„ ๋‹ฌ์„ฑํ•˜์˜€๊ณ  ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค PASCAL VOC์—์„œ๋Š” 2๋ฐฐ, MS COCO์—์„œ๋Š” 1.85๋ฐฐ ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋˜ํ•œ ์‹œ๋ƒ…ํ‹ฑ ์—ฐ์‚ฐ์ˆ˜๋„ PASCAL VOC์—์„œ๋Š” 40.33%, MS COCO์—์„œ๋Š” 45.31%๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ์—ˆ๋‹ค.One of the primary reasons behind the recent success of deep neural networks (DNNs) lies in the development of high-performance parallel computing systems and the availability of enormous amounts of data for training a complex model. Nonetheless, solving such advanced machine learning problems in real world applications requires a more sophisticated model with a vast number of parameters and training data, which leads to substantial amounts of computational overhead and power consumption. Given these circumstances, spiking neural networks (SNNs) have attracted growing interest as the third generation of neural networks due to their event-driven and low-powered nature. SNNs were introduced to mimic how information is encoded and processed in the human brain by employing spiking neurons as computation units. SNNs utilize temporal aspects in information transmission as in biological neural systems, thus providing sparse yet powerful computing ability. SNNs have been successfully applied in several applications, but these applications only include relatively simple tasks such as image classification, and are limited to shallow neural networks and datasets. One of the primary reasons for the limited application scope is the lack of scalable training algorithms attained from non-differential spiking neurons. In this dissertation, we investigate deep SNNs in a much more challenging regression problem (i.e., object detection), and propose a first object detection model in deep SNNs which is able to achieve comparable results to those of DNNs in non-trivial datasets. Furthermore, we introduce novel approaches to improve performance of the object detection model in terms of accuracy, latency and energy efficiency. This dissertation contains mainly two approaches: (a) object detection model in deep SNNs, and (b) improving performance of object detection model in deep SNNs. Consequently, the two approaches enable fast and accurate object detection in deep SNNs. The first approach is an object detection model in deep SNNs. We present a spiked-based object detection model, called Spiking-YOLO. To the best of our knowledge, Spiking-YOLO is the first spiked-based object detection model that is able to achieve comparable results to those of DNNs on a non-trivial dataset, namely PASCAL VOC and MS COCO. In doing so, we introduce two novel methods: a channel-wise weight normalization and a signed neuron with imbalanced threshold, both of which provide fast and accurate information transmission in deep SNNs. Our experiments show that Spiking-YOLO achieves remarkable results that are comparable (up to 98%) to those of Tiny YOLO (DNNs) on PASCAL VOC and MS COCO. Furthermore, Spiking-YOLO on a neuromorphic chip consumes approximately 280 times less energy than Tiny YOLO, and converges 2.3 to 4 times faster than previous DNN-to-SNN conversion methods. The second approach aims to provide a more effective form of computational capabilities in SNNs. Even though, SNNs enable sparse yet efficient information transmission through spike trains, leading to exceptional computational and energy efficiency, the critical challenges in SNNs to date are two-fold: (a) latency: the number of time steps required to achieve competitive results and (b) synaptic operations: the total number of spikes generated during inference. Without addressing these challenges properly, the potential impact of SNNs may be diminished in terms of energy and power efficiency. We present a threshold voltage balancing method for object detection in SNNs, which utilizes Bayesian optimization to find optimal threshold voltages in SNNs. We specifically design Bayesian optimization to consider important characteristics of SNNs, such as latency and number of synaptic operations. Furthermore, we introduce two-phase threshold voltages to provide faster and more accurate object detection, while providing high energy efficiency. According to experimental results, the proposed methods achieve the state-of-the-art object detection accuracy in SNNs, and converge 2x and 1.85x faster than conventional methods on PASCAL VOC and MS COCO, respectively. Moreover, the total number of synaptic operations is reduced by 40.33% and 45.31% on PASCAL VOC and MS COCO, respectively.Abstract i List of Figures ix List of Tables x 1 Introduction 1 2 Background 10 2.1 Object detection 10 2.2 Spiking Neural Networks 16 2.3 DNN-to-SNN conversion 18 2.4 Hyper-parameter optimization 21 3 Object detection model in deep SNNs 25 3.1 Introduction 25 3.2 Channel-wise weight normalization 27 3.2.1 Conventional weight normalization methods 27 3.2.2 Analysis of limitations in layer-wise weight normalization 29 3.2.3 Proposed weight normalization method 30 3.2.4 Analysis of the improved firing rate 38 3.3 Signed neuron with imbalanced threshold 39 3.3.1 Limitation of leaky-ReLU implementation in SNNs 39 3.3.2 The notion of imbalanced threshold 41 3.4 Evaluation 43 3.4.1 Spiking-YOLO detection results 43 3.4.2 Spiking-YOLO energy efficiency 57 4 Improving performance and efficiency of deep SNNs 60 4.1 Introduction 60 4.2 Threshold voltage balancing through Bayesian optimization 62 4.2.1 Motivation 62 4.2.2 Overall process and setup 67 4.2.3 Design of Bayesian optimization for SNNs 69 4.3 Fast and accurate object detection with two-phase threshold voltages 74 4.3.1 Motivation 74 4.3.2 Phase-1 threshold voltages: fast object detection 76 4.3.3 Phase-2 threshold voltages: accurate detection 76 4.4 Evaluation 79 4.4.1 Experimental setup 79 4.4.2 Experimental results 79 5 Conclusion 85 5.1 Dissertation summary 86 5.2 Discussion 88 5.2.1 Overview of the proposed methods and their usages 88 5.3 Challenges in SNNs 90 5.4 Future Work 92 5.4.1 Extension to various applications and DNN models 92 5.4.2 Further improve efficiency of SNNs 93 5.4.3 Optimization of deep SNNs 94 Bibliography 95 Abstract (In Korean) 110Docto

    Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods

    Get PDF
    This Special Issue is a book composed by collecting documents published through peer review on the research of various advanced technologies related to applications and theories of signal processing for multimedia systems using ML or advanced methods. Multimedia signals include image, video, audio, character recognition and optimization of communication channels for networks. The specific contents included in this book are data hiding, encryption, object detection, image classification, and character recognition. Academics and colleagues who are interested in these topics will find it interesting to read
    • โ€ฆ
    corecore