20 research outputs found

    Development of Features for Recognition of Handwritten Odia Characters

    Get PDF
    In this thesis, we propose four different schemes for recognition of handwritten atomic Odia characters which includes forty seven alphabets and ten numerals. Odia is the mother tongue of the state of Odisha in the republic of India. Optical character recognition (OCR) for many languages is quite matured and OCR systems are already available in industry standard but, for the Odia language OCR is still a challenging task. Further, the features described for other languages canโ€™t be directly utilized for Odia character recognition for both printed and handwritten text. Thus, the prime thrust has been made to propose features and utilize a classifier to derive a significant recognition accuracy. Due to the non-availability of a handwritten Odia database for validation of the proposed schemes, we have collected samples from individuals to generate a database of large size through a digital note maker. The database consists of a total samples of 17, 100 (150 ร— 2 ร— 57) collected from 150 individuals at two different times for 57 characters. This database has been named Odia handwritten character set version 1.0 (OHCS v1.0) and is made available in http://nitrkl.ac.in/Academic/Academic_Centers/Centre_For_Computer_Vision.aspx for the use of researchers. The first scheme divides the contour of each character into thirty segments. Taking the centroid of the character as base point, three primary features length, angle, and chord-to-arc-ratio are extracted from each segment. Thus, there are 30 feature values for each primary attribute and a total of 90 feature points. A back propagation neural network has been employed for the recognition and performance comparisons are made with competent schemes. The second contribution falls in the line of feature reduction of the primary features derived in the earlier contribution. A fuzzy inference system has been employed to generate an aggregated feature vector of size 30 from 90 feature points which represent the most significant features for each character. For recognition, a six-state hidden Markov model (HMM) is employed for each character and as a consequence we have fifty-seven ergodic HMMs with six-states each. An accuracy of 84.5% has been achieved on our dataset. The third contribution involves selection of evidence which are the most informative local shape contour features. A dedicated distance metric namely, far_count is used in computation of the information gain values for possible segments of different lengths that are extracted from whole shape contour of a character. The segment, with highest information gain value is treated as the evidence and mapped to the corresponding class. An evidence dictionary is developed out of these evidence from all classes of characters and is used for testing purpose. An overall testing accuracy rate of 88% is obtained. The final contribution deals with the development of a hybrid feature derived from discrete wavelet transform (DWT) and discrete cosine transform (DCT). Experimentally it has been observed that a 3-level DWT decomposition with 72 DCT coefficients from each high-frequency components as features gives a testing accuracy of 86% in a neural classifier. The suggested features are studied in isolation and extensive simulations has been carried out along with other existing schemes using the same data set. Further, to study generalization behavior of proposed schemes, they are applied on English and Bangla handwritten datasets. The performance parameters like recognition rate and misclassification rate are computed and compared. Further, as we progress from one contribution to the other, the proposed scheme is compared with the earlier proposed schemes

    NON-LINEAR AND SPARSE REPRESENTATIONS FOR MULTI-MODAL RECOGNITION

    Get PDF
    In the first part of this dissertation, we address the problem of representing 2D and 3D shapes. In particular, we introduce a novel implicit shape representation based on Support Vector Machine (SVM) theory. Each shape is represented by an analytic decision function obtained by training an SVM, with a Radial Basis Function (RBF) kernel, so that the interior shape points are given higher values. This empowers support vector shape (SVS) with multifold advantages. First, the representation uses a sparse subset of feature points determined by the support vectors, which significantly improves the discriminative power against noise, fragmentation and other artifacts that often come with the data. Second, the use of the RBF kernel provides scale, rotation, and translation invariant features, and allows a shape to be represented accurately regardless of its complexity. Finally, the decision function can be used to select reliable feature points. These features are described using gradients computed from highly consistent decision functions instead of conventional edges. Our experiments on 2D and 3D shapes demonstrate promising results. The availability of inexpensive 3D sensors like Kinect necessitates the design of new representation for this type of data. We present a 3D feature descriptor that represents local topologies within a set of folded concentric rings by distances from local points to a projection plane. This feature, called as Concentric Ring Signature (CORS), possesses similar computational advantages to point signatures yet provides more accurate matches. CORS produces compact and discriminative descriptors, which makes it more robust to noise and occlusions. It is also well-known to computer vision researchers that there is no universal representation that is optimal for all types of data or tasks. Sparsity has proved to be a good criterion for working with natural images. This motivates us to develop efficient sparse and non-linear learning techniques for automatically extracting useful information from visual data. Specifically, we present dictionary learning methods for sparse and redundant representations in a high-dimensional feature space. Using the kernel method, we describe how the well-known dictionary learning approaches such as the method of optimal directions and KSVD can be made non-linear. We analyse their kernel constructions and demonstrate their effectiveness through several experiments on classification problems. It is shown that non-linear dictionary learning approaches can provide significantly better discrimination compared to their linear counterparts and kernel PCA, especially when the data is corrupted by different types of degradations. Visual descriptors are often high dimensional. This results in high computational complexity for sparse learning algorithms. Motivated by this observation, we introduce a novel framework, called sparse embedding (SE), for simultaneous dimensionality reduction and dictionary learning. We formulate an optimization problem for learning a transformation from the original signal domain to a lower-dimensional one in a way that preserves the sparse structure of data. We propose an efficient optimization algorithm and present its non-linear extension based on the kernel methods. One of the key features of our method is that it is computationally efficient as the learning is done in the lower-dimensional space and it discards the irrelevant part of the signal that derails the dictionary learning process. Various experiments show that our method is able to capture the meaningful structure of data and can perform significantly better than many competitive algorithms on signal recovery and object classification tasks. In many practical applications, we are often confronted with the situation where the data that we use to train our models are different from that presented during the testing. In the final part of this dissertation, we present a novel framework for domain adaptation using a sparse and hierarchical network (DASH-N), which makes use of the old data to improve the performance of a system operating on a new domain. Our network jointly learns a hierarchy of features together with transformations that rectify the mismatch between different domains. The building block of DASH-N is the latent sparse representation. It employs a dimensionality reduction step that can prevent the data dimension from increasing too fast as traversing deeper into the hierarchy. Experimental results show that our method consistently outperforms the current state-of-the-art by a significant margin. Moreover, we found that a multi-layer {DASH-N} has an edge over the single-layer DASH-N

    Irish Machine Vision and Image Processing Conference Proceedings 2017

    Get PDF

    Learning Low-Dimensional Models for Heterogeneous Data

    Full text link
    Modern data analysis increasingly involves extracting insights, trends and patterns from large and messy data collected from myriad heterogeneous sources. The scale and heterogeneity present exciting new opportunities for discovery, but also create a need for new statistical techniques and theory tailored to these settings. Traditional intuitions often no longer apply, e.g., when the number of variables measured is comparable to the number of samples obtained. A deeper theoretical understanding is needed to develop principled methods and guidelines for statistical data analysis. This dissertation studies the low-dimensional modeling of high-dimensional data in three heterogeneous settings. The first heterogeneity is in the quality of samples, and we consider the standard and ubiquitous low-dimensional modeling technique of Principal Component Analysis (PCA). We analyze how well PCA recovers underlying low-dimensional components from high-dimensional data when some samples are noisier than others (i.e., have heteroscedastic noise). Our analysis characterizes the penalty of heteroscedasticity for PCA, and we consider a weighted variant of PCA that explicitly accounts for heteroscedasticity by giving less weight to samples with more noise. We characterize the performance of weighted PCA for all choices of weights and derive optimal weights. The second heterogeneity is in the statistical properties of data, and we generalize the (increasingly) standard method of Canonical Polyadic (CP) tensor decomposition to allow for general statistical assumptions. Traditional CP tensor decomposition is most natural for data with all entries having Gaussian noise of homogeneous variance. Instead, the Generalized CP (GCP) tensor decomposition we propose allows for other statistical assumptions, and we demonstrate its flexibility on various datasets arising in social networks, neuroscience studies and weather patterns. Fitting GCP with alternative statistical assumptions provides new ways to explore trends in the data and yields improved predictions, e.g., of social network and mouse neural data. The third heterogeneity is in the class of samples, and we consider learning a mixture of low-dimensional subspaces. This model supposes that each sample comes from one of several (unknown) low-dimensional subspaces, that taken together form a union of subspaces (UoS). Samples from the same class come from the same subspace in the union. We consider an ensemble algorithm that clusters the samples, and analyze the approach to provide recovery guarantees. Finally, we propose a sequence of unions of subspaces (SUoS) model that systematically captures samples with heterogeneous complexity, and we describe some early ideas for learning and using SUoS models in patch-based image denoising.PHDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/150043/1/dahong_1.pd

    Putting artificial intelligence into wearable human-machine interfaces โ€“ towards a generic, self-improving controller

    Get PDF
    The standard approach to creating a machine learning based controller is to provide users with a number of gestures that they need to make; record multiple instances of each gesture using specific sensors; extract the relevant sensor data and pass it through a supervised learning algorithm until the algorithm can successfully identify the gestures; map each gesture to a control signal that performs a desired outcome. This approach is both inflexible and time consuming. The primary contribution of this research was to investigate a new approach to putting artificial intelligence into wearable human-machine interfaces by creating a Generic, Self-Improving Controller. It was shown to learn two user-defined static gestures with an accuracy of 100% in less than 10 samples per gesture; three in less than 20 samples per gesture; and four in less than 35 samples per gesture. Pre-defined dynamic gestures were more difficult to learn. It learnt two with an accuracy of 90% in less than 6,000 samples per gesture; and four with an accuracy of 70% after 50,000 samples per gesture. The research has resulted in a number of additional contributions: โ€ข The creation of a source-independent hardware data capture, processing, fusion and storage tool for standardising the capture and storage of historical copies of data captured from multiple different sensors. โ€ข An improved Attitude and Heading Reference System (AHRS) algorithm for calculating orientation quaternions that is five orders of magnitude more precise. โ€ข The reformulation of the regularised TD learning algorithm; the reformulation of the TD learning algorithm applied the artificial neural network back-propagation algorithm; and the combination of the reformulations into a new, regularised TD learning algorithm applied to the artificial neural network back-propagation algorithm. โ€ข The creation of a Generic, Self-Improving Predictor that can use different learning algorithms and a Flexible Artificial Neural Network.Open Acces

    ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์˜ ํ–ฅ์ƒ์„ ์œ„ํ•œ ๊นŠ์€ ์‹ ๊ฒฝ๋ง ์–‘์žํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์„ฑ์›์šฉ.์ตœ๊ทผ ๊นŠ์€ ์‹ ๊ฒฝ๋ง(deep neural network, DNN)์€ ์˜์ƒ, ์Œ์„ฑ ์ธ์‹ ๋ฐ ํ•ฉ์„ฑ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ๋Œ€๋ถ€๋ถ„์˜ ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ ๋งŽ์€ ๊ฐ€์ค‘์น˜(parameter) ์ˆ˜์™€ ๊ณ„์‚ฐ๋Ÿ‰์„ ์š”๊ตฌํ•˜์—ฌ ์ž„๋ฒ ๋””๋“œ ์‹œ์Šคํ…œ์—์„œ์˜ ๋™์ž‘์„ ๋ฐฉํ•ดํ•œ๋‹ค. ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ ๋‚ฎ์€ ์ •๋ฐ€๋„์—์„œ๋„ ์ž˜ ๋™์ž‘ํ•˜๋Š” ์ธ๊ฐ„์˜ ์‹ ๊ฒฝ์„ธํฌ๋ฅผ ๋ชจ๋ฐฉํ•˜์˜€๊ธฐ ๋–„๋ฌธ์— ๋‚ฎ์€ ์ •๋ฐ€๋„์—์„œ๋„ ์ž˜ ๋™์ž‘ํ•  ๊ฐ€๋Šฅ์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์ธ๊ณต์‹ ๊ฒฝ๋ง์˜ ์–‘์žํ™”(quantization)๋Š” ์ด๋Ÿฌํ•œ ํŠน์ง•์„ ์ด์šฉํ•œ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ๊นŠ์€ ์‹ ๊ฒฝ๋ง ๊ณ ์ •์†Œ์ˆ˜์  ์–‘์žํ™”๋Š” 8-bit ์ด์ƒ์˜ ๋‹จ์–ด๊ธธ์ด์—์„œ ๋ถ€๋™์†Œ์ˆ˜์ ๊ณผ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜์žˆ์ง€๋งŒ, ๊ทธ๋ณด๋‹ค ๋‚ฎ์€ 1-, 2-bit์—์„œ๋Š” ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง„๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์€ ๋ถˆ๊ท ํ˜• ์–‘์žํ™”๊ธฐ๋‚˜ ์ ์‘์  ์–‘์žํ™” ๋“ฑ์˜ ๋” ์ •๋ฐ€ํ•œ ์ธ๊ณต์‹ ๊ฒฝ๋ง ์–‘์žํ™” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ๊ธฐ์กด์˜ ์—ฐ๊ตฌ์™€ ๋งค์šฐ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ๊ณ ์ • ์†Œ์ˆ˜์  ๋„คํŠธ์›Œํฌ์˜ ์ผ๋ฐ˜ํ™”๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š”๋ฐ ์ดˆ์ ์„ ๋งž์ถ”์—ˆ์œผ๋ฉฐ, ์ด๋ฅผ ์œ„ํ•ด ์žฌํ›ˆ๋ จ(retraining) ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์–‘์žํ™”๋œ ์ธ๊ณต์‹ ๊ฒฝ๋ง์˜ ์„ฑ๋Šฅ์„ ๋ถ„์„ํ•œ๋‹ค. ์„ฑ๋Šฅ ๋ถ„์„์€ ๋ ˆ์ด์–ด๋ณ„ ๋ฏผ๊ฐ๋„ ์ธก์ •(layer-wise sensitivity analysis)์— ๊ธฐ๋ฐ˜ํ•œ๋‹ค. ๋˜ํ•œ ์–‘์žํ™” ๋ชจ๋ธ์˜ ๋„“์ด์™€ ๊นŠ์ด์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ๋„ ๋ถ„์„ํ•œ๋‹ค. ๋ถ„์„๋œ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์–‘์žํ™” ์Šคํ… ์ ์‘ ํ›ˆ๋ จ๋ฒ•(quantization step size adaptation)๊ณผ ์ ์ง„์  ์–‘์žํ™” ํ›ˆ๋ จ ๋ฐฉ๋ฒ•(gradual quantization)์„ ์ œ์•ˆํ•œ๋‹ค. ์–‘์žํ™”๋œ ์‹ ๊ฒฝ๋ง ํ›ˆ๋ จ์‹œ ์–‘์žํ™” ๋…ธ์ด์ฆˆ๋ฅผ ์ ๋‹นํžˆ ์กฐ์ •ํ•˜์—ฌ ์†์‹ค ํ‰๋ฉด(loss surface)์ƒ์— ํ‰ํ‰ํ•œ ๋ฏธ๋‹ˆ๋งˆ(minima)์— ๋„๋‹ฌ ํ•  ์ˆ˜ ์žˆ๋Š” ์–‘์žํ™” ํ›ˆ๋ จ ๋ฐฉ๋ฒ• ๋˜ํ•œ ์ œ์•ˆํ•œ๋‹ค. HLHLp (high-low-high-low-precision)๋กœ ๋ช…๋ช…๋œ ํ›ˆ๋ จ ๋ฐฉ๋ฒ•์€ ์–‘์žํ™” ์ •๋ฐ€๋„๋ฅผ ํ›ˆ๋ จ์ค‘์— ๋†’๊ฒŒ-๋‚ฎ๊ฒŒ-๋†’๊ฒŒ-๋‚ฎ๊ฒŒ ๋ฐ”๊พธ๋ฉด์„œ ํ›ˆ๋ จํ•œ๋‹ค. ํ›ˆ๋ จ๋ฅ (learning rate)๋„ ์–‘์žํ™” ์Šคํ… ์‚ฌ์ด์ฆˆ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์œ ๋™์ ์œผ๋กœ ๋ฐ”๋€๋‹ค. ์ œ์•ˆํ•˜๋Š” ํ›ˆ๋ จ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ ํ›ˆ๋ จ๋œ ์–‘์žํ™” ๋ชจ๋ธ์— ๋น„ํ•ด ์ƒ๋‹นํžˆ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ๋˜ํ•œ ์„ ํ›ˆ๋ จ๋œ ์„ ์ƒ ๋ชจ๋ธ๋กœ ํ•™์ƒ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๋Š” ์ง€์‹ ์ฆ๋ฅ˜(knowledge distillation, KD) ๊ธฐ์ˆ ์„ ์ด์šฉํ•˜์—ฌ ์–‘์žํ™”์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ํŠนํžˆ ์„ ์ƒ ๋ชจ๋ธ์„ ์„ ํƒํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ์ง€์‹ ์ฆ๋ฅ˜์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ๋ถ„์„ํ•œ๋‹ค. ๋ถ€๋™์†Œ์ˆ˜์  ์„ ์ƒ๋ชจ๋ธ๊ณผ ์–‘์žํ™” ๋œ ์„ ์ƒ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ์‹œํ‚จ ๊ฒฐ๊ณผ ์„ ์ƒ ๋ชจ๋ธ์ด ๋งŒ๋“ค์–ด๋‚ด๋Š” ์†Œํ”„ํŠธ๋งฅ์Šค(softmax) ๋ถ„ํฌ๊ฐ€ ์ง€์‹์ฆ๋ฅ˜ํ•™์Šต ๊ฒฐ๊ณผ์— ํฌ๊ฒŒ ์˜ํ–ฅ์„ ์ฃผ๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค. ์†Œํ”„ํŠธ๋งฅ์Šค ๋ถ„ํฌ๋Š” ์ง€์‹์ฆ๋ฅ˜์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ํ†ตํ•ด ์กฐ์ ˆ๋ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ง€์‹์ฆ๋ฅ˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋“ค๊ฐ„์˜ ์—ฐ๊ด€๊ด€๊ณ„ ๋ถ„์„์„ ํ†ตํ•ด ๋†’์€ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ ์ ์ง„์ ์œผ๋กœ ์†Œํ”„ํŠธ ์†์‹ค ํ•จ์ˆ˜ ๋ฐ˜์˜ ๋น„์œจ์„ ํ›ˆ๋ จ์ค‘์— ์ค„์—ฌ๊ฐ€๋Š” ์ ์ง„์  ์†Œํ”„ํŠธ ์†์‹ค ๊ฐ์†Œ(gradual soft loss reducing)๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์—ฌ๋Ÿฌ ์–‘์žํ™”๋ชจ๋ธ์„ ํ‰๊ท ๋‚ด์–ด ๋†’์€ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๊ฐ–๋Š” ์–‘์žํ™” ๋ชจ๋ธ์„ ์–ป๋Š” ํ›ˆ๋ จ ๋ฐฉ๋ฒ•์ธ ํ™•๋ฅ  ์–‘์žํ™” ๊ฐ€์ค‘์น˜ ํ‰๊ท (stochastic quantized weight averaging, SQWA) ํ›ˆ๋ จ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ (1) ๋ถ€๋™์†Œ์ˆ˜์  ํ›ˆ๋ จ, (2) ๋ถ€๋™์†Œ์ˆ˜์  ๋ชจ๋ธ์˜ ์ง์ ‘ ์–‘์žํ™”(direct quantization), (3) ์žฌํ›ˆ๋ จ(retraining)๊ณผ์ •์—์„œ ์ง„๋™ ํ›ˆ๋ จ์œจ(cyclical learning rate)์„ ์‚ฌ์šฉํ•˜์—ฌ ํœธ๋ จ์œจ์ด ์ง„๋™๋‚ด์—์„œ ๊ฐ€์žฅ ๋‚ฎ์„ ๋•Œ ๋ชจ๋ธ๋“ค์„ ์ €์žฅ, (4) ์ €์žฅ๋œ ๋ชจ๋ธ๋“ค์„ ํ‰๊ท , (5) ํ‰๊ท  ๋œ ๋ชจ๋ธ์„ ๋‚ฎ์€ ํ›ˆ๋ จ์œจ๋กœ ์žฌ์กฐ์ • ํ•˜๋Š” ๋‹ค์ค‘ ๋‹จ๊ณ„ ํ›ˆ๋ จ๋ฒ•์ด๋‹ค. ์ถ”๊ฐ€๋กœ ์–‘์žํ™” ๊ฐ€์ค‘์น˜ ๋„๋ฉ”์ธ์—์„œ ์—ฌ๋Ÿฌ ์–‘์žํ™” ๋ชจ๋ธ๋“ค์„ ํ•˜๋‚˜์˜ ์†์‹คํ‰๋ฉด๋‚ด์— ๋™์‹œ์— ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š” ์‹ฌ์ƒ(visualization) ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์‹ฌ์ƒ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด SQWA๋กœ ํ›ˆ๋ จ๋œ ์–‘์žํ™” ๋ชจ๋ธ์€ ์†์‹คํ‰๋ฉด์˜ ๊ฐ€์šด๋ฐ ๋ถ€๋ถ„์— ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์˜€๋‹ค.Deep neural networks (DNNs) achieve state-of-the-art performance for various applications such as image recognition and speech synthesis across different fields. However, their implementation in embedded systems is difficult owing to the large number of associated parameters and high computational costs. In general, DNNs operate well using low-precision parameters because they mimic the operation of human neurons; therefore, quantization of DNNs could further improve their operational performance. In many applications, word-length larger than 8 bits leads to DNN performance comparable to that of a full-precision model; however, shorter word-length such as those of 1 or 2 bits can result in significant performance degradation. To alleviate this problem, complex quantization methods implemented via asymmetric or adaptive quantizers have been employed in previous works. In contrast, in this study, we propose a different approach for quantization of DNNs. In particular, we focus on improving the generalization capability of quantized DNNs (QDNNs) instead of employing complex quantizers. To this end, first, we analyze the performance characteristics of quantized DNNs using a retraining algorithm; we employ layer-wise sensitivity analysis to investigate the quantization characteristics of each layer. In addition, we analyze the differences in QDNN performance for different quantized network sizes. Based on our analyses, two simple quantization training techniques, namely \textit{adaptive step size retraining} and \textit{gradual quantization} are proposed. Furthermore, a new training scheme for QDNNs is proposed, which is referred to as high-low-high-low-precision (HLHLp) training scheme, that allows the network to achieve flat minima on its loss surface with the aid of quantization noise. As the name suggests, the proposed training method employs high-low-high-low precision for network training in an alternating manner. Accordingly, the learning rate is also abruptly changed at each stage. Our obtained analysis results include that the proposed training technique leads to good performance improvement for QDNNs compared with previously reported fine tuning-based quantization schemes. Moreover, the knowledge distillation (KD) technique that utilizes a pre-trained teacher model for training a student network is exploited for the optimization of the QDNNs. We explore the effect of teacher network selection and investigate that of different hyperparameters on the quantization of DNNs using KD. In particular, we use several large floating-point and quantized models as teacher networks. Our experiments indicate that, for effective KD training, softmax distribution produced by a teacher network is more important than its performance. Furthermore, because softmax distribution of a teacher network can be controlled using KD hyperparameters, we analyze the interrelationship of each KD component for QDNN training. We show that even a small teacher model can achieve the same distillation performance as a larger teacher model. We also propose the gradual soft loss reducing (GSLR) technique for robust KD-based QDNN optimization, wherein the mixing ratio of hard and soft losses during training is controlled. In addition, we present a new QDNN optimization approach, namely \textit{stochastic quantized weight averaging} (SQWA), to design low-precision DNNs with good generalization capability using model averaging. The proposed approach includes (1) floating-point model training, (2) direct quantization of weights, (3) capture of multiple low-precision models during retraining with cyclical learning rate, (4) averaging of the captured models, and (5) re-quantization of the averaged model and its fine-tuning with low learning rate. Additionally, we present a loss-visualization technique for the quantized weight domain to elucidate the behavior of the proposed method. Our visualization results indicate that a QDNN optimized using our proposed approach is located near the center of the flat minimum on the loss surface.1.Introduction 1 1.1 Quantization of Deep Neural Networks 1 1.2 Generalization Capability of DNNs 3 1.3 Improved Generalization Capability of QDNNs 3 1.4 Outline of the Dissertation 5 2. Analysis of Fixedpoint Quantization of Deep Neural Networks 6 2.1 Introduction 6 2.2 Fixedpoint Performance Analysis of Deep Neural Networks 8 2.2.1 Model Design of Deep Neural Networks 8 2.2.2 Retrainbased Weight Quantization 10 2.2.3 Quantization Sensitivity Analysis 12 2.2.4 Empirical Analysis 13 2.3 Step Size Adaptation and Gradual Quantization for Retraining of DeepNeural Networks 22 2.3.1 Stepsize adaptation during retraining 22 2.3.2 Gradual quantization scheme 24 2.3.3 Experimental Results 24 2.4 Concluding remarks 30 3. HLHLp:Quantized Neural Networks Training for Reaching Flat Minimain Loss Surface 32 3.1 Introduction 32 3.2 Related Works 33 3.2.1 Quantization of Deep Neural Networks 33 3.2.2 Flat Minima in Loss Surfaces 34 3.3 Training QDNN for IMproved Generalization Capability 35 3.3.1 Analysis of Training with Quantized Weights 35 3.3.2 Highlowhighlowprecision Training 38 3.4 Experimental Results 40 3.4.1 Image Classification with CNNs 41 3.4.2 Language Modeling on PTB and WikiText2 44 3.4.3 Speech Recognition on WSJ Corpus 48 3.4.4 Discussion 49 3.5 Concluding Remarks 55 4 Knowledge Distillation for Optimization of Quantized Deep Neural Networks 56 4.1 Introduction 56 4.2 Quantized Deep Neural Netowrk Training Using Knowledge Distillation 57 4.2.1 Quantization of deep neural networks and knowledge distillation 58 4.2.2 Teacher model selection for KD 59 4.2.3 Discussion on hyperparameters of KD 62 4.3 Experimental Results 62 4.3.1 Experimental setup 62 4.3.2 Results on CIFAR10 and CIFAR100 64 4.3.3 Model size and temperature 66 4.3.4 Gradual Soft Loss Reducing 68 4.4 Concluding Remarks 68 5 SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of LowPrecision Deep Neural Networks 70 5.1 Introduction 70 5.2 Related works 71 5.2.1 Quantization of deep neural networks for efficient implementations 71 5.2.2 Stochastic weight averaging and losssurface visualization 72 5.3 Quantization of DNN and loss surface visualization 73 5.3.1 Quantization of deep neural networks 73 5.3.2 Loss surface visualization for QDNNs 75 5.4 SQWA algorithm 76 5.5 Experimental results 80 5.5.1 CIFAR100 80 5.5.2 ImageNet 87 5.6 Concluding remarks 90 6 Conclusion 92 Abstract (In Korean) 110Docto

    Deep Neural Networks and Data for Automated Driving

    Get PDF
    This open access book brings together the latest developments from industry and research on automated driving and artificial intelligence. Environment perception for highly automated driving heavily employs deep neural networks, facing many challenges. How much data do we need for training and testing? How to use synthetic data to save labeling costs for training? How do we increase robustness and decrease memory usage? For inevitably poor conditions: How do we know that the network is uncertain about its decisions? Can we understand a bit more about what actually happens inside neural networks? This leads to a very practical problem particularly for DNNs employed in automated driving: What are useful validation techniques and how about safety? This book unites the views from both academia and industry, where computer vision and machine learning meet environment perception for highly automated driving. Naturally, aspects of data, robustness, uncertainty quantification, and, last but not least, safety are at the core of it. This book is unique: In its first part, an extended survey of all the relevant aspects is provided. The second part contains the detailed technical elaboration of the various questions mentioned above

    Efficient and Accurate Spiking Neural Networks

    Get PDF
    corecore