160 research outputs found

    Automated license plate recognition: a survey on methods and techniques

    Get PDF
    With the explosive growth in the number of vehicles in use, automated license plate recognition (ALPR) systems are required for a wide range of tasks such as law enforcement, surveillance, and toll booth operations. The operational specifications of these systems are diverse due to the differences in the intended application. For instance, they may need to run on handheld devices or cloud servers, or operate in low light and adverse weather conditions. In order to meet these requirements, a variety of techniques have been developed for license plate recognition. Even though there has been a notable improvement in the current ALPR methods, there is a requirement to be filled in ALPR techniques for a complex environment. Thus, many approaches are sensitive to the changes in illumination and operate mostly in daylight. This study explores the methods and techniques used in ALPR in recent literature. We present a critical and constructive analysis of related studies in the field of ALPR and identify the open challenge faced by researchers and developers. Further, we provide future research directions and recommendations to optimize the current solutions to work under extreme conditions

    ํ•™์Šต๋œ ๋ชจ๋ธ ๊ฐฑ์‹  ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๊ฐ•์ธํ•œ ๋ฌผ์ฒด ์ถ”์ 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ์ด๊ฒฝ๋ฌด.๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌผ์ฒด ์ถ”์  ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋ธ ๊ฐฑ์‹  ๊ธฐ๋ฒ•์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด ๋ฌผ์ฒด ์ถ”์  ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์€ ๋ฌผ์ฒด ์ถ”์  ๋ฌธ์ œ๋ฅผ ๊ฐ์ฒด ๊ฒ€์ถœ์„ ํ†ตํ•œ ์ถ”์  ๋ฌธ์ œ (tracking-by-detection) ๋กœ ๊ฐ„์ฃผํ•˜์—ฌ ์™”์œผ๋ฉฐ, ์ด๋“ค์€ ํŠน์ • ๋ฌผ์ฒด๋ฅผ ๊ฒ€์ถœํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒ€์ถœ๊ธฐ ๋ชจ๋ธ์„ ์ฃผ์–ด์ง„ ๋น„๋””์˜ค์˜ ์ฒซ๋ฒˆ์งธ ํ”„๋ ˆ์ž„์—์„œ ํ•™์Šตํ•˜์—ฌ ์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋น„๋””์˜ค์˜ ์ฐจํ›„ ํ”„๋ ˆ์ž„๋“ค์—์„œ ๋ชฉํ‘œ ๋ฌผ์ฒด๋ฅผ ๊ฒ€์ถœํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋ฌผ์ฒด ์ถ”์  ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ ์™”๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์€ ๋ฌผ์ฒด์˜ ๋ณ€ํ˜•, ํฌ๊ธฐ ๋ณ€ํ™”, ๊ฐ€๋ ค์ง, ์กฐ๋ช… ๋ณ€ํ™”, ๋ฐฐ๊ฒฝ ๋ฌผ์ฒด์˜ ๋“ฑ์žฅ ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ์ƒํ™ฉ๋ณ€ํ™”์™€ ๋ฌผ์ฒด์˜ ์™ธ์–‘๋ณ€ํ™”์— ๋”ฐ๋ผ ์ถ”์ ์— ์–ด๋ ค์›€์ด ์กด์žฌํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์กด ๋ฌผ์ฒด ์ถ”์  ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์€ ์ถ”์  ๋„์ค‘์— ๋ฌผ์ฒด์˜ ๋ณ€ํ™”ํ•œ ์™ธ์–‘๊ณผ ๋ฐฐ๊ฒฝ ๋ฌผ์ฒด๋“ค์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ์ •๋ณด๋ฅผ ์ถ”์  ๊ณผ์ •์— ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ ๊ฐฑ์‹  ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ์™”๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๊ธฐ๋ฒ•๋“ค์—์„œ ๋ชจ๋ธ ๊ฐฑ์‹  ๊ณผ์ •์€ ์†Œ์ˆ˜์˜ ํ•™์Šต ํ‘œ๋ณธ์„ ์‚ฌ์šฉํ•œ ์ตœ์ ํ™” ๋ฌธ์ œ์˜ ํ•ด๊ฒฐ์„ ํ†ตํ•ด ์ฃผ๋กœ ์ด๋ฃจ์–ด์ง€๋ฉฐ, ๊ฒฝํ—˜์ ์œผ๋กœ ์–ป์–ด์ง„ ์ •๊ทœํ™” ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ์˜ ๊ณผ์ ํ•ฉ ๋ฌธ์ œ์™€ ์˜ค๋ฅ˜ ๋ˆ„์  ๋ฌธ์ œ๊ฐ€ ๋ฌผ์ฒด ์ถ”์  ๊ณผ์ •์—์„œ ์ง€์†๋˜๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค. ์ „์ˆ ํ•œ ๋ฌธ์ œ์ ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌผ์ฒด ์ถ”์  ๋ฌธ์ œ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋ธ ๊ฐฑ์‹  ๊ธฐ๋ฒ•์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ์ ‘๊ทผ๋ฒ•๋“ค์„ ์ œ์‹œํ•œ๋‹ค. ์ด์— ๋Œ€ํ•ด ์„ธ ๊ฐ€์ง€์˜ ๋ชจ๋ธ ๊ฐฑ์‹  ๊ธฐ๋ฒ•๋“ค์„ ์ œ์•ˆํ•˜๋ฉฐ ๊ฐ๊ฐ์€: (1) ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฒ•์— ๊ธฐ๋ฐ˜ํ•œ ํ‘œ๋ณธ ์„ ํƒ๊ธฐ๋ฒ•, (2) ๋ฉ”ํƒ€ํ•™์Šต์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ํ”ผ์ณ ๊ณต๊ฐ„์˜ ๊ฐฑ์‹ ๊ธฐ๋ฒ•, (3) ์ ์‘์  ์ปจํ‹ฐ๋‰ด์–ผ ๋ฉ”ํƒ€ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ๊ฐฑ์‹ ๊ธฐ๋ฒ•์ด๋‹ค. ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•๋ก ๋“ค์€ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ์— ๊ธฐ๋ฐ˜ํ•œ ๋ฉ”ํƒ€๋Ÿฌ๋„ˆ๋ฅผ ๋„์ž…ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์žฅ๋ฉด๋ณ€ํ™”์™€ ์ƒํ™ฉ๋ณ€ํ™”์— ๋Œ€ํ•ด ํ•™์Šต ๊ณผ์ •์—์„œ์˜ ๊ณผ์ ํ•ฉ ๋ฌธ์ œ์™€ ์˜ค๋ฅ˜ ๋ˆ„์  ๋ฌธ์ œ๋ฅผ ์ค„์ด๊ณ ์ž ํ•˜์˜€์œผ๋ฉฐ, ๋ฉ”ํƒ€๋Ÿฌ๋„ˆ๋Š” ๊ฒฝ๋Ÿ‰ํ™”๋œ ๊ตฌ์กฐ๋กœ ์„ค๊ณ„๋˜์–ด ์ „์ฒด ๋ฌผ์ฒด ์ถ”์  ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ์‹ค์‹œ๊ฐ„ ์†๋„๋กœ ๋™์ž‘ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜์˜€๋‹ค. ์ฒซ๋ฒˆ์งธ๋กœ, ์ •์ฑ… ๋„คํŠธ์›Œํฌ๋ฅผ ๋ฉ”ํƒ€๋Ÿฌ๋„ˆ๋กœ ํ™œ์šฉํ•˜๋Š” ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ํ‘œ๋ณธ ์„ ํƒ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ •์ฑ… ๋„คํŠธ์›Œํฌ๋Š” ์ฃผ์–ด์ง„ ์žฅ๋ฉด์—์„œ ๋ชฉํ‘œ ๋ฌผ์ฒด๋ฅผ ๊ฒ€์ถœํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ํ‘œ๋ณธ ์ค‘ ์‚ฌ์šฉํ•˜๊ธฐ์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ ํ‘œ๋ณธ์„ ์„ ํƒํ•˜๋Š” ์˜์‚ฌ๊ฒฐ์ •์„ ํ•™์Šตํ•œ๋‹ค. ๋‹ค์Œ์œผ๋กœ, ๋ฉ”ํƒ€๋Ÿฌ๋„ˆ ๋„คํŠธ์›Œํฌ๋ฅผ ํ™œ์šฉํ•œ ๋ฉ”ํƒ€ํ•™์Šต ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜๋ฉฐ, ์—ฌ๊ธฐ์„œ ๋ฉ”ํƒ€๋Ÿฌ๋„ˆ๋Š” ์†์‹คํ•จ์ˆ˜์˜ ๊ทธ๋ž˜๋””์–ธํŠธ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•ด ๋ชฉํ‘œ ๋ฌผ์ฒด์— ํŠนํ™”๋œ ํ”ผ์ณ ๊ณต๊ฐ„์„ ๊ตฌ์ถ•ํ•œ๋‹ค. ๋ฉ”ํƒ€๋Ÿฌ๋„ˆ ๋„คํŠธ์›Œํฌ๋Š” ๋ฌผ์ฒด ์ถ”์ ๊ธฐ์— ๋Œ€ํ•ด ์ ์‘์ ์ธ ๊ฐ€์ค‘์น˜์™€ ์ฑ„๋„ ์–ดํ…์…˜์˜ ํ˜•ํƒœ๋กœ ์ƒˆ๋กœ์šด ์ •๋ณด๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ปจํ‹ฐ๋‰ด์–ผ ๋ฉ”ํƒ€ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ๊ธฐ๋ฒ•์—์„œ๋Š” ์ดˆ๊ธฐ ๊ฐฑ์‹ ๊ณผ์ •๊ณผ ์˜จ๋ผ์ธ ๊ฐฑ์‹ ๊ณผ์ • ๋‘ ๊ฐ€์ง€ ๋ชจ๋‘๋ฅผ ์ ์‘ํ˜• ์ปจํ‹ฐ๋‰ด์–ผ ๋ฉ”ํƒ€ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ๋ชจ๋ธํ•œ๋‹ค. ๋ฉ”ํƒ€๋Ÿฌ๋„ˆ๋Š” ๋ฌผ์ฒด ์ถ”์ ๊ธฐ๊ฐ€ ์ƒˆ๋กœ์šด ํ•™์Šต ํ‘œ๋ณธ์„ ๋ฐฐ์šธ์ง€, ์•„๋‹ˆ๋ฉด ๊ธฐ์กด ์ง€์‹์„ ์œ ์ง€ํ• ์ง€๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋„๋ก ์ ์‘์ ์œผ๋กœ ํ•™์Šต๊ณผ์ •์„ ์ œ์–ดํ•˜๋Š” ์—ญํ• ์„ ํ•™์Šตํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•๋“ค์„ ๋ฌผ์ฒด ์ถ”์  ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์— ์ ์šฉํ•ด๋ณธ ๊ฒฐ๊ณผ ์œ ์˜๋ฏธํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉฐ, ๋ฉด๋ฐ€ํ•œ ์‹คํ—˜์  ๋ถ„์„๊ณผ ๊ตฌ์„ฑ์š”์†Œ๋ณ„ ๋ถ„์„์„ ํ†ตํ•ด ์œ ํšจ์„ฑ์„ ๊ฒ€์ฆํ•˜์˜€๋‹ค. ๋˜ํ•œ ์ €๋ช…ํ•˜๋ฉด์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋ฌผ์ฒด ์ถ”์  ๋ฒค์น˜๋งˆํฌ๋“ค์„ ํ™œ์šฉํ•œ ๋น„๊ต์‹คํ—˜์„ ํ†ตํ•ด ๋‹ค๋ฅธ ์ตœ์‹  ๋ฌผ์ฒด์ถ”์  ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค๊ณผ ๋น„๊ตํ•ด์„œ๋„ ์‹ค์‹œ๊ฐ„ ์†๋„๋กœ ํšจ์œจ์ ์œผ๋กœ ๋™์ž‘ํ•˜๋ฉด์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คŒ์„ ํ™•์ธํ•˜์˜€๋‹ค.In this dissertation, we address the model adaptation problem of visual tracking algorithms. Conventional tracking algorithms regard the visual tracking problem as a tracking-by-detection problem, which can be solved by formulating a target-specific detection model at the initial frame of a given video, and evaluating the model for the subsequent video frames. However, various challenges are associated with the model due to changes in circumstances such as target deformation, scale change, occlusion, illumination change, background clutter, etc. To deal with the aforementioned challenges, conventional tracking algorithms incorporate a model adaptation strategy to provide the model with new information regarding the target appearance and background distractor objects. Nonetheless, since these approaches are often conducted on a handful of self-labeled training examples through solving an optimization task involving hand-crafted regularization schemes, the risk of overfitting and error accumulation persist throughout the course of the tracking process. In order to address the aforementioned problems, we introduce novel approaches to the model adaptation strategy for the visual tracking problem. Three types of model adaptation approaches are proposed, based on the following: (1) reinforcement learning based exemplar selection, (2) deep meta-learning based feature space update, (3) deep adaptive continual meta-learning based adaptation. The proposed approaches introduce deep neural network based meta-learners that can handle various scenes and circumstances with reduced overfitting and error accumulation, while the meta-learners are designed to be light-weight and can achieve real-time speeds for the overall visual tracking framework. First, we propose a deep reinforcement learning based exemplar selection method that incorporates a policy network for its meta-learner. The policy network is trained to make decisions on selecting the adequate target exemplar that can be used to locate the target given a scene. Next, a deep meta-learning based method, which utilizes a meta-learner network to construct the target-specific feature space using the loss gradient information, is proposed. The meta-learner network provides the tracker with new information in the form of adaptive weights and channel attention. Finally, a deep continual meta-learning based method simultaneously models the initial and online adaptations under the adaptive continual meta-learning framework. The meta-learner is trained to adaptively regulate the learning process where the tracker can choose between learning new examples and retaining the previous knowledge. Applying the proposed methods to visual tracking algorithms, significant performance gains are achieved and the effectiveness is validated by the extensive experimental evaluations and component-wise ablation analyses. Additionally, comparisons on well-known, widely used visual tracking benchmarks demonstrate the competitive performance against other state-of-the-art tracking algorithms, while efficiently running at real-time speeds.1 Introduction 1 2 Model Selection by Reinforcement Learning 7 2.1 Introduction 8 2.2 Related Work 10 2.3 Tracking with Reinforced Decisions 13 2.3.1 Proposed Tracking Algorithm 13 2.3.2 Reinforcement Learning Overview and Application to Visual Tracking 16 2.3.3 Network Architectures 19 2.3.4 Training the Policy Network 20 2.4 Experiments 21 2.4.1 Implementation Details 21 2.4.2 Evaluation on OTB dataset 23 2.5 Summary 31 3 Model Update by Meta-Learning 33 3.1 Introduction 34 3.2 Related Work 36 3.3 Tracking with Meta-Learner 39 3.3.1 Overview of Proposed Method 39 3.3.2 Network Implementation and Training 44 3.4 Experimental Results 47 3.4.1 Evaluation Environment 47 3.4.2 Experiments and Analysis 47 3.5 Summary 57 4 Model Update by Continual Meta-Learning 59 4.1 Introduction 61 4.2 Related Work 63 4.3 Tracking with Adaptive Continual Meta-Learner 67 4.3.1 Meta-Training with Simulated Episodes 67 4.3.2 Proposed Tracking Algorithm 73 4.3.3 Baseline Tracking Algorithm: TACT 75 4.4 Experiments 81 4.4.1 Implementation Details 81 4.4.2 Quantitative Evaluation 83 4.4.3 Analysis 88 4.5 Summary 93 5 Conclusion 95 5.1 Summary and Contributions of the Dissertation 95 5.2 Future Work 97 Bibliography 99 ๊ตญ๋ฌธ์ดˆ๋ก 114Docto

    License Plate Recognition using Convolutional Neural Networks Trained on Synthetic Images

    Get PDF
    In this thesis, we propose a license plate recognition system and study the feasibility of using synthetic training samples to train convolutional neural networks for a practical application. First we develop a modular framework for synthetic license plate generation; to generate different license plate types (or other objects) only the first module needs to be adapted. The other modules apply variations to the training samples such as background, occlusions, camera perspective projection, object noise and camera acquisition noise, with the aim to achieve enough variation of the object that the trained networks will also recognize real objects of the same class. Then we design two convolutional neural networks of low-complexity for license plate detection and character recognition. Both are designed for simultaneous classification and localization by branching the networks into a classification and a regression branch and are trained end-to-end simultaneously over both branches, on only our synthetic training samples. To recognize real license plates, we design a pipeline for scale invariant license plate detection with a scale pyramid and a fully convolutional application of the license plate detection network in order to detect any number of license plates and of any scale in an image. Before character classification is applied, potential plate regions are un-skewed based on the detected plate location in order to achieve an as optimal representation of the characters as possible. The character classification is also performed with a fully convolutional sweep to simultaneously find all characters at once. Both the plate and the character stages apply a refinement classification where initial classifications are first centered and rescaled. We show that this simple, yet effective trick greatly improves the accuracy of our classifications, and at a small increase of complexity. To our knowledge, this trick has not been exploited before. To show the effectiveness of our system we first apply it on a dataset of photos of Italian license plates to evaluate the different stages of our system and which effect the classification thresholds have on the accuracy. We also find robust training parameters and thresholds that are reliable for classification without any need for calibration on a validation set of real annotated samples (which may not always be available) and achieve a balanced precision and recall on the set of Italian license plates, both in excess of 98%. Finally, to show that our system generalizes to new plate types, we compare our system to two reference system on a dataset of Taiwanese license plates. For this, we only modify the first module of the synthetic plate generation algorithm to produce Taiwanese license plates and adjust parameters regarding plate dimensions, then we train our networks and apply the classification pipeline, using the robust parameters, on the Taiwanese reference dataset. We achieve state-of-the-art performance on plate detection (99.86% precision and 99.1% recall), single character detection (99.6%) and full license reading (98.7%)

    Multimodal sentiment analysis in real-life videos

    Get PDF
    This thesis extends the emerging field of multimodal sentiment analysis of real-life videos, taking two components into consideration: the emotion and the emotion's target. The emotion component of media is traditionally represented as a segment-based intensity model of emotion classes. This representation is replaced here by a value- and time-continuous view. Adjacent research fields, such as affective computing, have largely neglected the linguistic information available from automatic transcripts of audio-video material. As is demonstrated here, this text modality is well-suited for time- and value-continuous prediction. Moreover, source-specific problems, such as trustworthiness, have been largely unexplored so far. This work examines perceived trustworthiness of the source, and its quantification, in user-generated video data and presents a possible modelling path. Furthermore, the transfer between the continuous and discrete emotion representations is explored in order to summarise the emotional context at a segment level. The other component deals with the target of the emotion, for example, the topic the speaker is addressing. Emotion targets in a video dataset can, as is shown here, be coherently extracted based on automatic transcripts without limiting a priori parameters, such as the expected number of targets. Furthermore, alternatives to purely linguistic investigation in predicting targets, such as knowledge-bases and multimodal systems, are investigated. A new dataset is designed for this investigation, and, in conjunction with proposed novel deep neural networks, extensive experiments are conducted to explore the components described above. The developed systems show robust prediction results and demonstrate strengths of the respective modalities, feature sets, and modelling techniques. Finally, foundations are laid for cross-modal information prediction systems with applications to the correction of corrupted in-the-wild signals from real-life videos

    3D-3D Deformable Registration and Deep Learning Segmentation based Neck Diseases Analysis in MRI

    Full text link
    Whiplash, cervical dystonia (CD), neck pain and work-related upper limb disorder (WRULD) are the most common diseases in the cervical region. Headaches, stiffness, sensory disturbance to the legs and arms, optical problems, aching in the back and shoulder, and auditory and visual problems are common symptoms seen in patients with these diseases. CD patients may also suffer tormenting spasticity in some neck muscles, with the symptoms possibly being acute and persisting for a long time, sometimes a lifetime. Whiplash-associated disorders (WADs) may occur due to sudden forward and backward movements of the head and neck occurring during a sporting activity or vehicle or domestic accident. These diseases affect private industries, insurance companies and governments, with the socio-economic costs significantly related to work absences, long-term sick leave, early disability and disability support pensions, health care expenses, reduced productivity and insurance claims. Therefore, diagnosing and treating neck-related diseases are important issues in clinical practice. The reason for these afflictions resulting from accident is the impairment of the cervical muscles which undergo atrophy or pseudo-hypertrophy due to fat infiltrating into them. These morphological changes have to be determined by identifying and quantifying their bio-markers before applying any medical intervention. Volumetric studies of neck muscles are reliable indicators of the proper treatments to apply. Radiation therapy, chemotherapy, injection of a toxin or surgery could be possible ways of treating these diseases. However, the dosages required should be precise because the neck region contains some sensitive organs, such as nerves, blood vessels and the trachea and spinal cord. Image registration and deep learning-based segmentation can help to determine appropriate treatments by analyzing the neck muscles. However, this is a challenging task for medical images due to complexities such as many muscles crossing multiple joints and attaching to many bones. Also, their shapes and sizes vary greatly across populations whereas their cross-sectional areas (CSAs) do not change in proportion to the heights and weights of individuals, with their sizes varying more significantly between males and females than ages. Therefore, the neck's anatomical variabilities are much greater than those of other parts of the human body. Some other challenges which make analyzing neck muscles very difficult are their compactness, similar gray-level appearances, intra-muscular fat, sliding due to cardiac and respiratory motions, false boundaries created by intramuscular fat, low resolution and contrast in medical images, noise, inhomogeneity and background clutter with the same composition and intensity. Furthermore, a patient's mode, position and neck movements during the capture of an image create variability. However, very little significant research work has been conducted on analyzing neck muscles. Although previous image registration efforts form a strong basis for many medical applications, none can satisfy the requirements of all of them because of the challenges associated with their implementation and low accuracy which could be due to anatomical complexities and variabilities or the artefacts of imaging devices. In existing methods, multi-resolution- and heuristic-based methods are popular. However, the above issues cause conventional multi-resolution-based registration methods to be trapped in local minima due to their low degrees of freedom in their geometrical transforms. Although heuristic-based methods are good at handling large mismatches, they require pre-segmentation and are computationally expensive. Also, current deformable methods often face statistical instability problems and many local optima when dealing with small mismatches. On the other hand, deep learning-based methods have achieved significant success over the last few years. Although a deeper network can learn more complex features and yields better performances, its depth cannot be increased as this would cause the gradient to vanish during training and result in training difficulties. Recently, researchers have focused on attention mechanisms for deep learning but current attention models face a challenge in the case of an application with compact and similar small multiple classes, large variability, low contrast and noise. The focus of this dissertation is on the design of 3D-3D image registration approaches as well as deep learning-based semantic segmentation methods for analyzing neck muscles. In the first part of this thesis, a novel object-constrained hierarchical registration framework for aligning inter-subject neck muscles is proposed. Firstly, to handle large-scale local minima, it uses a coarse registration technique which optimizes a new edge position difference (EPD) similarity measure to align large mismatches. Also, a new transformation based on the discrete periodic spline wavelet (DPSW), affine and free-form-deformation (FFD) transformations are exploited. Secondly, to avoid the monotonous nature of using transformations in multiple stages, affine registration technique, which uses a double-pushing system by changing the edges in the EPD and switching the transformation's resolutions, is designed to align small mismatches. The EPD helps in both the coarse and fine techniques to implement object-constrained registration via controlling edges which is not possible using traditional similarity measures. Experiments are performed on clinical 3D magnetic resonance imaging (MRI) scans of the neck, with the results showing that the EPD is more effective than the mutual information (MI) and the sum of squared difference (SSD) measures in terms of the volumetric dice similarity coefficient (DSC). Also, the proposed method is compared with two state-of-the-art approaches with ablation studies of inter-subject deformable registration and achieves better accuracy, robustness and consistency. However, as this method is computationally complex and has a problem handling large-scale anatomical variabilities, another 3D-3D registration framework with two novel contributions is proposed in the second part of this thesis. Firstly, a two-stage heuristic search optimization technique for handling large mismatches,which uses a minimal user hypothesis regarding these mismatches and is computationally fast, is introduced. It brings a moving image hierarchically closer to a fixed one using MI and EPD similarity measures in the coarse and fine stages, respectively, while the images do not require pre-segmentation as is necessary in traditional heuristic optimization-based techniques. Secondly, a region of interest (ROI) EPD-based registration framework for handling small mismatches using salient anatomical information (AI), in which a convex objective function is formed through a unique shape created from the desired objects in the ROI, is proposed. It is compared with two state-of-the-art methods on a neck dataset, with the results showing that it is superior in terms of accuracy and is computationally fast. In the last part of this thesis, an evaluation study of recent U-Net-based convolutional neural networks (CNNs) is performed on a neck dataset. It comprises 6 recent models, the U-Net, U-Net with a conditional random field (CRF-Unet), attention U-Net (A-Unet), nested U-Net or U-Net++, multi-feature pyramid (MFP)-Unet and recurrent residual U-Net (R2Unet) and 4 with more comprehensive modifications, the multi-scale U-Net (MS-Unet), parallel multi-scale U-Net (PMSUnet), recurrent residual attention U-Net (R2A-Unet) and R2A-Unet++ in neck muscles segmentation, with analyses of the numerical results indicating that the R2Unet architecture achieves the best accuracy. Also, two deep learning-based semantic segmentation approaches are proposed. In the first, a new two-stage U-Net++ (TS-UNet++) uses two different types of deep CNNs (DCNNs) rather than one similar to the traditional multi-stage method, with the U-Net++ in the first stage and the U-Net in the second. More convolutional blocks are added after the input and before the output layers of the multi-stage approach to better extract the low- and high-level features. A new concatenation-based fusion structure, which is incorporated in the architecture to allow deep supervision, helps to increase the depth of the network without accelerating the gradient-vanishing problem. Then, more convolutional layers are added after each concatenation of the fusion structure to extract more representative features. The proposed network is compared with the U-Net, U-Net++ and two-stage U-Net (TS-UNet) on the neck dataset, with the results indicating that it outperforms the others. In the second approach, an explicit attention method, in which the attention is performed through a ROI evolved from ground truth via dilation, is proposed. It does not require any additional CNN, as does a cascaded approach, to localize the ROI. Attention in a CNN is sensitive with respect to the area of the ROI. This dilated ROI is more capable of capturing relevant regions and suppressing irrelevant ones than a bounding box and region-level coarse annotation, and is used during training of any CNN. Coarse annotation, which does not require any detailed pixel wise delineation that can be performed by any novice person, is used during testing. This proposed ROI-based attention method, which can handle compact and similar small multiple classes with objects with large variabilities, is compared with the automatic A-Unet and U-Net, and performs best

    Acta Cybernetica : Volume 17. Number 3.

    Get PDF

    Artificial Intelligence Technology

    Get PDF
    This open access book aims to give our readers a basic outline of todayโ€™s research and technology developments on artificial intelligence (AI), help them to have a general understanding of this trend, and familiarize them with the current research hotspots, as well as part of the fundamental and common theories and methodologies that are widely accepted in AI research and application. This book is written in comprehensible and plain language, featuring clearly explained theories and concepts and extensive analysis and examples. Some of the traditional findings are skipped in narration on the premise of a relatively comprehensive introduction to the evolution of artificial intelligence technology. The book provides a detailed elaboration of the basic concepts of AI, machine learning, as well as other relevant topics, including deep learning, deep learning framework, Huawei MindSpore AI development framework, Huawei Atlas computing platform, Huawei AI open platform for smart terminals, and Huawei CLOUD Enterprise Intelligence application platform. As the worldโ€™s leading provider of ICT (information and communication technology) infrastructure and smart terminals, Huaweiโ€™s products range from digital data communication, cyber security, wireless technology, data storage, cloud computing, and smart computing to artificial intelligence
    • โ€ฆ
    corecore