500 research outputs found

    ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•œ ์ž๋™ํ™”๋œ ์น˜๊ณผ ์˜๋ฃŒ์˜์ƒ ๋ถ„์„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์น˜๊ณผ๋Œ€ํ•™ ์น˜์˜๊ณผํ•™๊ณผ, 2021.8. ํ•œ์ค‘์„.๋ชฉ ์ : ์น˜๊ณผ ์˜์—ญ์—์„œ๋„ ์‹ฌ์ธต์‹ ๊ฒฝ๋ง(Deep Neural Network) ๋ชจ๋ธ์„ ์ด์šฉํ•œ ๋ฐฉ์‚ฌ์„ ์‚ฌ์ง„์—์„œ์˜ ์ž„ํ”Œ๋ž€ํŠธ ๋ถ„๋ฅ˜, ๋ณ‘์†Œ ์œ„์น˜ ํƒ์ง€ ๋“ฑ์˜ ์—ฐ๊ตฌ๋“ค์ด ์ง„ํ–‰๋˜์—ˆ์œผ๋‚˜, ์ตœ๊ทผ ๊ฐœ๋ฐœ๋œ ํ‚คํฌ์ธํŠธ ํƒ์ง€(keypoint detection) ๋ชจ๋ธ ๋˜๋Š” ์ „์ฒด์  ๊ตฌํšํ™”(panoptic segmentation) ๋ชจ๋ธ์„ ์˜๋ฃŒ๋ถ„์•ผ์— ์ ์šฉํ•œ ์—ฐ๊ตฌ๋Š” ์•„์ง ๋ฏธ๋น„ํ•˜๋‹ค. ๋ณธ ์—ฐ๊ตฌ์˜ ๋ชฉ์ ์€ ์น˜๊ทผ๋‹จ ๋ฐฉ์‚ฌ์„ ์‚ฌ์ง„์—์„œ ํ‚คํฌ์ธํŠธ ํƒ์ง€๋ฅผ ์ด์šฉํ•ด ์ž„ํ”Œ๋ž€ํŠธ ๊ณจ ์†Œ์‹ค ์ •๋„๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๋ชจ๋ธ๊ณผ panoptic segmentation์„ ํŒŒ๋…ธ๋ผ๋งˆ์˜์ƒ์— ์ ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ตฌ์กฐ๋ฌผ๋“ค์„ ๊ตฌํšํ™”ํ•˜๋Š” ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผœ ์ง„๋ฃŒ์— ๋ณด์กฐ์ ์œผ๋กœ ํ™œ์šฉ๋˜๋„๋ก ๋งŒ๋“ค์–ด๋ณด๊ณ , ์ด ๋ชจ๋ธ๋“ค์˜ ์ถ”๋ก ๊ฒฐ๊ณผ๋ฅผ ํ‰๊ฐ€ํ•ด๋ณด๋Š” ๊ฒƒ์ด๋‹ค. ๋ฐฉ ๋ฒ•: ๊ฐ์ฒด ํƒ์ง€ ๋ฐ ๊ตฌํšํ™”์— ์žˆ์–ด ๋„๋ฆฌ ์—ฐ๊ตฌ๋œ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์ธ Mask-RCNN์„ ํ‚คํฌ์ธํŠธ ํƒ์ง€๊ฐ€ ๊ฐ€๋Šฅํ•œ ํ˜•ํƒœ๋กœ ์ค€๋น„ํ•˜์—ฌ ์น˜๊ทผ๋‹จ ๋ฐฉ์‚ฌ์„ ์‚ฌ์ง„์—์„œ ์ž„ํ”Œ๋ž€ํŠธ์˜ top, apex, ๊ทธ๋ฆฌ๊ณ  bone level ์ง€์ ์„ ์ขŒ์šฐ๋กœ ์ด 6์ง€์  ํƒ์ง€ํ•˜๊ฒŒ๋” ํ•™์Šต์‹œํ‚จ ๋’ค, ํ•™์Šต์— ์‚ฌ์šฉ๋˜์ง€ ์•Š์€ ์‹œํ—˜ ๋ฐ์ดํ„ฐ์…‹์„ ๋Œ€์ƒ์œผ๋กœ ํƒ์ง€์‹œํ‚จ๋‹ค. ํ‚คํฌ์ธํŠธ ํƒ์ง€ ํ‰๊ฐ€์šฉ ์ง€ํ‘œ์ธ object keypoint similarity (OKS) ๋ฐ ์ด๋ฅผ ์ด์šฉํ•œ average precision (AP) ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๊ณ , ํ‰๊ท  OKS๊ฐ’์„ ํ†ตํ•ด ๋ชจ๋ธ ๋ฐ ์น˜๊ณผ์˜์‚ฌ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ตํ•œ๋‹ค. ๋˜ํ•œ, ํƒ์ง€๋œ ํ‚คํฌ์ธํŠธ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋ฐฉ์‚ฌ์„ ์‚ฌ์ง„์ƒ์—์„œ์˜ ๊ณจ ์†Œ์‹ค ์ •๋„๋ฅผ ์ˆ˜์น˜ํ™”ํ•œ๋‹ค. Panoptic segmentation์„ ์œ„ํ•ด์„œ๋Š” ๊ธฐ์กด์˜ ๋ฒค์น˜๋งˆํฌ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ์ ์„ ๊ฑฐ๋‘” ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์ธ Panoptic DeepLab์„ ํŒŒ๋…ธ๋ผ๋งˆ์˜์ƒ์—์„œ ์ฃผ์š” ๊ตฌ์กฐ๋ฌผ(์ƒ์•…๋™, ์ƒ์•…๊ณจ, ํ•˜์•…๊ด€, ํ•˜์•…๊ณจ, ์ž์—ฐ์น˜, ์น˜๋ฃŒ๋œ ์น˜์•„, ์ž„ํ”Œ๋ž€ํŠธ)์„ ๊ตฌํšํ™”ํ•˜๋„๋ก ํ•™์Šต์‹œํ‚จ ๋’ค, ์‹œํ—˜ ๋ฐ์ดํ„ฐ์…‹์—์„œ์˜ ๊ตฌํšํ™” ๊ฒฐ๊ณผ์— panoptic / semantic / instance segmentation ๊ฐ๊ฐ์˜ ํ‰๊ฐ€์ง€ํ‘œ๋“ค์„ ์ ์šฉํ•˜๊ณ , ํ”ฝ์…€๋“ค์˜ ์ •๋‹ต(ground truth) ํด๋ž˜์Šค์™€ ๋ชจ๋ธ์ด ์ถ”๋ก ํ•œ ํด๋ž˜์Šค์— ๋Œ€ํ•œ confusion matrix๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. ๊ฒฐ ๊ณผ: OKS๊ฐ’์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ณ„์‚ฐํ•œ ํ‚คํฌ์ธํŠธ ํƒ์ง€ AP๋Š”, ๋ชจ๋“  OKS threshold์— ๋Œ€ํ•œ ํ‰๊ท ์˜ ๊ฒฝ์šฐ, ์ƒ์•… ์ž„ํ”Œ๋ž€ํŠธ์—์„œ๋Š” 0.761, ํ•˜์•… ์ž„ํ”Œ๋ž€ํŠธ์—์„œ๋Š” 0.786์ด์—ˆ๋‹ค. ํ‰๊ท  OKS๋Š” ๋ชจ๋ธ์ด 0.8885, ์น˜๊ณผ์˜์‚ฌ๊ฐ€ 0.9012๋กœ, ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜๋ฏธํ•œ ์ฐจ์ด๊ฐ€ ์—†์—ˆ๋‹ค (p = 0.41). ๋ชจ๋ธ์˜ ํ‰๊ท  OKS ๊ฐ’์€ ์‚ฌ๋žŒ์˜ ํ‚คํฌ์ธํŠธ ์–ด๋…ธํ…Œ์ด์…˜ ์ •๊ทœ๋ถ„ํฌ์ƒ์—์„œ ์ƒ์œ„ 66.92% ์ˆ˜์ค€์ด์—ˆ๋‹ค. ํŒŒ๋…ธ๋ผ๋งˆ์˜์ƒ ๊ตฌ์กฐ๋ฌผ ๊ตฌํšํ™”์—์„œ๋Š”, panoptic segmentation ํ‰๊ฐ€์ง€ํ‘œ์ธ panoptic quality ๊ฐ’์˜ ๊ฒฝ์šฐ ๋ชจ๋“  ํด๋ž˜์Šค์˜ ํ‰๊ท ์€ 80.47์ด์—ˆ์œผ๋ฉฐ, ์น˜๋ฃŒ๋œ ์น˜์•„๊ฐ€ 57.13์œผ๋กœ ๊ฐ€์žฅ ๋‚ฎ์•˜๊ณ  ํ•˜์•…๊ด€์ด 65.97๋กœ ๋‘๋ฒˆ์งธ๋กœ ๋‚ฎ์€ ๊ฐ’์„ ๋ณด์˜€๋‹ค. Semantic segmentation ํ‰๊ฐ€์ง€ํ‘œ์ธ globalํ•œ Intersection over Union (IoU) ๊ฐ’์€ ๋ชจ๋“  ํด๋ž˜์Šค ํ‰๊ท  0.795์˜€์œผ๋ฉฐ, ํ•˜์•…๊ด€์ด 0.639๋กœ ๊ฐ€์žฅ ๋‚ฎ์•˜๊ณ  ์น˜๋ฃŒ๋œ ์น˜์•„๊ฐ€ 0.656์œผ๋กœ ๋‘๋ฒˆ์งธ๋กœ ๋‚ฎ์€ ๊ฐ’์„ ๋ณด์˜€๋‹ค. Confusion matrix ๊ณ„์‚ฐ ๊ฒฐ๊ณผ, ground truth ํ”ฝ์…€๋“ค ์ค‘ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ถ”๋ก ๋œ ํ”ฝ์…€๋“ค์˜ ๋น„์œจ์€ ํ•˜์•…๊ด€์ด 0.802๋กœ ๊ฐ€์žฅ ๋‚ฎ์•˜๋‹ค. ๊ฐœ๋ณ„ ๊ฐ์ฒด์— ๋Œ€ํ•œ IoU๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ณ„์‚ฐํ•œ Instance segmentation ํ‰๊ฐ€์ง€ํ‘œ์ธ AP๊ฐ’์€, ๋ชจ๋“  IoU threshold์— ๋Œ€ํ•œ ํ‰๊ท ์˜ ๊ฒฝ์šฐ, ์น˜๋ฃŒ๋œ ์น˜์•„๊ฐ€ 0.316, ์ž„ํ”Œ๋ž€ํŠธ๊ฐ€ 0.414, ์ž์—ฐ์น˜๊ฐ€ 0.520์ด์—ˆ๋‹ค. ๊ฒฐ ๋ก : ํ‚คํฌ์ธํŠธ ํƒ์ง€ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ, ์น˜๊ทผ๋‹จ ๋ฐฉ์‚ฌ์„ ์‚ฌ์ง„์—์„œ ์ž„ํ”Œ๋ž€ํŠธ์˜ ์ฃผ์š” ์ง€์ ์„ ์‚ฌ๋žŒ๊ณผ ๋‹ค์†Œ ์œ ์‚ฌํ•œ ์ˆ˜์ค€์œผ๋กœ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ, ํƒ์ง€๋œ ์ง€์ ๋“ค์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐฉ์‚ฌ์„ ์‚ฌ์ง„์ƒ์—์„œ์˜ ์ž„ํ”Œ๋ž€ํŠธ ์ฃผ์œ„ ๊ณจ ์†Œ์‹ค ๋น„์œจ ๊ณ„์‚ฐ์„ ์ž๋™ํ™”ํ•  ์ˆ˜ ์žˆ๊ณ , ์ด ๊ฐ’์€ ์ž„ํ”Œ๋ž€ํŠธ ์ฃผ์œ„์—ผ์˜ ์‹ฌ๋„ ๋ถ„๋ฅ˜์— ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. ํŒŒ๋…ธ๋ผ๋งˆ ์˜์ƒ์—์„œ๋Š” panoptic segmentation์ด ๊ฐ€๋Šฅํ•œ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ ์ƒ์•…๋™๊ณผ ํ•˜์•…๊ด€์„ ํฌํ•จํ•œ ์ฃผ์š” ๊ตฌ์กฐ๋ฌผ๋“ค์„ ๊ตฌํšํ™”ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ด์™€ ๊ฐ™์ด ๊ฐ ์ž‘์—…์— ๋งž๋Š” ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์„ ์ ์ ˆํ•œ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต์‹œํ‚จ๋‹ค๋ฉด ์ง„๋ฃŒ ๋ณด์กฐ ์ˆ˜๋‹จ์œผ๋กœ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค.Purpose: In dentistry, deep neural network models have been applied in areas such as implant classification or lesion detection in radiographs. However, few studies have applied the recently developed keypoint detection model or panoptic segmentation model to medical or dental images. The purpose of this study is to train two neural network models to be used as aids in clinical practice and evaluate them: a model to determine the extent of implant bone loss using keypoint detection in periapical radiographs and a model that segments various structures on panoramic radiographs using panoptic segmentation. Methods: Mask-RCNN, a widely studied convolutional neural network for object detection and instance segmentation, was constructed in a form that is capable of keypoint detection, and trained to detect six points of an implant in a periapical radiograph: left and right of the top, apex, and bone level. Next, a test dataset was used to evaluate the inference results. Object keypoint similarity (OKS), a metric to evaluate the keypoint detection task, and average precision (AP), based on the OKS values, were calculated. Furthermore, the results of the model and those arrived at by a dentist were compared using the mean OKS. Based on the detected keypoint, the peri-implant bone loss ratio was obtained from the radiograph. For panoptic segmentation, Panoptic DeepLab, a neural network model ranked high in the previous benchmark, was trained to segment key structures in panoramic radiographs: maxillary sinus, maxilla, mandibular canal, mandible, natural tooth, treated tooth, and dental implant. Then, each evaluation metric of panoptic, semantic, and instance segmentation was applied to the inference results of the test dataset. Finally, the confusion matrix for the ground truth class of pixels and the class inferred by the model was obtained. Results: The AP of keypoint detection for the average of all OKS thresholds was 0.761 for the upper implants and 0.786 for the lower implants. The mean OKS was 0.8885 for the model and 0.9012 for the dentist; thus, the difference was not statistically significant (p = 0.41). The mean OKS of the model was in the top 66.92% of the normal distribution of human keypoint annotations. In panoramic radiograph segmentation, the average panoptic quality (PQ) of all classes was 80.47. The treated teeth showed the lowest PQ of 57.13, and the mandibular canal showed the second lowest PQ of 65.97. The Intersection over Union (IoU) was 0.795 on average for all classes, where the mandibular canal showed the lowest IoU of 0.639, and the treated tooth showed the second lowest IoU of 0.656. In the confusion matrix, the proportion of correctly inferred pixels among the ground truth pixels was the lowest in the mandibular canal at 0.802. The AP, averaged for all IoU thresholds, was 0.316 for the treated tooth, 0.414 for the dental implant, and 0.520 for the normal tooth. Conclusion: Using the keypoint detection neural network model, it was possible to detect major landmarks around dental implants in periapical radiographs to a degree similar to that of human experts. In addition, it was possible to automate the calculation of the peri-implant bone loss ratio on periapical radiographs based on the detected keypoints, and this value could be used to classify the degree of peri-implantitis. In panoramic radiographs, the major structures including the maxillary sinus and the mandibular canal could be segmented using a neural network model capable of panoptic segmentation. Thus, if deep neural networks suitable for each task are trained using suitable datasets, the proposed approach can be used to assist dental clinicians.Chapter 1. Introduction 1 Chapter 2. Materials and methods 5 Chapter 3. Results 23 Chapter 4. Discussion 32 Chapter 5. Conclusions 45 Published papers related to this study 46 References 47 Abbreviations 52 Abstract in Korean 53 Acknowledgements 56๋ฐ•

    Scene understanding from 3D point clouds and RGB images for autonomous driving

    Get PDF
    Autonomous cars are often equipped with 3D data acquisition sensors and devices, e.g., LiDAR, which provide a 3D point cloud that describes the surroundings. Direct acquisition of 3D data from these sensors is commonly used for obstacle avoidance and mapping. Analysing 3D point clouds is complex since point clouds are unstructured, unordered, and contain a varying number of points. The most common approach used for scene understanding in images is the Convolutional Neural Network. Although CNNs achieve high performance in image analysis, they cannot be applied naturally on point clouds. Several methods for extending CNNs to 3D point cloud analysis have been proposed, such as rasterization into a 3D voxel grid to use directly a CNN or using a Graph Convolutional Network. The main goal of this dissertation is to study and compare different approaches for scene understanding from 3D point clouds within the scope of driving automation systems. Moreover, the project contemplates the study of sensor fusion approaches, namely how to combine 3D point clouds and images. In light of this, this project uses a sensor fusion technique called pointpainting, which uses images segmentation to enhance 3D object detection on point clouds

    Computer vision for plant and animal inventory

    Get PDF
    The population, composition, and spatial distribution of the plants and animals in certain regions are always important data for natural resource management, conservation and farming. The traditional ways to acquire such data require human participation. The procedure of data processing by human is usually cumbersome, expensive and time-consuming. Hence the algorithms for automatic animal and plant inventory show their worth and become a hot topic. We propose a series of computer vision methods for automated plant and animal inventory, to recognize, localize, categorize, track and count different objects of interest, including vegetation, trees, fishes and livestock animals. We make use of different sensors, hardware platforms, neural network architectures and pipelines to deal with the varied properties and challenges of these objects. (1) For vegetation analysis, we propose a fast multistage method to estimate the coverage. The reference board is localized based on its edge and texture features. And then a K-means color model of the board is generated. Finally, the vegetation is segmented at pixel level using the color model. The proposed method is robust to lighting condition changes. (2) For tree counting in aerial images, we propose a novel method called density transformer, or DENT, to learn and predict the density of the trees at different positions. DENT uses an efficient multi-receptive field network to extract visual features from different positions. A transformer encoder is applied to filter and transfer useful contextual information across different spatial positions. DENT significantly outperformed the existing state-of-art CNN detectors and regressors on both the dataset built by ourselves and an existing cross-site dataset. (3) We propose a framework of fish classification system using boat cameras. The framework contains two branches. A branch extracts the contextual information from the whole image. The other branch localizes all the individual fish and normalizes their poses. The classification results from the two branches are weighted based on the clearness of the image and the familiarness of the context. Our system achieved the top 1 percent rank in the competition of The Nature Conservancy Fisheries Monitoring. (4) We also propose a video-based pig counting algorithm using an inspection robot. We adopt a novel bottom-up keypoint tracking method and a novel spatial-aware temporal response filtering method to count the pigs. The proposed approach outperformed the other methods and even human competitors in the experiments.Includes bibliographical references

    Detection and Mosaicing through Deep Learning Models for Low-Quality Retinal Images

    Get PDF
    Glaucoma is a severe eye disease that is asymptomatic in the initial stages and can lead to blindness, due to its degenerative characteristic. There isnโ€™t any available cure for it, and it is the second most common cause of blindness in the world. Most of the people affected by it only discovers the disease when it is already too late. Regular visits to the ophthalmologist are the best way to prevent or contain it, with a precise diagnosis performed with professional equipment. From another perspective, for some individuals or populations, this task can be difficult to accomplish, due to several restrictions, such as low incoming resources, geographical adversities, and travelling restrictions (distance, lack of means of transportation, etc.). Also, logistically, due to its dimensions, relocating the professional equipment can be expensive, thus becoming not viable to bring them to remote areas. In the market, low-cost products like the D-Eye lens offer an alternative to meet this need. The D-Eye lens can be attached to a smartphone to capture fundus images, but it presents a major drawback in terms of lower-quality imaging when compared to professional equipment. This work presents and evaluates methods for eye reading with D-Eye recordings. This involves exposing the retina in two steps: object detection and summarization via object mosaicing. Deep learning methods, such as the YOLO family architecture, were used for retina registration as an object detector. The summarization methods presented and inferred in this work mosaiced the best retina images together to produce a more detailed resultant image. After selecting the best workflow from these methods, a final inference was performed and visually evaluated, the results were not rich enough to serve as a pre-screening medical assessment, determining that improvements in the actual algorithm and technology are needed to retrieve better imaging

    Nucleus segmentation : towards automated solutions

    Get PDF
    Single nucleus segmentation is a frequent challenge of microscopy image processing, since it is the first step of many quantitative data analysis pipelines. The quality of tracking single cells, extracting features or classifying cellular phenotypes strongly depends on segmentation accuracy. Worldwide competitions have been held, aiming to improve segmentation, and recent years have definitely brought significant improvements: large annotated datasets are now freely available, several 2D segmentation strategies have been extended to 3D, and deep learning approaches have increased accuracy. However, even today, no generally accepted solution and benchmarking platform exist. We review the most recent single-cell segmentation tools, and provide an interactive method browser to select the most appropriate solution.Peer reviewe

    Deep learning for real-world object detection

    Get PDF

    Hybrid model for Single-Stage Multi-Person Pose Estimation

    Full text link
    In general, human pose estimation methods are categorized into two approaches according to their architectures: regression (i.e., heatmap-free) and heatmap-based methods. The former one directly estimates precise coordinates of each keypoint using convolutional and fully-connected layers. Although this approach is able to detect overlapped and dense keypoints, unexpected results can be obtained by non-existent keypoints in a scene. On the other hand, the latter one is able to filter the non-existent ones out by utilizing predicted heatmaps for each keypoint. Nevertheless, it suffers from quantization error when obtaining the keypoint coordinates from its heatmaps. In addition, unlike the regression one, it is difficult to distinguish densely placed keypoints in an image. To this end, we propose a hybrid model for single-stage multi-person pose estimation, named HybridPose, which mutually overcomes each drawback of both approaches by maximizing their strengths. Furthermore, we introduce self-correlation loss to inject spatial dependencies between keypoint coordinates and their visibility. Therefore, HybridPose is capable of not only detecting densely placed keypoints, but also filtering the non-existent keypoints in an image. Experimental results demonstrate that proposed HybridPose exhibits the keypoints visibility without performance degradation in terms of the pose estimation accuracy
    • โ€ฆ
    corecore