333 research outputs found

    Evaluating the anticipated outcomes of MRI seizure image from open-source tool- Prototype approach

    Full text link
    Epileptic Seizure is an abnormal neuronal exertion in the brain, affecting nearly 70 million of the world's population (Ngugi et al., 2010). So many open-source neuroimaging tools are used for metabolism checkups and analysis purposes. The scope of open-source tools like MATLAB, Slicer 3D, Brain Suite21a, SPM, and MedCalc are explained in this paper. MATLAB was used by 60% of the researchers for their image processing and 10% of them use their proprietary software. More than 30% of the researchers use other open-source software tools with their processing techniques for the study of magnetic resonance seizure image

    Aligning Figurative Paintings With Their Sources for Semantic Interpretation

    Get PDF
    This paper reports steps in probing the artistic methods of figurative painters through computational algorithms. We explore a comparative method that investigates the relation between the source of a painting, typically a photograph or an earlier painting, and the painting itself. A first crucial step in this process is to find the source and to crop, standardize and align it to the painting so that a comparison becomes possible. The next step is to apply different low-level algorithms to construct difference maps for color, edges, texture, brightness, etc. From this basis, various subsequent operations become possible to detect and compare features of the image, such as facial action units and the emotions they signify. This paper demonstrates a pipeline we have built and tested using paintings by a renowned contemporary painter Luc Tuymans. We focus in this paper particularly on the alignment process, on edge difference maps, and on the utility of the comparative method for bringing out the semantic significance of a painting

    Lane Line Detection and Object Scene Segmentation Using Otsu Thresholding and the Fast Hough Transform for Intelligent Vehicles in Complex Road Conditions

    Get PDF
    An Otsu-threshold- and Canny-edge-detection-based fast Hough transform (FHT) approach to lane detection was proposed to improve the accuracy of lane detection for autonomous vehicle driving. During the last two decades, autonomous vehicles have become very popular, and it is constructive to avoid traffic accidents due to human mistakes. The new generation needs automatic vehicle intelligence. One of the essential functions of a cutting-edge automobile system is lane detection. This study recommended the idea of lane detection through improved (extended) Canny edge detection using a fast Hough transform. The Gaussian blur filter was used to smooth out the image and reduce noise, which could help to improve the edge detection accuracy. An edge detection operator known as the Sobel operator calculated the gradient of the image intensity to identify edges in an image using a convolutional kernel. These techniques were applied in the initial lane detection module to enhance the characteristics of the road lanes, making it easier to detect them in the image. The Hough transform was then used to identify the routes based on the mathematical relationship between the lanes and the vehicle. It did this by converting the image into a polar coordinate system and looking for lines within a specific range of contrasting points. This allowed the algorithm to distinguish between the lanes and other features in the image. After this, the Hough transform was used for lane detection, making it possible to distinguish between left and right lane marking detection extraction; the region of interest (ROI) must be extracted for traditional approaches to work effectively and easily. The proposed methodology was tested on several image sequences. The least-squares fitting in this region was then used to track the lane. The proposed system demonstrated high lane detection in experiments, demonstrating that the identification method performed well regarding reasoning speed and identification accuracy, which considered both accuracy and real-time processing and could satisfy the requirements of lane recognition for lightweight automatic driving systems

    Synthetic Aperture Radar (SAR) Meets Deep Learning

    Get PDF
    This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports

    Estrategias de visión por computador para la estimación de pose en el contexto de aplicaciones robóticas industriales: avances en el uso de modelos tanto clásicos como de Deep Learning en imágenes 2D

    Get PDF
    184 p.La visión por computador es una tecnología habilitadora que permite a los robots y sistemas autónomos percibir su entorno. Dentro del contexto de la industria 4.0 y 5.0, la visión por ordenador es esencial para la automatización de procesos industriales. Entre las técnicas de visión por computador, la detección de objetos y la estimación de la pose 6D son dos de las más importantes para la automatización de procesos industriales. Para dar respuesta a estos retos, existen dos enfoques principales: los métodos clásicos y los métodos de aprendizaje profundo. Los métodos clásicos son robustos y precisos, pero requieren de una gran cantidad de conocimiento experto para su desarrollo. Por otro lado, los métodos de aprendizaje profundo son fáciles de desarrollar, pero requieren de una gran cantidad de datos para su entrenamiento.En la presente memoria de tesis se presenta una revisión de la literatura sobre técnicas de visión por computador para la detección de objetos y la estimación de la pose 6D. Además se ha dado respuesta a los siguientes retos: (1) estimación de pose mediante técnicas de visión clásicas, (2) transferencia de aprendizaje de modelos 2D a 3D, (3) la utilización de datos sintéticos para entrenar modelos de aprendizaje profundo y (4) la combinación de técnicas clásicas y de aprendizaje profundo. Para ello, se han realizado contribuciones en revistas de alto impacto que dan respuesta a los anteriores retos

    Computational Efficiency Studies in Computer Vision Tasks

    Get PDF
    Computer vision has made massive progress in recent years, thanks to hardware and algorithms development. Most methods are performance-driven meanwhile have a lack of consideration for energy efficiency. This dissertation proposes computational efficiency boosting methods for three different vision tasks: ultra-high resolution images segmentation, optical characters recognition for Unmanned Aerial Vehicles (UAV) based videos, and multiple object detection for UAV based videos. The pattern distribution of ultra-high resolution images is usually unbalanced. While part of an image contains complex and fine-grained patterns such as boundaries, most areas are composed of simple and repeated patterns. In the first chapter, we propose to learn a skip map, which can guide a segmentation network to skip simple patterns and hence reduce computational complexity. Specifically, the skip map highlights simple-pattern areas that can be down-sampled for processing at a lower resolution, while the remaining complex part is still segmented at the original resolution. Applied on the state-of-the-art ultra-high resolution image segmentation network GLNet, our proposed skip map saves more than 30% computation while maintaining comparable segmentation performance. In the second chapter, we propose an end-to-end system for UAV videos OCR framework. We first revisit RCNN’s crop & resize training strategy and empirically find that it outperforms aligned RoI sampling on a real-world video text dataset captured by UAV. We further propose a multi-stage image processor that takes videos’ redundancy, continuity, and mixed degradation into account to reduce energy consumption. Lastly, the model is pruned and quantized before deployed on Raspberry Pi. Our proposed energy-efficient video text spotting solution, dubbed as E²VTS, outperforms all previous methods by achieving a competitive tradeoff between energy efficiency and performance. In the last chapter, we propose an energy-efficient video multiple objects detection solution. Besides designing a fast multiple object detector, we propose a data synthesis and a knowledge transfer-based annotation method to overcome class imbalance and domain gap issues. This solution was implemented on LPCVC 2021 UVA challenge and judged to be the first-place winner

    Towards Robust Real-Time Scene Text Detection: From Semantic to Instance Representation Learning

    Full text link
    Due to the flexible representation of arbitrary-shaped scene text and simple pipeline, bottom-up segmentation-based methods begin to be mainstream in real-time scene text detection. Despite great progress, these methods show deficiencies in robustness and still suffer from false positives and instance adhesion. Different from existing methods which integrate multiple-granularity features or multiple outputs, we resort to the perspective of representation learning in which auxiliary tasks are utilized to enable the encoder to jointly learn robust features with the main task of per-pixel classification during optimization. For semantic representation learning, we propose global-dense semantic contrast (GDSC), in which a vector is extracted for global semantic representation, then used to perform element-wise contrast with the dense grid features. To learn instance-aware representation, we propose to combine top-down modeling (TDM) with the bottom-up framework to provide implicit instance-level clues for the encoder. With the proposed GDSC and TDM, the encoder network learns stronger representation without introducing any parameters and computations during inference. Equipped with a very light decoder, the detector can achieve more robust real-time scene text detection. Experimental results on four public datasets show that the proposed method can outperform or be comparable to the state-of-the-art on both accuracy and speed. Specifically, the proposed method achieves 87.2% F-measure with 48.2 FPS on Total-Text and 89.6% F-measure with 36.9 FPS on MSRA-TD500 on a single GeForce RTX 2080 Ti GPU.Comment: Accepted by ACM MM 202

    Convolutional Bidirectional Variational Autoencoder for Image Domain Translation of Dotted Arabic Expiration

    Full text link
    THIS paper proposes an approach of Ladder Bottom-up Convolutional Bidirectional Variational Autoencoder (LCBVAE) architecture for the encoder and decoder, which is trained on the image translation of the dotted Arabic expiration dates by reconstructing the Arabic dotted expiration dates into filled-in expiration dates. We employed a customized and adapted version of Convolutional Recurrent Neural Network CRNN model to meet our specific requirements and enhance its performance in our context, and then trained the custom CRNN model with the filled-in images from the year of 2019 to 2027 to extract the expiration dates and assess the model performance of LCBVAE on the expiration date recognition. The pipeline of (LCBVAE+CRNN) can be then integrated into an automated sorting systems for extracting the expiry dates and sorting the products accordingly during the manufacture stage. Additionally, it can overcome the manual entry of expiration dates that can be time-consuming and inefficient at the merchants. Due to the lack of the availability of the dotted Arabic expiration date images, we created an Arabic dot-matrix True Type Font (TTF) for the generation of the synthetic images. We trained the model with unrealistic synthetic dates of 59902 images and performed the testing on a realistic synthetic date of 3287 images from the year of 2019 to 2027, represented as yyyy/mm/dd. In our study, we demonstrated the significance of latent bottleneck layer with improving the generalization when the size is increased up to 1024 in downstream transfer learning tasks as for image translation. The proposed approach achieved an accuracy of 97% on the image translation with using the LCBVAE architecture that can be generalized for any downstream learning tasks as for image translation and reconstruction.Comment: 15 Pages, 10 figure

    Curve Sign Inventorying Method Using Smartphones and Deep Learning Technologies

    Get PDF
    The objective of the proposed research is to develop and assess a system using smartphones and deep learning technologies to automatically establish an intelligent and sustainable curve sign inventory from videos. The Manual on the Uniform Traffic Control Devices (MUTCD) is the nationwide regulator that defines the standards used for transportation asset installation and maintenance. The proposed system is one of the components of a larger methodology whose purpose is to accomplish a frequent and cost-effective MUTCD curve sign compliance checking and other curve safety checking in order to reduce the number of deadly crashes on curves. To automatically build an effective sign inventory from videos, four modules are needed: sign detection, classification, tracking and localization. For this purpose, a pipeline has been developed in the past by former students of the Transportation laboratory of Georgia Tech. However, this pipeline is not accurate enough and its different modules have never been critically tested and assessed. Therefore, the objective of this study is to improve the different modules and particularly the detection module, which is the most important module of the pipeline, and to critically assess these improved modules to determine the pipeline ability to build an effective sign inventory. The proposed system has been tested and assessed in real conditions on a mountain road with many curves and curve signs; it has shown that the detection module is able to detect every single curve sign with a very low number of detected non-curve signs (false positive), resulting in a precision of 0.97 and a recall of 1. The other modules also showed very promising results. Overall, this study demonstrates that the proposed system is suitable for building an accurate curve sign inventory that can be used by transportation agencies to get a precise idea of the condition of the curve sign networks on a particular road.M.S

    Experimental Characterization and Computer Vision-Assisted Detection of Pitting Corrosion on Stainless Steel Structural Members

    Get PDF
    Pitting corrosion is a prevalent form of corrosive damage that can weaken, damage, and initiate failure in corrosion-resistant metallic materials. For instance, 304 stainless steel is commonly utilized in various structures (e.g., miter gates, heat exchangers, and storage tanks), but is prone to failure through pitting corrosion and stress corrosion cracking under mechanical loading, regardless of its high corrosion resistance. In this study, to better understand the pitting corrosion damage development, controlled corrosion experiments were conducted to generate pits on 304 stainless steel specimens with and without mechanical loading. The pit development over time was characterized using a high-resolution laser scanner. In addition, to achieve scalable and automatic assessment of pitting corrosion conditions, two convolutional neural network-based computer vision algorithms were adopted and implemented to evaluate the efficacy of networks to identify existence of pitting damage. One was a newly trained convolutional neural network (CNN) using MATLAB software, while the other one was a retrained version of GoogLeNet. Overall, the experimental results showed that time is the dependent variable in predicting pit depth. Meanwhile, loading conditions significantly influence pit morphology. Under compression loading, pits form with larger surface opening areas, while under tension loading, pits have smaller surface opening areas. Deep pits of smaller areas are dangerous for structural members, as they can lead to high stress concentrations and early stress corrosion cracking (SCC). Furthermore, while the training library was limited and consisted of low-resolution images, the retrained GoogLeNet CNN showed promising potential for identifying pitting corrosion based on the evaluation of its performance parameters, including the accuracy, loss, recall, precision, and F1-measure
    • …
    corecore