362 research outputs found

    YOLO-LHD: an enhanced lightweight approach for helmet wearing detection in industrial environments

    Get PDF
    Establishing a lightweight yet high-precision object detection algorithm is paramount for accurately assessing workers’ helmet-wearing status in intricate industrial settings. Helmet detection is inherently challenging due to factors like the diminutive target size, intricate backgrounds, and the need to strike a balance between model compactness and detection accuracy. In this paper, we propose YOLO-LHD (You Only Look Once-Lightweight Helmet Detection), an efficient framework built upon the YOLOv8 object detection model. The proposed approach enhances the model’s ability to detect small targets in complex scenes by incorporating the Coordinate attention mechanism and Focal loss function, which introduce high-resolution features and large-scale detection heads. Additionally, we integrate the improved Ghostv2 module into the backbone feature extraction network to further improve the balance between model accuracy and size. We evaluated our method on MHWD dataset established in this study and compared it with the baseline model YOLOv8n. The proposed YOLO-LHD model achieved a reduction of 66.1% in model size while attaining the best 94.3% mAP50 with only 0.86M parameters. This demonstrates the effectiveness of the proposed approach in achieving lightweight deployment and high-precision helmet detection

    Improved Human Face Recognition by Introducing a New Cnn Arrangement and Hierarchical Method

    Get PDF
    Human face recognition has become one of the most attractive topics in the fields ‎of biometrics due to its wide applications. The face is a part of the body that carries ‎the most information regarding identification in human interactions. Features such ‎as the composition of facial components, skin tone, face\u27s central axis, distances ‎between eyes, and many more, alongside the other biometrics, are used ‎unconsciously by the brain to distinguish a person. Indeed, analyzing the facial ‎features could be the first method humans use to identify a person in their lives. ‎As one of the main biometric measures, human face recognition has been utilized in ‎various commercial applications over the past two decades. From banking to smart ‎advertisement and from border security to mobile applications. These are a few ‎examples that show us how far these methods have come. We can confidently say ‎that the techniques for face recognition have reached an acceptable level of ‎accuracy to be implemented in some real-life applications. However, there are other ‎applications that could benefit from improvement. Given the increasing demand ‎for the topic and the fact that nowadays, we have almost all the infrastructure that ‎we might need for our application, make face recognition an appealing topic. ‎ When we are evaluating the quality of a face recognition method, there are some ‎benchmarks that we should consider: accuracy, speed, and complexity are the main ‎parameters. Of course, we can measure other aspects of the algorithm, such as size, ‎precision, cost, etc. But eventually, every one of those parameters will contribute to ‎improving one or some of these three concepts of the method. Then again, although ‎we can see a significant level of accuracy in existing algorithms, there is still much ‎room for improvement in speed and complexity. In addition, the accuracy of the ‎mentioned methods highly depends on the properties of the face images. In other ‎words, uncontrolled situations and variables like head pose, occlusion, lighting, ‎image noise, etc., can affect the results dramatically. ‎ Human face recognition systems are used in either identification or verification. In ‎verification, the system\u27s main goal is to check if an input belongs to a pre-determined tag or a person\u27s ID. ‎Almost every face recognition system consists of four major steps. These steps are ‎pre-processing, face detection, feature extraction, and classification. Improvement ‎in each of these steps will lead to the overall enhancement of the system. In this ‎work, the main objective is to propose new, improved and enhanced methods in ‎each of those mentioned steps, evaluate the results by comparing them with other ‎existing techniques and investigate the outcome of the proposed system.

    Deep Representation Learning with Limited Data for Biomedical Image Synthesis, Segmentation, and Detection

    Get PDF
    Biomedical imaging requires accurate expert annotation and interpretation that can aid medical staff and clinicians in automating differential diagnosis and solving underlying health conditions. With the advent of Deep learning, it has become a standard for reaching expert-level performance in non-invasive biomedical imaging tasks by training with large image datasets. However, with the need for large publicly available datasets, training a deep learning model to learn intrinsic representations becomes harder. Representation learning with limited data has introduced new learning techniques, such as Generative Adversarial Networks, Semi-supervised Learning, and Self-supervised Learning, that can be applied to various biomedical applications. For example, ophthalmologists use color funduscopy (CF) and fluorescein angiography (FA) to diagnose retinal degenerative diseases. However, fluorescein angiography requires injecting a dye, which can create adverse reactions in the patients. So, to alleviate this, a non-invasive technique needs to be developed that can translate fluorescein angiography from fundus images. Similarly, color funduscopy and optical coherence tomography (OCT) are also utilized to semantically segment the vasculature and fluid build-up in spatial and volumetric retinal imaging, which can help with the future prognosis of diseases. Although many automated techniques have been proposed for medical image segmentation, the main drawback is the model's precision in pixel-wise predictions. Another critical challenge in the biomedical imaging field is accurately segmenting and quantifying dynamic behaviors of calcium signals in cells. Calcium imaging is a widely utilized approach to studying subcellular calcium activity and cell function; however, large datasets have yielded a profound need for fast, accurate, and standardized analyses of calcium signals. For example, image sequences from calcium signals in colonic pacemaker cells ICC (Interstitial cells of Cajal) suffer from motion artifacts and high periodic and sensor noise, making it difficult to accurately segment and quantify calcium signal events. Moreover, it is time-consuming and tedious to annotate such a large volume of calcium image stacks or videos and extract their associated spatiotemporal maps. To address these problems, we propose various deep representation learning architectures that utilize limited labels and annotations to address the critical challenges in these biomedical applications. To this end, we detail our proposed semi-supervised, generative adversarial networks and transformer-based architectures for individual learning tasks such as retinal image-to-image translation, vessel and fluid segmentation from fundus and OCT images, breast micro-mass segmentation, and sub-cellular calcium events tracking from videos and spatiotemporal map quantification. We also illustrate two multi-modal multi-task learning frameworks with applications that can be extended to other domains of biomedical applications. The main idea is to incorporate each of these as individual modules to our proposed multi-modal frameworks to solve the existing challenges with 1) Fluorescein angiography synthesis, 2) Retinal vessel and fluid segmentation, 3) Breast micro-mass segmentation, and 4) Dynamic quantification of calcium imaging datasets

    Generative Adversarial Network and Its Application in Aerial Vehicle Detection and Biometric Identification System

    Get PDF
    In recent years, generative adversarial networks (GANs) have shown great potential in advancing the state-of-the-art in many areas of computer vision, most notably in image synthesis and manipulation tasks. GAN is a generative model which simultaneously trains a generator and a discriminator in an adversarial manner to produce real-looking synthetic data by capturing the underlying data distribution. Due to its powerful ability to generate high-quality and visually pleasingresults, we apply it to super-resolution and image-to-image translation techniques to address vehicle detection in low-resolution aerial images and cross-spectral cross-resolution iris recognition. First, we develop a Multi-scale GAN (MsGAN) with multiple intermediate outputs, which progressively learns the details and features of the high-resolution aerial images at different scales. Then the upscaled super-resolved aerial images are fed to a You Only Look Once-version 3 (YOLO-v3) object detector and the detection loss is jointly optimized along with a super-resolution loss to emphasize target vehicles sensitive to the super-resolution process. There is another problem that remains unsolved when detection takes place at night or in a dark environment, which requires an IR detector. Training such a detector needs a lot of infrared (IR) images. To address these challenges, we develop a GAN-based joint cross-modal super-resolution framework where low-resolution (LR) IR images are translated and super-resolved to high-resolution (HR) visible (VIS) images before applying detection. This approach significantly improves the accuracy of aerial vehicle detection by leveraging the benefits of super-resolution techniques in a cross-modal domain. Second, to increase the performance and reliability of deep learning-based biometric identification systems, we focus on developing conditional GAN (cGAN) based cross-spectral cross-resolution iris recognition and offer two different frameworks. The first approach trains a cGAN to jointly translate and super-resolve LR near-infrared (NIR) iris images to HR VIS iris images to perform cross-spectral cross-resolution iris matching to the same resolution and within the same spectrum. In the second approach, we design a coupled GAN (cpGAN) architecture to project both VIS and NIR iris images into a low-dimensional embedding domain. The goal of this architecture is to ensure maximum pairwise similarity between the feature vectors from the two iris modalities of the same subject. We have also proposed a pose attention-guided coupled profile-to-frontal face recognition network to learn discriminative and pose-invariant features in an embedding subspace. To show that the feature vectors learned by this deep subspace can be used for other tasks beyond recognition, we implement a GAN architecture which is able to reconstruct a frontal face from its corresponding profile face. This capability can be used in various face analysis tasks, such as emotion detection and expression tracking, where having a frontal face image can improve accuracy and reliability. Overall, our research works have shown its efficacy by achieving new state-of-the-art results through extensive experiments on publicly available datasets reported in the literature

    Offline and Online Interactive Frameworks for MRI and CT Image Analysis in the Healthcare Domain : The Case of COVID-19, Brain Tumors and Pancreatic Tumors

    Get PDF
    Medical imaging represents the organs, tissues and structures underneath the outer layers of skin and bones etc. and stores information on normal anatomical structures for abnormality detection and diagnosis. In this thesis, tools and techniques are used to automate the analysis of medical images, emphasizing the detection of brain tumor anomalies from brain MRIs, Covid infections from lung CT images and pancreatic tumor from pancreatic CT images. Image processing methods such as filtering and thresholding models, geometry models, graph models, region-based analysis, connected component analysis, machine learning models, and recent deep learning models are used. The following problems for medical images : abnormality detection, abnormal region segmentation, interactive user interface to represent the results of detection and segmentation while receiving feedbacks from healthcare professionals to improve the analysis procedure, and finally report generation, are considered in this research. Complete interactive systems containing conventional models, machine learning, and deep learning methods for different types of medical abnormalities have been proposed and developed in this thesis. The experimental results show promising outcomes that has led to the incorporation of the methods for the proposed solutions based on the observations of the performance metrics and their comparisons. Although currently separate systems have been developed for brain tumor, Covid and pancreatic cancer, the success of the developed systems show a promising potential to combine them to form a generalized system for analyzing medical imaging of different types collected from any organs to detect any type of abnormalities

    Vertebral Compression Fracture Detection With Novel 3D Localisation

    Full text link
    Vertebral compression fractures (VCF) often go undetected in radiology images, potentially leading to secondary fractures and permanent disability or even death. The objective of this thesis is to develop a fully automated method for detecting VCF in incidental CT images acquired for other purposes, thereby facilitating better follow up and treatment. The proposed approach is based on 3D localisation in CT images, followed by VCF detection in the localised regions. The 3D localisation algorithm combines deep reinforcement learning (DRL) with imitation learning (IL) to extract thoracic / lumbar spine regions from chest / abdomen CT scans. The algorithm generates six bounding boxes as Regions of Interest (ROI) using three different CNN models, with an average Jaccard Index (JI)/Dice Coefficient (DC) of 74.21%/84.71%. The extracted ROI were then divided into slices and the slices into patches to train four convolutional neural network (CNN) models for VCF detection at the patch level. The predictions from the patches were aggregated at bounding box level, and majority voting performed to decide on the presence / absence of VCF for a patient. The best performing model was a six layered CNN, which together with majority voting achieved threefold cross validation accuracy / F1 Score of 85.95% / 85.94% from 308 chest scans. The same model also achieved a fivefold cross validation accuracy / F1 score of 86.67% / 87.04% from 168 abdomen scans. Because of the success of the 3D localisation algorithm, it was also trained on other abdominal organs, namely the spleen and left and right kidneys, with promising results. The 3D localisation algorithm was enhanced to work with fused bounding boxes and also in semi-supervised mode to address the problem of annotation time by radiologists. Experiments using three different proportions of labelled and unlabelled data achieved fairly good performance, although not as good as the fully supervised equivalents. Finally, VCF detection in a weakly supervised multiple instance learning (MIL) setting was performed to reduce radiologists’ time for annotations, together with majority voting on the six bounding boxes. The best performing model was the six layered CNN which achieved threefold cross validation accuracy / F1 score of 81.05% / 80.74 % on 308 thoracic scans, and fivefold cross validation accuracy / F1 Score of 85.45% / 86.61% on 168 abdomen scans. Overall, the results are comparable to the state-of the art that used an order of magnitude more scans

    Road Sign Board Direction and Location Extraction and Recognition for Autonomous Vehicle.

    Get PDF
    The problem of direction and location identification is very important in technologies used for Autonomous vehicles. while the navigation systems are that they cannot cover all areas due to a lack of signals or changes made on routes due to maintenance or upgrades. This research will focus on recognizing the sign and extracting address location names and directions from road signs. Moreover, it will help better identify road exits and lane directions for better route planning. In this paper we use YOLOv5 to identify the road board sign location and direction. Then extract the direction of each address location that are included in the road board sign and inform the car about the direction because autonomous car has no any driver so the car must decide by itself witch direction to choose to get the goal address location. This system can be used to continuously cheek the frames of the video that is taken by the car’s camera for road sign boards and analyses the image to find the direction of each location that are explained inside road sign board on the road. The proposed system consists of a camera mounted on top of the front mirror of the vehicle, and also a computer to run the recorded video on the system. In experiments, yolov5 framework achieves the best performance of 98.76% mean average precision (mAP) at Intersection over Union (IoU) threshold of 0.5, evaluated on our new developed dataset. And 91.31% on different IoU thresholds, ranging from 0.5 to 0.95

    Occupancy Analysis of the Outdoor Football Fields

    Get PDF

    A survey of machine learning-based methods for COVID-19 medical image analysis

    Get PDF
    The ongoing COVID-19 pandemic caused by the SARS-CoV-2 virus has already resulted in 6.6 million deaths with more than 637 million people infected after only 30 months since the first occurrences of the disease in December 2019. Hence, rapid and accurate detection and diagnosis of the disease is the first priority all over the world. Researchers have been working on various methods for COVID-19 detection and as the disease infects lungs, lung image analysis has become a popular research area for detecting the presence of the disease. Medical images from chest X-rays (CXR), computed tomography (CT) images, and lung ultrasound images have been used by automated image analysis systems in artificial intelligence (AI)- and machine learning (ML)-based approaches. Various existing and novel ML, deep learning (DL), transfer learning (TL), and hybrid models have been applied for detecting and classifying COVID-19, segmentation of infected regions, assessing the severity, and tracking patient progress from medical images of COVID-19 patients. In this paper, a comprehensive review of some recent approaches on COVID-19-based image analyses is provided surveying the contributions of existing research efforts, the available image datasets, and the performance metrics used in recent works. The challenges and future research scopes to address the progress of the fight against COVID-19 from the AI perspective are also discussed. The main objective of this paper is therefore to provide a summary of the research works done in COVID detection and analysis from medical image datasets using ML, DL, and TL models by analyzing their novelty and efficiency while mentioning other COVID-19-based review/survey researches to deliver a brief overview on the maximum amount of information on COVID-19-based existing researches. [Figure not available: see fulltext.

    Automatický systém docházky na bázi rozpoznávání tváře

    Get PDF
    In colleges, universities, organizations, schools, and offices, taking attendance is one of the most important tasks that must be done on a timely or daily basis. Most of the time, it is done manually, for example, by calling names, passing around sheets of papers or checking respective identification cards. These methods are time consuming and sometimes the records are lost. Hence, there is a need for a computer-based attendance management system which can assist with this process of taking attendances automatically and saving them such that the record can be accessible at any time. The main goal of this project is to create a face recognition-based attendance system that will turn the manual process of taking attendance into an automated one. This project meets the requirements for bringing modernization to the way attendance is being taken, as well as the criteria for time management. It can be implemented in classrooms, offices or anywhere an attendance system is required. Users must be registered into the face recognition attendance system. Registration involves taking a picture of each user’s face, building a model from their facial characteristics, and then saving this model into a database. The camera is tasked with taking pictures of users faces and their facial characteristics are extracted using Python OpenCV. Moreover, the face recognition process comes in when users want to take their attendances. Their facial characteristics are compared to those stored in the database and up on a successful face recognition, the system automatically register the attendance and saves it to another database. An Excel sheet showing all recorded attendances can be generated at any time.Na vysokých školách, univerzitách, organizacích, školách a úřadech je docházka jedním z nejdůležitějších úkolů, které je třeba provádět včas nebo denně. To se obvykle provádí ručně, například vyvoláváním jmen, procházením papírů nebo kontrolou příslušných identifikačních karet. Tyto metody jsou časově náročné a někdy dochází ke ztrátě záznamů. Existuje tedy potřeba počítačového systému pro správu docházky, který může tomuto procesu automatického přebírání docházky a jejího ukládání napomáhat, aby byl záznam kdykoli zpřístupněn. Hlavním cílem této práce je vytvořit docházkový systém založený na rozpoznávání obličeje, který změní manuální proces přebírání docházky na automatizovaný. Tento projekt splňuje požadavky na modernizaci docházkové metody i kritéria pro time management. Může být implementován v učebnách, kancelářích nebo kdekoli, kde je vyžadován docházkový systém. Uživatelé musí být registrováni v docházkovém systému rozpoznávání obličejů. Registrace zahrnuje pořízení snímku obličeje každého uživatele, vytvoření modelu z jeho obličejových charakteristik a následné uložení tohoto modelu do databáze. Kamera má za úkol pořizovat snímky tváří uživatelů a jejich obličejové charakteristiky jsou extrahovány pomocí Python OpenCV. Proces rozpoznávání obličeje navíc přichází na řadu, když si uživatelé chtějí vzít docházku. Jejich obličejové charakteristiky jsou porovnávány s těmi uloženými v databázi a po úspěšném rozpoznání obličeje systém automaticky zaregistruje docházku a uloží ji do jiné databáze. List Excel zobrazující veškerou zaznamenanou docházku lze vygenerovat kdykoli.440 - Katedra telekomunikační technikydobř
    corecore