250 research outputs found

    Grounding semantics in robots for Visual Question Answering

    Get PDF
    In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning

    Statistical Image Analysis for Image Evolution

    Get PDF
    This thesis is focused on using genetic programming to evolve images based on lightweight features extracted from a given target image. The main motivation of this thesis is research by Lombardi et al. in which an image retrieval system is developed based on lightweight statistical features of images for comparing and classifying them in painting style categories; primarily based on color matching. In this thesis, automatic fitness scoring of variations of up to 17 lightweight image features using many-objective fitness evaluation was used to evolve textures. Evolved results were shown to have similar color characteristics to target images. Although a human survey was conducted to confirm those results, it was inconclusive

    Detection of Motorcycle Headlights Using YOLOv5 and HSV

    Get PDF
    "Electronic Traffic Law Enforcement" (ETLE) denotes a mechanism that employs electronic technologies to implement traffic regulations. This commonly entails utilizing a range of electronic apparatuses like cameras, sensors, and automated setups to oversee and uphold traffic protocols, administer fines, and enhance road security. ETLE systems are frequently utilized for identifying and sanctioning infractions like exceeding speed limits, disregarding red lights, and turning off the headlights. In Indonesia, there is currently no dedicated system designed to detect traffic violation, especially regarding vehicle headlights. Therefore, this research was conducted to detect vehicle headlights using digital images. With the results of this study, it will be possible to develop a system capable of classifying whether vehicle headlights are on or off. This research employed the deep learning method in the form of the YOLOv5 model, which achieved an accuracy of 94.12% in detecting vehicle images. Furthermore, the white color extraction method was performed by projecting the RGB space to HSV to detect the Region of Interest (ROI) of the vehicle headlights, achieving an accuracy of 73.76%. The results of this vehicle headlight detection are influenced by factors such as lighting, image capture angle, and vehicle type

    Investigation on advanced image search techniques

    Get PDF
    Content-based image search for retrieval of images based on the similarity in their visual contents, such as color, texture, and shape, to a query image is an active research area due to its broad applications. Color, for example, provides powerful information for image search and classification. This dissertation investigates advanced image search techniques and presents new color descriptors for image search and classification and robust image enhancement and segmentation methods for iris recognition. First, several new color descriptors have been developed for color image search. Specifically, a new oRGB-SIFT descriptor, which integrates the oRGB color space and the Scale-Invariant Feature Transform (SIFT), is proposed for image search and classification. The oRGB-SIFT descriptor is further integrated with other color SIFT features to produce the novel Color SIFT Fusion (CSF), the Color Grayscale SIFT Fusion (CGSF), and the CGSF+PHOG descriptors for image category search with applications to biometrics. Image classification is implemented using a novel EFM-KNN classifier, which combines the Enhanced Fisher Model (EFM) and the K Nearest Neighbor (KNN) decision rule. Experimental results on four large scale, grand challenge datasets have shown that the proposed oRGB-SIFT descriptor improves recognition performance upon other color SIFT descriptors, and the CSF, the CGSF, and the CGSF+PHOG descriptors perform better than the other color SIFT descriptors. The fusion of both Color SIFT descriptors (CSF) and Color Grayscale SIFT descriptor (CGSF) shows significant improvement in the classification performance, which indicates that various color-SIFT descriptors and grayscale-SIFT descriptor are not redundant for image search. Second, four novel color Local Binary Pattern (LBP) descriptors are presented for scene image and image texture classification. Specifically, the oRGB-LBP descriptor is derived in the oRGB color space. The other three color LBP descriptors, namely, the Color LBP Fusion (CLF), the Color Grayscale LBP Fusion (CGLF), and the CGLF+PHOG descriptors, are obtained by integrating the oRGB-LBP descriptor with some additional image features. Experimental results on three large scale, grand challenge datasets have shown that the proposed descriptors can improve scene image and image texture classification performance. Finally, a new iris recognition method based on a robust iris segmentation approach is presented for improving iris recognition performance. The proposed robust iris segmentation approach applies power-law transformations for more accurate detection of the pupil region, which significantly reduces the candidate limbic boundary search space for increasing detection accuracy and efficiency. As the limbic circle, which has a center within a close range of the pupil center, is selectively detected, the eyelid detection approach leads to improved iris recognition performance. Experiments using the Iris Challenge Evaluation (ICE) database show the effectiveness of the proposed method

    Facial analysis in video : detection and recognition

    Get PDF
    Biometric authentication systems automatically identify or verify individuals using physiological (e.g., face, fingerprint, hand geometry, retina scan) or behavioral (e.g., speaking pattern, signature, keystroke dynamics) characteristics. Among these biometrics, facial patterns have the major advantage of being the least intrusive. Automatic face recognition systems thus have great potential in a wide spectrum of application areas. Focusing on facial analysis, this dissertation presents a face detection method and numerous feature extraction methods for face recognition. Concerning face detection, a video-based frontal face detection method has been developed using motion analysis and color information to derive field of interests, and distribution-based distance (DBD) and support vector machine (SVM) for classification. When applied to 92 still images (containing 282 faces), this method achieves 98.2% face detection rate with two false detections, a performance comparable to the state-of-the-art face detection methods; when applied to videQ streams, this method detects faces reliably and efficiently. Regarding face recognition, extensive assessments of face recognition performance in twelve color spaces have been performed, and a color feature extraction method defined by color component images across different color spaces is shown to help improve the baseline performance of the Face Recognition Grand Challenge (FRGC) problems. The experimental results show that some color configurations, such as YV in the YUV color space and YJ in the YIQ color space, help improve face recognition performance. Based on these improved results, a novel feature extraction method implementing genetic algorithms (GAs) and the Fisher linear discriminant (FLD) is designed to derive the optimal discriminating features that lead to an effective image representation for face recognition. This method noticeably improves FRGC ver1.0 Experiment 4 baseline recognition rate from 37% to 73%, and significantly elevates FRGC xxxx Experiment 4 baseline verification rate from 12% to 69%. Finally, four two-dimensional (2D) convolution filters are derived for feature extraction, and a 2D+3D face recognition system implementing both 2D and 3D imaging modalities is designed to address the FRGC problems. This method improves FRGC ver2.0 Experiment 3 baseline performance from 54% to 72%

    Creating 3D object descriptors using a genetic algorithm

    Get PDF
    In the technological world that we live in, the need for computer vision became almost as important as human vision. We are surrounded be all kinds of machines that need to have their own virtual eyes. The most developed cars have software that can analyze traffic signs in order to warn the driver about the eventsontheroad. Whenwesendaspacerovertootherplanetitisimportantthatitcananalyzetheground in order to avoid obstacles that would lead to its destruction. Thereisstillmuchworktobedoneinthefieldofcomputervisionwiththeviewtoimprovetheperformance and speed of recognition tasks. There are many available descriptors used for 3D point cloud recognition and some of them are explained in this thesis. The aim of this work is to design descriptors that can match correctly 3D point clouds. The idea is to use artificial intelligence, in the form of a GA to obtain optimized parameters for the descriptors. For this purpose the PCL [RC11] is used, which deals with the manipulation of 3D points data. The created descriptors are explained and experiments are done to illustrate their performance. The main conclusions are that there is still much work to be done in shape recognition. The descriptor developed in this thesis that use only color information is better than the descriptors that use only shape data. Although we have achieved descriptors withgoodperformanceinthisthesis,therecouldbeawaytoimprovethemevenmore. As the descriptor that use only color data is better than the shape-only descriptors, we can expect that there is a better way to represent the shape of an object. Humans can recognize better objects by shape than by color, what makes us wonder if there is a way to improve the techniques used for shape description

    Physics methods for image classification with Deep Neural Networks

    Get PDF
    The studies performed in this thesis see their light in the context of an internship carried out in Porini, a dynamic business versed in digital consulting and software development. The ultimate goal of this research is to develop an algorithm to perform product recognition of common items found in supermarkets or grocery shops. The first part of the analysis will consider a simplified toy model, in order to gain a deeper insight on the data at disposal. In particular, a manual feature extraction will be designed, consisting of an equalisation procedure and a custom-built cropping for the images. A novel classification model will be then defined using average RGB histograms as references for each product class and testing out different metrics to quantify the similarity between two images. This implementation will culminate in the realization of a proof of concept in the form of an application for mobile platforms. In the second part of the study, object detection and recognition will be tackled in a more generalized context. This will require the employment of more advanced, pre-built algorithms, particularly in the form of deep convolutional neural networks. Specifically, a focus will be made on the single-shot approach, where a duly trained detector only observes the image at once, as a whole, before outputting its detection prediction; an exploratory analysis will be performed taking advantage of the YOLO model, a state-of-the-art implementation in the field. The results obtained are very satisfactory: the first part of the study has led to the definition of a new customized algorithm for classification which is robust and well-optimized, while in the second one promising foundations have been laid in the development of advanced object recognition tools for general use cases.ope

    Machine Learning in Robotic Navigation:Deep Visual Localization and Adaptive Control

    Get PDF
    The work conducted in this thesis contributes to the robotic navigation field by focusing on different machine learning solutions: supervised learning with (deep) neural networks, unsupervised learning, and reinforcement learning.First, we propose a semi-supervised machine learning approach that can dynamically update the robot controller's parameters using situational analysis through feature extraction and unsupervised clustering. The results show that the robot can adapt to the changes in its surroundings, resulting in a thirty percent improvement in navigation speed and stability.Then, we train multiple deep neural networks for estimating the robot's position in the environment using ground truth information provided by a classical localization and mapping approach. We prepare two image-based localization datasets in 3D simulation and compare the results of a traditional multilayer perceptron, a stacked denoising autoencoder, and a convolutional neural network (CNN). The experiment results show that our proposed inception based CNNs without pooling layers perform very well in all the environments. Finally, we propose a two-stage learning framework for visual navigation in which the experience of the agent during exploration of one goal is shared to learn to navigate to other goals. The multi-goal Q-function learns to traverse the environment by using the provided discretized map. Transfer learning is applied to the multi-goal Q-function from a maze structure to a 2D simulator and is finally deployed in a 3D simulator where the robot uses the estimated locations from the position estimator deep CNNs. The results show a significant improvement when multi-goal reinforcement learning is used
    corecore