8 research outputs found

    Ascertaining the Ideality of Photometric Stereo Datasets under Unknown Lighting

    Get PDF
    The standard photometric stereo model makes several assumptions that are rarely verified in experimental datasets. In particular, the observed object should behave as a Lambertian reflector, and the light sources should be positioned at an infinite distance from it, along a known direction. Even when Lambert’s law is approximately fulfilled, an accurate assessment of the relative position between the light source and the target is often unavailable in real situations. The Hayakawa procedure is a computational method for estimating such information directly from data images. It occasionally breaks down when some of the available images excessively deviate from ideality. This is generally due to observing a non-Lambertian surface, or illuminating it from a close distance, or both. Indeed, in narrow shooting scenarios, typical, e.g., of archaeological excavation sites, it is impossible to position a flashlight at a sufficient distance from the observed surface. It is then necessary to understand if a given dataset is reliable and which images should be selected to better reconstruct the target. In this paper, we propose some algorithms to perform this task and explore their effectiveness

    Person recognition based on deep gait: a survey.

    Get PDF
    Gait recognition, also known as walking pattern recognition, has expressed deep interest in the computer vision and biometrics community due to its potential to identify individuals from a distance. It has attracted increasing attention due to its potential applications and non-invasive nature. Since 2014, deep learning approaches have shown promising results in gait recognition by automatically extracting features. However, recognizing gait accurately is challenging due to the covariate factors, complexity and variability of environments, and human body representations. This paper provides a comprehensive overview of the advancements made in this field along with the challenges and limitations associated with deep learning methods. For that, it initially examines the various gait datasets used in the literature review and analyzes the performance of state-of-the-art techniques. After that, a taxonomy of deep learning methods is presented to characterize and organize the research landscape in this field. Furthermore, the taxonomy highlights the basic limitations of deep learning methods in the context of gait recognition. The paper is concluded by focusing on the present challenges and suggesting several research directions to improve the performance of gait recognition in the future

    Towards Safer Robot-Assisted Surgery: A Markerless Augmented Reality Framework

    Full text link
    Robot-assisted surgery is rapidly developing in the medical field, and the integration of augmented reality shows the potential of improving the surgeons' operation performance by providing more visual information. In this paper, we proposed a markerless augmented reality framework to enhance safety by avoiding intra-operative bleeding which is a high risk caused by the collision between the surgical instruments and the blood vessel. Advanced stereo reconstruction and segmentation networks are compared to find out the best combination to reconstruct the intra-operative blood vessel in the 3D space for the registration of the pre-operative model, and the minimum distance detection between the instruments and the blood vessel is implemented. A robot-assisted lymphadenectomy is simulated on the da Vinci Research Kit in a dry lab, and ten human subjects performed this operation to explore the usability of the proposed framework. The result shows that the augmented reality framework can help the users to avoid the dangerous collision between the instruments and the blood vessel while not introducing an extra load. It provides a flexible framework that integrates augmented reality into the medical robot platform to enhance safety during the operation

    Towards Detecting, Recognizing, and Parsing the Address Information from Bangla Signboard: A Deep Learning-based Approach

    Full text link
    Retrieving textual information from natural scene images is an active research area in the field of computer vision with numerous practical applications. Detecting text regions and extracting text from signboards is a challenging problem due to special characteristics like reflecting lights, uneven illumination, or shadows found in real-life natural scene images. With the advent of deep learning-based methods, different sophisticated techniques have been proposed for text detection and text recognition from the natural scene. Though a significant amount of effort has been devoted to extracting natural scene text for resourceful languages like English, little has been done for low-resource languages like Bangla. In this research work, we have proposed an end-to-end system with deep learning-based models for efficiently detecting, recognizing, correcting, and parsing address information from Bangla signboards. We have created manually annotated datasets and synthetic datasets to train signboard detection, address text detection, address text recognition, address text correction, and address text parser models. We have conducted a comparative study among different CTC-based and Encoder-Decoder model architectures for Bangla address text recognition. Moreover, we have designed a novel address text correction model using a sequence-to-sequence transformer-based network to improve the performance of Bangla address text recognition model by post-correction. Finally, we have developed a Bangla address text parser using the state-of-the-art transformer-based pre-trained language model

    Um estudo comparativo das abordagens de detecção e reconhecimento de texto para cenários de computação restrita

    Get PDF
    Orientadores: Ricardo da Silva Torres, Allan da Silva PintoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Textos são elementos fundamentais para uma efetiva comunicação em nosso cotidiano. A mobilidade de pessoas e veículos em ambientes urbanos e a busca por um produto de interesse em uma prateleira de supermercado são exemplos de atividades em que o entendimento dos elementos textuais presentes no ambiente são essenciais para a execução da tarefa. Recentemente, diversos avanços na área de visão computacional têm sido reportados na literatura, com o desenvolvimento de algoritmos e métodos que objetivam reconhecer objetos e textos em cenas. Entretanto, a detecção e reconhecimento de textos são problemas considerados em aberto devido a diversos fatores que atuam como fontes de variabilidades durante a geração e captura de textos em cenas, o que podem impactar as taxas de detecção e reconhecimento de maneira significativa. Exemplo destes fatores incluem diferentes formas dos elementos textuais (e.g., circular ou em linha curva), estilos e tamanhos da fonte, textura, cor, variação de brilho e contraste, entre outros. Além disso, os recentes métodos considerados estado-da-arte, baseados em aprendizagem profunda, demandam altos custos de processamento computacional, o que dificulta a utilização de tais métodos em cenários de computação restritiva. Esta dissertação apresenta um estudo comparativo de técnicas de detecção e reconhecimento de texto, considerando tanto os métodos baseados em aprendizado profundo quanto os métodos que utilizam algoritmos clássicos de aprendizado de máquina. Esta dissertação também apresenta um método de fusão de caixas delimitadoras, baseado em programação genética (GP), desenvolvido para atuar tanto como uma etapa de pós-processamento, posterior a etapa de detecção, quanto para explorar a complementariedade dos algoritmos de detecção de texto investigados nesta dissertação. De acordo com o estudo comparativo apresentado neste trabalho, os métodos baseados em aprendizagem profunda são mais eficazes e menos eficientes, em comparação com os métodos clássicos da literatura e considerando as métricas adotadas. Além disso, o algoritmo de fusão proposto foi capaz de aprender informações complementares entre os métodos investigados nesta dissertação, o que resultou em uma melhora das taxas de precisão e revocação. Os experimentos foram conduzidos considerando os problemas de detecção de textos horizontais, verticais e de orientação arbitráriaAbstract: Texts are fundamental elements for effective communication in our daily lives. The mobility of people and vehicles in urban environments and the search for a product of interest on a supermarket shelf are examples of activities in which the understanding of the textual elements present in the environment is essential to succeed in such tasks. Recently, several advances in computer vision have been reported in the literature, with the development of algorithms and methods that aim to recognize objects and texts in scenes. However, text detection and recognition are still open problems due to several factors that act as sources of variability during scene text generation and capture, which can significantly impact detection and recognition rates of current algorithms. Examples of these factors include different shapes of textual elements (e.g., circular or curved), font styles and sizes, texture, color, brightness and contrast variation, among others. Besides, recent state-of-the-art methods based on deep learning demand high computational processing costs, which difficult their use in restricted computing scenarios. This dissertation presents a comparative study of text detection and recognition techniques, considering methods based on deep learning and methods that use classical machine learning algorithms. This dissertation also presents an algorithm for fusing bounding boxes, based on genetic programming (GP), developed to act as a post-processing step for a single text detector and to explore the complementarity of text detection algorithms investigated in this dissertation. According to the comparative study presented in this work, the methods based on deep learning are more effective and less efficient, in comparison to classic methods for text detection investigated in this work, considering the adopted metrics. Furthermore, the proposed GP-based fusion algorithm was able to learn complementary information from the methods investigated in this dissertation, which resulted in an improvement of precision and recall rates. The experiments were conducted considering text detection problems involving horizontal, vertical and arbitrary orientationsMestradoCiência da ComputaçãoMestre em Ciência da ComputaçãoCAPE

    Efficient and effective objective image quality assessment metrics

    Get PDF
    Acquisition, transmission, and storage of images and videos have been largely increased in recent years. At the same time, there has been an increasing demand for high quality images and videos to provide satisfactory quality-of-experience for viewers. In this respect, high dynamic range (HDR) imaging with higher than 8-bit depth has been an interesting approach in order to capture more realistic images and videos. Objective image and video quality assessment plays a significant role in monitoring and enhancing the image and video quality in several applications such as image acquisition, image compression, multimedia streaming, image restoration, image enhancement and displaying. The main contributions of this work are to propose efficient features and similarity maps that can be used to design perceptually consistent image quality assessment tools. In this thesis, perceptually consistent full-reference image quality assessment (FR-IQA) metrics are proposed to assess the quality of natural, synthetic, photo-retouched and tone-mapped images. In addition, efficient no-reference image quality metrics are proposed to assess JPEG compressed and contrast distorted images. Finally, we propose a perceptually consistent color to gray conversion method, perform a subjective rating and evaluate existing color to gray assessment metrics. Existing FR-IQA metrics may have the following limitations. First, their performance is not consistent for different distortions and datasets. Second, better performing metrics usually have high complexity. We propose in this thesis an efficient and reliable full-reference image quality evaluator based on new gradient and color similarities. We derive a general deviation pooling formulation and use it to compute a final quality score from the similarity maps. Extensive experimental results verify high accuracy and consistent performance of the proposed metric on natural, synthetic and photo retouched datasets as well as its low complexity. In order to visualize HDR images on standard low dynamic range (LDR) displays, tone-mapping operators are used in order to convert HDR into LDR. Given different depth bits of HDR and LDR, traditional FR-IQA metrics are not able to assess the quality of tone-mapped images. The existing full-reference metric for tone-mapped images called TMQI converts both HDR and LDR to an intermediate color space and measure their similarity in the spatial domain. We propose in this thesis a feature similarity full-reference metric in which local phase of HDR is compared with the local phase of LDR. Phase is an important information of images and previous studies have shown that human visual system responds strongly to points in an image where the phase information is ordered. Experimental results on two available datasets show the very promising performance of the proposed metric. No-reference image quality assessment (NR-IQA) metrics are of high interest because in the most present and emerging practical real-world applications, the reference signals are not available. In this thesis, we propose two perceptually consistent distortion-specific NR-IQA metrics for JPEG compressed and contrast distorted images. Based on edge statistics of JPEG compressed images, an efficient NR-IQA metric for blockiness artifact is proposed which is robust to block size and misalignment. Then, we consider the quality assessment of contrast distorted images which is a common distortion. Higher orders of Minkowski distance and power transformation are used to train a low complexity model that is able to assess contrast distortion with high accuracy. For the first time, the proposed model is used to classify the type of contrast distortions which is very useful additional information for image contrast enhancement. Unlike its traditional use in the assessment of distortions, objective IQA can be used in other applications. Examples are the quality assessment of image fusion, color to gray image conversion, inpainting, background subtraction, etc. In the last part of this thesis, a real-time and perceptually consistent color to gray image conversion methodology is proposed. The proposed correlation-based method and state-of-the-art methods are compared by subjective and objective evaluation. Then, a conclusion is made on the choice of the objective quality assessment metric for the color to gray image conversion. The conducted subjective ratings can be used in the development process of quality assessment metrics for the color to gray image conversion and to test their performance