3,112 research outputs found

    Reconhecimento automático de moedas medievais usando visão por computador

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaThe use of computer vision for identification and recognition of coins is well studied and of renowned interest. However the focus of research has consistently been on modern coins and the used algorithms present quite disappointing results when applied to ancient coins. This discrepancy is explained by the nature of ancient coins that are manually minted, having plenty variances, failures, ripples and centuries of degradation which further deform the characteristic patterns, making their identification a hard task even for humans. Another noteworthy factor in almost all similar studies is the controlled environments and uniform illumination of all images of the datasets. Though it makes sense to focus on the more problematic variables, this is an impossible premise to find outside the researchers’ laboratory, therefore a problematic that must be approached. This dissertation focuses on medieval and ancient coin recognition in uncontrolled “real world” images, thus trying to pave way to the use of vast repositories of coin images all over the internet that could be used to make our algorithms more robust. The first part of the dissertation proposes a fast and automatic method to segment ancient coins over complex backgrounds using a Histogram Backprojection approach combined with edge detection methods. Results are compared against an automation of GrabCut algorithm. The proposed method achieves a Good or Acceptable rate on 76% of the images, taking an average of 0.29s per image, against 49% in 19.58s for GrabCut. Although this work is oriented to ancient coin segmentation, the method can also be used in other contexts presenting thin objects with uniform colors. In the second part, several state of the art machine learning algorithms are compared in the search for the most promising approach to classify these challenging coins. The best results are achieved using dense SIFT descriptors organized into Bags of Visual Words, and using Support Vector Machine or Naïve Bayes as machine learning strategies.O uso de visão por computador para identificação e reconhecimento de moedas é bastante estudado e de reconhecido interesse. No entanto o foco da investigação tem sido sistematicamente sobre as moedas modernas e os algoritmos usados apresentam resultados bastante desapontantes quando aplicados a moedas antigas. Esta discrepância é justificada pela natureza das moedas antigas que, sendo cunhadas à mão, apresentam bastantes variações, falhas e séculos de degradação que deformam os padrões característicos, tornando a sua identificação dificil mesmo para o ser humano. Adicionalmente, a quase totalidade dos estudos usa ambientes controlados e iluminação uniformizada entre todas as imagens dos datasets. Embora faça sentido focar-se nas variáveis mais problemáticas, esta é uma premissa impossível de encontrar fora do laboratório do investigador e portanto uma problemática que tem que ser estudada. Esta dissertação foca-se no reconhecimento de moedas medievais e clássicas em imagens não controladas, tentando assim abrir caminho ao uso de vastos repositórios de imagens de moedas disponíveis na internet, que poderiam ser usados para tornar os nossos algoritmos mais robustos. Na primeira parte é proposto um método rápido e automático para segmentar moedas antigas sobre fundos complexos, numa abordagem que envolve Histogram Backprojection combinado com deteção de arestas. Os resultados são comparados com uma automação do algoritmo GrabCut. O método proposto obtém uma classificação de Bom ou Aceitável em 76% das imagens, demorando uma média de 0.29s por imagem, contra 49% em 19,58s do GrabCut. Não obstante o foco em segmentação de moedas antigas, este método pode ser usado noutros contextos que incluam objetos planos de cor uniforme. Na segunda parte, o estado da arte de Machine Learning é testado e comparado em busca da abordagem mais promissora para classificar estas moedas. Os melhores resultados são alcançados usando descritores dense SIFT, organizados em Bags of Visual Words e usando Support Vector Machine ou Naive Bayes como estratégias de machine learning

    Framework to Create Cloud-Free Remote Sensing Data Using Passenger Aircraft as the Platform

    Get PDF
    Cloud removal in optical remote sensing imagery is essential for many Earth observation applications.Due to the inherent imaging geometry features in satellite remote sensing, it is impossible to observe the ground under the clouds directly; therefore, cloud removal algorithms are always not perfect owing to the loss of ground truth. Passenger aircraft have the advantages of short visitation frequency and low cost. Additionally, because passenger aircraft fly at lower altitudes compared to satellites, they can observe the ground under the clouds at an oblique viewing angle. In this study, we examine the possibility of creating cloud-free remote sensing data by stacking multi-angle images captured by passenger aircraft. To accomplish this, a processing framework is proposed, which includes four main steps: 1) multi-angle image acquisition from passenger aircraft, 2) cloud detection based on deep learning semantic segmentation models, 3) cloud removal by image stacking, and 4) image quality enhancement via haze removal. This method is intended to remove cloud contamination without the requirements of reference images and pre-determination of cloud types. The proposed method was tested in multiple case studies, wherein the resultant cloud- and haze-free orthophotos were visualized and quantitatively analyzed in various land cover type scenes. The results of the case studies demonstrated that the proposed method could generate high quality, cloud-free orthophotos. Therefore, we conclude that this framework has great potential for creating cloud-free remote sensing images when the cloud removal of satellite imagery is difficult or inaccurate

    VIDEO FOREGROUND LOCALIZATION FROM TRADITIONAL METHODS TO DEEP LEARNING

    Get PDF
    These days, detection of Visual Attention Regions (VAR), such as moving objects has become an integral part of many Computer Vision applications, viz. pattern recognition, object detection and classification, video surveillance, autonomous driving, human-machine interaction (HMI), and so forth. The moving object identification using bounding boxes has matured to the level of localizing the objects along their rigid borders and the process is called foreground localization (FGL). Over the decades, many image segmentation methodologies have been well studied, devised, and extended to suit the video FGL. Despite that, still, the problem of video foreground (FG) segmentation remains an intriguing task yet appealing due to its ill-posed nature and myriad of applications. Maintaining spatial and temporal coherence, particularly at object boundaries, persists challenging, and computationally burdensome. It even gets harder when the background possesses dynamic nature, like swaying tree branches or shimmering water body, and illumination variations, shadows cast by the moving objects, or when the video sequences have jittery frames caused by vibrating or unstable camera mounts on a surveillance post or moving robot. At the same time, in the analysis of traffic flow or human activity, the performance of an intelligent system substantially depends on its robustness of localizing the VAR, i.e., the FG. To this end, the natural question arises as what is the best way to deal with these challenges? Thus, the goal of this thesis is to investigate plausible real-time performant implementations from traditional approaches to modern-day deep learning (DL) models for FGL that can be applicable to many video content-aware applications (VCAA). It focuses mainly on improving existing methodologies through harnessing multimodal spatial and temporal cues for a delineated FGL. The first part of the dissertation is dedicated for enhancing conventional sample-based and Gaussian mixture model (GMM)-based video FGL using probability mass function (PMF), temporal median filtering, and fusing CIEDE2000 color similarity, color distortion, and illumination measures, and picking an appropriate adaptive threshold to extract the FG pixels. The subjective and objective evaluations are done to show the improvements over a number of similar conventional methods. The second part of the thesis focuses on exploiting and improving deep convolutional neural networks (DCNN) for the problem as mentioned earlier. Consequently, three models akin to encoder-decoder (EnDec) network are implemented with various innovative strategies to improve the quality of the FG segmentation. The strategies are not limited to double encoding - slow decoding feature learning, multi-view receptive field feature fusion, and incorporating spatiotemporal cues through long-shortterm memory (LSTM) units both in the subsampling and upsampling subnetworks. Experimental studies are carried out thoroughly on all conditions from baselines to challenging video sequences to prove the effectiveness of the proposed DCNNs. The analysis demonstrates that the architectural efficiency over other methods while quantitative and qualitative experiments show the competitive performance of the proposed models compared to the state-of-the-art

    Interpretable Hyperspectral AI: When Non-Convex Modeling meets Hyperspectral Remote Sensing

    Full text link
    Hyperspectral imaging, also known as image spectrometry, is a landmark technique in geoscience and remote sensing (RS). In the past decade, enormous efforts have been made to process and analyze these hyperspectral (HS) products mainly by means of seasoned experts. However, with the ever-growing volume of data, the bulk of costs in manpower and material resources poses new challenges on reducing the burden of manual labor and improving efficiency. For this reason, it is, therefore, urgent to develop more intelligent and automatic approaches for various HS RS applications. Machine learning (ML) tools with convex optimization have successfully undertaken the tasks of numerous artificial intelligence (AI)-related applications. However, their ability in handling complex practical problems remains limited, particularly for HS data, due to the effects of various spectral variabilities in the process of HS imaging and the complexity and redundancy of higher dimensional HS signals. Compared to the convex models, non-convex modeling, which is capable of characterizing more complex real scenes and providing the model interpretability technically and theoretically, has been proven to be a feasible solution to reduce the gap between challenging HS vision tasks and currently advanced intelligent data processing models
    corecore