233 research outputs found

    A Multiple-Expert Binarization Framework for Multispectral Images

    Full text link
    In this work, a multiple-expert binarization framework for multispectral images is proposed. The framework is based on a constrained subspace selection limited to the spectral bands combined with state-of-the-art gray-level binarization methods. The framework uses a binarization wrapper to enhance the performance of the gray-level binarization. Nonlinear preprocessing of the individual spectral bands is used to enhance the textual information. An evolutionary optimizer is considered to obtain the optimal and some suboptimal 3-band subspaces from which an ensemble of experts is then formed. The framework is applied to a ground truth multispectral dataset with promising results. In addition, a generalization to the cross-validation approach is developed that not only evaluates generalizability of the framework, it also provides a practical instance of the selected experts that could be then applied to unseen inputs despite the small size of the given ground truth dataset.Comment: 12 pages, 8 figures, 6 tables. Presented at ICDAR'1

    Digits Recognition on Medical Device

    Get PDF
    With the rapid development of mobile health, mechanisms for automatic data input are becoming increasingly important for mobile health apps. In these apps, users are often required to input data frequently, especially numbers, from medical devices such as glucometers and blood pressure meters. However, these simple tasks are tedious and prone to error. Even though some Bluetooth devices can make those input operations easier, they are not popular enough due to being expensive and requiring complicated protocol support. Therefore, we propose an automatic procedure to recognize the digits on the screen of medical devices with smartphone cameras. The whole procedure includes several “standard” components in computer vision: image enhancement, the region-of-interest detection, and text recognition. Previous works existed for each component, but they have various weaknesses that lead to a low recognition rate. We proposed several novel enhancements in each component. Experiment results suggest that our enhanced procedure outperforms the procedure of applying optical character recognition directly from 6.2% to 62.1%. This procedure can be adopted (with human verification) to recognize the digits on the screen of medical devices with smartphone cameras

    Visual image processing in various representation spaces for documentary preservation

    Get PDF
    This thesis establishes an advanced image processing framework for the enhancement and restoration of historical document images (HDI) in both intensity (gray-scale or color) and multispectral (MS) representation spaces. It provides three major contributions: 1) the binarization of gray-scale HDI; 2) the visual quality restoration of MS HDI; and 3) automatic reference data (RD) estimation for HDI binarization. HDI binarization is one of the enhancement techniques that produces bi-level information which is easy to handle using methods of analysis (OCR, for instance) and is less computationally costly to process than 256 levels of grey or color images. Restoring the visual quality of HDI in an MS representation space enhances their legibility, which is not possible with conventional intensity-based restoration methods, and HDI legibility is the main concern of historians and librarians wishing to transfer knowledge and revive ancient cultural heritage. The use of MS imaging systems is a new and attractive research trend in the field of numerical processing of cultural heritage documents. In this thesis, these systems are also used for automatically estimating more accurate RD to be used for the evaluation of HDI binarization algorithms in order to track the level of human performance. Our first contribution, which is a new adaptive method of intensity-based binarization, is defined at the outset. Since degradation is present over document images, binarization methods must be adapted to handle degradation phenomena locally. Unfortunately, these methods are not effective, as they are not able to capture weak text strokes, which results in a deterioration of the performance of character recognition engines. The proposed approach first detects a subset of the most probable text pixels, which are used to locally estimate the parameters of the two classes of pixels (text and background), and then performs a simple maximum likelihood (ML) to locally classify the remaining pixels based on their class membership. To the best of our knowledge, this is the first time local parameter estimation and classification in an ML framework has been introduced for HDI binarization with promising results. A limitation of this method in the case with as the intensity-based methods of enhancement is that they are not effective in dealing with severely degraded HDI. Developing more advanced methods based on MS information would be a promising alternative avenue of research. In the second contribution, a novel approach to the visual restoration of HDI is defined. The approach is aimed at providing end users (historians, librarians, etc..) with better HDI visualization, specifically; it aims to restore them from degradations, while keeping the original appearance of the HDI intact. Practically, this problem cannot be solved by conventional intensity-based restoration methods. To cope with these limitations, MS imaging is used to produce additional spectral images in the invisible light (infrared and ultraviolet) range, which gives greater contrast to objects in the documents. The inpainting-based variational framework proposed here for HDI restoration involves isolating the degradation phenomena in the infrared spectral images, and then inpainting them in the visible spectral images. The final color image to visualize is therefore reconstructed from the restored visible spectral images. To the best of our knowledge, this is the first time the inpainting technique has been introduced for MS HDI. The experimental results are promising, and our objective, in collaboration with the BAnQ (Bibliothèque et Archives nationales de Québec), is to push heritage documents into the public domain and build an intelligent engine for accessing them. It is useful to note that the proposed model can be extended to other MS-based image processing tasks. Our third contribution is presented, which is to consider a new problem of RD (reference data) estimation, in order to show the importance of working with MS images rather than gray-scale or color images. RDs are mandatory for comparing different binarization algorithms, and they are usually generated by an expert. However, an expert’s RD is always subject to mislabeling and judgment errors, especially in the case of degraded data in restricted representation spaces (gray-scale or color images). In the proposed method, multiple RD generated by several experts are used in combination with MS HDI to estimate new, more accurate RD. The idea is to include the agreement of experts about labels and the multivariate data fidelity in a single Bayesian classification framework to estimate the a posteriori probability of new labels forming the final estimated RD. Our experiments show that estimated RD are more accurate than an expert’s RD. To the best of our knowledge, no similar work to combine binary data and multivariate data for the estimation of RD has been conducted

    Advances in Image Processing, Analysis and Recognition Technology

    Get PDF
    For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches

    SCALING ARTIFICIAL INTELLIGENCE IN ENDOSCOPY: FROM MODEL DEVELOPMENT TO MACHINE LEARNING OPERATIONS FRAMEWORKS

    Get PDF
    Questa tesi esplora l'integrazione dell'intelligenza artificiale (IA) in Otorinolaringoiatria – Chirurgia di Testa e Collo, concentrandosi sui progressi della computer vision per l’endoscopia e le procedure chirurgiche. La ricerca inizia con una revisione completa dello stato dell’arte dell'IA e della computer vision in questo campo, identificando aree per ulteriori sviluppi. L'obiettivo principale è stato quello di sviluppare un sistema di computer vision per l'analisi di immagini e video endoscopici. La ricerca ha coinvolto la progettazione di strumenti per la rilevazione e segmentazione di neoplasie nelle vie aerodigestive superiori (VADS) e la valutazione della motilità delle corde vocali, cruciale nella stadiazione del carcinoma laringeo. Inoltre, lo studio si è focalizzato sul potenziale dei foundation vision models, vision transformers basati su self-supervised learning, per ridurre la necessità di annotazione da parte di esperti, approccio particolarmente vantaggioso in campi con dati limitati. Inoltre, la ricerca ha incluso lo sviluppo di un'applicazione web per migliorare e velocizzare il processo di annotazione in endoscopia delle VADS, nell’ambito generale delle tecniche di MLOps. La tesi copre varie fasi della ricerca, a partire dalla definizione del quadro concettuale e della metodologia, denominata "Videomics". Include una revisione della letteratura sull'IA in endoscopia clinica, focalizzata sulla Narrow Band Imaging (NBI) e sulle reti neurali convoluzionali (CNN). Lo studio progredisce attraverso diverse fasi, dalla valutazione della qualità delle immagini endoscopiche alla caratterizzazione approfondita delle lesioni neoplastiche. Si affronta anche la necessità di standard nel reporting degli studi di computer vision in ambito medico e si valuta l'applicazione dell'IA in setting dinamici come la motilità delle corde vocali. Una parte significativa della ricerca indaga l'uso di algoritmi di computer vision generalizzati (“foundation models”) e la “commoditization” degli algoritmi di machine learning, utilizzando polipi nasali e il carcinoma orofaringeo come casi studio. Infine, la tesi discute lo sviluppo di ENDO-CLOUD, un sistema basato su cloud per l’analisi della videolaringoscopia, evidenziando le sfide e le soluzioni nella gestione dei dati e l’utilizzo su larga scala di modelli di IA nell'imaging medico.This thesis explores the integration of artificial intelligence (AI) in Otolaryngology – Head and Neck Surgery, focusing on advancements in computer vision for endoscopy and surgical procedures. It begins with a comprehensive review of AI and computer vision advancements in this field, identifying areas for further exploration. The primary aim was to develop a computer vision system for endoscopy analysis. The research involved designing tools for detecting and segmenting neoplasms in the upper aerodigestive tract (UADT) and assessing vocal fold motility, crucial in laryngeal cancer staging. Further, the study delves into the potential of vision foundation models, like vision transformers trained via self-supervision, to reduce the need for expert annotations, particularly beneficial in fields with limited cases. Additionally, the research includes the development of a web application for enhancing and speeding up the annotation process in UADT endoscopy, under the umbrella of Machine Learning Operations (MLOps). The thesis covers various phases of research, starting with defining the conceptual framework and methodology, termed "Videomics". It includes a literature review on AI in clinical endoscopy, focusing on Narrow Band Imaging (NBI) and convolutional neural networks (CNNs). The research progresses through different stages, from quality assessment of endoscopic images to in-depth characterization of neoplastic lesions. It also addresses the need for standards in medical computer vision study reporting and evaluates the application of AI in dynamic vision scenarios like vocal fold motility. A significant part of the research investigates the use of "general purpose" vision algorithms and the commoditization of machine learning algorithms, using nasal polyps and oropharyngeal cancer as case studies. Finally, the thesis discusses the development of ENDO-CLOUD, a cloud-based system for videolaryngoscopy, highlighting the challenges and solutions in data management and the large-scale deployment of AI models in medical imaging

    Land use/land cover mapping (1:25000) of Taiwan, Republic of China by automated multispectral interpretation of LANDSAT imagery

    Get PDF
    Three methods were tested for collection of the training sets needed to establish the spectral signatures of the land uses/land covers sought due to the difficulties of retrospective collection of representative ground control data. Computer preprocessing techniques applied to the digital images to improve the final classification results were geometric corrections, spectral band or image ratioing and statistical cleaning of the representative training sets. A minimal level of statistical verification was made based upon the comparisons between the airphoto estimates and the classification results. The verifications provided a further support to the selection of MSS band 5 and 7. It also indicated that the maximum likelihood ratioing technique can achieve more agreeable classification results with the airphoto estimates than the stepwise discriminant analysis

    Automated High-resolution Earth Observation Image Interpretation: Outcome of the 2020 Gaofen Challenge

    Get PDF
    In this article, we introduce the 2020 Gaofen Challenge and relevant scientific outcomes. The 2020 Gaofen Challenge is an international competition, which is organized by the China High-Resolution Earth Observation Conference Committee and the Aerospace Information Research Institute, Chinese Academy of Sciences and technically cosponsored by the IEEE Geoscience and Remote Sensing Society and the International Society for Photogrammetry and Remote Sensing. It aims at promoting the academic development of automated high-resolution earth observation image interpretation. Six independent tracks have been organized in this challenge, which cover the challenging problems in the field of object detection and semantic segmentation. With the development of convolutional neural networks, deep-learning-based methods have achieved good performance on image interpretation. In this article, we report the details and the best-performing methods presented so far in the scope of this challenge
    • …
    corecore