146 research outputs found
Recommended from our members
Towards Universal Object Detection
Object detection is one of the most important and challenging research topics in computer vision. It is playing an important role in our everyday life and has many applications, e.g. surveillance, autonomous driving, robotics, drone, medical imaging, etc. The ultimate goal of object detection is a universal object detector that can work very well in any case under any condition like human vision system. However, there are multiple challenges on the universality of object detection, e.g. scale-variance, high-quality requirement, domain shift, computational constraint, etc. These will prevent the object detector from being widely used for various scales of objects, critical applications requiring extremely accurate localization, scenarios with changing domain priors, and diverse hardware settings. To address these challenges, multiple solutions have been proposed in this thesis. These include an efficient multi-scale architecture to achieve scale-invariant detection, a robust multi-stage framework effective for high-quality requirement, a cross-domain solution to extend the universality over various domains, and a design of complexity-aware cascades and a novel low-precision network to enhance the universality under different computational constraints. All these efforts have substantially improved the universality of object detection, and the advanced object detector can be applied to broader environments
Deep neural networks for image quality: a comparison study for identification photos
Many online platforms allow their users to upload images to their account profile.
The fact that a user is free to upload any image of their liking to a university
or a job platform, has resulted in occurrences of profile images that weren’t very
professional or adequate in any of those contexts. Another problem associated
with submitting a profile image is that even if there is some kind of control over
each submitted image, this control is performed manually by someone, and that
process alone can be very tedious and time-consuming, especially when there are
cases of a large influx of new users joining those platforms.
Based on international compliance standards used to validate photographs for
machine-readable travel documents, there are SDKs that already perform automatic classification of the quality of those photographs, however, the classification
is based on traditional computer vision algorithms.
With the growing popularity and powerful performance of deep neural networks,
it would be interesting to examine how would these perform in this task.
This dissertation proposes a deep neural network model to classify the quality of
profile images, and a comparison of this model against traditional computer vision
algorithms, with respect to the complexity of the implementation, the quality of
the classifications, and the computation time associated to the classification process. To the best of our knowledge, this dissertation is the first to study the use
of deep neural networks on image quality classification.Muitas plataformas online permitem que os seus utilizadores façam upload de imagens para o perfil das respetivas contas. O facto de um utilizador ser livre de
submeter qualquer imagem do seu agrado para uma plataforma de universidade
ou de emprego, pode resultar em ocorrências de casos onde as imagens de perfil
não são adequadas ou profissionais nesses contextos. Outro problema associado
à submissão de imagens para um perfil é que, mesmo que haja algum tipo de
controlo sobre cada imagem submetida, esse controlo é feito manualmente. Esse
processo, por si, só pode ser aborrecido e demorado, especialmente em situações
de grande afluxo de novos utilizadores a inscreverem-se nessas plataformas.
Com base em normas internacionais utilizadas para validar fotografias de documentos de viagem de leitura óptica, existem SDKs que já realizam a classificação automática da qualidade dessas fotografias. No entanto, essa classificação é
baseada em algoritmos tradicionais de visão computacional.
Com a crescente popularidade e o poderoso desempenho de redes neurais profundas, seria interessante examinar como é que estas se comportam nesta tarefa.
Assim, esta dissertação propõe um modelo de rede neural profunda para classificar a qualidade de imagens de perfis e faz uma comparação deste modelo com
algoritmos tradicionais de visão computacional, no que respeita à complexidade da
implementação, qualidade das classificações e ao tempo de computação associado
ao processo de classificação. Tanto quanto sabemos, esta dissertação é a primeira a
estudar o uso de redes neurais profundas na classificação da qualidade de imagem
A Survey on Deep Learning in Medical Image Analysis
Deep learning algorithms, in particular convolutional networks, have rapidly
become a methodology of choice for analyzing medical images. This paper reviews
the major deep learning concepts pertinent to medical image analysis and
summarizes over 300 contributions to the field, most of which appeared in the
last year. We survey the use of deep learning for image classification, object
detection, segmentation, registration, and other tasks and provide concise
overviews of studies per application area. Open challenges and directions for
future research are discussed.Comment: Revised survey includes expanded discussion section and reworked
introductory section on common deep architectures. Added missed papers from
before Feb 1st 201
Entropy in Image Analysis II
Image analysis is a fundamental task for any application where extracting information from images is required. The analysis requires highly sophisticated numerical and analytical methods, particularly for those applications in medicine, security, and other fields where the results of the processing consist of data of vital importance. This fact is evident from all the articles composing the Special Issue "Entropy in Image Analysis II", in which the authors used widely tested methods to verify their results. In the process of reading the present volume, the reader will appreciate the richness of their methods and applications, in particular for medical imaging and image security, and a remarkable cross-fertilization among the proposed research areas
CONCRETE CRACK EVALUATION FOR CIVIL INFRASTRUCTURE USING COMPUTER VISION AND DEEP LEARNING
Department of Urban and Environmental Engineering (Urban Infrastructure Engineering)Surface cracks of civil infrastructure are one of the important indicators for structural durability and integrity. Concrete cracks are typically investigated by manual visual observation on the surface, which is intrinsically subjective because it highly depends on the experience of inspectors. Furthermore, manual visual inspection is time-consuming, expensive, and often unsafe when inaccessible structural components need to be assessed. Computer vision-based approach is recognized as a promising alternative that can automatically extract crack information from images captured by the digital camera. As texts and cracks are similar in terms of consisting distinguishable lines and curves, image binarization developed for text detection can be appropriate for crack identification purposes. However, although image binarization is useful to separate cracks and backgrounds, the crack assessment is difficult to standardize owing to the high dependence of binarization parameters determined by users. Another critical challenge in digital image processing for crack detection is to automatically distinguish cracks from an image containing actual cracks and crack-like noise patterns (e.g., stains, holes, dark shadows, and lumps), which are often seen on the surface of concrete structures. In addition, a tailored camera system and the corresponding strategy are necessary to effectively address the practical issues in terms of the skewed angle and the process of the sequential crack images for efficient measurement. This research develops a computer vision-based approach in conjunction with deep learning for accurate crack evaluation of for civil infrastructure. The main contribution of the proposed approach can be summarized as follows: (1) a deep learning-based approach for crack detection, (2) a hybrid image processing for crack quantification, and (3) camera systems for the practical issues on civil infrastructure in terms of a skewed angle problem and an efficient measurement with the sequential crack images. The proposed research allows accurate crack evaluation to provide a proper maintenance strategy for civil infrastructure in practice.clos
Deep learning for food instance segmentation
Food object detection and instance segmentation are critical in many applications such as dietary management or food intake monitoring. Food image recognition poses different challenges, such as the existence of a large number of classes, a high inter-class similarity, and high intra-class variance. This, along with the traditional problems associated with object detection and instance segmentation make this a very complex computer vision task. Real-world food datasets generally suffer from long-tailed and fine-grained distributions. However, the recent literature fails to address food detection in this regard. In this research, we propose a novel two-stage object detector, which we call Strong LOng-tailed Food object Detection and instance Segmentation (SLOF-DS), to tackle the long-tailed nature of food images. In addition, a multi-task based framework, which exploits different sources of prior information, was proposed to improve the classification of fine-grained classes. Lastly, we also propose a new module based on Graph Neural Neworks, we call Graph Confidence Propagation (GCP) that additionally improves the performance of both object detection and instance segmentation modules by combining all the model outputs considering the global image context. Exhaustive quantitive and qualitative analysis performed on two open source food benchmarks, namely the UECFood-256 (object detection) and the AiCrowd Food Recognition Challenge 2022 dataset (instance segmentation) using different baseline algorithms prove the robust improvements introduced by the different components proposed in this thesis. More concretely, we outperformed the state-of-the-art performance on both public datasets
License Plate Recognition using Convolutional Neural Networks Trained on Synthetic Images
In this thesis, we propose a license plate recognition system and study the feasibility
of using synthetic training samples to train convolutional neural networks for a
practical application.
First we develop a modular framework for synthetic license plate generation; to
generate different license plate types (or other objects) only the first module needs
to be adapted. The other modules apply variations to the training samples such as
background, occlusions, camera perspective projection, object noise and camera
acquisition noise, with the aim to achieve enough variation of the object that the
trained networks will also recognize real objects of the same class.
Then we design two convolutional neural networks of low-complexity for license
plate detection and character recognition. Both are designed for simultaneous
classification and localization by branching the networks into a classification and a
regression branch and are trained end-to-end simultaneously over both branches, on
only our synthetic training samples.
To recognize real license plates, we design a pipeline for scale invariant license
plate detection with a scale pyramid and a fully convolutional application of the
license plate detection network in order to detect any number of license plates and
of any scale in an image. Before character classification is applied, potential plate
regions are un-skewed based on the detected plate location in order to achieve an as
optimal representation of the characters as possible. The character classification is
also performed with a fully convolutional sweep to simultaneously find all characters
at once.
Both the plate and the character stages apply a refinement classification where
initial classifications are first centered and rescaled. We show that this simple, yet
effective trick greatly improves the accuracy of our classifications, and at a small
increase of complexity. To our knowledge, this trick has not been exploited before.
To show the effectiveness of our system we first apply it on a dataset of photos
of Italian license plates to evaluate the different stages of our system and which
effect the classification thresholds have on the accuracy. We also find robust training
parameters and thresholds that are reliable for classification without any need for
calibration on a validation set of real annotated samples (which may not always be
available) and achieve a balanced precision and recall on the set of Italian license
plates, both in excess of 98%.
Finally, to show that our system generalizes to new plate types, we compare our
system to two reference system on a dataset of Taiwanese license plates. For this, we
only modify the first module of the synthetic plate generation algorithm to produce
Taiwanese license plates and adjust parameters regarding plate dimensions, then we
train our networks and apply the classification pipeline, using the robust parameters,
on the Taiwanese reference dataset. We achieve state-of-the-art performance on plate
detection (99.86% precision and 99.1% recall), single character detection (99.6%)
and full license reading (98.7%)
Artificial Intelligence Algorithms for Eye Banking
Eye banking plays a critical role in modern medicine by providing cornea tissues for transplantation to restore vision for millions of people worldwide. The evaluation of corneal endothelium is done by measuring the corneal endothelial cell density (ECD). Unfortunately, the current system to measure ECD is manual, time-consuming, and error prone. Furthermore, the impact of social behaviors and biological conditions on corneal endothelium and corneal transplant success is largely unexplored. To overcome these challenges, this dissertation aims to develop tools for corneal endothelial image and data analysis that enhance the efficiency and quality of the cornea transplants.
In the first study, an image processing algorithm is developed to analyze corneal endothelial images captured by a Konan CellChek specular microscope. The algorithm successfully identifies the region of interest, filters the image, and employs stochastic watershed segmentation to determine cell boundaries and evaluate endothelial cell density (ECD). The proposed algorithm achieves a high correlation with manual counts (R2 = 0.98) and has an average analysis time of 2.5 seconds.
In the second study, a deep learning-based cell segmentation algorithm called Mobile-CellNet is proposed to estimate ECD. This technique addresses the limitations of classical algorithms and creates a more robust and highly efficient algorithm. The approach achieves a mean absolute error of 4.06% for ECD on the test set, similar to U-Net but with significantly fewer floating-point operations and parameters.
The third study explores the correlation between alcohol abuse and corneal endothelial morphology in a donor pool of 5,624 individuals. Multivariable regression analysis shows that alcohol abuse is associated with a reduction in endothelial cell density, an increase in the coefficient of variation, and a decrease in percent hexagonality.
These studies highlight the potential of big data and artificial algorithms in accurately and efficiently analyzing corneal images and donor medical data to improve the efficiency of eye banking and patient outcomes. By automating the analysis of corneal images and exploring the impact of social behaviors and biological conditions on corneal endothelial morphology, we can enhance the quality and availability of cornea transplants and ultimately improve the lives of millions of people worldwide
- …