997 research outputs found
Organising a photograph collection based on human appearance
This thesis describes a complete framework for organising digital photographs in an unsupervised manner, based on the appearance of people captured in the photographs. Organising a collection of photographs manually, especially providing the identities of people captured in photographs, is a time consuming task. Unsupervised grouping of images containing similar persons makes annotating names easier (as a group of images can be named at once) and enables quick search based on query by example.
The full process of unsupervised clustering is discussed in this thesis. Methods for locating facial components are discussed and a technique based on colour
image segmentation is proposed and tested. Additionally a method based on the Principal Component Analysis template is tested, too. These provide eye locations required for acquiring a normalised facial image. This image is then preprocessed by a histogram equalisation and feathering, and the features of MPEG-7 face recognition descriptor are extracted. A distance measure proposed in the MPEG-7 standard is used as a similarity measure.
Three approaches to grouping that use only face recognition features for clustering are analysed. These are modified k-means, single-link and a method based on a nearest neighbour classifier. The nearest neighbour-based technique is chosen for further experiments with fusing information from several sources. These sources are context-based such as events (party, trip, holidays), the ownership of photographs, and content-based such as information about the colour and texture of the bodies of humans appearing in photographs. Two techniques are proposed for fusing event and ownership (user) information with the face recognition features: a Transferable Belief Model (TBM) and three level clustering. The three level clustering is carried out at “event” level, “user” level and “collection” level. The latter technique proves to be most efficient.
For combining body information with the face recognition features, three probabilistic fusion methods are tested. These are the average sum, the generalised product and the maximum rule. Combinations are tested within events and within user collections. This work concludes with a brief discussion on extraction of key images for a representation of each cluster
Aprendizado profundo em triagem de melanoma
Orientadores: Eduardo Alves do Valle Junior, Lin Tzy LiDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: De todos os cânceres de pele, melanoma representa apenas 1% dos casos, mas 75% das mortes. O prognóstico do melanoma é bom quando detectado cedo, mas deteriora rápido ao longo que a doença progride. Ferramentas automatizadas podem prover triagem mais rápida, ajudando médicos a focar em pacientes ou lesões de risco. As características da doença --- raridade, letalidade, rápida progressão, e diagnóstico sutil --- fazem a triagem de melanoma automática particularmente desafiadora. O objetivo deste trabalho é melhor compreender como Deep Learning pode ser utilizado --- mais precisamente, Redes Neurais Convolucionais --- para classificar corretamente imagens de lesões de pele. Para isso, este trabalho está dividido em duas linhas de pesquisa. Primeiro, o estudo está focado na transferibilidade de características das redes CNN pré-treinadas. O objetivo principal desse tópico é estudar como as características transferidas se comportam em diferentes esquemas, com o objetivo de gerar melhores características para a camada de decisão. Em um segundo tópico, esse estudo incidirá na melhoria das métricas de classificação, que é o objetivo geral. Sobre a transferibilidade das características, foram realizados experimentos para analisar a forma como os diferentes esquemas de transferência afetariam a Área sob a Curva ROC (AUC): treinar uma CNN a partir do zero; transferir o conhecimento de uma CNN pré-treinada com imagens gerais ou específicas; realizar uma transferência dupla, que é uma sequência de treinamento onde em um primeiro momento a rede é treinada com imagens gerais, em um segundo momento com as imagens específicas, e, finalmente, em um terceiro momento com as imagens de melanoma. A partir desses experimentos, aprendemos que a transferência de aprendizagem é uma boa prática, assim como é o ajuste fino. Os resultados também sugerem que modelos mais profundos conduzem a melhores resultados. Hipotetizamos que a transferência de aprendizagem de uma tarefa relacionada sob ponto de vista médico (no caso, a partir de um dataset de imagens de retinopatia) levaria a melhores resultados, especialmente no esquema de transferência dupla, mas os resultados mostraram o oposto, sugerindo que a adaptação de tarefas muito específicas representa desafios específicos. Sobre a melhoria das métricas, discute-se o pipeline vencedor utilizado no International Skin Imaging Collaboration (ISIC) Challenge 2017, alcançando o estado da arte na classificação de melanoma com 87.4% AUC. A solução é baseada em stacking/meta learning dos modelos Inception v4 e Resnet101, realizando fine tuning enquanto executa a aumentação de dados nos conjuntos de treino e teste. Também comparamos diferentes técnicas de segmentação --- multiplicação elemento a elemento da imagem da lesão de pele e sua máscara de segmentação, e utilizar a máscara de segmentação como quarto canal --- com uma rede treinada sem segmentação. A rede sem segmentação é a que obteve melhor desemepnho (96.0% AUC) contra a máscara de segmentação como quarto canal (94.5% AUC). Nós também disponibilizamos uma implementação de referência reprodutível com todo o código desenvolvido para as contribuições desta dissertaçãoAbstract: From all skin cancers, melanoma represents just 1% of cases, but 75% of deaths. Melanoma¿s prognosis is good when detected early, but deteriorates fast as the disease progresses. Automated tools may play an essential role in providing timely screening, helping doctors focus on patients or lesions at risk. However, due to the disease¿s characteristics --- rarity, lethality, fast progression, and diagnosis subtlety --- automated screening for melanoma is particularly challenging. The objective of this work is to understand better how can we use Deep Learning --- more precisely, Convolutional Neural Networks --- to correctly classify images of skin lesions. This work is divided into two lines of investigation to achieve the objective. First, the study is focused on the transferability of features from pretrained CNN networks. The primary objective of that thread is to study how the transferred features behave in different schemas, aiming at generating better features to the classifier layer. Second, this study will also improve the classification metrics, which is the overall objective of this line of research. On the transferability of features, we performed experiments to analyze how different transfer schemas would impact the overall Area Under the ROC Curve (AUC) training a CNN from scratch; transferring from pretrained CNN on general and specific image databases; performing a double transfer, in a sequence from general to specific and finally melanoma databases. From those experiments, we learned that transfer learning is a good practice, as is fine tuning. The results also suggest that deeper models lead to better results. We expected that transfer learning from a related task (in the case, from a retinopathy image database) would lead to better outcomes, but results showed the opposite, suggesting that adaptation from particular tasks poses specific challenges. On the improvement of metrics, we discussed the winner pipeline used in the International Skin Imaging Collaboration (ISIC) Challenge 2017, reaching state-of-the-art results on melanoma classification with 87.4% AUC. The solution is based on the stacking/meta-learning from Inception v4 and Resnet101 models, fine tuning them while performing data augmentation on the train and test sets. Also, we compare different segmentation techniques - elementwise multiplication of the skin lesion image and its mask, and input the segmentation mask as a fourth channel - with a network trained without segmentation. The network with no segmentation is the one who performs better (96.0% AUC) against segmentation mask as a fourth channel (94.5% AUC). We made available a reproducible reference implementation with all developed source code for the contributions of this thesisMestradoEngenharia de ComputaçãoMestre em Engenharia Elétrica133530/2016-7CNP
Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases
We present a simple but effective method to measure and mitigate model biases
caused by reliance on spurious cues. Instead of requiring costly changes to
one's data or model training, our method better utilizes the data one already
has by sorting them. Specifically, we rank images within their classes based on
spuriosity (the degree to which common spurious cues are present), proxied via
deep neural features of an interpretable network. With spuriosity rankings, it
is easy to identify minority subpopulations (i.e. low spuriosity images) and
assess model bias as the gap in accuracy between high and low spuriosity
images. One can even efficiently remove a model's bias at little cost to
accuracy by finetuning its classification head on low spuriosity images,
resulting in fairer treatment of samples regardless of spuriosity. We
demonstrate our method on ImageNet, annotating class-feature
dependencies ( of which we find to be spurious) and generating a dataset
of soft segmentations for these features along the way. Having computed
spuriosity rankings via the identified spurious neural features, we assess
biases for diverse models and find that class-wise biases are highly
correlated across models. Our results suggest that model bias due to spurious
feature reliance is influenced far more by what the model is trained on than
how it is trained.Comment: Accepted to NeurIPS '23 (Spotlight). Camera ready versio
Using contour information and segmentation for object registration, modeling and retrieval
This thesis considers different aspects of the utilization of contour information and syntactic and semantic image segmentation for object registration, modeling and retrieval in the context of content-based indexing and retrieval in large collections of images. Target applications include retrieval in collections of closed silhouettes, holistic w ord recognition in handwritten historical manuscripts and shape registration. Also, the thesis explores the feasibility of contour-based syntactic features for improving the correspondence of the output of bottom-up segmentation to semantic objects present in the scene and discusses the feasibility of different strategies for image analysis utilizing contour information, e.g. segmentation driven by visual features versus segmentation driven by shape models or semi-automatic in selected application scenarios.
There are three contributions in this thesis. The first contribution considers structure analysis based on the shape and spatial configuration of image regions (socalled syntactic visual features) and their utilization for automatic image segmentation. The second contribution is the study of novel shape features, matching algorithms and similarity measures. Various applications of the proposed solutions are presented throughout the thesis providing the basis for the third contribution which is a discussion of the feasibility of different recognition strategies utilizing contour information. In each case, the performance and generality of the proposed approach has been analyzed based on extensive rigorous experimentation using as large as possible test collections
Facial Expression Analysis under Partial Occlusion: A Survey
Automatic machine-based Facial Expression Analysis (FEA) has made substantial
progress in the past few decades driven by its importance for applications in
psychology, security, health, entertainment and human computer interaction. The
vast majority of completed FEA studies are based on non-occluded faces
collected in a controlled laboratory environment. Automatic expression
recognition tolerant to partial occlusion remains less understood, particularly
in real-world scenarios. In recent years, efforts investigating techniques to
handle partial occlusion for FEA have seen an increase. The context is right
for a comprehensive perspective of these developments and the state of the art
from this perspective. This survey provides such a comprehensive review of
recent advances in dataset creation, algorithm development, and investigations
of the effects of occlusion critical for robust performance in FEA systems. It
outlines existing challenges in overcoming partial occlusion and discusses
possible opportunities in advancing the technology. To the best of our
knowledge, it is the first FEA survey dedicated to occlusion and aimed at
promoting better informed and benchmarked future work.Comment: Authors pre-print of the article accepted for publication in ACM
Computing Surveys (accepted on 02-Nov-2017
AttMOT: Improving Multiple-Object Tracking by Introducing Auxiliary Pedestrian Attributes
Multi-object tracking (MOT) is a fundamental problem in computer vision with
numerous applications, such as intelligent surveillance and automated driving.
Despite the significant progress made in MOT, pedestrian attributes, such as
gender, hairstyle, body shape, and clothing features, which contain rich and
high-level information, have been less explored. To address this gap, we
propose a simple, effective, and generic method to predict pedestrian
attributes to support general Re-ID embedding. We first introduce AttMOT, a
large, highly enriched synthetic dataset for pedestrian tracking, containing
over 80k frames and 6 million pedestrian IDs with different time, weather
conditions, and scenarios. To the best of our knowledge, AttMOT is the first
MOT dataset with semantic attributes. Subsequently, we explore different
approaches to fuse Re-ID embedding and pedestrian attributes, including
attention mechanisms, which we hope will stimulate the development of
attribute-assisted MOT. The proposed method AAM demonstrates its effectiveness
and generality on several representative pedestrian multi-object tracking
benchmarks, including MOT17 and MOT20, through experiments on the AttMOT
dataset. When applied to state-of-the-art trackers, AAM achieves consistent
improvements in MOTA, HOTA, AssA, IDs, and IDF1 scores. For instance, on MOT17,
the proposed method yields a +1.1 MOTA, +1.7 HOTA, and +1.8 IDF1 improvement
when used with FairMOT. To encourage further research on attribute-assisted
MOT, we will release the AttMOT dataset
- …