997 research outputs found

    Transferable Belief Model for hair mask segmentation

    Full text link

    Organising a photograph collection based on human appearance

    Get PDF
    This thesis describes a complete framework for organising digital photographs in an unsupervised manner, based on the appearance of people captured in the photographs. Organising a collection of photographs manually, especially providing the identities of people captured in photographs, is a time consuming task. Unsupervised grouping of images containing similar persons makes annotating names easier (as a group of images can be named at once) and enables quick search based on query by example. The full process of unsupervised clustering is discussed in this thesis. Methods for locating facial components are discussed and a technique based on colour image segmentation is proposed and tested. Additionally a method based on the Principal Component Analysis template is tested, too. These provide eye locations required for acquiring a normalised facial image. This image is then preprocessed by a histogram equalisation and feathering, and the features of MPEG-7 face recognition descriptor are extracted. A distance measure proposed in the MPEG-7 standard is used as a similarity measure. Three approaches to grouping that use only face recognition features for clustering are analysed. These are modified k-means, single-link and a method based on a nearest neighbour classifier. The nearest neighbour-based technique is chosen for further experiments with fusing information from several sources. These sources are context-based such as events (party, trip, holidays), the ownership of photographs, and content-based such as information about the colour and texture of the bodies of humans appearing in photographs. Two techniques are proposed for fusing event and ownership (user) information with the face recognition features: a Transferable Belief Model (TBM) and three level clustering. The three level clustering is carried out at “event” level, “user” level and “collection” level. The latter technique proves to be most efficient. For combining body information with the face recognition features, three probabilistic fusion methods are tested. These are the average sum, the generalised product and the maximum rule. Combinations are tested within events and within user collections. This work concludes with a brief discussion on extraction of key images for a representation of each cluster

    Aprendizado profundo em triagem de melanoma

    Get PDF
    Orientadores: Eduardo Alves do Valle Junior, Lin Tzy LiDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: De todos os cânceres de pele, melanoma representa apenas 1% dos casos, mas 75% das mortes. O prognóstico do melanoma é bom quando detectado cedo, mas deteriora rápido ao longo que a doença progride. Ferramentas automatizadas podem prover triagem mais rápida, ajudando médicos a focar em pacientes ou lesões de risco. As características da doença --- raridade, letalidade, rápida progressão, e diagnóstico sutil --- fazem a triagem de melanoma automática particularmente desafiadora. O objetivo deste trabalho é melhor compreender como Deep Learning pode ser utilizado --- mais precisamente, Redes Neurais Convolucionais --- para classificar corretamente imagens de lesões de pele. Para isso, este trabalho está dividido em duas linhas de pesquisa. Primeiro, o estudo está focado na transferibilidade de características das redes CNN pré-treinadas. O objetivo principal desse tópico é estudar como as características transferidas se comportam em diferentes esquemas, com o objetivo de gerar melhores características para a camada de decisão. Em um segundo tópico, esse estudo incidirá na melhoria das métricas de classificação, que é o objetivo geral. Sobre a transferibilidade das características, foram realizados experimentos para analisar a forma como os diferentes esquemas de transferência afetariam a Área sob a Curva ROC (AUC): treinar uma CNN a partir do zero; transferir o conhecimento de uma CNN pré-treinada com imagens gerais ou específicas; realizar uma transferência dupla, que é uma sequência de treinamento onde em um primeiro momento a rede é treinada com imagens gerais, em um segundo momento com as imagens específicas, e, finalmente, em um terceiro momento com as imagens de melanoma. A partir desses experimentos, aprendemos que a transferência de aprendizagem é uma boa prática, assim como é o ajuste fino. Os resultados também sugerem que modelos mais profundos conduzem a melhores resultados. Hipotetizamos que a transferência de aprendizagem de uma tarefa relacionada sob ponto de vista médico (no caso, a partir de um dataset de imagens de retinopatia) levaria a melhores resultados, especialmente no esquema de transferência dupla, mas os resultados mostraram o oposto, sugerindo que a adaptação de tarefas muito específicas representa desafios específicos. Sobre a melhoria das métricas, discute-se o pipeline vencedor utilizado no International Skin Imaging Collaboration (ISIC) Challenge 2017, alcançando o estado da arte na classificação de melanoma com 87.4% AUC. A solução é baseada em stacking/meta learning dos modelos Inception v4 e Resnet101, realizando fine tuning enquanto executa a aumentação de dados nos conjuntos de treino e teste. Também comparamos diferentes técnicas de segmentação --- multiplicação elemento a elemento da imagem da lesão de pele e sua máscara de segmentação, e utilizar a máscara de segmentação como quarto canal --- com uma rede treinada sem segmentação. A rede sem segmentação é a que obteve melhor desemepnho (96.0% AUC) contra a máscara de segmentação como quarto canal (94.5% AUC). Nós também disponibilizamos uma implementação de referência reprodutível com todo o código desenvolvido para as contribuições desta dissertaçãoAbstract: From all skin cancers, melanoma represents just 1% of cases, but 75% of deaths. Melanoma¿s prognosis is good when detected early, but deteriorates fast as the disease progresses. Automated tools may play an essential role in providing timely screening, helping doctors focus on patients or lesions at risk. However, due to the disease¿s characteristics --- rarity, lethality, fast progression, and diagnosis subtlety --- automated screening for melanoma is particularly challenging. The objective of this work is to understand better how can we use Deep Learning --- more precisely, Convolutional Neural Networks --- to correctly classify images of skin lesions. This work is divided into two lines of investigation to achieve the objective. First, the study is focused on the transferability of features from pretrained CNN networks. The primary objective of that thread is to study how the transferred features behave in different schemas, aiming at generating better features to the classifier layer. Second, this study will also improve the classification metrics, which is the overall objective of this line of research. On the transferability of features, we performed experiments to analyze how different transfer schemas would impact the overall Area Under the ROC Curve (AUC) training a CNN from scratch; transferring from pretrained CNN on general and specific image databases; performing a double transfer, in a sequence from general to specific and finally melanoma databases. From those experiments, we learned that transfer learning is a good practice, as is fine tuning. The results also suggest that deeper models lead to better results. We expected that transfer learning from a related task (in the case, from a retinopathy image database) would lead to better outcomes, but results showed the opposite, suggesting that adaptation from particular tasks poses specific challenges. On the improvement of metrics, we discussed the winner pipeline used in the International Skin Imaging Collaboration (ISIC) Challenge 2017, reaching state-of-the-art results on melanoma classification with 87.4% AUC. The solution is based on the stacking/meta-learning from Inception v4 and Resnet101 models, fine tuning them while performing data augmentation on the train and test sets. Also, we compare different segmentation techniques - elementwise multiplication of the skin lesion image and its mask, and input the segmentation mask as a fourth channel - with a network trained without segmentation. The network with no segmentation is the one who performs better (96.0% AUC) against segmentation mask as a fourth channel (94.5% AUC). We made available a reproducible reference implementation with all developed source code for the contributions of this thesisMestradoEngenharia de ComputaçãoMestre em Engenharia Elétrica133530/2016-7CNP

    Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases

    Full text link
    We present a simple but effective method to measure and mitigate model biases caused by reliance on spurious cues. Instead of requiring costly changes to one's data or model training, our method better utilizes the data one already has by sorting them. Specifically, we rank images within their classes based on spuriosity (the degree to which common spurious cues are present), proxied via deep neural features of an interpretable network. With spuriosity rankings, it is easy to identify minority subpopulations (i.e. low spuriosity images) and assess model bias as the gap in accuracy between high and low spuriosity images. One can even efficiently remove a model's bias at little cost to accuracy by finetuning its classification head on low spuriosity images, resulting in fairer treatment of samples regardless of spuriosity. We demonstrate our method on ImageNet, annotating 50005000 class-feature dependencies (630630 of which we find to be spurious) and generating a dataset of 325k325k soft segmentations for these features along the way. Having computed spuriosity rankings via the identified spurious neural features, we assess biases for 8989 diverse models and find that class-wise biases are highly correlated across models. Our results suggest that model bias due to spurious feature reliance is influenced far more by what the model is trained on than how it is trained.Comment: Accepted to NeurIPS '23 (Spotlight). Camera ready versio

    Using contour information and segmentation for object registration, modeling and retrieval

    Get PDF
    This thesis considers different aspects of the utilization of contour information and syntactic and semantic image segmentation for object registration, modeling and retrieval in the context of content-based indexing and retrieval in large collections of images. Target applications include retrieval in collections of closed silhouettes, holistic w ord recognition in handwritten historical manuscripts and shape registration. Also, the thesis explores the feasibility of contour-based syntactic features for improving the correspondence of the output of bottom-up segmentation to semantic objects present in the scene and discusses the feasibility of different strategies for image analysis utilizing contour information, e.g. segmentation driven by visual features versus segmentation driven by shape models or semi-automatic in selected application scenarios. There are three contributions in this thesis. The first contribution considers structure analysis based on the shape and spatial configuration of image regions (socalled syntactic visual features) and their utilization for automatic image segmentation. The second contribution is the study of novel shape features, matching algorithms and similarity measures. Various applications of the proposed solutions are presented throughout the thesis providing the basis for the third contribution which is a discussion of the feasibility of different recognition strategies utilizing contour information. In each case, the performance and generality of the proposed approach has been analyzed based on extensive rigorous experimentation using as large as possible test collections

    Facial Expression Analysis under Partial Occlusion: A Survey

    Full text link
    Automatic machine-based Facial Expression Analysis (FEA) has made substantial progress in the past few decades driven by its importance for applications in psychology, security, health, entertainment and human computer interaction. The vast majority of completed FEA studies are based on non-occluded faces collected in a controlled laboratory environment. Automatic expression recognition tolerant to partial occlusion remains less understood, particularly in real-world scenarios. In recent years, efforts investigating techniques to handle partial occlusion for FEA have seen an increase. The context is right for a comprehensive perspective of these developments and the state of the art from this perspective. This survey provides such a comprehensive review of recent advances in dataset creation, algorithm development, and investigations of the effects of occlusion critical for robust performance in FEA systems. It outlines existing challenges in overcoming partial occlusion and discusses possible opportunities in advancing the technology. To the best of our knowledge, it is the first FEA survey dedicated to occlusion and aimed at promoting better informed and benchmarked future work.Comment: Authors pre-print of the article accepted for publication in ACM Computing Surveys (accepted on 02-Nov-2017

    AttMOT: Improving Multiple-Object Tracking by Introducing Auxiliary Pedestrian Attributes

    Full text link
    Multi-object tracking (MOT) is a fundamental problem in computer vision with numerous applications, such as intelligent surveillance and automated driving. Despite the significant progress made in MOT, pedestrian attributes, such as gender, hairstyle, body shape, and clothing features, which contain rich and high-level information, have been less explored. To address this gap, we propose a simple, effective, and generic method to predict pedestrian attributes to support general Re-ID embedding. We first introduce AttMOT, a large, highly enriched synthetic dataset for pedestrian tracking, containing over 80k frames and 6 million pedestrian IDs with different time, weather conditions, and scenarios. To the best of our knowledge, AttMOT is the first MOT dataset with semantic attributes. Subsequently, we explore different approaches to fuse Re-ID embedding and pedestrian attributes, including attention mechanisms, which we hope will stimulate the development of attribute-assisted MOT. The proposed method AAM demonstrates its effectiveness and generality on several representative pedestrian multi-object tracking benchmarks, including MOT17 and MOT20, through experiments on the AttMOT dataset. When applied to state-of-the-art trackers, AAM achieves consistent improvements in MOTA, HOTA, AssA, IDs, and IDF1 scores. For instance, on MOT17, the proposed method yields a +1.1 MOTA, +1.7 HOTA, and +1.8 IDF1 improvement when used with FairMOT. To encourage further research on attribute-assisted MOT, we will release the AttMOT dataset
    corecore