1,046 research outputs found

    Bayesian Joint Modelling for Object Localisation in Weakly Labelled Images

    Get PDF
    Abstract—We address the problem of localisation of objects as bounding boxes in images and videos with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. In this paper, a novel framework based on Bayesian joint topic modelling is proposed, which differs significantly from the existing ones in that: (1) All foreground object classes are modelled jointly in a single generative model that encodes multiple object co-existence so that “explaining away ” inference can resolve ambiguity and lead to better learning and localisation. (2) Image backgrounds are shared across classes to better learn varying surroundings and “push out ” objects of interest. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Moreover, the Bayesian formulation enables the exploitation of various types of prior knowledge to compensate for the limited supervision offered by weakly labelled data, as well as Bayesian domain adaptation for transfer learning. Extensive experiments on the PASCAL VOC, ImageNet and YouTube-Object videos datasets demonstrate the effectiveness of our Bayesian joint model for weakly supervised object localisation

    Automatic annotation for weakly supervised learning of detectors

    Get PDF
    PhDObject detection in images and action detection in videos are among the most widely studied computer vision problems, with applications in consumer photography, surveillance, and automatic media tagging. Typically, these standard detectors are fully supervised, that is they require a large body of training data where the locations of the objects/actions in images/videos have been manually annotated. With the emergence of digital media, and the rise of high-speed internet, raw images and video are available for little to no cost. However, the manual annotation of object and action locations remains tedious, slow, and expensive. As a result there has been a great interest in training detectors with weak supervision where only the presence or absence of object/action in image/video is needed, not the location. This thesis presents approaches for weakly supervised learning of object/action detectors with a focus on automatically annotating object and action locations in images/videos using only binary weak labels indicating the presence or absence of object/action in images/videos. First, a framework for weakly supervised learning of object detectors in images is presented. In the proposed approach, a variation of multiple instance learning (MIL) technique for automatically annotating object locations in weakly labelled data is presented which, unlike existing approaches, uses inter-class and intra-class cue fusion to obtain the initial annotation. The initial annotation is then used to start an iterative process in which standard object detectors are used to refine the location annotation. Finally, to ensure that the iterative training of detectors do not drift from the object of interest, a scheme for detecting model drift is also presented. Furthermore, unlike most other methods, our weakly supervised approach is evaluated on data without manual pose (object orientation) annotation. Second, an analysis of the initial annotation of objects, using inter-class and intra-class cues, is carried out. From the analysis, a new method based on negative mining (NegMine) is presented for the initial annotation of both object and action data. The NegMine based approach is a much simpler formulation using only inter-class measure and requires no complex combinatorial optimisation but can still meet or outperform existing approaches including the previously pre3 sented inter-intra class cue fusion approach. Furthermore, NegMine can be fused with existing approaches to boost their performance. Finally, the thesis will take a step back and look at the use of generic object detectors as prior knowledge in weakly supervised learning of object detectors. These generic object detectors are typically based on sampling saliency maps that indicate if a pixel belongs to the background or foreground. A new approach to generating saliency maps is presented that, unlike existing approaches, looks beyond the current image of interest and into images similar to the current image. We show that our generic object proposal method can be used by itself to annotate the weakly labelled object data with surprisingly high accuracy

    Weakly supervised segmentation of polyps on colonoscopy images

    Get PDF
    openIl cancro del colon-retto (CRC) Ăš una delle principali cause di morte a livello mondiale e continua a rappresentare una sfida critica per la salute pubblica, richiedendo una precisa e tempestiva diagnosi e un intervento mirato. La colonscopia, ovvero l'esame diagnostico volto a esplorare le pareti interne del colon per scoprire eventuali masse tumorali, ha dimostrato essere un metodo efficace per ridurre l'incidenza di mortalitĂ . Le tecniche emergenti, come l'analisi avanzata delle immagini tramite reti neurali, sono promettenti per una diagnosi accurata. Tuttavia, alcuni studi hanno riportato che, per varie ragioni, una certa percentuale di polipi non viene rilevata correttamente durante la colonscopia. Una delle piĂč importanti Ăš la dipendenza dalle annotazioni a livello di pixel, che richiede molte risorse computazionali; per questo si rendono necessarie soluzioni innovative. Questa tesi introduce alcune strategie per migliorare l'identificazione dei polipi. A tal fine, le tecniche principali utilizzate coinvolgono i cosiddetti metodi di Explainable AI per l'analisi delle mappe di salienza e di attivazione, attraverso diversi algoritmi di rilevamento della salienza visiva e la Gradient-weighted Class Activation Mapping (Grad-CAM). Inoltre, viene utilizzata una rete neurale per la segmentazione con architettura DeepLabV3+, in cui vengono fornite le bounding box sulle immagini di addestramento, in un contesto debolmente supervisionato.Colorectal cancer (CRC) is one of the leading causes of death worldwide and continues to pose a critical public health challenge, demanding precise early detection and intervention. Colonoscopy, the diagnostic examination aimed at exploring the inner walls of the colon to discover any tumour masses, is an effective method to decrease mortality incidence. Emerging techniques, such as advanced image analysis driven by neural networks, hold promise for accurate diagnosis. However, studies have reported that, for various reasons, a certain percentage of polyps are not correctly detected during colonoscopy. One of the most important is the dependency on pixel-level annotations, which requires a lot of computational resources, making necessary innovative solutions. This thesis introduces strategies for improving polyp identification. For this purpose, the main techniques involve the so-called Explainable AI tools for analyzing saliency maps and activation maps, through several state-of-the-art visual saliency detection algorithms and Gradient-weighted Class Activation Mapping (Grad-CAM). In addition, a neural network for segmentation with DeepLabV3+ architecture is used, in which bounding boxes are provided on the training images, within a weakly supervised framework

    Visual object category discovery in images and videos

    Get PDF
    textThe current trend in visual recognition research is to place a strict division between the supervised and unsupervised learning paradigms, which is problematic for two main reasons. On the one hand, supervised methods require training data for each and every category that the system learns; training data may not always be available and is expensive to obtain. On the other hand, unsupervised methods must determine the optimal visual cues and distance metrics that distinguish one category from another to group images into semantically meaningful categories; however, for unlabeled data, these are unknown a priori. I propose a visual category discovery framework that transcends the two paradigms and learns accurate models with few labeled exemplars. The main insight is to automatically focus on the prevalent objects in images and videos, and learn models from them for category grouping, segmentation, and summarization. To implement this idea, I first present a context-aware category discovery framework that discovers novel categories by leveraging context from previously learned categories. I devise a novel object-graph descriptor to model the interaction between a set of known categories and the unknown to-be-discovered categories, and group regions that have similar appearance and similar object-graphs. I then present a collective segmentation framework that simultaneously discovers the segmentations and groupings of objects by leveraging the shared patterns in the unlabeled image collection. It discovers an ensemble of representative instances for each unknown category, and builds top-down models from them to refine the segmentation of the remaining instances. Finally, building on these techniques, I show how to produce compact visual summaries for first-person egocentric videos that focus on the important people and objects. The system leverages novel egocentric and high-level saliency features to predict important regions in the video, and produces a concise visual summary that is driven by those regions. I compare against existing state-of-the-art methods for category discovery and segmentation on several challenging benchmark datasets. I demonstrate that we can discover visual concepts more accurately by focusing on the prevalent objects in images and videos, and show clear advantages of departing from the status quo division between the supervised and unsupervised learning paradigms. The main impact of my thesis is that it lays the groundwork for building large-scale visual discovery systems that can automatically discover visual concepts with minimal human supervision.Electrical and Computer Engineerin

    Visual attention and swarm cognition for off-road robots

    Get PDF
    Tese de doutoramento, InformĂĄtica (Engenharia InformĂĄtica), Universidade de Lisboa, Faculdade de CiĂȘncias, 2011Esta tese aborda o problema da modelação de atenção visual no contexto de robĂŽs autĂłnomos todo-o-terreno. O objectivo de utilizar mecanismos de atenção visual Ă© o de focar a percepção nos aspectos do ambiente mais relevantes Ă  tarefa do robĂŽ. Esta tese mostra que, na detecção de obstĂĄculos e de trilhos, esta capacidade promove robustez e parcimĂłnia computacional. Estas sĂŁo caracterĂ­sticas chave para a rapidez e eficiĂȘncia dos robĂŽs todo-o-terreno. Um dos maiores desafios na modelação de atenção visual advĂ©m da necessidade de gerir o compromisso velocidade-precisĂŁo na presença de variaçÔes de contexto ou de tarefa. Esta tese mostra que este compromisso Ă© resolvido se o processo de atenção visual for modelado como um processo auto-organizado, cuja operação Ă© modulada pelo mĂłdulo de selecção de acção, responsĂĄvel pelo controlo do robĂŽ. Ao fechar a malha entre o processo de selecção de acção e o de percepção, o Ășltimo Ă© capaz de operar apenas onde Ă© necessĂĄrio, antecipando as acçÔes do robĂŽ. Para fornecer atenção visual com propriedades auto-organizadas, este trabalho obtĂ©m inspiração da Natureza. Concretamente, os mecanismos responsĂĄveis pela capacidade que as formigas guerreiras tĂȘm de procurar alimento de forma auto-organizada, sĂŁo usados como metĂĄfora na resolução da tarefa de procurar, tambĂ©m de forma auto-organizada, obstĂĄculos e trilhos no campo visual do robĂŽ. A solução proposta nesta tese Ă© a de colocar vĂĄrios focos de atenção encoberta a operar como um enxame, atravĂ©s de interacçÔes baseadas em feromona. Este trabalho representa a primeira realização corporizada de cognição de enxame. Este Ă© um novo campo de investigação que procura descobrir os princĂ­pios bĂĄsicos da cognição, inspeccionando as propriedades auto-organizadas da inteligĂȘncia colectiva exibida pelos insectos sociais. Logo, esta tese contribui para a robĂłtica como disciplina de engenharia e para a robĂłtica como disciplina de modelação, capaz de suportar o estudo do comportamento adaptĂĄvel.Esta tese aborda o problema da modelação de atenção visual no contexto de robĂŽs autĂłnomos todo-o-terreno. O objectivo de utilizar mecanismos de atenção visual Ă© o de focar a percepção nos aspectos do ambiente mais relevantes Ă  tarefa do robĂŽ. Esta tese mostra que, na detecção de obstĂĄculos e de trilhos, esta capacidade promove robustez e parcimĂłnia computacional. Estas sĂŁo caracterĂ­sticas chave para a rapidez e eficiĂȘncia dos robĂŽs todo-o-terreno. Um dos maiores desafios na modelação de atenção visual advĂ©m da necessidade de gerir o compromisso velocidade-precisĂŁo na presença de variaçÔes de contexto ou de tarefa. Esta tese mostra que este compromisso Ă© resolvido se o processo de atenção visual for modelado como um processo auto-organizado, cuja operação Ă© modulada pelo mĂłdulo de selecção de acção, responsĂĄvel pelo controlo do robĂŽ. Ao fechar a malha entre o processo de selecção de acção e o de percepção, o Ășltimo Ă© capaz de operar apenas onde Ă© necessĂĄrio, antecipando as acçÔes do robĂŽ. Para fornecer atenção visual com propriedades auto-organizadas, este trabalho obtĂ©m inspi- ração da Natureza. Concretamente, os mecanismos responsĂĄveis pela capacidade que as formi- gas guerreiras tĂȘm de procurar alimento de forma auto-organizada, sĂŁo usados como metĂĄfora na resolução da tarefa de procurar, tambĂ©m de forma auto-organizada, obstĂĄculos e trilhos no campo visual do robĂŽ. A solução proposta nesta tese Ă© a de colocar vĂĄrios focos de atenção encoberta a operar como um enxame, atravĂ©s de interacçÔes baseadas em feromona. Este trabalho representa a primeira realização corporizada de cognição de enxame. Este Ă© um novo campo de investigação que procura descobrir os princĂ­pios bĂĄsicos da cognição, ins- peccionando as propriedades auto-organizadas da inteligĂȘncia colectiva exibida pelos insectos sociais. Logo, esta tese contribui para a robĂłtica como disciplina de engenharia e para a robĂłtica como disciplina de modelação, capaz de suportar o estudo do comportamento adaptĂĄvel.Fundação para a CiĂȘncia e a Tecnologia (FCT,SFRH/BD/27305/2006); Laboratory of Agent Modelling (LabMag
    • 

    corecore