16 research outputs found

    SIDU:Similarity Difference And Uniqueness Method for Explainable AI

    Get PDF
    A new brand of technical artificial intelligence ( Explainable AI ) research has focused on trying to open up the 'black box' and provide some explainability. This paper presents a novel visual explanation method for deep learning networks in the form of a saliency map that can effectively localize entire object regions. In contrast to the current state-of-the art methods, the proposed method shows quite promising visual explanations that can gain greater trust of human expert. Both quantitative and qualitative evaluations are carried out on both general and clinical data sets to confirm the effectiveness of the proposed method.Comment: Accepted manuscript in IEEE International Conference on Image Processin

    Introducing and assessing the explainable AI (XAI)method: SIDU

    Get PDF
    Explainable Artificial Intelligence (XAI) has in recent years become a well-suited framework to generate human understandable explanations of black box models. In this paper, we present a novel XAI visual explanation algorithm denoted SIDU that can effectively localize entire object regions responsible for prediction in a full extend. We analyze its robustness and effectiveness through various computational and human subject experiments. In particular, we assess the SIDU algorithm using three different types of evaluations (Application, Human and Functionally-Grounded) to demonstrate its superior performance. The robustness of SIDU is further studied in presence of adversarial attack on black box models to better understand its performance.Comment: Preprint-submitted to Journal of Pattern Recognition (Elsevier

    Spatio-temporal saliency detection in dynamic scenes using color and texture features

    No full text
    De nombreuses applications de la vision par ordinateur requièrent la détection, la localisation et le suivi de régions ou d’objets d’intérêt dans une image ou une séquence d’images. De nombreux modèles d’attention visuelle, inspirés de la vision humaine, qui détectent de manière automatique les régions d’intérêt dans une image ou une vidéo, ont récemment été développés et utilisés avec succès dans différentes applications. Néanmoins, la plupart des approches existantes sont limitées à l’analyse de scènes statiques et très peu de méthodes exploitent la nature temporelle des séquences d’images.L'objectif principal de ce travail de thèse est donc l'étude de modèles d'attention visuelle pour l'analyse de scènes dynamiques complexes. Une carte de saliance est habituellement obtenue par la fusion d'une carte statitque (saliance spatiale dans une image) d'une part, et d'une carte dynamique (salience temporelle entre une série d'image) d'autre part. Dans notre travail, nous modélisons les changements dynamiques par un opérateur de texture LBP-TOP (Local Binary Patterns) et nous utilisons l'information couleur pour l'aspect spatial.Les deux cartes de saliances sont calculées en utilisant une formulation discriminante inspirée du système visuel humain, et fuionnées de manière appropriée en une carte de saliance spatio-temporelle.De nombreuses expériences avec des bases de données publiques, montrent que notre approche obteint des résulats meilleurs ou comparables avec les approches de la littérature.Visual saliency is an important research topic in the field of computer vision due to its numerouspossible applications. It helps to focus on regions of interest instead of processingthe whole image or video data. Detecting visual saliency in still images has been widelyaddressed in literature with several formulations. However, visual saliency detection invideos has attracted little attention, and is a more challenging task due to additional temporalinformation. Indeed, a video contains strong spatio-temporal correlation betweenthe regions of consecutive frames, and, furthermore, motion of foreground objects dramaticallychanges the importance of the objects in a scene. The main objective of thethesis is to develop a spatio-temporal saliency method that works well for complex dynamicscenes.A spatio-temporal saliency map is usually obtained by the fusion of a static saliency mapand a dynamic saliency map. In our work, we model the dynamic textures in a dynamicscene with Local Binary Patterns (LBP-TOP) to compute the dynamic saliency map, andwe use color features to compute the static saliency map. Both saliency maps are computedusing a bio-inspired mechanism of Human Visual System (HVS) with a discriminantformulation known as center surround saliency, and are fused in a proper way.The proposed models have been extensively evaluated with diverse publicly availabledatasets which contain several videos of dynamic scenes. The evaluation is performed intwo parts. First, the method in locating interesting foreground objects in complex scene.Secondly, we evaluate our model on the task of predicting human observers fixations.The proposed method is also compared against state-of-the art methods, and the resultsshow that the proposed approach achieves competitive results.In this thesis we also evaluate the performance of different fusion techniques, because fusionplays a critical role in the accuracy of the spatio-temporal saliency map. We evaluatethe performances of different fusion techniques on a large and diverse complex datasetand the results show that a fusion method must be selected depending on the characteristics,in terms of color and motion contrasts, of a sequence. Overall, fusion techniqueswhich take the best of each saliency map (static and dynamic) in the final spatio-temporalmap achieve best results

    Modèles d'attention visuelle pour l'analyse de scènes dynamiques

    No full text
    Visual saliency is an important research topic in the field of computer vision due to its numerouspossible applications. It helps to focus on regions of interest instead of processingthe whole image or video data. Detecting visual saliency in still images has been widelyaddressed in literature with several formulations. However, visual saliency detection invideos has attracted little attention, and is a more challenging task due to additional temporalinformation. Indeed, a video contains strong spatio-temporal correlation betweenthe regions of consecutive frames, and, furthermore, motion of foreground objects dramaticallychanges the importance of the objects in a scene. The main objective of thethesis is to develop a spatio-temporal saliency method that works well for complex dynamicscenes.A spatio-temporal saliency map is usually obtained by the fusion of a static saliency mapand a dynamic saliency map. In our work, we model the dynamic textures in a dynamicscene with Local Binary Patterns (LBP-TOP) to compute the dynamic saliency map, andwe use color features to compute the static saliency map. Both saliency maps are computedusing a bio-inspired mechanism of Human Visual System (HVS) with a discriminantformulation known as center surround saliency, and are fused in a proper way.The proposed models have been extensively evaluated with diverse publicly availabledatasets which contain several videos of dynamic scenes. The evaluation is performed intwo parts. First, the method in locating interesting foreground objects in complex scene.Secondly, we evaluate our model on the task of predicting human observers fixations.The proposed method is also compared against state-of-the art methods, and the resultsshow that the proposed approach achieves competitive results.In this thesis we also evaluate the performance of different fusion techniques, because fusionplays a critical role in the accuracy of the spatio-temporal saliency map. We evaluatethe performances of different fusion techniques on a large and diverse complex datasetand the results show that a fusion method must be selected depending on the characteristics,in terms of color and motion contrasts, of a sequence. Overall, fusion techniqueswhich take the best of each saliency map (static and dynamic) in the final spatio-temporalmap achieve best results.De nombreuses applications de la vision par ordinateur requièrent la détection, la localisation et le suivi de régions ou d’objets d’intérêt dans une image ou une séquence d’images. De nombreux modèles d’attention visuelle, inspirés de la vision humaine, qui détectent de manière automatique les régions d’intérêt dans une image ou une vidéo, ont récemment été développés et utilisés avec succès dans différentes applications. Néanmoins, la plupart des approches existantes sont limitées à l’analyse de scènes statiques et très peu de méthodes exploitent la nature temporelle des séquences d’images.L'objectif principal de ce travail de thèse est donc l'étude de modèles d'attention visuelle pour l'analyse de scènes dynamiques complexes. Une carte de saliance est habituellement obtenue par la fusion d'une carte statitque (saliance spatiale dans une image) d'une part, et d'une carte dynamique (salience temporelle entre une série d'image) d'autre part. Dans notre travail, nous modélisons les changements dynamiques par un opérateur de texture LBP-TOP (Local Binary Patterns) et nous utilisons l'information couleur pour l'aspect spatial.Les deux cartes de saliances sont calculées en utilisant une formulation discriminante inspirée du système visuel humain, et fuionnées de manière appropriée en une carte de saliance spatio-temporelle.De nombreuses expériences avec des bases de données publiques, montrent que notre approche obteint des résulats meilleurs ou comparables avec les approches de la littérature

    Face Alignment using Boosted Appeareance Model (Discriminative Appearance Model)

    No full text
    This thesis explores decriminative face alignment using Boosted Appearance Model (BAM). In this method face alignment is done by maximizing the score of the trained two classifier which learns both correct and incorrect alignment and is able to distinguish correct and incorrect alignment so that the correct alignment gets maximum positve score. During the training stage we trained Point Distribution Model (PDM) which acts as shape model and a boosting based classifier based on Haar like Rectangular Features which we call as Boosted Appearance Model (BAM). This algorithm iteratively updates the shape parameters of the PDM by the well known optimization method known as gradient ascent, such that the classification score is maximized. When we test our algorithm on the images the initial parameters will likely have a negative score and these parameters are updated so that the final parameters will have the maximum positive score this will indicates alignment is finalized. The main applications are tracking, Medical Image Interpretation, Industrial Inspection etc.C/o Sandeep Karanamsetty Västanågatan 26, Linköping-58235 phone no: +4670493686

    Face Alignment using Boosted Appeareance Model (Discriminative Appearance Model)

    No full text
    This thesis explores decriminative face alignment using Boosted Appearance Model (BAM). In this method face alignment is done by maximizing the score of the trained two classifier which learns both correct and incorrect alignment and is able to distinguish correct and incorrect alignment so that the correct alignment gets maximum positve score. During the training stage we trained Point Distribution Model (PDM) which acts as shape model and a boosting based classifier based on Haar like Rectangular Features which we call as Boosted Appearance Model (BAM). This algorithm iteratively updates the shape parameters of the PDM by the well known optimization method known as gradient ascent, such that the classification score is maximized. When we test our algorithm on the images the initial parameters will likely have a negative score and these parameters are updated so that the final parameters will have the maximum positive score this will indicates alignment is finalized. The main applications are tracking, Medical Image Interpretation, Industrial Inspection etc.C/o Sandeep Karanamsetty Västanågatan 26, Linköping-58235 phone no: +4670493686
    corecore