10,912 research outputs found

    Evaluation and Mitigation of Agnosia in Multimodal Large Language Models

    Full text link
    While Multimodal Large Language Models (MLLMs) are widely used for a variety of vision-language tasks, one observation is that they sometimes misinterpret visual inputs or fail to follow textual instructions even in straightforward cases, leading to irrelevant responses, mistakes, and ungrounded claims. This observation is analogous to a phenomenon in neuropsychology known as Agnosia, an inability to correctly process sensory modalities and recognize things (e.g., objects, colors, relations). In our study, we adapt this similar concept to define "agnosia in MLLMs", and our goal is to comprehensively evaluate and mitigate such agnosia in MLLMs. Inspired by the diagnosis and treatment process in neuropsychology, we propose a novel framework EMMA (Evaluation and Mitigation of Multimodal Agnosia). In EMMA, we develop an evaluation module that automatically creates fine-grained and diverse visual question answering examples to assess the extent of agnosia in MLLMs comprehensively. We also develop a mitigation module to reduce agnosia in MLLMs through multimodal instruction tuning on fine-grained conversations. To verify the effectiveness of our framework, we evaluate and analyze agnosia in seven state-of-the-art MLLMs using 9K test samples. The results reveal that most of them exhibit agnosia across various aspects and degrees. We further develop a fine-grained instruction set and tune MLLMs to mitigate agnosia, which led to notable improvement in accuracy

    Self-correction of 3D reconstruction from multi-view stereo images

    Get PDF
    We present a self-correction approach to improving the 3D reconstruction of a multi-view 3D photogrammetry system. The self-correction approach has been able to repair the reconstructed 3D surface damaged by depth discontinuities. Due to self-occlusion, multi-view range images have to be acquired and integrated into a watertight nonredundant mesh model in order to cover the extended surface of an imaged object. The integrated surface often suffers from “dent” artifacts produced by depth discontinuities in the multi-view range images. In this paper we propose a novel approach to correcting the 3D integrated surface such that the dent artifacts can be repaired automatically. We show examples of 3D reconstruction to demonstrate the improvement that can be achieved by the self-correction approach. This self-correction approach can be extended to integrate range images obtained from alternative range capture devices

    Adaptive Methods for Color Vision Impaired Users

    Get PDF
    Color plays a key role in the understanding of the information in computer environments. It happens that about 5% of the world population is affected by color vision deficiency (CVD), also called color blindness. This visual impairment hampers the color perception, ending up by limiting the overall perception that CVD people have about the surrounding environment, no matter it is real or virtual. In fact, a CVD individual may not distinguish between two different colors, what often originates confusion or a biased understanding of the reality, including web environments, whose web pages are plenty of media elements like text, still images, video, sprites, and so on. Aware of the difficulties that color-blind people may face in interpreting colored contents, a significant number of recoloring algorithms have been proposed in the literature with the purpose of improving the visual perception of those people somehow. However, most of those algorithms lack a systematic study of subjective assessment, what undermines their validity, not to say usefulness. Thus, in the sequel of the research work behind this Ph.D. thesis, the central question that needs to be answered is whether recoloring algorithms are of any usefulness and help for colorblind people or not. With this in mind, we conceived a few preliminary recoloring algorithms that were published in conference proceedings elsewhere. Except the algorithm detailed in Chapter 3, these conference algorithms are not described in this thesis, though they have been important to engender those presented here. The first algorithm (Chapter 3) was designed and implemented for people with dichromacy to improve their color perception. The idea is to project the reddish hues onto other hues that are perceived more regularly by dichromat people. The second algorithm (Chapter 4) is also intended for people with dichromacy to improve their perception of color, but its applicability covers the adaptation of text and image, in HTML5- compliant web environments. This enhancement of color contrast of text and imaging in web pages is done while keeping the naturalness of color as much as possible. Also, to the best of our knowledge, this is the first web recoloring approach targeted to dichromat people that takes into consideration both text and image recoloring in an integrated manner. The third algorithm (Chapter 5) primarily focuses on the enhancement of some of the object contours in still images, instead of recoloring the pixels of the regions bounded by such contours. Enhancing contours is particularly suited to increase contrast in images, where we find adjacent regions that are color indistinguishable from dichromat’s point of view. To our best knowledge, this is one of the first algorithms that take advantage of image analysis and processing techniques for region contours. After accurate subjective assessment studies for color-blind people, we concluded that the CVD adaptation methods are useful in general. Nevertheless, each method is not efficient enough to adapt all sorts of images, that is, the adequacy of each method depends on the type of image (photo-images, graphical representations, etc.). Furthermore, we noted that the experience-based perceptual learning of colorblind people throughout their lives determines their visual perception. That is, color adaptation algorithms must satisfy requirements such as color naturalness and consistency, to ensure that dichromat people improve their visual perception without artifacts. On the other hand, CVD adaptation algorithms should be object-oriented, instead of pixel-oriented (as typically done), to select judiciously pixels that should be adapted. This perspective opens an opportunity window for future research in color accessibility in the field of in human-computer interaction (HCI).A cor desempenha um papel fundamental na compreensão da informação em ambientes computacionais. Porém, cerca de 5% da população mundial é afetada pela deficiência de visão de cor (ou Color Vision Deficiency (CVD), do Inglês), correntemente designada por daltonismo. Esta insuficiência visual dificulta a perceção das cores, o que limita a perceção geral que os indivíduos têm sobre o meio, seja real ou virtual. Efetivamente, um indivíduo com CVD vê como iguais cores que são diferentes, o que origina confusão ou uma compreensão distorcida da realidade, assim como dos ambientes web, onde existe uma abundância de conteúdos média coloridos, como texto, imagens fixas e vídeo, entre outros. Com o intuito de mitigar as dificuldades que as pessoas com CVD enfrentam na interpretação de conteúdos coloridos, tem sido proposto na literatura um número significativo de algoritmos de recoloração, que têm como o objetivo melhorar, de alguma forma, a perceção visual de pessoas com CVD. Porém, a maioria desses trabalhos carece de um estudo sistemático de avaliação subjetiva, o que põe em causa a sua validação, se não mesmo a sua utilidade. Assim, a principal questão à qual se pretende responder, como resultado do trabalho de investigação subjacente a esta tese de doutoramento, é se os algoritmos de recoloração têm ou não uma real utilidade, constituindo assim uma ajuda efetiva às pessoas com daltonismo. Tendo em mente esta questão, concebemos alguns algoritmos de recoloração preliminares que foram publicados em atas de conferências. Com exceção do algoritmo descrito no Capítulo 3, esses algoritmos não são descritos nesta tese, não obstante a sua importância na conceção daqueles descritos nesta dissertação. O primeiro algoritmo (Capítulo 3) foi projetado e implementado para pessoas com dicromacia, a fim de melhorar a sua perceção da cor. A ideia consiste em projetar as cores de matiz avermelhada em matizes que são melhor percebidos pelas pessoas com os tipos de daltonismo em causa. O segundo algoritmo (Capítulo 4) também se destina a melhorar a perceção da cor por parte de pessoas com dicromacia, porém a sua aplicabilidade abrange a adaptação de texto e imagem, em ambientes web compatíveis com HTML5. Isto é conseguido através do realce do contraste de cores em blocos de texto e em imagens, em páginas da web, mantendo a naturalidade da cor tanto quanto possível. Além disso, tanto quanto sabemos, esta é a primeira abordagem de recoloração em ambiente web para pessoas com dicromacia, que trata o texto e a imagem de forma integrada. O terceiro algoritmo (Capítulo 5) centra-se principalmente na melhoria de alguns dos contornos de objetos em imagens, em vez de aplicar a recoloração aos pixels das regiões delimitadas por esses contornos. Esta abordagem é particularmente adequada para aumentar o contraste em imagens, quando existem regiões adjacentes que são de cor indistinguível sob a perspetiva dos observadores com dicromacia. Também neste caso, e tanto quanto é do nosso conhecimento, este é um dos primeiros algoritmos em que se recorre a técnicas de análise e processamento de contornos de regiões. Após rigorosos estudos de avaliação subjetiva com pessoas com daltonismo, concluiu-se que os métodos de adaptação CVD são úteis em geral. No entanto, cada método não é suficientemente eficiente para todos os tipo de imagens, isto é, o desempenho de cada método depende do tipo de imagem (fotografias, representações gráficas, etc.). Além disso, notámos que a aprendizagem perceptual baseada na experiência das pessoas daltónicas ao longo de suas vidas é determinante para perceber aquilo que vêem. Isto significa que os algoritmos de adaptação de cor devem satisfazer requisitos tais como a naturalidade e a consistência da cor, de modo a não pôr em causa aquilo que os destinatários consideram razoável ver no mundo real. Por outro lado, a abordagem seguida na adaptação CVD deve ser orientada aos objetos, em vez de ser orientada aos pixéis (como tem sido feito até ao momento), de forma a possibilitar uma seleção mais criteriosa dos pixéis que deverão ser sujeitos ao processo de adaptação. Esta perspectiva abre uma janela de oportunidade para futura investigação em acessibilidade da cor no domínio da interacção humano-computador (HCI)

    DAugNet: Unsupervised, Multi-source, Multi-target, and Life-long Domain Adaptation for Semantic Segmentation of Satellite Images

    Full text link
    The domain adaptation of satellite images has recently gained an increasing attention to overcome the limited generalization abilities of machine learning models when segmenting large-scale satellite images. Most of the existing approaches seek for adapting the model from one domain to another. However, such single-source and single-target setting prevents the methods from being scalable solutions, since nowadays multiple source and target domains having different data distributions are usually available. Besides, the continuous proliferation of satellite images necessitates the classifiers to adapt to continuously increasing data. We propose a novel approach, coined DAugNet, for unsupervised, multi-source, multi-target, and life-long domain adaptation of satellite images. It consists of a classifier and a data augmentor. The data augmentor, which is a shallow network, is able to perform style transfer between multiple satellite images in an unsupervised manner, even when new data are added over the time. In each training iteration, it provides the classifier with diversified data, which makes the classifier robust to large data distribution difference between the domains. Our extensive experiments prove that DAugNet significantly better generalizes to new geographic locations than the existing approaches
    corecore