61 research outputs found

    Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

    Full text link
    Images contain rich relational knowledge that can help machines understand the world. Existing methods on visual knowledge extraction often rely on the pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. To achieve this, we present OpenVik which consists of an open relational region detector to detect regions potentially containing relational knowledge and a visual knowledge generator that generates format-free knowledge by prompting the large multimodality model with the detected region of interest. We also explore two data enhancement techniques for diversifying the generated format-free visual knowledge. Extensive knowledge quality evaluations highlight the correctness and uniqueness of the extracted open visual knowledge by OpenVik. Moreover, integrating our extracted knowledge across various visual reasoning applications shows consistent improvements, indicating the real-world applicability of OpenVik.Comment: Accepted to NeurIPS 202

    Multi-modal preference alignment remedies regression of visual instruction tuning on language model

    Full text link
    In production, multi-modal large language models (MLLMs) are expected to support multi-turn queries of interchanging image and text modalities. However, the current MLLMs trained with visual-question-answering (VQA) datasets could suffer from degradation, as VQA datasets lack the diversity and complexity of the original text instruction datasets which the underlying language model had been trained with. To address this challenging degradation, we first collect a lightweight (6k entries) VQA preference dataset where answers were annotated by Gemini for 5 quality metrics in a granular fashion, and investigate standard Supervised Fine-tuning, rejection sampling, Direct Preference Optimization (DPO), and SteerLM. Our findings indicate that the with DPO we are able to surpass instruction-following capabilities of the language model, achieving a 6.73 score on MT-Bench, compared to Vicuna's 6.57 and LLaVA's 5.99 despite small data scale. This enhancement in textual instruction proficiency correlates with boosted visual instruction performance (+4.9\% on MM-Vet, +6\% on LLaVA-Bench), with minimal alignment tax on visual knowledge benchmarks compared to previous RLHF approach. In conclusion, we propose a distillation-based multi-modal alignment model with fine-grained annotations on a small dataset that reconciles the textual and visual performance of MLLMs, restoring and boosting language capability after visual instruction tuning

    Don’t touch! hands off! art, blindness and the conservation of expertise

    Get PDF
    The embargo on touching in museums is increasingly being brought into question, not least by blind activists who are calling for greater access to collections. The provision of opportunities to touch could be read as a potential conflict between established optic knowledge and illicit haptic experience, between the conservation of objects and access to collections. Instead I suggest that touch is not necessarily other to the museum; rather, the status of who does the touching and knowing is crucial and not the use of touch per se. It is expert territory and vested academic interests that are at stake here. Using Bruno Latour’s (1993) conceptions of hybrid networks and purified zones of academic practice, I then explore what the unacknowledged existence of touch means for museums and for notions of authority more generally. I suggest that if the apparent boundaries of disciplines are unconvincing in practice, then the possibility of expert knowledge is seriously undermined. Blind people’s demand for access through touch is not then a challenge of one paradigm to another but implicitly questions the accreditation of authority itself. As such it forms part of a wider institutional shift with regard to expertise and an increased need for negotiating between different conceptual frameworks. The ocularcentric bias of museums is increasingly being questioned by blind and visually impaired visitors who emphasize touch as a learning and aesthetic experience. This challenge is contentious not least because it ostensibly brings the individuals’ rights of access into direct conflict with museum conservation. I argue that concerns over conservation can, however, mask and serve to legitimate preconceptions about who should have access to collections; what counts as damage or dirt; and the means by which art and artefacts can be understood or enjoyed. It is expertise rather than the conservation of objects which is at stake. This article suggests that in campaigning for access through touch, blind people physically move beyond the barriers which reserve contact for the museum elite and simultaneously establish the viability of learning in a way that is not sanctioned by the art historical community. Thus resistance to touch in museums is not so much a concern for preservation as a defence of territory and expertise

    Exploratory Cluster Analysis from Ubiquitous Data Streams using Self-Organizing Maps

    Get PDF
    This thesis addresses the use of Self-Organizing Maps (SOM) for exploratory cluster analysis over ubiquitous data streams, where two complementary problems arise: first, to generate (local) SOM models over potentially unbounded multi-dimensional non-stationary data streams; second, to extrapolate these capabilities to ubiquitous environments. Towards this problematic, original contributions are made in terms of algorithms and methodologies. Two different methods are proposed regarding the first problem. By focusing on visual knowledge discovery, these methods fill an existing gap in the panorama of current methods for cluster analysis over data streams. Moreover, the original SOM capabilities in performing both clustering of observations and features are transposed to data streams, characterizing these contributions as versatile compared to existing methods, which target an individual clustering problem. Also, additional methodologies that tackle the ubiquitous aspect of data streams are proposed in respect to the second problem, allowing distributed and collaborative learning strategies. Experimental evaluations attest the effectiveness of the proposed methods and realworld applications are exemplified, namely regarding electric consumption data, air quality monitoring networks and financial data, motivating their practical use. This research study is the first to clearly address the use of the SOM towards ubiquitous data streams and opens several other research opportunities in the future

    Semantics of the visual environment encoded in parahippocampal cortex

    Get PDF
    Semantic representations capture the statistics of experience and store this information in memory. A fundamental component of this memory system is knowledge of the visual environment, including knowledge of objects and their associations. Visual semantic information underlies a range of behaviors, from perceptual categorization to cognitive processes such as language and reasoning. Here we examine the neuroanatomic system that encodes visual semantics. Across three experiments, we found converging evidence indicating that knowledge of verbally mediated visual concepts relies on information encoded in a region of the ventral-medial temporal lobe centered on parahippocampal cortex. In an fMRI study, this region was strongly engaged by the processing of concepts relying on visual knowledge but not by concepts relying on other sensory modalities. In a study of patients with the semantic variant of primary progressive aphasia (semantic dementia), atrophy that encompassed this region was associated with a specific impairment in verbally mediated visual semantic knowledge. Finally, in a structural study of healthy adults from the fMRI experiment, gray matter density in this region related to individual variability in the processing of visual concepts. The anatomic location of these findings aligns with recent work linking the ventral-medial temporal lobe with high-level visual representation, contextual associations, and reasoning through imagination. Together this work suggests a critical role for parahippocampal cortex in linking the visual environment with knowledge systems in the human brain

    From the Ground Up: Designerly Knowledge in Human-Drone Interaction

    Get PDF
    There are flying robots out there — you may have seen and heard them, droning over your head. Drones have expanded our human capacities, lifting our sight to the skies, but not without generating intricate experiences. How are these machines being designed and researched? What design methods, approaches, and philosophies are relevant to the study of the development (or decline) of drones in society? In this thesis, I argue that we must re-frame how drones are studied, from the ground up, through a design stance. I invite you to take a journey with me, with changing lenses from the work of others to my own intimate relationship with this technology. My work relies on exploring the fringes of design research: understudied groups such as children, alternative design approaches such as soma design, and peripheral methods such as autoethnography.This thesis includes four articles discussing perspectives on designerly knowledge, composing a frame surrounding the notion that we may be missing out on some of the aspects of the wicked nature of human-drone interaction (HDI) design. The methods are poised on phenomenology and narratives, and supported by the assumption that any subject of study is a sociotechnical assemblage. Starting through a first-person perspective, I offer a contribution to the gap in research through a longitudinal autoethnographic study conducted with my children. The second paper comes in the form of a pictorial expressing a first-person experience during a design research workshop, and what that meant for my relationship with drones as a research material. The third paper leaps into a Research through Design project, challenging the solutionist drone and offering instead the first steps in a concept-driven design of the unlikely pairing of drones and breathing. The fourth paper returns to the pictorial form, suggesting a method for visual conversations between researchers through the tangible qualities of sketches and illustrations. Central to this thesis, is the argument for designerly approaches in HDI and championing the need for alternative forms of publication and research. To that end, I include two publications in the form of pictorials: a publication format relying on visual knowledge and with growing interest in the HCI community

    Building a Wiki resource on digital 3D reconstruction related knowledge assets

    Get PDF
    Purpose – While single theoretical approaches related to visual humanities research and in particular digital 3D reconstruction – as the virtual, interpretative 3D modeling and visualization of historical objects – are widely described in compendia like Wikipedia, and various publications discuss approaches from certain disciplinary perspectives, a comprehensive and multidisciplinary systematization is still missing. Against this background, the research activity described within this article is intended to gain a wide and multidisciplinary overview for research approaches, theories, and methods which are relevant to investigate or explain knowledge-related phenomena in the context of visual humanities research and education. Design/methodology/approach – To meet these interests we intend to set up a Wiki resource as a structured repository. The content will be based on (a) interactive workshops held at conferences to collect and structure knowledge assets on visual knowledge involving experts from different domains. Moreover, (b) a student seminar starting in early 2017 is designated to describe some typical research designs as well as amend related methods and theories in the Wiki resource based on Wikipedia articles. A content structuring principle for the Wiki resource follows the guidelines of Wikimedia as well as plans for the results to be populated again in Wikipedia. Originality/value – While Wiki approaches are frequently used in the context of visual humanities, these resources are primarily created by experts. Furthermore, Wiki-based approaches related to visualization are often focused on a certain disciplinary context as, for example, art history. A unique aspect of the described setting is to build a Wiki on digital 3D reconstruction including expertise from different knowledge domains – i.e. on perception and cognition, didactics, information sciences, as well as computing and visual humanities. Moreover, the combination of student work and assessments by experts also provides novel insights for educational research. Practical implications – The intended product is a comprehensive and multidisciplinary structured repository on digital 3D reconstruction research approaches, methods, theories, publication bodies, and good practice examples. The editing of the project results into the Wikipedia will lead to a wide dissemination and visibility of group activities and outcomes as well as enhance competencies of all contributors on collaborative work

    UTAUT Model, Smart Exhibition Sorted by Relevance: Word Cloud Visualization Review

    Get PDF
    The aim of the paper is to introduce a visualization method with word cloud visualization to illustrate evolution of the smart exhibition, other relevant exhibition modes with virtual presentation and the articles in the application of UTAUT model in a set of documents. The relationship between UTAUT model and smart area, and the smart exhibition and some other smart areas is to be presented rapidly and evidently. This article provides interactive visual analysis of smart exhibition sorted by relevance and the industries or fields in the application of the UTAUT Model by a set of key words, at different time points based on the presentation of D2 or D3 to highlight the core word to make the trend of the smart exhibition clearly understood

    Knowledge and Reasoning in Spatial Analysis

    Get PDF
    Reasoning is an essential part of any analysis process. Especially in visual analytics, the quality of the results depends heavily on the knowledge and reasoning skills of the analyst. In this study, we consider how to make the results transparent by visualizing the reasoning and the knowledge, so that persons from outside can trace and verify them. The focus of this study is in spatial analysis and a case study was carried out on a process of off-road mobility analysis. In the case study, linked views of a map and a PCP were identified as reasoning artifacts. The knowledge used by the analyst was formed by these artifacts and the tangible pieces of information identified in them, along with the mental models of the analyst′s mind. To make the results transparent, the tangible pieces of information were marked with sketches and the mental models were presented in causal graphs because it was found that causality was central to the reasoning process in the case study. The causal graph allows the reasoning of the analyst to be studied, as well as traced back to its origin.Peer reviewe

    Evaluating a framework of theoretical hypotheses for animation learning

    Get PDF
    This paper presents a set of theoretical hypotheses suggesting various relationships between didactical setting and learning effects with animations. Particularly, we investigated whether individual flow-control adequately provides didactical means to reduce the cognitive load imposed by animations. We did not find an effect of individual flow control, probably due to the fact that this learning condition was embedded in a setting where not enough verbal information was offered together with the graphical animation. Overall the multimedia effects found in this study are in line with known principles of didactical multimedia design. Further, this study sheds light on theoretical aspects involved in the complex interaction between learning content, presentation, learning and resulting knowledg
    • …
    corecore