Search CORE

424,181 research outputs found

Model-driven performance analysis of rule-based domain specific visual models

Author: Durán Francisco
Troya Castilla Javier
Vallecillo Moreno Antonio
Zschaler Steffen
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Context: Domain-Specific Visual Languages (DSVLs) play a crucial role in Model-Driven Engineering (MDE). Most DSVLs already allow the specification of the structure and behavior of systems. However, there is also an increasing need to model, simulate and reason about their non-functional properties. In particular, QoS usage and management constraints (performance, reliability, etc.) are essential characteristics of any non-trivial system. Objective: Very few DSVLs currently offer support for modeling these kinds of properties. And those which do, tend to require skilled knowledge of specialized notations, which clashes with the intuitive nature of DSVLs. In this paper we present an alternative approach to specify QoS properties in a high-level and platform-independent manner. Method: We propose the use of special objects (observers) that can be added to the graphical specification of a system for describing and monitoring some of its non-functional properties. Results: Observers allow extending the global state of the system with the variables that the designer wants to analyze, being able to capture the performance properties of interest. A performance evaluation tool has also been developed as a proof of concept for the proposal. Conclusion: The results show how non-functional properties can be specified in DSVLs using observers, and how the performance of systems specified in this way can be evaluated in a flexible and effective way.Ministerio de Ciencia e Innovación TIN2008-031087Ministerio de Ciencia e Innovación TIN2011-2379

Crossref

King's Research Portal

idUS. Depósito de Investigación Universidad de Sevilla

A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering

Author: Chen Xinyu
Hu Baotian
Li Yunxin
Lyu Chenyang
Wang Longyue
Wang Wei
Zhang Min
Zhong Wanqi
Publication venue
Publication date: 27/01/2024
Field of study

The emergence of multimodal large models (MLMs) has significantly advanced the field of visual understanding, offering remarkable capabilities in the realm of visual question answering (VQA). Yet, the true challenge lies in the domain of knowledge-intensive VQA tasks, which necessitate not just recognition of visual elements, but also a deep comprehension of the visual information in conjunction with a vast repository of learned knowledge. To uncover such capabilities of MLMs, particularly the newly introduced GPT-4V, we provide an in-depth evaluation from three perspectives: 1) Commonsense Knowledge, which assesses how well models can understand visual cues and connect to general knowledge; 2) Fine-grained World Knowledge, which tests the model's skill in reasoning out specific knowledge from images, showcasing their proficiency across various specialized fields; 3) Comprehensive Knowledge with Decision-making Rationales, which examines model's capability to provide logical explanations for its inference, facilitating a deeper analysis from the interpretability perspective. Extensive experiments indicate that GPT-4V achieves SOTA performance on above three tasks. Interestingly, we find that: a) GPT-4V demonstrates enhanced reasoning and explanation when using composite images as few-shot; b) GPT-4V produces severe hallucinations when dealing with world knowledge, highlighting the future need for advancements in this research direction.Comment: 18 pages, 13pages; working in progres

arXiv.org e-Print Archive

From Data to Knowledge Graphs: A Multi-Layered Method to Model User's Visual Analytics Workflow for Analytical Purposes

Author: Christino Leonardo
Paulovich Fernando V.
Publication venue
Publication date: 01/01/2022
Field of study

The importance of knowledge generation drives much of Visual Analytics (VA). User-tracking and behavior graphs have shown the value of understanding users' knowledge generation while performing VA workflows. Works in theoretical models, ontologies, and provenance analysis have greatly described means to structure and understand the connection between knowledge generation and VA workflows. Yet, two concepts are typically intermixed: the temporal aspect, which indicates sequences of events, and the atemporal aspect, which indicates the workflow state space. In works where these concepts are separated, they do not discuss how to analyze the recorded user's knowledge gathering process when compared to the VA workflow itself. This paper presents Visual Analytic Knowledge Graph (VAKG), a conceptual framework that generalizes existing knowledge models and ontologies by focusing on how humans relate to computer processes temporally and how it relates to the workflow's state space. Our proposal structures this relationship as a 4-way temporal knowledge graph with specific emphasis on modeling the human and computer aspect of VA as separate but interconnected graphs for, among others, analytical purposes. We compare VAKG with relevant literature to show that VAKG's contribution allows VA applications to use it as a provenance model and a state space graph, allowing for analytics of domain-specific processes, usage patterns, and users' knowledge gain performance. We also interviewed two domain experts to check, in the wild, whether real practice and our contributions are aligned.Comment: 9 pgs, submitted to VIS 202

arXiv.org e-Print Archive

Pure OAI Repository

Self-trained Panoptic Segmentation

Author: Verma Shourya
Publication venue
Publication date: 17/11/2023
Field of study

Panoptic segmentation is an important computer vision task which combines semantic and instance segmentation. It plays a crucial role in domains of medical image analysis, self-driving vehicles, and robotics by providing a comprehensive understanding of visual environments. Traditionally, deep learning panoptic segmentation models have relied on dense and accurately annotated training data, which is expensive and time consuming to obtain. Recent advancements in self-supervised learning approaches have shown great potential in leveraging synthetic and unlabelled data to generate pseudo-labels using self-training to improve the performance of instance and semantic segmentation models. The three available methods for self-supervised panoptic segmentation use proposal-based transformer architectures which are computationally expensive, complicated and engineered for specific tasks. The aim of this work is to develop a framework to perform embedding-based self-supervised panoptic segmentation using self-training in a synthetic-to-real domain adaptation problem setting

arXiv.org e-Print Archive

Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery

Author: Bai Long
Cui Beilei
Islam Mobarakol
Ren Hongliang
Publication venue
Publication date: 12/01/2024
Field of study

Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoRA) of the foundation model for surgical depth estimation. Methods: We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery. We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning. During training, we freeze the DINO image encoder, which shows excellent visual representation capacity, and only optimize the LoRA layers and depth decoder to integrate features from the surgical scene. Results: Our model is extensively validated on a MICCAI challenge dataset of SCARED, which is collected from da Vinci Xi endoscope surgery. We empirically show that Surgical-DINO significantly outperforms all the state-of-the-art models in endoscopic depth estimation tasks. The analysis with ablation studies has shown evidence of the remarkable effect of our LoRA layers and adaptation. Conclusion: Surgical-DINO shed some light on the successful adaptation of the foundation models into the surgical domain for depth estimation. There is clear evidence in the results that zero-shot prediction on pre-trained weights in computer vision datasets or naive fine-tuning is not sufficient to use the foundation model in the surgical domain directly. Code is available at https://github.com/BeileiCui/SurgicalDINO.Comment: Accepted by IPCAI 2024 (IJCAR Special Issue

arXiv.org e-Print Archive

Recommended from our members

Understanding of Visual Domains via the Lens of Natural Language

Author: Wu Chenyun
Publication venue: ScholarWorks@UMass Amherst
Publication date: 21/10/2021
Field of study

A joint understanding of vision and language can enable intelligent systems to perceive, act, and communicate with humans for a wide range of applications. For example, they can assist a human to navigate in an environment, edit the content of an image through natural language commands, or search through image collections using natural language queries. In this thesis, we aim to improve our understanding of visual domains through the lens of natural language. We specifically look into (1) images of categories within a fine-grained taxonomy such as species of birds or variants of aircraft, (2) images of textures that describe local color, shape, and patterns, and (3) regions in images that correspond to objects, materials, and textures. In one line of work, we investigate ways to discover a domain-specific language by asking annotators to describe visual differences between instances within a fine-grained taxonomy. We show that a system trained to describe these differences leads to an accurate and interpretable basis for categorization. In another line of work, we investigate the effectiveness of language and vision models for describing textures, a problem that, despite the ubiquity of textures, has not been sufficiently studied in the literature. Textures are diverse, yet their local nature allows for the description of appearance of a wide range of visual categories. The locality also allows us to systematically generate synthetic variations to investigate how disentangled visual representations are for properties such as shape, color, and figure-ground segmentation. Finally, instead of modeling an image as a whole, we design a system that allows descriptions of regions within an image. A challenge is to handle the long-tail distribution of names and appearances of concepts within natural scenes. We design a modular framework that integrates object detection, semantic segmentation, and contextual reasoning with language that leads to better performance. In addition to methods and analysis, we contribute datasets and benchmarks to evaluate the performance of models in each of these domains. The availability of large-scale pre-trained models for vision (e.g., ResNet) and language (e.g., BERT) have catalyzed improvements and novel applications in computer vision and natural language processing, but until recently similar models that could jointly reason about language and vision were not available. This has changed through the availability of models such as CLIP, which have been trained on a massive number of images with associated texts. Therefore, we analyze the effectiveness of CLIP-based representations for tasks posed in our earlier work. By comparing and contrasting these with domain-specific ones we presented in the earlier chapters, we shed some light on the nature of the learned representations and the biases they encode

ScholarWorks@UMass Amherst

Domain adaptive learning with disentangled features

Author: Peng Xingchao
Publication venue
Publication date: 18/02/2021
Field of study

Recognizing visual information is crucial for many real artificial-intelligence-based applications, ranging from domestic robots to autonomous vehicles. However, the success of deep learning methods on visual recognition tasks is highly dependent on access to large-scale labeled datasets, which are expensive and cumbersome to collect. Transfer learning provides a way to alleviate the burden of annotating data, which transfers the knowledge learned from a rich-labeled source domain to a scarce-labeled target domain. However, the performance of deep learning models degrades significantly when testing on novel domains due to the presence of domain shift. To tackle the domain shift, conventional domain adaptation methods diminish the domain shift between two domains with a distribution matching loss or adversarial loss. These models align the domain-specific feature distribution and the domain-invariant feature distribution simultaneously, which is sub-optimal towards solving deep domain adaptation tasks, given that deep neural networks are known to extract features in which multiple hidden factors are highly entangled. This thesis explores how to learn effective transferable features by disentangling the deep features. The following questions are studied: (1) how to disentangle the deep features into domain-invariant and domain-specific features? (2) how would feature disentanglement help to learn transferable features under a synthetic-to-real domain adaptation scenario? (3) how would feature disentanglement facilitate transfer learning with multiple source or target domains? (4) how to leverage feature disentanglement to boost the performance in a federated system? To address these needs, this thesis proposes deep adversarial feature disentanglement: a class/domain identifier is trained on the labeled source domain and the disentangler generates features to fool the class/domain identifier. Extensive experiments and empirical analysis demonstrate the effectiveness of the feature disentanglement method on many real-world domain adaptation tasks. Specifically, the following three unsupervised domain adaptation scenarios are explored: (1) domain agnostic learning with disentangled representations, (2) unsupervised federated domain adaptation, (3) multi-source domain adaptation

Boston University Institutional Repository (OpenBU)

Budget-Aware Adapters for Multi-Domain Learning

Author: Berriel Rodrigo
Klein Tassilo
Lathuilière Stéphane
Nabi Moin
Oliveira-Santos Thiago
Ricci Elisa
Sebe Nicu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/12/2020
Field of study

Multi-Domain Learning (MDL) refers to the problem of learning a set of models derived from a common deep architecture, each one specialized to perform a task in a certain domain (e.g., photos, sketches, paintings). This paper tackles MDL with a particular interest in obtaining domain-specific models with an adjustable budget in terms of the number of network parameters and computational complexity. Our intuition is that, as in real applications the number of domains and tasks can be very large, an effective MDL approach should not only focus on accuracy but also on having as few parameters as possible. To implement this idea we derive specialized deep models for each domain by adapting a pre-trained architecture but, differently from other methods, we propose a novel strategy to automatically adjust the computational complexity of the network. To this aim, we introduce Budget-Aware Adapters that select the most relevant feature channels to better handle data from a novel domain. Some constraints on the number of active switches are imposed in order to obtain a network respecting the desired complexity budget. Experimentally, we show that our approach leads to recognition accuracy competitive with state-of-the-art approaches but with much lighter networks both in terms of storage and computation.Comment: ICCV 201

arXiv.org e-Print Archive

Crossref