424,181 research outputs found

    Model-driven performance analysis of rule-based domain specific visual models

    Get PDF
    Context: Domain-Specific Visual Languages (DSVLs) play a crucial role in Model-Driven Engineering (MDE). Most DSVLs already allow the specification of the structure and behavior of systems. However, there is also an increasing need to model, simulate and reason about their non-functional properties. In particular, QoS usage and management constraints (performance, reliability, etc.) are essential characteristics of any non-trivial system. Objective: Very few DSVLs currently offer support for modeling these kinds of properties. And those which do, tend to require skilled knowledge of specialized notations, which clashes with the intuitive nature of DSVLs. In this paper we present an alternative approach to specify QoS properties in a high-level and platform-independent manner. Method: We propose the use of special objects (observers) that can be added to the graphical specification of a system for describing and monitoring some of its non-functional properties. Results: Observers allow extending the global state of the system with the variables that the designer wants to analyze, being able to capture the performance properties of interest. A performance evaluation tool has also been developed as a proof of concept for the proposal. Conclusion: The results show how non-functional properties can be specified in DSVLs using observers, and how the performance of systems specified in this way can be evaluated in a flexible and effective way.Ministerio de Ciencia e Innovación TIN2008-031087Ministerio de Ciencia e Innovación TIN2011-2379

    A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering

    Full text link
    The emergence of multimodal large models (MLMs) has significantly advanced the field of visual understanding, offering remarkable capabilities in the realm of visual question answering (VQA). Yet, the true challenge lies in the domain of knowledge-intensive VQA tasks, which necessitate not just recognition of visual elements, but also a deep comprehension of the visual information in conjunction with a vast repository of learned knowledge. To uncover such capabilities of MLMs, particularly the newly introduced GPT-4V, we provide an in-depth evaluation from three perspectives: 1) Commonsense Knowledge, which assesses how well models can understand visual cues and connect to general knowledge; 2) Fine-grained World Knowledge, which tests the model's skill in reasoning out specific knowledge from images, showcasing their proficiency across various specialized fields; 3) Comprehensive Knowledge with Decision-making Rationales, which examines model's capability to provide logical explanations for its inference, facilitating a deeper analysis from the interpretability perspective. Extensive experiments indicate that GPT-4V achieves SOTA performance on above three tasks. Interestingly, we find that: a) GPT-4V demonstrates enhanced reasoning and explanation when using composite images as few-shot; b) GPT-4V produces severe hallucinations when dealing with world knowledge, highlighting the future need for advancements in this research direction.Comment: 18 pages, 13pages; working in progres

    From Data to Knowledge Graphs: A Multi-Layered Method to Model User's Visual Analytics Workflow for Analytical Purposes

    Full text link
    The importance of knowledge generation drives much of Visual Analytics (VA). User-tracking and behavior graphs have shown the value of understanding users' knowledge generation while performing VA workflows. Works in theoretical models, ontologies, and provenance analysis have greatly described means to structure and understand the connection between knowledge generation and VA workflows. Yet, two concepts are typically intermixed: the temporal aspect, which indicates sequences of events, and the atemporal aspect, which indicates the workflow state space. In works where these concepts are separated, they do not discuss how to analyze the recorded user's knowledge gathering process when compared to the VA workflow itself. This paper presents Visual Analytic Knowledge Graph (VAKG), a conceptual framework that generalizes existing knowledge models and ontologies by focusing on how humans relate to computer processes temporally and how it relates to the workflow's state space. Our proposal structures this relationship as a 4-way temporal knowledge graph with specific emphasis on modeling the human and computer aspect of VA as separate but interconnected graphs for, among others, analytical purposes. We compare VAKG with relevant literature to show that VAKG's contribution allows VA applications to use it as a provenance model and a state space graph, allowing for analytics of domain-specific processes, usage patterns, and users' knowledge gain performance. We also interviewed two domain experts to check, in the wild, whether real practice and our contributions are aligned.Comment: 9 pgs, submitted to VIS 202

    Self-trained Panoptic Segmentation

    Full text link
    Panoptic segmentation is an important computer vision task which combines semantic and instance segmentation. It plays a crucial role in domains of medical image analysis, self-driving vehicles, and robotics by providing a comprehensive understanding of visual environments. Traditionally, deep learning panoptic segmentation models have relied on dense and accurately annotated training data, which is expensive and time consuming to obtain. Recent advancements in self-supervised learning approaches have shown great potential in leveraging synthetic and unlabelled data to generate pseudo-labels using self-training to improve the performance of instance and semantic segmentation models. The three available methods for self-supervised panoptic segmentation use proposal-based transformer architectures which are computationally expensive, complicated and engineered for specific tasks. The aim of this work is to develop a framework to perform embedding-based self-supervised panoptic segmentation using self-training in a synthetic-to-real domain adaptation problem setting

    Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery

    Full text link
    Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoRA) of the foundation model for surgical depth estimation. Methods: We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery. We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning. During training, we freeze the DINO image encoder, which shows excellent visual representation capacity, and only optimize the LoRA layers and depth decoder to integrate features from the surgical scene. Results: Our model is extensively validated on a MICCAI challenge dataset of SCARED, which is collected from da Vinci Xi endoscope surgery. We empirically show that Surgical-DINO significantly outperforms all the state-of-the-art models in endoscopic depth estimation tasks. The analysis with ablation studies has shown evidence of the remarkable effect of our LoRA layers and adaptation. Conclusion: Surgical-DINO shed some light on the successful adaptation of the foundation models into the surgical domain for depth estimation. There is clear evidence in the results that zero-shot prediction on pre-trained weights in computer vision datasets or naive fine-tuning is not sufficient to use the foundation model in the surgical domain directly. Code is available at https://github.com/BeileiCui/SurgicalDINO.Comment: Accepted by IPCAI 2024 (IJCAR Special Issue

    Domain adaptive learning with disentangled features

    Get PDF
    Recognizing visual information is crucial for many real artificial-intelligence-based applications, ranging from domestic robots to autonomous vehicles. However, the success of deep learning methods on visual recognition tasks is highly dependent on access to large-scale labeled datasets, which are expensive and cumbersome to collect. Transfer learning provides a way to alleviate the burden of annotating data, which transfers the knowledge learned from a rich-labeled source domain to a scarce-labeled target domain. However, the performance of deep learning models degrades significantly when testing on novel domains due to the presence of domain shift. To tackle the domain shift, conventional domain adaptation methods diminish the domain shift between two domains with a distribution matching loss or adversarial loss. These models align the domain-specific feature distribution and the domain-invariant feature distribution simultaneously, which is sub-optimal towards solving deep domain adaptation tasks, given that deep neural networks are known to extract features in which multiple hidden factors are highly entangled. This thesis explores how to learn effective transferable features by disentangling the deep features. The following questions are studied: (1) how to disentangle the deep features into domain-invariant and domain-specific features? (2) how would feature disentanglement help to learn transferable features under a synthetic-to-real domain adaptation scenario? (3) how would feature disentanglement facilitate transfer learning with multiple source or target domains? (4) how to leverage feature disentanglement to boost the performance in a federated system? To address these needs, this thesis proposes deep adversarial feature disentanglement: a class/domain identifier is trained on the labeled source domain and the disentangler generates features to fool the class/domain identifier. Extensive experiments and empirical analysis demonstrate the effectiveness of the feature disentanglement method on many real-world domain adaptation tasks. Specifically, the following three unsupervised domain adaptation scenarios are explored: (1) domain agnostic learning with disentangled representations, (2) unsupervised federated domain adaptation, (3) multi-source domain adaptation

    Budget-Aware Adapters for Multi-Domain Learning

    Full text link
    Multi-Domain Learning (MDL) refers to the problem of learning a set of models derived from a common deep architecture, each one specialized to perform a task in a certain domain (e.g., photos, sketches, paintings). This paper tackles MDL with a particular interest in obtaining domain-specific models with an adjustable budget in terms of the number of network parameters and computational complexity. Our intuition is that, as in real applications the number of domains and tasks can be very large, an effective MDL approach should not only focus on accuracy but also on having as few parameters as possible. To implement this idea we derive specialized deep models for each domain by adapting a pre-trained architecture but, differently from other methods, we propose a novel strategy to automatically adjust the computational complexity of the network. To this aim, we introduce Budget-Aware Adapters that select the most relevant feature channels to better handle data from a novel domain. Some constraints on the number of active switches are imposed in order to obtain a network respecting the desired complexity budget. Experimentally, we show that our approach leads to recognition accuracy competitive with state-of-the-art approaches but with much lighter networks both in terms of storage and computation.Comment: ICCV 201
    • …