Search CORE

1,727 research outputs found

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

Author: Fang Wei
Lee Chung-Wei
Wang Yu-Chiang Frank
Yeh Chih-Kuan
Publication venue
Publication date: 26/05/2018
Field of study

In this paper, we propose a novel deep learning architecture for multi-label zero-shot learning (ML-ZSL), which is able to predict multiple unseen class labels for each input instance. Inspired by the way humans utilize semantic knowledge between objects of interests, we propose a framework that incorporates knowledge graphs for describing the relationships between multiple labels. Our model learns an information propagation mechanism from the semantic label space, which can be applied to model the interdependencies between seen and unseen class labels. With such investigation of structured knowledge graphs for visual reasoning, we show that our model can be applied for solving multi-label classification and ML-ZSL tasks. Compared to state-of-the-art approaches, comparable or improved performances can be achieved by our method.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref

Visual Question Answering: A Survey of Methods and Datasets

Author: Dick Anthony
Hengel Anton van den
Shen Chunhua
Teney Damien
Wang Peng
Wu Qi
Publication venue
Publication date: 20/07/2016
Field of study

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires reasoning over visual elements of the image and general knowledge to infer the correct answer. In the first part of this survey, we examine the state of the art by comparing modern approaches to the problem. We classify methods by their mechanism to connect the visual and textual modalities. In particular, we examine the common approach of combining convolutional and recurrent neural networks to map images and questions to a common feature space. We also discuss memory-augmented and modular architectures that interface with structured knowledge bases. In the second part of this survey, we review the datasets available for training and evaluating VQA systems. The various datatsets contain questions at different levels of complexity, which require different capabilities and types of reasoning. We examine in depth the question/answer pairs from the Visual Genome project, and evaluate the relevance of the structured annotations of images with scene graphs for VQA. Finally, we discuss promising future directions for the field, in particular the connection to structured knowledge bases and the use of natural language processing models.Comment: 25 page

arXiv.org e-Print Archive

Adelaide Research & Scholarship

ANALYZING PULMONARY ABNORMALITY WITH SUPERPIXEL BASED GRAPH NEURAL NETWORKS IN CHEST X-RAY

Author: Pradhan Ronaj
Publication venue: USD RED
Publication date: 01/01/2023
Field of study

In recent years, the utilization of graph-based deep learning has gained prominence, yet its potential in the realm of medical diagnosis remains relatively unexplored. Convolutional Neural Network (CNN) has achieved state-of-the-art performance in areas such as computer vision, particularly for grid-like data such as images. However, they require a huge dataset to achieve top level of performance and challenge arises when learning from the inherent irregular/unordered nature of physiological data. In this thesis, the research primarily focuses on abnormality screening: classification of Chest X-Ray (CXR) as Tuberculosis positive or negative, using Graph Neural Networks (GNN) that uses Region Adjacency Graphs (RAGs), and each superpixel serves as a dedicated graph node. For graph classification, provided that the different classes are distinct enough GNN often classify graphs using just the graph structures. This study delves into the inquiry of whether the incorporation of node features, such as coordinate points and pixel intensity, along with structured data representing graph can enhance the learning process. By integration of residual and concatenation structures, this methodology adeptly captures essential features and relationships among superpixels, thereby contributing to advancements in tuberculosis identification. We achieved the best performance: accuracy of 0.80 and AUC of 0.79, through the union of state-of-the-art neural network architectures and innovative graph-based representations. This work introduces a new perspective to medical image analysis

USD RED (University of South Dakota)