Search CORE

201,225 research outputs found

Two Stream Scene Understanding on Graph Embedding

Author: Huang Runxaing
Sun Wenyuan
Yang Wenkai
Publication venue
Publication date: 12/11/2023
Field of study

The paper presents a novel two-stream network architecture for enhancing scene understanding in computer vision. This architecture utilizes a graph feature stream and an image feature stream, aiming to merge the strengths of both modalities for improved performance in image classification and scene graph generation tasks. The graph feature stream network comprises a segmentation structure, scene graph generation, and a graph representation module. The segmentation structure employs the UPSNet architecture with a backbone that can be a residual network, Vit, or Swin Transformer. The scene graph generation component focuses on extracting object labels and neighborhood relationships from the semantic map to create a scene graph. Graph Convolutional Networks (GCN), GraphSAGE, and Graph Attention Networks (GAT) are employed for graph representation, with an emphasis on capturing node features and their interconnections. The image feature stream network, on the other hand, focuses on image classification through the use of Vision Transformer and Swin Transformer models. The two streams are fused using various data fusion methods. This fusion is designed to leverage the complementary strengths of graph-based and image-based features.Experiments conducted on the ADE20K dataset demonstrate the effectiveness of the proposed two-stream network in improving image classification accuracy compared to conventional methods. This research provides a significant contribution to the field of computer vision, particularly in the areas of scene understanding and image classification, by effectively combining graph-based and image-based approaches

arXiv.org e-Print Archive

Recommended from our members

Image processing and understanding based on graph similarity testing: algorithm design and software development

Author: Kang Jieqi
Publication venue: ScholarWorks@UMass Amherst
Publication date: 20/03/2017
Field of study

Image processing and understanding is a key task in the human visual system. Among all related topics, content based image retrieval and classification is the most typical and important problem. Successful image retrieval/classification models require an effective fundamental step of image representation and feature extraction. While traditional methods are not capable of capturing all structural information on the image, using graph to represent the image is not only biologically plausible but also has certain advantages. Graphs have been widely used in image related applications. Traditional graph-based image analysis models include pixel-based graph-cut techniques for image segmentation, low-level and high-level image feature extraction based on graph statistics and other related approaches which utilize the idea of graph similarity testing. To compare the images through their graph representations, a graph similarity testing algorithm is essential. Most of the existing graph similarity measurement tools are not designed for generic tasks such as image classification and retrieval, and some other models are either not scalable or not always effective. Graph spectral theory is a powerful analytical tool for capturing and representing structural information of the graph, but to use it on image understanding remains a challenge. In this dissertation, we focus on developing fast and effective image analysis models based on the spectral graph theory and other graph related mathematical tools. We first propose a fast graph similarity testing method based on the idea of the heat content and the mathematical theory of diffusion over manifolds. We then demonstrate the ability of our similarity testing model by comparing random graphs and power law graphs. Based on our graph analysis model, we develop a graph-based image representation and understanding framework. We propose the image heat content feature at first and then discuss several approaches to further improve the model. The first component in our improved framework is a novel graph generation model. The proposed model greatly reduces the size of the traditional pixel-based image graph representation and is shown to still be effective in representing an image. Meanwhile, we propose and discuss several low-level and high-level image features based on spectral graph information, including oscillatory image heat content, weighted eigenvalues and weighted heat content spectrum. Experiments show that the proposed models are invariant to non-structural changes on images and perform well in standard image classification benchmarks. Furthermore, our image features are robust to small distortions and changes of viewpoint. The model is also capable of capturing important image structural information on the image and performs well alone or in combination with other traditional techniques. We then introduce two real world software development projects using graph-based image processing techniques in this dissertation. Finally, we discuss the pros, cons and the intuition of our proposed model by demonstrating the properties of the proposed image feature and the correlation between different image features

ScholarWorks@UMass Amherst

Semi-Supervised Learning with Graphs: Covariance Based Superpixels for Hyperspectral Image Classification

Author: Aviles-Rivero AI
Coomes D
Faul A
Papadakis N
Schonlieb CB
Sellars P
Publication venue: International Geoscience and Remote Sensing Symposium (IGARSS)
Publication date: 01/01/2019
Field of study

In this paper, we present a graph-based semi-supervised framework for hyperspectral image classification. We first introduce a novel superpixel algorithm based on the spectral covariance matrix representation of pixels to provide a better representation of our data. We then construct a superpixel graph, based on carefully considered feature vectors, before performing classification. We demonstrate, through a set of experimental results using two benchmarking datasets, that our approach outperforms three state-of-the-art classification frameworks, especially when an extremely small amount of labelled data is used.Case Studentship with the NP

arXiv.org e-Print Archive

Crossref

Apollo (Cambridge)

Oskar Bordeaux

Symbolic Music Representations for Classification Tasks: A Systematic Evaluation

Author: Cancino-Chacón Carlos Eduardo
Dixon Simon
Karystinaios Emmanouil
Widmer Gerhard
Zhang Huan
Publication venue
Publication date: 05/09/2023
Field of study

Music Information Retrieval (MIR) has seen a recent surge in deep learning-based approaches, which often involve encoding symbolic music (i.e., music represented in terms of discrete note events) in an image-like or language like fashion. However, symbolic music is neither an image nor a sentence, and research in the symbolic domain lacks a comprehensive overview of the different available representations. In this paper, we investigate matrix (piano roll), sequence, and graph representations and their corresponding neural architectures, in combination with symbolic scores and performances on three piece-level classification tasks. We also introduce a novel graph representation for symbolic performances and explore the capability of graph representations in global classification tasks. Our systematic evaluation shows advantages and limitations of each input representation. Our results suggest that the graph representation, as the newest and least explored among the three approaches, exhibits promising performance, while being more light-weight in training

arXiv.org e-Print Archive

Novel deep learning methods for track reconstruction

Author: Anderson Dustin
Bendavid Josh
Calafiura Paolo
Cerati Giuseppe
Farrell Steven
Gray Lindsey
Kowalkowski Jim
Mudigonda Mayur
Prabhat
Spentzouris Panagiotis
Spiropulu Maria
Tsaris Aristeidis
Vlimant Jean-Roch
Zheng Stephan
Publication venue
Publication date: 14/10/2018
Field of study

For the past year, the HEP.TrkX project has been investigating machine learning solutions to LHC particle track reconstruction problems. A variety of models were studied that drew inspiration from computer vision applications and operated on an image-like representation of tracking detector data. While these approaches have shown some promise, image-based methods face challenges in scaling up to realistic HL-LHC data due to high dimensionality and sparsity. In contrast, models that can operate on the spacepoint representation of track measurements ("hits") can exploit the structure of the data to solve tasks efficiently. In this paper we will show two sets of new deep learning models for reconstructing tracks using space-point data arranged as sequences or connected graphs. In the first set of models, Recurrent Neural Networks (RNNs) are used to extrapolate, build, and evaluate track candidates akin to Kalman Filter algorithms. Such models can express their own uncertainty when trained with an appropriate likelihood loss function. The second set of models use Graph Neural Networks (GNNs) for the tasks of hit classification and segment classification. These models read a graph of connected hits and compute features on the nodes and edges. They adaptively learn which hit connections are important and which are spurious. The models are scaleable with simple architecture and relatively few parameters. Results for all models will be presented on ACTS generic detector simulated data.Comment: CTD 2018 proceeding

arXiv.org e-Print Archive

eScholarship - University of California