201,225 research outputs found
Two Stream Scene Understanding on Graph Embedding
The paper presents a novel two-stream network architecture for enhancing
scene understanding in computer vision. This architecture utilizes a graph
feature stream and an image feature stream, aiming to merge the strengths of
both modalities for improved performance in image classification and scene
graph generation tasks. The graph feature stream network comprises a
segmentation structure, scene graph generation, and a graph representation
module. The segmentation structure employs the UPSNet architecture with a
backbone that can be a residual network, Vit, or Swin Transformer. The scene
graph generation component focuses on extracting object labels and neighborhood
relationships from the semantic map to create a scene graph. Graph
Convolutional Networks (GCN), GraphSAGE, and Graph Attention Networks (GAT) are
employed for graph representation, with an emphasis on capturing node features
and their interconnections. The image feature stream network, on the other
hand, focuses on image classification through the use of Vision Transformer and
Swin Transformer models. The two streams are fused using various data fusion
methods. This fusion is designed to leverage the complementary strengths of
graph-based and image-based features.Experiments conducted on the ADE20K
dataset demonstrate the effectiveness of the proposed two-stream network in
improving image classification accuracy compared to conventional methods. This
research provides a significant contribution to the field of computer vision,
particularly in the areas of scene understanding and image classification, by
effectively combining graph-based and image-based approaches
Recommended from our members
Image processing and understanding based on graph similarity testing: algorithm design and software development
Image processing and understanding is a key task in the human visual system. Among all related topics, content based image retrieval and classification is the most typical and important problem. Successful image retrieval/classification models require an effective fundamental step of image representation and feature extraction. While traditional methods are not capable of capturing all structural information on the image, using graph to represent the image is not only biologically plausible but also has certain advantages.
Graphs have been widely used in image related applications. Traditional graph-based image analysis models include pixel-based graph-cut techniques for image segmentation, low-level and high-level image feature extraction based on graph statistics and other related approaches which utilize the idea of graph similarity testing. To compare the images through their graph representations, a graph similarity testing algorithm is essential. Most of the existing graph similarity measurement tools are not designed for generic tasks such as image classification and retrieval, and some other models are either not scalable or not always effective. Graph spectral theory is a powerful analytical tool for capturing and representing structural information of the graph, but to use it on image understanding remains a challenge.
In this dissertation, we focus on developing fast and effective image analysis models based on the spectral graph theory and other graph related mathematical tools. We first propose a fast graph similarity testing method based on the idea of the heat content and the mathematical theory of diffusion over manifolds. We then demonstrate the ability of our similarity testing model by comparing random graphs and power law graphs. Based on our graph analysis model, we develop a graph-based image representation and understanding framework. We propose the image heat content feature at first and then discuss several approaches to further improve the model. The first component in our improved framework is a novel graph generation model. The proposed model greatly reduces the size of the traditional pixel-based image graph representation and is shown to still be effective in representing an image. Meanwhile, we propose and discuss several low-level and high-level image features based on spectral graph information, including oscillatory image heat content, weighted eigenvalues and weighted heat content spectrum. Experiments show that the proposed models are invariant to non-structural changes on images and perform well in standard image classification benchmarks. Furthermore, our image features are robust to small distortions and changes of viewpoint. The model is also capable of capturing important image structural information on the image and performs well alone or in combination with other traditional techniques. We then introduce two real world software development projects using graph-based image processing techniques in this dissertation. Finally, we discuss the pros, cons and the intuition of our proposed model by demonstrating the properties of the proposed image feature and the correlation between different image features
Semi-Supervised Learning with Graphs: Covariance Based Superpixels for Hyperspectral Image Classification
In this paper, we present a graph-based semi-supervised framework for
hyperspectral image classification. We first introduce a novel superpixel
algorithm based on the spectral covariance matrix representation of pixels to
provide a better representation of our data. We then construct a superpixel
graph, based on carefully considered feature vectors, before performing
classification. We demonstrate, through a set of experimental results using two
benchmarking datasets, that our approach outperforms three state-of-the-art
classification frameworks, especially when an extremely small amount of
labelled data is used.Case Studentship with the NP
Symbolic Music Representations for Classification Tasks: A Systematic Evaluation
Music Information Retrieval (MIR) has seen a recent surge in deep
learning-based approaches, which often involve encoding symbolic music (i.e.,
music represented in terms of discrete note events) in an image-like or
language like fashion. However, symbolic music is neither an image nor a
sentence, and research in the symbolic domain lacks a comprehensive overview of
the different available representations. In this paper, we investigate matrix
(piano roll), sequence, and graph representations and their corresponding
neural architectures, in combination with symbolic scores and performances on
three piece-level classification tasks. We also introduce a novel graph
representation for symbolic performances and explore the capability of graph
representations in global classification tasks. Our systematic evaluation shows
advantages and limitations of each input representation. Our results suggest
that the graph representation, as the newest and least explored among the three
approaches, exhibits promising performance, while being more light-weight in
training
Novel deep learning methods for track reconstruction
For the past year, the HEP.TrkX project has been investigating machine
learning solutions to LHC particle track reconstruction problems. A variety of
models were studied that drew inspiration from computer vision applications and
operated on an image-like representation of tracking detector data. While these
approaches have shown some promise, image-based methods face challenges in
scaling up to realistic HL-LHC data due to high dimensionality and sparsity. In
contrast, models that can operate on the spacepoint representation of track
measurements ("hits") can exploit the structure of the data to solve tasks
efficiently. In this paper we will show two sets of new deep learning models
for reconstructing tracks using space-point data arranged as sequences or
connected graphs. In the first set of models, Recurrent Neural Networks (RNNs)
are used to extrapolate, build, and evaluate track candidates akin to Kalman
Filter algorithms. Such models can express their own uncertainty when trained
with an appropriate likelihood loss function. The second set of models use
Graph Neural Networks (GNNs) for the tasks of hit classification and segment
classification. These models read a graph of connected hits and compute
features on the nodes and edges. They adaptively learn which hit connections
are important and which are spurious. The models are scaleable with simple
architecture and relatively few parameters. Results for all models will be
presented on ACTS generic detector simulated data.Comment: CTD 2018 proceeding
- …