46,083 research outputs found
AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching
Despite significant progress of deep learning in recent years,
state-of-the-art semantic matching methods still rely on legacy features such
as SIFT or HoG. We argue that the strong invariance properties that are key to
the success of recent deep architectures on the classification task make them
unfit for dense correspondence tasks, unless a large amount of supervision is
used. In this work, we propose a deep network, termed AnchorNet, that produces
image representations that are well-suited for semantic matching. It relies on
a set of filters whose response is geometrically consistent across different
object instances, even in the presence of strong intra-class, scale, or
viewpoint variations. Trained only with weak image-level labels, the final
representation successfully captures information about the object structure and
improves results of state-of-the-art semantic matching methods such as the
deformable spatial pyramid or the proposal flow methods. We show positive
results on the cross-instance matching task where different instances of the
same object category are matched as well as on a new cross-category semantic
matching task aligning pairs of instances each from a different object class.Comment: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition. 201
Object priors for classifying and localizing unseen actions
This work strives for the classification and localization of human actions in
videos, without the need for any labeled video training examples. Where
existing work relies on transferring global attribute or object information
from seen to unseen action videos, we seek to classify and spatio-temporally
localize unseen actions in videos from image-based object information only. We
propose three spatial object priors, which encode local person and object
detectors along with their spatial relations. On top we introduce three
semantic object priors, which extend semantic matching through word embeddings
with three simple functions that tackle semantic ambiguity, object
discrimination, and object naming. A video embedding combines the spatial and
semantic object priors. It enables us to introduce a new video retrieval task
that retrieves action tubes in video collections based on user-specified
objects, spatial relations, and object size. Experimental evaluation on five
action datasets shows the importance of spatial and semantic object priors for
unseen actions. We find that persons and objects have preferred spatial
relations that benefit unseen action localization, while using multiple
languages and simple object filtering directly improves semantic matching,
leading to state-of-the-art results for both unseen action classification and
localization.Comment: Accepted to IJC
Recognition of Object Categories in Realistic Scenes
Classification of images maps the image content into a certain semantic term such as categories, domain,
object. Image classification should be able automatically check the existence of certain object (e.g. car,
animal, and scene) in the image content. This task is still challenging in computer vision since we have to
deal with the realistic image. The objective of this works is to discover the image classification methods
by mixturing the existing techniques with the aim of the best results in classification. In this work, we
implemented sparse coding method with spatial pyramid matching to classify the images. Beside gray
SIFT, four SIFT color descriptors were also used as a local descriptor. Linear Super Vector Machine (SVM)
is conducted for training and testing the images. The result of this work has shown that color descriptors
improve significantly the classification rate compared to gray SIFT
Deep Learning for Medical Image Segmentation using Prior Knowledge and Topology
Image segmentation refers to the division of a digital image into distinct segments or groups of pixels/voxels. However, most of the existing deep learning approaches lack the utilization of prior knowledge, such as shape information, which could improve segmentation accuracy. In addition, conventional image segmentation frequently falls short in preserving intricate spatial details, motivating the innovation of strategies for multi-scaled feature integration. Furthermore, traditional image segmentation methods primarily concentrate on pixel-level or region-level analysis. However, given the inherent morphological similarities among various image objects, the significance of topology information surpasses that of pixel-level data in the realm of medical image semantic segmentation, and the incorporation of topology information for image segmentation is important.
The first aim of this dissertation is to incorporate shape priors into medical image segmentation. A shape-prior-V-Net (SP-V-Net) is proposed, which contains a shape transformation module to refine the segmentation results according to the shape prior. SP-V-Net has been applied to lung segmentation and proximal femur segmentation.
The second aim aims to improve image segmentation by leveraging hierarchical features. Two approaches are proposed: the feature pyramid U-Net++ (FP-U-Net++), which dynamically aggregates the feature pyramid in the decoder of U-Net ++, and the multi-input multi-scale U-Net (MIMS U-Net), which integrates the features in the encoder of the U-Net.
The third aim explores topology-based image semantic segmentation using graph neural networks. Three graph-matching networks have been developed, including association graph-based, edge attention graph matching, and hyper-association graph matching networks. The proposed graph-matching networks convert the graph-matching problems into a vertex classification problem using an association graph, where the positive vertex indicates the nodes from two individual graphs are matched. These models were applied to coronary artery semantic labeling on invasive coronary angiograms. Moreover, this study presents a pioneering approach for topology-based image semantic labeling using graph matching.
The successful completion of these aims contributes technically accurate and clinically applicable algorithms and techniques for medical image segmentation. The outcomes of this dissertation provide valuable tools for the medical imaging and computer vision communities, advancing the field and improving patient care through accurate and efficient medical image segmentation
SFNet: Learning Object-aware Semantic Correspondence
We address the problem of semantic correspondence, that is, establishing a
dense flow field between images depicting different instances of the same
object or scene category. We propose to use images annotated with binary
foreground masks and subjected to synthetic geometric deformations to train a
convolutional neural network (CNN) for this task. Using these masks as part of
the supervisory signal offers a good compromise between semantic flow methods,
where the amount of training data is limited by the cost of manually selecting
point correspondences, and semantic alignment ones, where the regression of a
single global geometric transformation between images may be sensitive to
image-specific details such as background clutter. We propose a new CNN
architecture, dubbed SFNet, which implements this idea. It leverages a new and
differentiable version of the argmax function for end-to-end training, with a
loss that combines mask and flow consistency with smoothness terms.
Experimental results demonstrate the effectiveness of our approach, which
significantly outperforms the state of the art on standard benchmarks.Comment: cvpr 2019 oral pape
Deep learning in remote sensing: a review
Standing at the paradigm shift towards data-intensive science, machine
learning techniques are becoming increasingly important. In particular, as a
major breakthrough in the field, deep learning has proven as an extremely
powerful tool in many fields. Shall we embrace deep learning as the key to all?
Or, should we resist a 'black-box' solution? There are controversial opinions
in the remote sensing community. In this article, we analyze the challenges of
using deep learning for remote sensing data analysis, review the recent
advances, and provide resources to make deep learning in remote sensing
ridiculously simple to start with. More importantly, we advocate remote sensing
scientists to bring their expertise into deep learning, and use it as an
implicit general model to tackle unprecedented large-scale influential
challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin
Aggregated Deep Local Features for Remote Sensing Image Retrieval
Remote Sensing Image Retrieval remains a challenging topic due to the special
nature of Remote Sensing Imagery. Such images contain various different
semantic objects, which clearly complicates the retrieval task. In this paper,
we present an image retrieval pipeline that uses attentive, local convolutional
features and aggregates them using the Vector of Locally Aggregated Descriptors
(VLAD) to produce a global descriptor. We study various system parameters such
as the multiplicative and additive attention mechanisms and descriptor
dimensionality. We propose a query expansion method that requires no external
inputs. Experiments demonstrate that even without training, the local
convolutional features and global representation outperform other systems.
After system tuning, we can achieve state-of-the-art or competitive results.
Furthermore, we observe that our query expansion method increases overall
system performance by about 3%, using only the top-three retrieved images.
Finally, we show how dimensionality reduction produces compact descriptors with
increased retrieval performance and fast retrieval computation times, e.g. 50%
faster than the current systems.Comment: Published in Remote Sensing. The first two authors have equal
contributio
- …