Search CORE

46,083 research outputs found

AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching

Author: Larlus Diane
Novotny David
Vedaldi Andrea
Publication venue
Publication date: 01/01/2017
Field of study

Despite significant progress of deep learning in recent years, state-of-the-art semantic matching methods still rely on legacy features such as SIFT or HoG. We argue that the strong invariance properties that are key to the success of recent deep architectures on the classification task make them unfit for dense correspondence tasks, unless a large amount of supervision is used. In this work, we propose a deep network, termed AnchorNet, that produces image representations that are well-suited for semantic matching. It relies on a set of filters whose response is geometrically consistent across different object instances, even in the presence of strong intra-class, scale, or viewpoint variations. Trained only with weak image-level labels, the final representation successfully captures information about the object structure and improves results of state-of-the-art semantic matching methods such as the deformable spatial pyramid or the proposal flow methods. We show positive results on the cross-instance matching task where different instances of the same object category are matched as well as on a new cross-category semantic matching task aligning pairs of instances each from a different object class.Comment: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 201

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Object priors for classifying and localizing unseen actions

Author: Mettes P.
Snoek C.G.M.
Thong W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/04/2021
Field of study

This work strives for the classification and localization of human actions in videos, without the need for any labeled video training examples. Where existing work relies on transferring global attribute or object information from seen to unseen action videos, we seek to classify and spatio-temporally localize unseen actions in videos from image-based object information only. We propose three spatial object priors, which encode local person and object detectors along with their spatial relations. On top we introduce three semantic object priors, which extend semantic matching through word embeddings with three simple functions that tackle semantic ambiguity, object discrimination, and object naming. A video embedding combines the spatial and semantic object priors. It enables us to introduce a new video retrieval task that retrieves action tubes in video collections based on user-specified objects, spatial relations, and object size. Experimental evaluation on five action datasets shows the importance of spatial and semantic object priors for unseen actions. We find that persons and objects have preferred spatial relations that benefit unseen action localization, while using multiple languages and simple object filtering directly improves semantic matching, leading to state-of-the-art results for both unseen action classification and localization.Comment: Accepted to IJC

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Recognition of Object Categories in Realistic Scenes

Author: Fauvert Eric
Suhatril Ruddy J.
Suhendra Adang
Voon Lew F.C. Lew Yan
Publication venue: 'Gunadarma University'
Publication date
Field of study

Classification of images maps the image content into a certain semantic term such as categories, domain, object. Image classification should be able automatically check the existence of certain object (e.g. car, animal, and scene) in the image content. This task is still challenging in computer vision since we have to deal with the realistic image. The objective of this works is to discover the image classification methods by mixturing the existing techniques with the aim of the best results in classification. In this work, we implemented sparse coding method with spatial pyramid matching to classify the images. Beside gray SIFT, four SIFT color descriptors were also used as a local descriptor. Linear Super Vector Machine (SVM) is conducted for training and testing the images. The result of this work has shown that color descriptors improve significantly the classification rate compared to gray SIFT

Gunadarma University Repository

Deep Learning for Medical Image Segmentation using Prior Knowledge and Topology

Author: Zhao Chen
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2023
Field of study

Image segmentation refers to the division of a digital image into distinct segments or groups of pixels/voxels. However, most of the existing deep learning approaches lack the utilization of prior knowledge, such as shape information, which could improve segmentation accuracy. In addition, conventional image segmentation frequently falls short in preserving intricate spatial details, motivating the innovation of strategies for multi-scaled feature integration. Furthermore, traditional image segmentation methods primarily concentrate on pixel-level or region-level analysis. However, given the inherent morphological similarities among various image objects, the significance of topology information surpasses that of pixel-level data in the realm of medical image semantic segmentation, and the incorporation of topology information for image segmentation is important. The first aim of this dissertation is to incorporate shape priors into medical image segmentation. A shape-prior-V-Net (SP-V-Net) is proposed, which contains a shape transformation module to refine the segmentation results according to the shape prior. SP-V-Net has been applied to lung segmentation and proximal femur segmentation. The second aim aims to improve image segmentation by leveraging hierarchical features. Two approaches are proposed: the feature pyramid U-Net++ (FP-U-Net++), which dynamically aggregates the feature pyramid in the decoder of U-Net ++, and the multi-input multi-scale U-Net (MIMS U-Net), which integrates the features in the encoder of the U-Net. The third aim explores topology-based image semantic segmentation using graph neural networks. Three graph-matching networks have been developed, including association graph-based, edge attention graph matching, and hyper-association graph matching networks. The proposed graph-matching networks convert the graph-matching problems into a vertex classification problem using an association graph, where the positive vertex indicates the nodes from two individual graphs are matched. These models were applied to coronary artery semantic labeling on invasive coronary angiograms. Moreover, this study presents a pioneering approach for topology-based image semantic labeling using graph matching. The successful completion of these aims contributes technically accurate and clinically applicable algorithms and techniques for medical image segmentation. The outcomes of this dissertation provide valuable tools for the medical imaging and computer vision communities, advancing the field and improving patient care through accurate and efficient medical image segmentation

Michigan Technological University

SFNet: Learning Object-aware Semantic Correspondence

Author: Ham Bumsub
Kim Dohyung
Lee Junghyup
Ponce Jean
Publication venue
Publication date: 04/04/2019
Field of study

We address the problem of semantic correspondence, that is, establishing a dense flow field between images depicting different instances of the same object or scene category. We propose to use images annotated with binary foreground masks and subjected to synthetic geometric deformations to train a convolutional neural network (CNN) for this task. Using these masks as part of the supervisory signal offers a good compromise between semantic flow methods, where the amount of training data is limited by the cost of manually selecting point correspondences, and semantic alignment ones, where the regression of a single global geometric transformation between images may be sensitive to image-specific details such as background clutter. We propose a new CNN architecture, dubbed SFNet, which implements this idea. It leverages a new and differentiable version of the argmax function for end-to-end training, with a loss that combines mask and flow consistency with smoothness terms. Experimental results demonstrate the effectiveness of our approach, which significantly outperforms the state of the art on standard benchmarks.Comment: cvpr 2019 oral pape

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Deep learning in remote sensing: a review

Author: Fraundorfer Friedrich
Mou Lichao
Tuia Devis
Xia Gui-Song
Xu Feng
Zhang Liangpei
Zhu Xiao Xiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Standing at the paradigm shift towards data-intensive science, machine learning techniques are becoming increasingly important. In particular, as a major breakthrough in the field, deep learning has proven as an extremely powerful tool in many fields. Shall we embrace deep learning as the key to all? Or, should we resist a 'black-box' solution? There are controversial opinions in the remote sensing community. In this article, we analyze the challenges of using deep learning for remote sensing data analysis, review the recent advances, and provide resources to make deep learning in remote sensing ridiculously simple to start with. More importantly, we advocate remote sensing scientists to bring their expertise into deep learning, and use it as an implicit general model to tackle unprecedented large-scale influential challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Wageningen University & Research Publications

Carolina Digital Repository

Aggregated Deep Local Features for Remote Sensing Image Retrieval

Author: Bondarev Egor
de With Peter H. N.
Imbriaco Raffaele
Sebastian Clint
Publication venue: 'MDPI AG'
Publication date: 01/02/2019
Field of study

Remote Sensing Image Retrieval remains a challenging topic due to the special nature of Remote Sensing Imagery. Such images contain various different semantic objects, which clearly complicates the retrieval task. In this paper, we present an image retrieval pipeline that uses attentive, local convolutional features and aggregates them using the Vector of Locally Aggregated Descriptors (VLAD) to produce a global descriptor. We study various system parameters such as the multiplicative and additive attention mechanisms and descriptor dimensionality. We propose a query expansion method that requires no external inputs. Experiments demonstrate that even without training, the local convolutional features and global representation outperform other systems. After system tuning, we can achieve state-of-the-art or competitive results. Furthermore, we observe that our query expansion method increases overall system performance by about 3%, using only the top-three retrieved images. Finally, we show how dimensionality reduction produces compact descriptors with increased retrieval performance and fast retrieval computation times, e.g. 50% faster than the current systems.Comment: Published in Remote Sensing. The first two authors have equal contributio

arXiv.org e-Print Archive

Pure OAI Repository

Directory of Open Access Journals