15,890 research outputs found
Could Multimedia approaches help Remote Sensing Analysis?
International audienceThe paper explores how multimedia approaches used in image understanding tasks could be adapted and used in remote sensing image analysis. Two approaches are investigated: the classical Bag of Visual Words (BoVW) approach and the Deep Learning approach. Tests are performed for the classification of the UC Merced Land Use Dataset which provide better results than the state of the art
Revising Knowledge Discovery for Object Representation with Spatio-Semantic Feature Integration
In large social networks, web objects become increasingly popular. Multimedia object classification and representation is a necessary step of multimedia information retrieval. Indexing and organizing these web objects for the purpose of convenient browsing and search of the objects, and to effectively reveal interesting patterns from the objects. For all these tasks, classifying the web objects into manipulable semantic categories is an essential procedure. One important issue for classification of objects is the representation of images. To perform supervised classification tasks, the knowledge is extracted from unlabeled objects through unsupervised learning. In order to represent the images in a more meaningful and effective way rather than using the basic Bag-of-words (BoW) model, a novel image representation model called Bag-of-visual phrases(BoP) is used. In this model visual words are obtained using hierarchical clustering and visual phrases are generated by vector classifier of visual words. To obtain the Spatio-semantic correlation knowledge the frequently co-occurring pairs are calculated from visual vocabulary. After the successful object representation, the tags, comments, and descriptions of web objects are separated by using most likelihood method. The spatial and semantic differentiation power of image features can be enhanced via this BoP model and likelihood method.
DOI: 10.17762/ijritcc2321-8169.15065
Bag-of-Features Image Indexing and Classification in Microsoft SQL Server Relational Database
This paper presents a novel relational database architecture aimed to visual
objects classification and retrieval. The framework is based on the
bag-of-features image representation model combined with the Support Vector
Machine classification and is integrated in a Microsoft SQL Server database.Comment: 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF),
Gdynia, Poland, 24-26 June 201
Learning Local Feature Aggregation Functions with Backpropagation
This paper introduces a family of local feature aggregation functions and a
novel method to estimate their parameters, such that they generate optimal
representations for classification (or any task that can be expressed as a cost
function minimization problem). To achieve that, we compose the local feature
aggregation function with the classifier cost function and we backpropagate the
gradient of this cost function in order to update the local feature aggregation
function parameters. Experiments on synthetic datasets indicate that our method
discovers parameters that model the class-relevant information in addition to
the local feature space. Further experiments on a variety of motion and visual
descriptors, both on image and video datasets, show that our method outperforms
other state-of-the-art local feature aggregation functions, such as Bag of
Words, Fisher Vectors and VLAD, by a large margin.Comment: In Proceedings of the 25th European Signal Processing Conference
(EUSIPCO 2017
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
- …