83,032 research outputs found
View and Illumination Invariant Object Classification Based on 3D Color Histogram Using Convolutional Neural Networks
Object classification is an important step in visual recognition and semantic analysis of visual content. In this paper, we propose a method for classification of objects that is invariant to illumination color, illumination direction and viewpoint based on 3D color histogram. A 3D color histogram of an image is represented as a 2D image, to capture the color composition while preserving the neighborhood information of color bins, to realize the necessary visual cues for classification of objects. Also, the ability of convolutional neural network (CNN) to learn invariant visual patterns is exploited for object classification. The efficacy of the proposed method is demonstrated on Amsterdam Library of Object Images (ALOI) dataset captured under various illumination conditions and angles-of-view
View and Illumination Invariant Object Classification Based on 3D Color Histogram Using Convolutional Neural Networks
Object classification is an important step in visual recognition and semantic analysis of visual content. In this paper, we propose a method for classification of objects that is invariant to illumination color, illumination direction and viewpoint based on 3D color histogram. A 3D color histogram of an image is represented as a 2D image, to capture the color composition while preserving the neighborhood information of color bins, to realize the necessary visual cues for classification of objects. Also, the ability of convolutional neural network (CNN) to learn invariant visual patterns is exploited for object classification. The efficacy of the proposed method is demonstrated on Amsterdam Library of Object Images (ALOI) dataset captured under various illumination conditions and angles-of-view
Hallucinating optimal high-dimensional subspaces
Linear subspace representations of appearance variation are pervasive in
computer vision. This paper addresses the problem of robustly matching such
subspaces (computing the similarity between them) when they are used to
describe the scope of variations within sets of images of different (possibly
greatly so) scales. A naive solution of projecting the low-scale subspace into
the high-scale image space is described first and subsequently shown to be
inadequate, especially at large scale discrepancies. A successful approach is
proposed instead. It consists of (i) an interpolated projection of the
low-scale subspace into the high-scale space, which is followed by (ii) a
rotation of this initial estimate within the bounds of the imposed
``downsampling constraint''. The optimal rotation is found in the closed-form
which best aligns the high-scale reconstruction of the low-scale subspace with
the reference it is compared to. The method is evaluated on the problem of
matching sets of (i) face appearances under varying illumination and (ii)
object appearances under varying viewpoint, using two large data sets. In
comparison to the naive matching, the proposed algorithm is shown to greatly
increase the separation of between-class and within-class similarities, as well
as produce far more meaningful modes of common appearance on which the match
score is based.Comment: Pattern Recognition, 201
Descriptor transition tables for object retrieval using unconstrained cluttered video acquired using a consumer level handheld mobile device
Visual recognition and vision based retrieval of objects from large databases are tasks with a wide spectrum of potential applications. In this paper we propose a novel recognition method from video sequences suitable for retrieval from databases acquired in highly unconstrained conditions e.g. using a mobile consumer-level device such as a phone. On the lowest level, we represent each sequence as a 3D mesh of densely packed local appearance descriptors. While image plane geometry is captured implicitly by a large overlap of neighbouring regions from which the descriptors are extracted, 3D information is extracted by means of a descriptor transition table, learnt from a single sequence for each known gallery object. These allow us to connect local descriptors along the 3rd dimension (which corresponds to viewpoint changes), thus resulting in a set of variable length Markov chains for each video. The matching of two sets of such chains is formulated as a statistical hypothesis test, whereby a subset of each is chosen to maximize the likelihood that the corresponding video sequences show the same object. The effectiveness of the proposed algorithm is empirically evaluated on the Amsterdam Library of Object Images and a new highly challenging video data set acquired using a mobile phone. On both data sets our method is shown to be successful in recognition in the presence of background clutter and large viewpoint changes.Postprin
Accurator: Nichesourcing for Cultural Heritage
With more and more cultural heritage data being published online, their
usefulness in this open context depends on the quality and diversity of
descriptive metadata for collection objects. In many cases, existing metadata
is not adequate for a variety of retrieval and research tasks and more specific
annotations are necessary. However, eliciting such annotations is a challenge
since it often requires domain-specific knowledge. Where crowdsourcing can be
successfully used for eliciting simple annotations, identifying people with the
required expertise might prove troublesome for tasks requiring more complex or
domain-specific knowledge. Nichesourcing addresses this problem, by tapping
into the expert knowledge available in niche communities. This paper presents
Accurator, a methodology for conducting nichesourcing campaigns for cultural
heritage institutions, by addressing communities, organizing events and
tailoring a web-based annotation tool to a domain of choice. The contribution
of this paper is threefold: 1) a nichesourcing methodology, 2) an annotation
tool for experts and 3) validation of the methodology and tool in three case
studies. The three domains of the case studies are birds on art, bible prints
and fashion images. We compare the quality and quantity of obtained annotations
in the three case studies, showing that the nichesourcing methodology in
combination with the image annotation tool can be used to collect high quality
annotations in a variety of domains and annotation tasks. A user evaluation
indicates the tool is suited and usable for domain specific annotation tasks
Optimising the number of channels in EEG-augmented image search
Recent proof-of-concept research has appeared showing the applicability of Brain Computer Interface (BCI) technology in combination with the human visual system, to classify images. The basic premise here is that images that arouse a participant’s attention generate a detectable response in their brainwaves, measurable using an electroencephalograph (EEG). When a participant is given a target class of images to search for, each image belonging to that target class presented within a stream of images should elicit a distinctly detectable neural response. Previous work in this domain has primarily focused on validating the technique on proof of concept image sets that demonstrate desired properties and on examining the capabilities of the technique at various image presentation speeds. In this paper we expand on this by examining the capability of the technique when using a reduced number of channels in the EEG, and its impact on the detection accuracy
Extending the 5S Framework of Digital Libraries to support Complex Objects, Superimposed Information, and Content-Based Image Retrieval Services
Advanced services in digital libraries (DLs) have been developed and widely used to address the required capabilities of an assortment of systems as DLs expand into diverse application domains. These systems may require support for images (e.g., Content-Based Image Retrieval), Complex (information) Objects, and use of content at fine grain (e.g., Superimposed Information). Due to the lack of consensus on precise theoretical definitions for those services, implementation efforts often involve ad hoc development, leading to duplication and interoperability problems. This article presents a methodology to address those problems by extending a precisely specified minimal digital library (in the 5S framework) with formal definitions of aforementioned services. The theoretical extensions of digital library functionality presented here are reinforced with practical case studies as well as scenarios for the individual and integrative use of services to balance theory and practice. This methodology has implications that other advanced
services can be continuously integrated into our current extended framework whenever they are identified. The theoretical definitions and case study we present may impact future development efforts and a wide range of digital library researchers, designers, and developers
- …