432 research outputs found
Perceptual-based textures for scene labeling: a bottom-up and a top-down approach
Due to the semantic gap, the automatic interpretation of digital images is a very challenging task. Both the segmentation and classification are intricate because of the high variation of the data. Therefore, the application of appropriate features is of utter importance. This paper presents biologically inspired texture features for material classification and interpreting outdoor scenery images. Experiments show that the presented texture features obtain the best classification results for material recognition compared to other well-known texture features, with an average classification rate of 93.0%. For scene analysis, both a bottom-up and top-down strategy are employed to bridge the semantic gap. At first, images are segmented into regions based on the perceptual texture and next, a semantic label is calculated for these regions. Since this emerging interpretation is still error prone, domain knowledge is ingested to achieve a more accurate description of the depicted scene. By applying both strategies, 91.9% of the pixels from outdoor scenery images obtained a correct label
Person re-identification via efficient inference in fully connected CRF
In this paper, we address the problem of person re-identification problem,
i.e., retrieving instances from gallery which are generated by the same person
as the given probe image. This is very challenging because the person's
appearance usually undergoes significant variations due to changes in
illumination, camera angle and view, background clutter, and occlusion over the
camera network. In this paper, we assume that the matched gallery images should
not only be similar to the probe, but also be similar to each other, under
suitable metric. We express this assumption with a fully connected CRF model in
which each node corresponds to a gallery and every pair of nodes are connected
by an edge. A label variable is associated with each node to indicate whether
the corresponding image is from target person. We define unary potential for
each node using existing feature calculation and matching techniques, which
reflect the similarity between probe and gallery image, and define pairwise
potential for each edge in terms of a weighed combination of Gaussian kernels,
which encode appearance similarity between pair of gallery images. The specific
form of pairwise potential allows us to exploit an efficient inference
algorithm to calculate the marginal distribution of each label variable for
this dense connected CRF. We show the superiority of our method by applying it
to public datasets and comparing with the state of the art.Comment: 7 pages, 4 figure
Disconnected Skeleton: Shape at its Absolute Scale
We present a new skeletal representation along with a matching framework to
address the deformable shape recognition problem. The disconnectedness arises
as a result of excessive regularization that we use to describe a shape at an
attainably coarse scale. Our motivation is to rely on the stable properties of
the shape instead of inaccurately measured secondary details. The new
representation does not suffer from the common instability problems of
traditional connected skeletons, and the matching process gives quite
successful results on a diverse database of 2D shapes. An important difference
of our approach from the conventional use of the skeleton is that we replace
the local coordinate frame with a global Euclidean frame supported by
additional mechanisms to handle articulations and local boundary deformations.
As a result, we can produce descriptions that are sensitive to any combination
of changes in scale, position, orientation and articulation, as well as
invariant ones.Comment: The work excluding {\S}V and {\S}VI has first appeared in 2005 ICCV:
Aslan, C., Tari, S.: An Axis-Based Representation for Recognition. In
ICCV(2005) 1339- 1346.; Aslan, C., : Disconnected Skeletons for Shape
Recognition. Masters thesis, Department of Computer Engineering, Middle East
Technical University, May 200
A Multi-scale colour and Keypoint Density-based Approach for Visual Saliency Detection.
In the first seconds of observation of an image, several visual attention processes are involved in the identification of the visual targets that pop-out from the scene to our eyes. Saliency is the quality that makes certain regions of an image stand out from the visual field and grab our attention. Saliency detection models, inspired by visual cortex mechanisms, employ both colour and luminance features. Furthermore, both locations of pixels and presence of objects influence the Visual Attention processes. In this paper, we propose a new saliency method based on the combination of the distribution of interest points in the image with multiscale analysis, a centre bias module and a machine learning approach. We use perceptually uniform colour spaces to study how colour impacts on the extraction of saliency. To investigate eye-movements and assess the performances of saliency methods over object-based images, we conduct experimental sessions on our dataset ETTO (Eye Tracking Through Objects). Experiments show our approach to be accurate in the detection of saliency concerning state-of-the-art methods and accessible eye-movement datasets. The performances over object-based images are excellent and consistent on generic pictures. Besides, our work reveals interesting findings on some relationships between saliency and perceptually uniform colour spaces
Local, Semi-Local and Global Models for Texture, Object and Scene Recognition
This dissertation addresses the problems of recognizing textures, objects, and scenes in photographs. We present approaches to these recognition tasks that combine salient local image features with spatial relations and effective discriminative learning techniques. First, we introduce a bag of features image model for recognizing textured surfaces under a wide range of transformations, including viewpoint changes and non-rigid deformations. We present results of a large-scale comparative evaluation indicating that bags of features can be effective not only for texture, but also for object categization, even in the presence of substantial clutter and intra-class variation. We also show how to augment the purely local image representation with statistical co-occurrence relations between pairs of nearby features, and develop a learning and classification framework for the task of classifying individual features in a multi-texture image. Next, we present a more structured alternative to bags of features for object recognition, namely, an image representation based on semi-local parts, or groups of features characterized by stable appearance and geometric layout. Semi-local parts are automatically learned from small sets of unsegmented, cluttered images. Finally, we present a global method for recognizing scene categories that works by partitioning the image into increasingly fine sub-regions and computing histograms of local features found inside each sub-region. The resulting spatial pyramid representation demonstrates significantly improved performance on challenging scene categorization tasks
Towards Smarter Fluorescence Microscopy: Enabling Adaptive Acquisition Strategies With Optimized Photon Budget
Fluorescence microscopy is an invaluable technique for studying the intricate process of organism development. The acquisition process, however, is associated with the fundamental trade-off between the quality and reliability of the acquired data. On one hand, the goal of capturing the development in its entirety, often times across multiple spatial and temporal scales, requires extended acquisition periods. On the other hand, high doses of light required for such experiments are harmful for living samples and can introduce non-physiological artifacts in the normal course of development. Conventionally, a single set of acquisition parameters is chosen in the beginning of the acquisition and constitutes the experimenter’s best guess of the overall optimal configuration within the aforementioned trade-off. In the paradigm of adaptive microscopy, in turn, one aims at achieving more efficient photon budget distribution by dynamically adjusting the acquisition parameters to the changing properties of the sample. In this thesis, I explore the principles of adaptive microscopy and propose a range of improvements for two real imaging scenarios.
Chapter 2 summarizes the design and implementation of an adaptive pipeline for efficient observation of the asymmetrically dividing neurogenic progenitors in Zebrafish retina. In the described approach the fast and expensive acquisition mode is automatically activated only when the mitotic cells are present in the field of view. The method illustrates the benefits of the adaptive acquisition in the common scenario of the individual events of interest being sparsely distributed throughout the duration of the acquisition.
Chapter 3 focuses on computational aspects of segmentation-based adaptive schemes for efficient acquisition of the developing Drosophila pupal wing. Fast sample segmentation is shown to provide a valuable output for the accurate evaluation of the sample morphology and dynamics in real time. This knowledge proves instrumental for adjusting the acquisition parameters to the current properties of the sample and reducing the required photon budget with minimal effects to the quality of the acquired data.
Chapter 4 addresses the generation of synthetic training data for learning-based methods in bioimage analysis, making them more practical and accessible for smart microscopy pipelines. State-of-the-art deep learning models trained exclusively on the generated synthetic data are shown to yield powerful predictions when applied to the real microscopy images. In the end, in-depth evaluation of the segmentation quality of both real and synthetic data-based models illustrates the important practical aspects of the approach and outlines the directions for further research
An attention model and its application in man-made scene interpretation
The ultimate aim of research into computer vision is designing a system which interprets
its surrounding environment in a similar way the human can do effortlessly. However, the
state of technology is far from achieving such a goal. In this thesis different components of
a computer vision system that are designed for the task of interpreting man-made scenes,
in particular images of buildings, are described. The flow of information in the proposed
system is bottom-up i.e., the image is first segmented into its meaningful components and
subsequently the regions are labelled using a contextual classifier.
Starting from simple observations concerning the human vision system and the gestalt laws
of human perception, like the law of “good (simple) shape” and “perceptual grouping”, a
blob detector is developed, that identifies components in a 2D image. These components
are convex regions of interest, with interest being defined as significant gradient magnitude
content. An eye tracking experiment is conducted, which shows that the regions identified
by the blob detector, correlate significantly with the regions which drive the attention of
viewers.
Having identified these blobs, it is postulated that a blob represents an object, linguistically
identified with its own semantic name. In other words, a blob may contain a window a
door or a chimney in a building. These regions are used to identify and segment higher
order structures in a building, like facade, window array and also environmental regions
like sky and ground.
Because of inconsistency in the unary features of buildings, a contextual learning algorithm
is used to classify the segmented regions. A model which learns spatial and topological
relationships between different objects from a set of hand-labelled data, is used. This
model utilises this information in a MRF to achieve consistent labellings of new scenes
- …