104 research outputs found

    Image Retrieval Using Image Captioning

    Get PDF
    The rapid growth in the availability of the Internet and smartphones have resulted in the increase in usage of social media in recent years. This increased usage has thereby resulted in the exponential growth of digital images which are available. Therefore, image retrieval systems play a major role in fetching images relevant to the query provided by the users. These systems should also be able to handle the massive growth of data and take advantage of the emerging technologies, like deep learning and image captioning. This report aims at understanding the purpose of image retrieval and various research held in image retrieval in the past. This report will also analyze various gaps in the past research and it will state the role of image captioning in these systems. Additionally, this report proposes a new methodology using image captioning to retrieve images and presents the results of this method, along with comparing the results with past research

    Retinal vessel segmentation using textons

    Get PDF
    Segmenting vessels from retinal images, like segmentation in many other medical image domains, is a challenging task, as there is no unified way that can be adopted to extract the vessels accurately. However, it is the most critical stage in automatic assessment of various forms of diseases (e.g. Glaucoma, Age-related macular degeneration, diabetic retinopathy and cardiovascular diseases etc.). Our research aims to investigate retinal image segmentation approaches based on textons as they provide a compact description of texture that can be learnt from a training set. This thesis presents a brief review of those diseases and also includes their current situations, future trends and techniques used for their automatic diagnosis in routine clinical applications. The importance of retinal vessel segmentation is particularly emphasized in such applications. An extensive review of previous work on retinal vessel segmentation and salient texture analysis methods is presented. Five automatic retinal vessel segmentation methods are proposed in this thesis. The first method focuses on addressing the problem of removing pathological anomalies (Drusen, exudates) for retinal vessel segmentation, which have been identified by other researchers as a problem and a common source of error. The results show that the modified method shows some improvement compared to a previously published method. The second novel supervised segmentation method employs textons. We propose a new filter bank (MR11) that includes bar detectors for vascular feature extraction and other kernels to detect edges and photometric variations in the image. The k-means clustering algorithm is adopted for texton generation based on the vessel and non-vessel elements which are identified by ground truth. The third improved supervised method is developed based on the second one, in which textons are generated by k-means clustering and texton maps representing vessels are derived by back projecting pixel clusters onto hand labelled ground truth. A further step is implemented to ensure that the best combinations of textons are represented in the map and subsequently used to identify vessels in the test set. The experimental results on two benchmark datasets show that our proposed method performs well compared to other published work and the results of human experts. A further test of our system on an independent set of optical fundus images verified its consistent performance. The statistical analysis on experimental results also reveals that it is possible to train unified textons for retinal vessel segmentation. In the fourth method a novel scheme using Gabor filter bank for vessel feature extraction is proposed. The ii method is inspired by the human visual system. Machine learning is used to optimize the Gabor filter parameters. The experimental results demonstrate that our method significantly enhances the true positive rate while maintaining a level of specificity that is comparable with other approaches. Finally, we proposed a new unsupervised texton based retinal vessel segmentation method using derivative of SIFT and multi-scale Gabor filers. The lack of sufficient quantities of hand labelled ground truth and the high level of variability in ground truth labels amongst experts provides the motivation for this approach. The evaluation results reveal that our unsupervised segmentation method is comparable with the best other supervised methods and other best state of the art methods

    Patch-based semantic labelling of images.

    Get PDF
    PhDThe work presented in this thesis is focused at associating a semantics to the content of an image, linking the content to high level semantic categories. The process can take place at two levels: either at image level, towards image categorisation, or at pixel level, in se- mantic segmentation or semantic labelling. To this end, an analysis framework is proposed, and the different steps of part (or patch) extraction, description and probabilistic modelling are detailed. Parts of different nature are used, and one of the contributions is a method to complement information associated to them. Context for parts has to be considered at different scales. Short range pixel dependences are accounted by associating pixels to larger patches. A Conditional Random Field, that is, a probabilistic discriminative graphical model, is used to model medium range dependences between neighbouring patches. Another contribution is an efficient method to consider rich neighbourhoods without having loops in the inference graph. To this end, weak neighbours are introduced, that is, neighbours whose label probability distribution is pre-estimated rather than mutable during the inference. Longer range dependences, that tend to make the inference problem intractable, are addressed as well. A novel descriptor based on local histograms of visual words has been proposed, meant to both complement the feature descriptor of the patches and augment the context awareness in the patch labelling process. Finally, an alternative approach to consider multiple scales in a hierarchical framework based on image pyramids is proposed. An image pyramid is a compositional representation of the image based on hierarchical clustering. All the presented contributions are extensively detailed throughout the thesis, and experimental results performed on publicly available datasets are reported to assess their validity. A critical comparison with the state of the art in this research area is also presented, and the advantage in adopting the proposed improvements are clearly highlighted

    Characterisation of Dynamic Process Systems by Use of Recurrence Texture Analysis

    Get PDF
    This thesis proposes a method to analyse the dynamic behaviour of process systems using sets of textural features extracted from distance matrices obtained from time series data. Algorithms based on the use of grey level co-occurrence matrices, wavelet transforms, local binary patterns, textons, and the pretrained convolutional neural networks (AlexNet and VGG16) were used to extract features. The method was demonstrated to effectively capture the dynamics of mineral process systems and could outperform competing approaches

    Enhancing spatio-chromatic representation with more-than-three color coding for image description

    Get PDF
    The extraction of spatio-chromatic features from color images is usually performed independently on each color channel. Usual 3D color spaces, such as RGB, present a high inter-channel correlation for natural images. This correlation can be reduced using color-opponent representations, but the spatial structure of regions with small color differences is not fully captured in two generic Red-Green and Blue-Yellow channels. To overcome these problems, we propose new color coding that is adapted to the specific content of each image. Our proposal is based on two steps: (a) setting the number of channels to the number of distinctive colors we find in each image (avoiding the problem of channel correlation), and (b) building a channel representation that maximizes contrast differences within each color channel (avoiding the problem of low local contrast). We call this approach more-than-three color coding (MTT) to emphasize the fact that the number of channels is adapted to the image content. The higher the color complexity of an image, the more channels can be used to represent it. Here we select distinctive colors as the most predominant in the image, which we call color pivots, and we build the new color coding strategy using these color pivots as a basis. To evaluate the proposed approach, we measure the efficiency in an image categorization task. We show how a generic descriptor improves performance at the description level when applied to the MTT coding

    Perceptual texture similarity estimation

    Get PDF
    This thesis evaluates the ability of computational features to estimate perceptual texture similarity. In the first part of this thesis, we conducted two evaluation experiments on the ability of 51 computational feature sets to estimate perceptual texture similarity using two differ-ent evaluation methods, namely, pair-of-pairs based and retrieval based evaluations. These experiments compared the computational features to two sets of human derived ground-truth data, both of which are higher resolution than those commonly used. The first was obtained by free-grouping and the second by pair-of-pairs experiments. Using these higher resolution data, we found that the feature sets do not perform well when compared to human judgements. Our analysis shows that these computational feature sets either (1) only exploit power spectrum information or (2) only compute higher order statistics (HoS) on, at most, small local neighbourhoods. In other words, they cannot capture aperiodic, long-range spatial relationships. As we hypothesise that these long-range interactions are important for the human perception of texture similarity we carried out two more pair-of-pairs ex-periments, the results of which indicate that long-range interactions do provide humans with important cues for the perception of texture similarity. In the second part of this thesis we develop new texture features that can encode such data. We first examine the importance of three different types of visual information for human perception of texture. Our results show that contours are the most critical type of information for human discrimination of textures. Finally, we report the development of a new set of contour-based features which performed well on the free-grouping data and outperformed the 51 feature sets and another contour type feature set with the pair-of-pairs data
    corecore