30,729 research outputs found

    Exploiting multimedia content : a machine learning based approach

    Get PDF
    Advisors: Prof. M Gopal, Prof. Santanu Chaudhury. Date and location of PhD thesis defense: 10 September 2013, Indian Institute of Technology DelhiThis thesis explores use of machine learning for multimedia content management involving single/multiple features, modalities and concepts. We introduce shape based feature for binary patterns and apply it for recognition and retrieval application in single and multiple feature based architecture. The multiple feature based recognition and retrieval frameworks are based on the theory of multiple kernel learning (MKL). A binary pattern recognition framework is presented by combining the binary MKL classifiers using a decision directed acyclic graph. The evaluation is shown for Indian script character recognition, and MPEG7 shape symbol recognition. A word image based document indexing framework is presented using the distance based hashing (DBH) defined on learned pivot centres. We use a new multi-kernel learning scheme using a Genetic Algorithm for developing a kernel DBH based document image retrieval system. The experimental evaluation is presented on document collections of Devanagari, Bengali and English scripts. Next, methods for document retrieval using multi-modal information fusion are presented. Text/Graphics segmentation framework is presented for documents having a complex layout. We present a novel multi-modal document retrieval framework using the segmented regions. The approach is evaluated on English magazine pages. A document script identification framework is presented using decision level aggregation of page, paragraph and word level prediction. Latent Dirichlet Allocation based topic modelling with modified edit distance is introduced for the retrieval of documents having recognition inaccuracies. A multi-modal indexing framework for such documents is presented by a learning based combination of text and image based properties. Experimental results are shown on Devanagari script documents. Finally, we have investigated concept based approaches for multimedia analysis. A multi-modal document retrieval framework is presented by combining the generative and discriminative modelling for exploiting the cross-modal correlation between modalities. The combination is also explored for semantic concept recognition using multi-modal components of the same document, and different documents over a collection. An experimental evaluation of the framework is shown for semantic event detection in sport videos, and semantic labelling of components of multi-modal document images

    The application of user log for online business environment using content-based Image retrieval system

    Get PDF
    Over the past few years, inter-query learning has gained much attention in the research and development of content-based image retrieval (CBIR) systems. This is largely due to the capability of inter-query approach to enable learning from the retrieval patterns of previous query sessions. However, much of the research works in this field have been focusing on analyzing image retrieval patterns stored in the database. This is not suitable for a dynamic environment such as the World Wide Web (WWW) where images are constantly added or removed. A better alternative is to use an image's visual features to capture the knowledge gained from the previous query sessions. Based on the previous work (Chung et al., 2006), the aim of this paper is to propose a framework of inter-query learning for the WWW-CBIR systems. Such framework can be extremely useful for those online companies whose core business involves providing multimedia content-based services and products to their customers

    Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm

    Get PDF
    In recent years, a variety of relevance feedback (RF) schemes have been developed to improve the performance of content-based image retrieval (CBIR). Given user feedback information, the key to a RF scheme is how to select a subset of image features to construct a suitable dissimilarity measure. Among various RF schemes, biased discriminant analysis (BDA) based RF is one of the most promising. It is based on the observation that all positive samples are alike, while in general each negative sample is negative in its own way. However, to use BDA, the small sample size (SSS) problem is a big challenge, as users tend to give a small number of feedback samples. To explore solutions to this issue, this paper proposes a direct kernel BDA (DKBDA), which is less sensitive to SSS. An incremental DKBDA (IDKBDA) is also developed to speed up the analysis. Experimental results are reported on a real-world image collection to demonstrate that the proposed methods outperform the traditional kernel BDA (KBDA) and the support vector machine (SVM) based RF algorithms

    Convolutional Sparse Kernel Network for Unsupervised Medical Image Analysis

    Full text link
    The availability of large-scale annotated image datasets and recent advances in supervised deep learning methods enable the end-to-end derivation of representative image features that can impact a variety of image analysis problems. Such supervised approaches, however, are difficult to implement in the medical domain where large volumes of labelled data are difficult to obtain due to the complexity of manual annotation and inter- and intra-observer variability in label assignment. We propose a new convolutional sparse kernel network (CSKN), which is a hierarchical unsupervised feature learning framework that addresses the challenge of learning representative visual features in medical image analysis domains where there is a lack of annotated training data. Our framework has three contributions: (i) We extend kernel learning to identify and represent invariant features across image sub-patches in an unsupervised manner. (ii) We initialise our kernel learning with a layer-wise pre-training scheme that leverages the sparsity inherent in medical images to extract initial discriminative features. (iii) We adapt a multi-scale spatial pyramid pooling (SPP) framework to capture subtle geometric differences between learned visual features. We evaluated our framework in medical image retrieval and classification on three public datasets. Our results show that our CSKN had better accuracy when compared to other conventional unsupervised methods and comparable accuracy to methods that used state-of-the-art supervised convolutional neural networks (CNNs). Our findings indicate that our unsupervised CSKN provides an opportunity to leverage unannotated big data in medical imaging repositories.Comment: Accepted by Medical Image Analysis (with a new title 'Convolutional Sparse Kernel Network for Unsupervised Medical Image Analysis'). The manuscript is available from following link (https://doi.org/10.1016/j.media.2019.06.005

    Multitraining support vector machine for image retrieval

    Get PDF
    Relevance feedback (RF) schemes based on support vector machines (SVMs) have been widely used in content-based image retrieval (CBIR). However, the performance of SVM-based RF approaches is often poor when the number of labeled feedback samples is small. This is mainly due to 1) the SVM classifier being unstable for small-size training sets because its optimal hyper plane is too sensitive to the training examples; and 2) the kernel method being ineffective because the feature dimension is much greater than the size of the training samples. In this paper, we develop a new machine learning technique, multitraining SVM (MTSVM), which combines the merits of the cotraining technique and a random sampling method in the feature space. Based on the proposed MTSVM algorithm, the above two problems can be mitigated. Experiments are carried out on a large image set of some 20 000 images, and the preliminary results demonstrate that the developed method consistently improves the performance over conventional SVM-based RFs in terms of precision and standard deviation, which are used to evaluate the effectiveness and robustness of a RF algorithm, respectively
    corecore