130 research outputs found

    Autoencoding the Retrieval Relevance of Medical Images

    Full text link
    Content-based image retrieval (CBIR) of medical images is a crucial task that can contribute to a more reliable diagnosis if applied to big data. Recent advances in feature extraction and classification have enormously improved CBIR results for digital images. However, considering the increasing accessibility of big data in medical imaging, we are still in need of reducing both memory requirements and computational expenses of image retrieval systems. This work proposes to exclude the features of image blocks that exhibit a low encoding error when learned by a n/p/nn/p/n autoencoder (p ⁣< ⁣np\!<\!n). We examine the histogram of autoendcoding errors of image blocks for each image class to facilitate the decision which image regions, or roughly what percentage of an image perhaps, shall be declared relevant for the retrieval task. This leads to reduction of feature dimensionality and speeds up the retrieval process. To validate the proposed scheme, we employ local binary patterns (LBP) and support vector machines (SVM) which are both well-established approaches in CBIR research community. As well, we use IRMA dataset with 14,410 x-ray images as test data. The results show that the dimensionality of annotated feature vectors can be reduced by up to 50% resulting in speedups greater than 27% at expense of less than 1% decrease in the accuracy of retrieval when validating the precision and recall of the top 20 hits.Comment: To appear in proceedings of The 5th International Conference on Image Processing Theory, Tools and Applications (IPTA'15), Nov 10-13, 2015, Orleans, Franc

    MinMax Radon Barcodes for Medical Image Retrieval

    Full text link
    Content-based medical image retrieval can support diagnostic decisions by clinical experts. Examining similar images may provide clues to the expert to remove uncertainties in his/her final diagnosis. Beyond conventional feature descriptors, binary features in different ways have been recently proposed to encode the image content. A recent proposal is "Radon barcodes" that employ binarized Radon projections to tag/annotate medical images with content-based binary vectors, called barcodes. In this paper, MinMax Radon barcodes are introduced which are superior to "local thresholding" scheme suggested in the literature. Using IRMA dataset with 14,410 x-ray images from 193 different classes, the advantage of using MinMax Radon barcodes over \emph{thresholded} Radon barcodes are demonstrated. The retrieval error for direct search drops by more than 15\%. As well, SURF, as a well-established non-binary approach, and BRISK, as a recent binary method are examined to compare their results with MinMax Radon barcodes when retrieving images from IRMA dataset. The results demonstrate that MinMax Radon barcodes are faster and more accurate when applied on IRMA images.Comment: To appear in proceedings of the 12th International Symposium on Visual Computing, December 12-14, 2016, Las Vegas, Nevada, US

    Image Area Reduction for Efficient Medical Image Retrieval

    Get PDF
    Content-based image retrieval (CBIR) has been one of the most active areas in medical image analysis in the last two decades because of the steadily increase in the number of digital images used. Efficient diagnosis and treatment planning can be supported by developing retrieval systems to provide high-quality healthcare. Extensive research has attempted to improve the image retrieval efficiency. The critical factors when searching in large databases are time and storage requirements. In general, although many methods have been suggested to increase accuracy, fast retrieval has been rather sporadically investigated. In this thesis, two different approaches are proposed to reduce both time and space requirements for medical image retrieval. The IRMA data set is used to validate the proposed methods. Both methods utilized Local Binary Pattern (LBP) histogram features which are extracted from 14,410 X-ray images of IRMA dataset. The first method is image folding that operates based on salient regions in an image. Saliency is determined by a context-aware saliency algorithm which includes folding the image. After the folding process, the reduced image area is used to extract multi-block and multi-scale LBP features and to classify these features by multi-class Support vector machine (SVM). The other method consists of classification and distance-based feature similarity. Images are firstly classified into general classes by utilizing LBP features. Subsequently, the retrieval is performed within the class to locate the most similar images. Between the retrieval and classification processes, LBP features are eliminated by employing the error histogram of a shallow (n/p/n) autoencoder to quantify the retrieval relevance of image blocks. If the region is relevant, the autoencoder gives large error for its decoding. Hence, via examining the autoencoder error of image blocks, irrelevant regions can be detected and eliminated. In order to calculate similarity within general classes, the distance between the LBP features of relevant regions is calculated. The results show that the retrieval time can be reduced, and the storage requirements can be lowered without significant decrease in accuracy

    Deep Perceptual Similarity is Adaptable to Ambiguous Contexts

    Full text link
    The concept of image similarity is ambiguous, meaning that images that are considered similar in one context might not be in another. This ambiguity motivates the creation of metrics for specific contexts. This work explores the ability of the successful deep perceptual similarity (DPS) metrics to adapt to a given context. Recently, DPS metrics have emerged using the deep features of neural networks for comparing images. These metrics have been successful on datasets that leverage the average human perception in limited settings. But the question remains if they could be adapted to specific contexts of similarity. No single metric can suit all definitions of similarity and previous metrics have been rule-based which are labor intensive to rewrite for new contexts. DPS metrics, on the other hand, use neural networks which might be retrained for each context. However, retraining networks takes resources and might ruin performance on previous tasks. This work examines the adaptability of DPS metrics by training positive scalars for the deep features of pretrained CNNs to correctly measure similarity for different contexts. Evaluation is performed on contexts defined by randomly ordering six image distortions (e.g. rotation) by which should be considered more similar when applied to an image. This also gives insight into whether the features in the CNN is enough to discern different distortions without retraining. Finally, the trained metrics are evaluated on a perceptual similarity dataset to evaluate if adapting to an ordering affects their performance on established scenarios. The findings show that DPS metrics can be adapted with high performance. While the adapted metrics have difficulties with the same contexts as baselines, performance is improved in 99% of cases. Finally, it is shown that the adaption is not significantly detrimental to prior performance on perceptual similarity

    Mo\^usai: Text-to-Music Generation with Long-Context Latent Diffusion

    Full text link
    Recent years have seen the rapid development of large generative models for text; however, much less research has explored the connection between text and another "language" of communication -- music. Music, much like text, can convey emotions, stories, and ideas, and has its own unique structure and syntax. In our work, we bridge text and music via a text-to-music generation model that is highly efficient, expressive, and can handle long-term structure. Specifically, we develop Mo\^usai, a cascading two-stage latent diffusion model that can generate multiple minutes of high-quality stereo music at 48kHz from textual descriptions. Moreover, our model features high efficiency, which enables real-time inference on a single consumer GPU with a reasonable speed. Through experiments and property analyses, we show our model's competence over a variety of criteria compared with existing music generation models. Lastly, to promote the open-source culture, we provide a collection of open-source libraries with the hope of facilitating future work in the field. We open-source the following: Codes: https://github.com/archinetai/audio-diffusion-pytorch; music samples for this paper: http://bit.ly/44ozWDH; all music samples for all models: https://bit.ly/audio-diffusion