130 research outputs found
Autoencoding the Retrieval Relevance of Medical Images
Content-based image retrieval (CBIR) of medical images is a crucial task that
can contribute to a more reliable diagnosis if applied to big data. Recent
advances in feature extraction and classification have enormously improved CBIR
results for digital images. However, considering the increasing accessibility
of big data in medical imaging, we are still in need of reducing both memory
requirements and computational expenses of image retrieval systems. This work
proposes to exclude the features of image blocks that exhibit a low encoding
error when learned by a autoencoder (). We examine the
histogram of autoendcoding errors of image blocks for each image class to
facilitate the decision which image regions, or roughly what percentage of an
image perhaps, shall be declared relevant for the retrieval task. This leads to
reduction of feature dimensionality and speeds up the retrieval process. To
validate the proposed scheme, we employ local binary patterns (LBP) and support
vector machines (SVM) which are both well-established approaches in CBIR
research community. As well, we use IRMA dataset with 14,410 x-ray images as
test data. The results show that the dimensionality of annotated feature
vectors can be reduced by up to 50% resulting in speedups greater than 27% at
expense of less than 1% decrease in the accuracy of retrieval when validating
the precision and recall of the top 20 hits.Comment: To appear in proceedings of The 5th International Conference on Image
Processing Theory, Tools and Applications (IPTA'15), Nov 10-13, 2015,
Orleans, Franc
MinMax Radon Barcodes for Medical Image Retrieval
Content-based medical image retrieval can support diagnostic decisions by
clinical experts. Examining similar images may provide clues to the expert to
remove uncertainties in his/her final diagnosis. Beyond conventional feature
descriptors, binary features in different ways have been recently proposed to
encode the image content. A recent proposal is "Radon barcodes" that employ
binarized Radon projections to tag/annotate medical images with content-based
binary vectors, called barcodes. In this paper, MinMax Radon barcodes are
introduced which are superior to "local thresholding" scheme suggested in the
literature. Using IRMA dataset with 14,410 x-ray images from 193 different
classes, the advantage of using MinMax Radon barcodes over \emph{thresholded}
Radon barcodes are demonstrated. The retrieval error for direct search drops by
more than 15\%. As well, SURF, as a well-established non-binary approach, and
BRISK, as a recent binary method are examined to compare their results with
MinMax Radon barcodes when retrieving images from IRMA dataset. The results
demonstrate that MinMax Radon barcodes are faster and more accurate when
applied on IRMA images.Comment: To appear in proceedings of the 12th International Symposium on
Visual Computing, December 12-14, 2016, Las Vegas, Nevada, US
Image Area Reduction for Efficient Medical Image Retrieval
Content-based image retrieval (CBIR) has been one of the most active areas in medical image analysis in the last two decades because of the steadily increase in the number of digital images used. Efficient diagnosis and treatment planning can be supported by developing retrieval systems to provide high-quality healthcare. Extensive research has attempted to improve the image retrieval efficiency. The critical factors when searching in large databases are time and storage requirements. In general, although many methods have been suggested to increase accuracy, fast retrieval has been rather sporadically investigated. In this thesis, two different approaches are proposed to reduce both time and space requirements for medical image retrieval. The IRMA data set is used to validate the proposed methods. Both methods utilized Local Binary Pattern (LBP) histogram features which are extracted from 14,410 X-ray images of IRMA dataset. The first method is image folding that operates based on salient regions in an image. Saliency is determined by a context-aware saliency algorithm which includes folding the image. After the folding process, the reduced image area is used to extract multi-block and multi-scale LBP features and to classify these features by multi-class Support vector machine (SVM). The other method consists of classification and distance-based feature similarity. Images are firstly classified into general classes by utilizing LBP features. Subsequently, the retrieval is performed within the class to locate the most similar images. Between the retrieval and classification processes, LBP features are eliminated by employing the error histogram of a shallow (n/p/n) autoencoder to quantify the retrieval relevance of image blocks. If the region is relevant, the autoencoder gives large error for its decoding. Hence, via examining the autoencoder error of image blocks, irrelevant regions can be detected and eliminated. In order to calculate similarity within general classes, the distance between the LBP features of relevant regions is calculated. The results show that the retrieval time can be reduced, and the storage requirements can be lowered without significant decrease in accuracy
Deep Perceptual Similarity is Adaptable to Ambiguous Contexts
The concept of image similarity is ambiguous, meaning that images that are
considered similar in one context might not be in another. This ambiguity
motivates the creation of metrics for specific contexts. This work explores the
ability of the successful deep perceptual similarity (DPS) metrics to adapt to
a given context. Recently, DPS metrics have emerged using the deep features of
neural networks for comparing images. These metrics have been successful on
datasets that leverage the average human perception in limited settings. But
the question remains if they could be adapted to specific contexts of
similarity. No single metric can suit all definitions of similarity and
previous metrics have been rule-based which are labor intensive to rewrite for
new contexts. DPS metrics, on the other hand, use neural networks which might
be retrained for each context. However, retraining networks takes resources and
might ruin performance on previous tasks. This work examines the adaptability
of DPS metrics by training positive scalars for the deep features of pretrained
CNNs to correctly measure similarity for different contexts. Evaluation is
performed on contexts defined by randomly ordering six image distortions (e.g.
rotation) by which should be considered more similar when applied to an image.
This also gives insight into whether the features in the CNN is enough to
discern different distortions without retraining. Finally, the trained metrics
are evaluated on a perceptual similarity dataset to evaluate if adapting to an
ordering affects their performance on established scenarios. The findings show
that DPS metrics can be adapted with high performance. While the adapted
metrics have difficulties with the same contexts as baselines, performance is
improved in 99% of cases. Finally, it is shown that the adaption is not
significantly detrimental to prior performance on perceptual similarity
Mo\^usai: Text-to-Music Generation with Long-Context Latent Diffusion
Recent years have seen the rapid development of large generative models for
text; however, much less research has explored the connection between text and
another "language" of communication -- music. Music, much like text, can convey
emotions, stories, and ideas, and has its own unique structure and syntax. In
our work, we bridge text and music via a text-to-music generation model that is
highly efficient, expressive, and can handle long-term structure. Specifically,
we develop Mo\^usai, a cascading two-stage latent diffusion model that can
generate multiple minutes of high-quality stereo music at 48kHz from textual
descriptions. Moreover, our model features high efficiency, which enables
real-time inference on a single consumer GPU with a reasonable speed. Through
experiments and property analyses, we show our model's competence over a
variety of criteria compared with existing music generation models. Lastly, to
promote the open-source culture, we provide a collection of open-source
libraries with the hope of facilitating future work in the field. We
open-source the following: Codes:
https://github.com/archinetai/audio-diffusion-pytorch; music samples for this
paper: http://bit.ly/44ozWDH; all music samples for all models:
https://bit.ly/audio-diffusion
- …