8,033 research outputs found
Automatic Query Image Disambiguation for Content-Based Image Retrieval
Query images presented to content-based image retrieval systems often have
various different interpretations, making it difficult to identify the search
objective pursued by the user. We propose a technique for overcoming this
ambiguity, while keeping the amount of required user interaction at a minimum.
To achieve this, the neighborhood of the query image is divided into coherent
clusters from which the user may choose the relevant ones. A novel feedback
integration technique is then employed to re-rank the entire database with
regard to both the user feedback and the original query. We evaluate our
approach on the publicly available MIRFLICKR-25K dataset, where it leads to a
relative improvement of average precision by 23% over the baseline retrieval,
which does not distinguish between different image senses.Comment: VISAPP 2018 paper, 8 pages, 5 figures. Source code:
https://github.com/cvjena/ai
Spectral gene set enrichment (SGSE)
Motivation: Gene set testing is typically performed in a supervised context
to quantify the association between groups of genes and a clinical phenotype.
In many cases, however, a gene set-based interpretation of genomic data is
desired in the absence of a phenotype variable. Although methods exist for
unsupervised gene set testing, they predominantly compute enrichment relative
to clusters of the genomic variables with performance strongly dependent on the
clustering algorithm and number of clusters. Results: We propose a novel
method, spectral gene set enrichment (SGSE), for unsupervised competitive
testing of the association between gene sets and empirical data sources. SGSE
first computes the statistical association between gene sets and principal
components (PCs) using our principal component gene set enrichment (PCGSE)
method. The overall statistical association between each gene set and the
spectral structure of the data is then computed by combining the PC-level
p-values using the weighted Z-method with weights set to the PC variance scaled
by Tracey-Widom test p-values. Using simulated data, we show that the SGSE
algorithm can accurately recover spectral features from noisy data. To
illustrate the utility of our method on real data, we demonstrate the superior
performance of the SGSE method relative to standard cluster-based techniques
for testing the association between MSigDB gene sets and the variance structure
of microarray gene expression data. Availability:
http://cran.r-project.org/web/packages/PCGSE/index.html Contact:
[email protected] or [email protected]
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
Caching Improvement Using Adaptive User Clustering
In this article we explore one of the most promising technologies for 5G
wireless networks using an underlay small cell network, namely proactive
caching. Using the increase in storage technologies and through studying the
users behavior, peak traffic can be reduced through proactive caching of the
content that is most probable to be requested. We propose a new method, in
which, instead of caching the most popular content, the users within the
network are clustered according to their content popularity and the caching is
done accordingly. We present also a method for estimating the number of
clusters within the network based on the Akaike information criterion. We
analytically derive a closed form expression of the hit probability and we
propose an optimization problem in which the small base stations association
with clusters is optimized
Manifold Learning in MR spectroscopy using nonlinear dimensionality reduction and unsupervised clustering
Purpose To investigate whether nonlinear dimensionality reduction improves unsupervised classification of 1H MRS brain tumor data compared with a linear method. Methods In vivo single-voxel 1H magnetic resonance spectroscopy (55 patients) and 1H magnetic resonance spectroscopy imaging (MRSI) (29 patients) data were acquired from histopathologically diagnosed gliomas. Data reduction using Laplacian eigenmaps (LE) or independent component analysis (ICA) was followed by k-means clustering or agglomerative hierarchical clustering (AHC) for unsupervised learning to assess tumor grade and for tissue type segmentation of MRSI data. Results An accuracy of 93% in classification of glioma grade II and grade IV, with 100% accuracy in distinguishing tumor and normal spectra, was obtained by LE with unsupervised clustering, but not with the combination of k-means and ICA. With 1H MRSI data, LE provided a more linear distribution of data for cluster analysis and better cluster stability than ICA. LE combined with k-means or AHC provided 91% accuracy for classifying tumor grade and 100% accuracy for identifying normal tissue voxels. Color-coded visualization of normal brain, tumor core, and infiltration regions was achieved with LE combined with AHC. Conclusion Purpose To investigate whether nonlinear dimensionality reduction improves unsupervised classification of 1H MRS brain tumor data compared with a linear method. Methods In vivo single-voxel 1H magnetic resonance spectroscopy (55 patients) and 1H magnetic resonance spectroscopy imaging (MRSI) (29 patients) data were acquired from histopathologically diagnosed gliomas. Data reduction using Laplacian eigenmaps (LE) or independent component analysis (ICA) was followed by k-means clustering or agglomerative hierarchical clustering (AHC) for unsupervised learning to assess tumor grade and for tissue type segmentation of MRSI data. Results An accuracy of 93% in classification of glioma grade II and grade IV, with 100% accuracy in distinguishing tumor and normal spectra, was obtained by LE with unsupervised clustering, but not with the combination of k-means and ICA. With 1H MRSI data, LE provided a more linear distribution of data for cluster analysis and better cluster stability than ICA. LE combined with k-means or AHC provided 91% accuracy for classifying tumor grade and 100% accuracy for identifying normal tissue voxels. Color-coded visualization of normal brain, tumor core, and infiltration regions was achieved with LE combined with AHC. Conclusion The LE method is promising for unsupervised clustering to separate brain and tumor tissue with automated color-coding for visualization of 1H MRSI data after cluster analysis
- …