197 research outputs found
Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models
The advent of large pre-trained models has brought about a paradigm shift in
both visual representation learning and natural language processing. However,
clustering unlabeled images, as a fundamental and classic machine learning
problem, still lacks effective solution, particularly for large-scale datasets.
In this paper, we propose a novel image clustering pipeline that leverages the
powerful feature representation of large pre-trained models such as CLIP and
cluster images effectively and efficiently at scale. We show that the
pre-trained features are significantly more structured by further optimizing
the rate reduction objective. The resulting features may significantly improve
the clustering accuracy, e.g., from 57\% to 66\% on ImageNet-1k. Furthermore,
by leveraging CLIP's image-text binding, we show how the new clustering method
leads to a simple yet effective self-labeling algorithm that successfully works
on unlabeled large datasets such as MS-COCO and LAION-Aesthetics. We will
release the code in https://github.com/LeslieTrue/CPP.Comment: 21 pages, 13 figure
Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)
The implicit objective of the biennial "international - Traveling Workshop on
Interactions between Sparse models and Technology" (iTWIST) is to foster
collaboration between international scientific teams by disseminating ideas
through both specific oral/poster presentations and free discussions. For its
second edition, the iTWIST workshop took place in the medieval and picturesque
town of Namur in Belgium, from Wednesday August 27th till Friday August 29th,
2014. The workshop was conveniently located in "The Arsenal" building within
walking distance of both hotels and town center. iTWIST'14 has gathered about
70 international participants and has featured 9 invited talks, 10 oral
presentations, and 14 posters on the following themes, all related to the
theory, application and generalization of the "sparsity paradigm":
Sparsity-driven data sensing and processing; Union of low dimensional
subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph
sensing/processing; Blind inverse problems and dictionary learning; Sparsity
and computational neuroscience; Information theory, geometry and randomness;
Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?;
Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website:
http://sites.google.com/site/itwist1
An Unsupervised Approach to Modelling Visual Data
For very large visual datasets, producing expert ground-truth data for training supervised algorithms can represent a substantial human effort. In these situations there is scope for the use of unsupervised approaches that can model collections of images and automatically summarise their content. The primary motivation for this thesis comes from the problem of labelling large visual datasets of the seafloor obtained by an Autonomous Underwater Vehicle (AUV) for ecological analysis. It is expensive to label this data, as taxonomical experts for the specific region are required, whereas automatically generated summaries can be used to focus the efforts of experts, and inform decisions on additional sampling. The contributions in this thesis arise from modelling this visual data in entirely unsupervised ways to obtain comprehensive visual summaries. Firstly, popular unsupervised image feature learning approaches are adapted to work with large datasets and unsupervised clustering algorithms. Next, using Bayesian models the performance of rudimentary scene clustering is boosted by sharing clusters between multiple related datasets, such as regular photo albums or AUV surveys. These Bayesian scene clustering models are extended to simultaneously cluster sub-image segments to form unsupervised notions of âobjectsâ within scenes. The frequency distribution of these objects within scenes is used as the scene descriptor for simultaneous scene clustering. Finally, this simultaneous clustering model is extended to make use of whole image descriptors, which encode rudimentary spatial information, as well as object frequency distributions to describe scenes. This is achieved by unifying the previously presented Bayesian clustering models, and in so doing rectifies some of their weaknesses and limitations. Hence, the final contribution of this thesis is a practical unsupervised algorithm for modelling images from the super-pixel to album levels, and is applicable to large datasets
Discovering motion hierarchies via tree-structured coding of trajectories
International audienceThe dynamic content of physical scenes is largely compositional, that is, the movements of the objects and of their parts are hierarchically organised and relate through composition along this hierarchy. This structure also prevails in the apparent 2D motion that a video captures. Accessing this visual motion hierarchy is important to get a better understanding of dynamic scenes and is useful for video manipulation. We propose to capture it through learned, tree-structured sparse coding of point trajectories. We leverage this new representation within an unsupervised clustering scheme to partition hierarchically the trajectories into meaningful groups. We show through experiments on motion capture data that our model is able to extract moving segments along with their organisation. We also present competitive results on the task of segmenting objects in video sequences from trajectories
Towards an Architecture for Efficient Distributed Search of Multimodal Information
The creation of very large-scale multimedia search engines, with more than one billion
images and videos, is a pressing need of digital societies where data is generated by multiple connected devices. Distributing search indexes in cloud environments is the inevitable solution to deal with the increasing scale of image and video collections. The distribution of such indexes in this setting raises multiple challenges such as the even partitioning of data space, load balancing across index nodes and the fusion of the results computed over multiple nodes. The main question behind this thesis is how to reduce and distribute the multimedia retrieval computational complexity?
This thesis studies the extension of sparse hash inverted indexing to distributed settings.
The main goal is to ensure that indexes are uniformly distributed across computing nodes while keeping similar documents on the same nodes. Load balancing is performed at both node and index level, to guarantee that the retrieval process is not delayed by nodes that have to inspect larger subsets of the index.
Multimodal search requires the combination of the search results from individual modalities and document features. This thesis studies rank fusion techniques focused on reducing complexity by automatically selecting only the features that improve retrieval effectiveness.
The achievements of this thesis span both distributed indexing and rank fusion research.
Experiments across multiple datasets show that sparse hashes can be used to distribute documents and queries across index entries in a balanced and redundant manner across nodes. Rank fusion results show that is possible to reduce retrieval complexity and improve efficiency by searching only a subset of the feature indexes
A REVIEW ON MULTIPLE-FEATURE-BASED ADAPTIVE SPARSE REPRESENTATION (MFASR) AND OTHER CLASSIFICATION TYPES
A new technique Multiple-feature-based adaptive sparse representation (MFASR) has been demonstrated for Hyperspectral Images (HSI's) classification. This method involves mainly in four steps at the various stages. The spectral and spatial information reflected from the original Hyperspectral Images with four various features. A shape adaptive (SA) spatial region is obtained in each pixel region at the second step. The algorithm namely sparse representation has applied to get the coefficients of sparse for each shape adaptive region in the form of matrix with multiple features. For each test pixel, the class label is determined with the help of obtained coefficients. The performances of MFASR have much better classification results than other classifiers in the terms of quantitative and qualitative percentage of results. This MFASR will make benefit of strong correlations that are obtained from different extracted features and this make use of effective features and effective adaptive sparse representation. Thus, the very high classification performance was achieved through this MFASR technique
- âŠ