137 research outputs found
Accessibility-based reranking in multimedia search engines
Traditional multimedia search engines retrieve results based mostly on the query submitted by the user, or using a log of previous searches to provide personalized results, while not considering the accessibility of the results for users with vision or other types of impairments. In this paper, a novel approach is presented which incorporates the accessibility of images for users with various vision impairments, such as color blindness, cataract and glaucoma, in order to rerank the results of an image search engine. The accessibility of individual images is measured through the use of vision simulation filters. Multi-objective optimization techniques utilizing the image accessibility scores are used to handle users with multiple vision impairments, while the impairment profile of a specific user is used to select one from the Pareto-optimal solutions. The proposed approach has been tested with two image datasets, using both simulated and real impaired users, and the results verify its applicability. Although the proposed method has been used for vision accessibility-based reranking, it can also be extended for other types of personalization context
A Review on Video Search Engine Ranking
Search reranking is considered as a best and basic approach to enhance recovery accuracy. The recordings are recovered utilizing the related literary data, for example, encompassing content from the website page. The execution of such frameworks basically depends on the importance between the content and the recordings. In any case, they may not generally coordinate all around ok, which causes boisterous positioning results. For example, outwardly comparative recordings may have altogether different positions. So reranking has been proposed to tackle the issue. Video reranking, as a compelling approach to enhance the consequences of electronic video look however the issue is not paltry particularly when we are thinking about different elements or modalities for pursuit in video and video recovery. This paper proposes another sort of reranking calculation, the round reranking, that backings the common trade of data over numerous modalities for enhancing seek execution and takes after the rationality of solid performing methodology could gain from weaker ones
Learning Object Categories From Internet Image Searches
In this paper, we describe a simple approach to learning models of visual object categories from images gathered from Internet image search engines. The images for a given keyword are typically highly variable, with a large fraction being unrelated to the query term, and thus pose a challenging environment from which to learn. By training our models directly from Internet images, we remove the need to laboriously compile training data sets, required by most other recognition approaches-this opens up the possibility of learning object category models “on-the-fly.” We describe two simple approaches, derived from the probabilistic latent semantic analysis (pLSA) technique for text document analysis, that can be used to automatically learn object models from these data. We show two applications of the learned model: first, to rerank the images returned by the search engine, thus improving the quality of the search engine; and second, to recognize objects in other image data sets
Recommended from our members
Understanding of Visual Domains via the Lens of Natural Language
A joint understanding of vision and language can enable intelligent systems to perceive, act, and communicate with humans for a wide range of applications. For example, they can assist a human to navigate in an environment, edit the content of an image through natural language commands, or search through image collections using natural language queries. In this thesis, we aim to improve our understanding of visual domains through the lens of natural language. We specifically look into (1) images of categories within a fine-grained taxonomy such as species of birds or variants of aircraft, (2) images of textures that describe local color, shape, and patterns, and (3) regions in images that correspond to objects, materials, and textures.
In one line of work, we investigate ways to discover a domain-specific language by asking annotators to describe visual differences between instances within a fine-grained taxonomy. We show that a system trained to describe these differences leads to an accurate and interpretable basis for categorization. In another line of work, we investigate the effectiveness of language and vision models for describing textures, a problem that, despite the ubiquity of textures, has not been sufficiently studied in the literature. Textures are diverse, yet their local nature allows for the description of appearance of a wide range of visual categories. The locality also allows us to systematically generate synthetic variations to investigate how disentangled visual representations are for properties such as shape, color, and figure-ground segmentation. Finally, instead of modeling an image as a whole, we design a system that allows descriptions of regions within an image. A challenge is to handle the long-tail distribution of names and appearances of concepts within natural scenes. We design a modular framework that integrates object detection, semantic segmentation, and contextual reasoning with language that leads to better performance. In addition to methods and analysis, we contribute datasets and benchmarks to evaluate the performance of models in each of these domains.
The availability of large-scale pre-trained models for vision (e.g., ResNet) and language (e.g., BERT) have catalyzed improvements and novel applications in computer vision and natural language processing, but until recently similar models that could jointly reason about language and vision were not available. This has changed through the availability of models such as CLIP, which have been trained on a massive number of images with associated texts. Therefore, we analyze the effectiveness of CLIP-based representations for tasks posed in our earlier work. By comparing and contrasting these with domain-specific ones we presented in the earlier chapters, we shed some light on the nature of the learned representations and the biases they encode
Image Retrieval based on Bag-of-Words model
This article gives a survey for bag-of-words (BoW) or bag-of-features model
in image retrieval system. In recent years, large-scale image retrieval shows
significant potential in both industry applications and research problems. As
local descriptors like SIFT demonstrate great discriminative power in solving
vision problems like object recognition, image classification and annotation,
more and more state-of-the-art large scale image retrieval systems are trying
to rely on them. A common way to achieve this is first quantizing local
descriptors into visual words, and then applying scalable textual indexing and
retrieval schemes. We call this model as bag-of-words or bag-of-features model.
The goal of this survey is to give an overview of this model and introduce
different strategies when building the system based on this model
- …