217 research outputs found
Memorable Maps: A Framework for Re-defining Places in Visual Place Recognition
This paper presents a cognition-inspired agnostic framework for building a map for Visual Place Recognition. This framework draws inspiration from human-memorability, utilizes the traditional image entropy concept and computes the static content in an image; thereby presenting a tri-folded criteria to assess the `memorability' of an image for visual place recognition. A dataset namely `ESSEX3IN1' is created, composed of highly confusing images from indoor, outdoor and natural scenes for analysis. When used in conjunction with state-of-the-art visual place recognition methods, the proposed framework provides significant performance boost to these techniques, as evidenced by results on ESSEX3IN1 and other public datasets
What Makes Natural Scene Memorable?
Recent studies on image memorability have shed light on the visual features
that make generic images, object images or face photographs memorable. However,
a clear understanding and reliable estimation of natural scene memorability
remain elusive. In this paper, we provide an attempt to answer: "what exactly
makes natural scene memorable". Specifically, we first build LNSIM, a
large-scale natural scene image memorability database (containing 2,632 images
and memorability annotations). Then, we mine our database to investigate how
low-, middle- and high-level handcrafted features affect the memorability of
natural scene. In particular, we find that high-level feature of scene category
is rather correlated with natural scene memorability. Thus, we propose a deep
neural network based natural scene memorability (DeepNSM) predictor, which
takes advantage of scene category. Finally, the experimental results validate
the effectiveness of DeepNSM.Comment: Accepted to ACM MM Workshop
Memorable Maps: A Framework for Re-defining Places in Visual Place Recognition
This paper presents a cognition-inspired agnostic framework for building a map for Visual Place Recognition. This framework draws inspiration from human-memorability, utilizes the traditional image entropy concept and computes the static content in an image; thereby presenting a tri-folded criterion to assess the 'memorability' of an image for visual place recognition. A dataset namely 'ESSEX3IN1' is created, composed of highly confusing images from indoor, outdoor and natural scenes for analysis. When used in conjunction with state-of-the-art visual place recognition methods, the proposed framework provides significant performance boost to these techniques, as evidenced by results on ESSEX3IN1 and other public datasets
Visual Place Recognition for Autonomous Robots
Autonomous robotics has been the subject of great interest within the research community over the past few decades. Its applications are wide-spread, ranging from health-care to manufacturing, goods transportation to home deliveries, site-maintenance to construction, planetary explorations to rescue operations and many others, including but not limited to agriculture, defence, commerce, leisure and extreme environments. At the core of robot autonomy lies the problem of localisation, i.e, knowing where it is and within the robotics community, this problem is termed as place recognition. Place recognition using only visual input is termed as Visual Place Recognition (VPR) and refers to the ability of an autonomous system to recall a previously visited place using only visual input, under changing viewpoint, illumination and seasonal conditions, and given computational and storage constraints.
This thesis is a collection of 4 inter-linked, mutually-relevant but branching-out topics within VPR: 1) What makes a place/image worthy for VPR?, 2) How to define a state-of-the-art in VPR?, 3) Do VPR techniques designed for ground-based platforms extend to aerial platforms? and 4) Can a handcrafted VPR technique outperform deep-learning-based VPR techniques? Each of these questions is a dedicated, peer-reviewed chapter in this thesis and the author attempts to answer these questions to the best of his abilities.
The worthiness of a place essentially refers to the salience and distinctiveness of the content in the image of this place. This salience is modelled as a framework, namely memorable-maps, comprising of 3 conjoint criteria: a) Human-memorability of an image, 2) Staticity and 3) Information content. Because a large number of VPR techniques have been proposed over the past 10-15 years, and due to the variation of employed VPR datasets and metrics for evaluation, the correct state-of-the-art remains ambiguous. The author levels this playing field by deploying 10 contemporary techniques on a common platform and use the most challenging VPR datasets to provide a holistic performance comparison. This platform is then extended to aerial place recognition datasets to answer the 3rd question above. Finally, the author designs a novel, handcrafted, compute-efficient and training-free VPR technique that outperforms state-of-the-art VPR techniques on 5 different VPR datasets
Long-Term Memorability On Advertisements
Marketers spend billions of dollars on advertisements but to what end? At the
purchase time, if customers cannot recognize a brand for which they saw an ad,
the money spent on the ad is essentially wasted. Despite its importance in
marketing, until now, there has been no study on the memorability of ads in the
ML literature. Most studies have been conducted on short-term recall (<5 mins)
on specific content types like object and action videos. On the other hand, the
advertising industry only cares about long-term memorability (a few hours or
longer), and advertisements are almost always highly multimodal, depicting a
story through its different modalities (text, images, and videos). With this
motivation, we conduct the first large scale memorability study consisting of
1203 participants and 2205 ads covering 276 brands. Running statistical tests
over different participant subpopulations and ad-types, we find many
interesting insights into what makes an ad memorable - both content and human
factors. For example, we find that brands which use commercials with fast
moving scenes are more memorable than those with slower scenes (p=8e-10) and
that people who use ad-blockers remember lower number of ads than those who
don't (p=5e-3). Further, with the motivation of simulating the memorability of
marketing materials for a particular audience, ultimately helping create one,
we present a novel model, Sharingan, trained to leverage real-world knowledge
of LLMs and visual knowledge of visual encoders to predict the memorability of
a content. We test our model on all the prominent memorability datasets in
literature (both images and videos) and achieve state of the art across all of
them. We conduct extensive ablation studies across memory types, modality,
brand, and architectural choices to find insights into what drives memory
- …