10 research outputs found
Zero-Shot Multi-View Indoor Localization via Graph Location Networks
Indoor localization is a fundamental problem in location-based applications.
Current approaches to this problem typically rely on Radio Frequency
technology, which requires not only supporting infrastructures but human
efforts to measure and calibrate the signal. Moreover, data collection for all
locations is indispensable in existing methods, which in turn hinders their
large-scale deployment. In this paper, we propose a novel neural network based
architecture Graph Location Networks (GLN) to perform infrastructure-free,
multi-view image based indoor localization. GLN makes location predictions
based on robust location representations extracted from images through
message-passing networks. Furthermore, we introduce a novel zero-shot indoor
localization setting and tackle it by extending the proposed GLN to a dedicated
zero-shot version, which exploits a novel mechanism Map2Vec to train
location-aware embeddings and make predictions on novel unseen locations. Our
extensive experiments show that the proposed approach outperforms
state-of-the-art methods in the standard setting, and achieves promising
accuracy even in the zero-shot setting where data for half of the locations are
not available. The source code and datasets are publicly available at
https://github.com/coldmanck/zero-shot-indoor-localization-release.Comment: Accepted at ACM MM 2020. 10 pages, 7 figures. Code and datasets
available at
https://github.com/coldmanck/zero-shot-indoor-localization-releas
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
The task of open-vocabulary object-centric image retrieval involves the
retrieval of images containing a specified object of interest, delineated by an
open-set text query. As working on large image datasets becomes standard,
solving this task efficiently has gained significant practical importance.
Applications include targeted performance analysis of retrieved images using
ad-hoc queries and hard example mining during training. Recent advancements in
contrastive-based open vocabulary systems have yielded remarkable
breakthroughs, facilitating large-scale open vocabulary image retrieval.
However, these approaches use a single global embedding per image, thereby
constraining the system's ability to retrieve images containing relatively
small object instances. Alternatively, incorporating local embeddings from
detection pipelines faces scalability challenges, making it unsuitable for
retrieval from large databases.
In this work, we present a simple yet effective approach to object-centric
open-vocabulary image retrieval. Our approach aggregates dense embeddings
extracted from CLIP into a compact representation, essentially combining the
scalability of image retrieval pipelines with the object identification
capabilities of dense detection methods. We show the effectiveness of our
scheme to the task by achieving significantly better results than global
feature approaches on three datasets, increasing accuracy by up to 15 mAP
points. We further integrate our scheme into a large scale retrieval framework
and demonstrate our method's advantages in terms of scalability and
interpretability.Comment: BMVC 202
Collaborative Recommendation Model Based on Multi-modal Multi-view Attention Network: Movie and literature cases
The existing collaborative recommendation models that use multi-modal
information emphasize the representation of users' preferences but easily
ignore the representation of users' dislikes. Nevertheless, modelling users'
dislikes facilitates comprehensively characterizing user profiles. Thus, the
representation of users' dislikes should be integrated into the user modelling
when we construct a collaborative recommendation model. In this paper, we
propose a novel Collaborative Recommendation Model based on Multi-modal
multi-view Attention Network (CRMMAN), in which the users are represented from
both preference and dislike views. Specifically, the users' historical
interactions are divided into positive and negative interactions, used to model
the user's preference and dislike views, respectively. Furthermore, the
semantic and structural information extracted from the scene is employed to
enrich the item representation. We validate CRMMAN by designing contrast
experiments based on two benchmark MovieLens-1M and Book-Crossing datasets.
Movielens-1m has about a million ratings, and Book-Crossing has about 300,000
ratings. Compared with the state-of-the-art knowledge-graph-based and
multi-modal recommendation methods, the AUC, NDCG@5 and NDCG@10 are improved by
2.08%, 2.20% and 2.26% on average of two datasets. We also conduct controlled
experiments to explore the effects of multi-modal information and multi-view
mechanism. The experimental results show that both of them enhance the model's
performance
Privacy Intelligence: A Survey on Image Sharing on Online Social Networks
Image sharing on online social networks (OSNs) has become an indispensable
part of daily social activities, but it has also led to an increased risk of
privacy invasion. The recent image leaks from popular OSN services and the
abuse of personal photos using advanced algorithms (e.g. DeepFake) have
prompted the public to rethink individual privacy needs when sharing images on
OSNs. However, OSN image sharing itself is relatively complicated, and systems
currently in place to manage privacy in practice are labor-intensive yet fail
to provide personalized, accurate and flexible privacy protection. As a result,
an more intelligent environment for privacy-friendly OSN image sharing is in
demand. To fill the gap, we contribute a systematic survey of 'privacy
intelligence' solutions that target modern privacy issues related to OSN image
sharing. Specifically, we present a high-level analysis framework based on the
entire lifecycle of OSN image sharing to address the various privacy issues and
solutions facing this interdisciplinary field. The framework is divided into
three main stages: local management, online management and social experience.
At each stage, we identify typical sharing-related user behaviors, the privacy
issues generated by those behaviors, and review representative intelligent
solutions. The resulting analysis describes an intelligent privacy-enhancing
chain for closed-loop privacy management. We also discuss the challenges and
future directions existing at each stage, as well as in publicly available
datasets.Comment: 32 pages, 9 figures. Under revie