413 research outputs found
How low can you go? Privacy-preserving people detection with an omni-directional camera
In this work, we use a ceiling-mounted omni-directional camera to detect
people in a room. This can be used as a sensor to measure the occupancy of
meeting rooms and count the amount of flex-desk working spaces available. If
these devices can be integrated in an embedded low-power sensor, it would form
an ideal extension of automated room reservation systems in office
environments. The main challenge we target here is ensuring the privacy of the
people filmed. The approach we propose is going to extremely low image
resolutions, such that it is impossible to recognise people or read potentially
confidential documents. Therefore, we retrained a single-shot low-resolution
person detection network with automatically generated ground truth. In this
paper, we prove the functionality of this approach and explore how low we can
go in resolution, to determine the optimal trade-off between recognition
accuracy and privacy preservation. Because of the low resolution, the result is
a lightweight network that can potentially be deployed on embedded hardware.
Such embedded implementation enables the development of a decentralised smart
camera which only outputs the required meta-data (i.e. the number of persons in
the meeting room)
PlaNet - Photo Geolocation with Convolutional Neural Networks
Is it possible to build a system to determine the location where a photo was
taken using just its pixels? In general, the problem seems exceptionally
difficult: it is trivial to construct situations where no location can be
inferred. Yet images often contain informative cues such as landmarks, weather
patterns, vegetation, road markings, and architectural details, which in
combination may allow one to determine an approximate location and occasionally
an exact location. Websites such as GeoGuessr and View from your Window suggest
that humans are relatively good at integrating these cues to geolocate images,
especially en-masse. In computer vision, the photo geolocation problem is
usually approached using image retrieval methods. In contrast, we pose the
problem as one of classification by subdividing the surface of the earth into
thousands of multi-scale geographic cells, and train a deep network using
millions of geotagged images. While previous approaches only recognize
landmarks or perform approximate matching using global image descriptors, our
model is able to use and integrate multiple visible cues. We show that the
resulting model, called PlaNet, outperforms previous approaches and even
attains superhuman levels of accuracy in some cases. Moreover, we extend our
model to photo albums by combining it with a long short-term memory (LSTM)
architecture. By learning to exploit temporal coherence to geolocate uncertain
photos, we demonstrate that this model achieves a 50% performance improvement
over the single-image model
On Correlated Knowledge Distillation for Monitoring Human Pose with Radios
In this work, we propose and develop a simple experimental testbed to study
the feasibility of a novel idea by coupling radio frequency (RF) sensing
technology with Correlated Knowledge Distillation (CKD) theory towards
designing lightweight, near real-time and precise human pose monitoring
systems. The proposed CKD framework transfers and fuses pose knowledge from a
robust "Teacher" model to a parameterized "Student" model, which can be a
promising technique for obtaining accurate yet lightweight pose estimates. To
assure its efficacy, we implemented CKD for distilling logits in our integrated
Software Defined Radio (SDR)-based experimental setup and investigated the
RF-visual signal correlation. Our CKD-RF sensing technique is characterized by
two modes -- a camera-fed Teacher Class Network (e.g., images, videos) with an
SDR-fed Student Class Network (e.g., RF signals). Specifically, our CKD model
trains a dual multi-branch teacher and student network by distilling and fusing
knowledge bases. The resulting CKD models are then subsequently used to
identify the multimodal correlation and teach the student branch in reverse.
Instead of simply aggregating their learnings, CKD training comprised multiple
parallel transformations with the two domains, i.e., visual images and RF
signals. Once trained, our CKD model can efficiently preserve privacy and
utilize the multimodal correlated logits from the two different neural networks
for estimating poses without using visual signals/video frames (by using only
the RF signals)
Visual Place Recognition under Severe Viewpoint and Appearance Changes
Over the last decade, the eagerness of the robotic and computer vision research communities unfolded extensive advancements in long-term robotic vision. Visual localization is the constituent of this active research domain; an ability of an object to correctly localize itself while mapping the environment simultaneously, technically termed as Simultaneous Localization and Mapping (SLAM).
Visual Place Recognition (VPR), a core component of SLAM is a well-known paradigm. In layman terms, at a certain place/location within an environment, a robot needs to decide whether it’s the same place experienced before? Visual Place Recognition utilizing Convolutional Neural Networks (CNNs) has made a major contribution in the last few years. However, the image retrieval-based VPR becomes more challenging when the same places experience strong viewpoint and seasonal transitions. This thesis concentrates on improving the retrieval performance of VPR system, generally targeting the place correspondence.
Despite the remarkable performances of state-of-the-art deep CNNs for VPR, the significant computation- and memory-overhead limit their practical deployment for resource constrained mobile robots. This thesis investigates the utility of shallow CNNs for power-efficient VPR applications. The proposed VPR frameworks focus on novel image regions that can contribute in recognizing places under dubious environment and viewpoint variations.
Employing challenging place recognition benchmark datasets, this thesis further illustrates and evaluates the robustness of shallow CNN-based regional features against viewpoint and appearance changes coupled with dynamic instances, such as pedestrians, vehicles etc. Finally, the presented computation-efficient and light-weight VPR methodologies have shown boostup in matching performance in terms of Area under Precision-Recall curves (AUC-PR curves) over state-of-the-art deep neural network based place recognition and SLAM algorithms
- …