2,668 research outputs found
Ono: an open platform for social robotics
In recent times, the focal point of research in robotics has shifted from industrial ro- bots toward robots that interact with humans in an intuitive and safe manner. This evolution has resulted in the subfield of social robotics, which pertains to robots that function in a human environment and that can communicate with humans in an int- uitive way, e.g. with facial expressions. Social robots have the potential to impact many different aspects of our lives, but one particularly promising application is the use of robots in therapy, such as the treatment of children with autism. Unfortunately, many of the existing social robots are neither suited for practical use in therapy nor for large scale studies, mainly because they are expensive, one-of-a-kind robots that are hard to modify to suit a specific need. We created Ono, a social robotics platform, to tackle these issues. Ono is composed entirely from off-the-shelf components and cheap materials, and can be built at a local FabLab at the fraction of the cost of other robots. Ono is also entirely open source and the modular design further encourages modification and reuse of parts of the platform
Removing useless APs and fingerprints from WiFi indoor positioning radio maps
Maintaining consistent radio maps for WiFi fingerprinting-based indoor positioning systems is an essential step to improve the performance of the positioning engines. The radio maps consist of WiFi fingerprints collected at a predefined set of positions/places within a positioning area. Each fingerprint consists of the identification and radio signal level of the surrounding Access Points (APs). Due to the wide proliferation of WiFi networks, it is very common to observe 10 to 20 APs at a single position and more than 50 APs across a single building. However, in practical, not all of the detected APs are useful for the position estimation process. Some of them might have weak signals at certain positions or might have less significance for a positionās fingerprint. Thus, those useless APs will add additional computational overheads during the position estimation, and consequently they will reduce the overall performance of the positioning engines. A similar phenomenon also occurs with some of the collected fingerprints. While it is widely accepted that the larger and more detailed the radio map is, the better is the accuracy of the positioning system, we found that some of the fingerprint samples on the radio maps do not contribute significantly to the estimation process. In this paper, we propose two methods for filtering the positioning radio maps: APs filtering and Fingerprints filtering. Then we report on the results of a set of experiments that have been done to evaluate the performance of a WiFi positioning radio map before and after applying the filtering approaches. The results show that there is possibility to simplify the radio maps of the positioning engines without significant degradation on the positioning precision and accuracy, and therefore to reduce the processing time for estimating the position of a tracked WiFi tag. This result has an important impact on increasing the number of tags a single instance of a WiFi positioning engine can handle at a time.This work was supported by the FEDER program through the COMPETE and the Portuguese Science and Technology Foundation (FCT), within the context of the AAL4ALL (COMPETE 13852) and FCOMP-01-FEDER-0124-022674 projects
A comprehensive survey on deep active learning and its applications in medical image analysis
Deep learning has achieved widespread success in medical image analysis,
leading to an increasing demand for large-scale expert-annotated medical image
datasets. Yet, the high cost of annotating medical images severely hampers the
development of deep learning in this field. To reduce annotation costs, active
learning aims to select the most informative samples for annotation and train
high-performance models with as few labeled samples as possible. In this
survey, we review the core methods of active learning, including the evaluation
of informativeness and sampling strategy. For the first time, we provide a
detailed summary of the integration of active learning with other
label-efficient techniques, such as semi-supervised, self-supervised learning,
and so on. Additionally, we also highlight active learning works that are
specifically tailored to medical image analysis. In the end, we offer our
perspectives on the future trends and challenges of active learning and its
applications in medical image analysis.Comment: Paper List on Github:
https://github.com/LightersWang/Awesome-Active-Learning-for-Medical-Image-Analysi
Unobtrusive and pervasive video-based eye-gaze tracking
Eye-gaze tracking has long been considered a desktop technology that finds its use inside the traditional office setting, where the operating conditions may be controlled. Nonetheless, recent advancements in mobile technology and a growing interest in capturing natural human behaviour have motivated an emerging interest in tracking eye movements within unconstrained real-life conditions, referred to as pervasive eye-gaze tracking. This critical review focuses on emerging passive and unobtrusive video-based eye-gaze tracking methods in recent literature, with the aim to identify different research avenues that are being followed in response to the challenges of pervasive eye-gaze tracking. Different eye-gaze tracking approaches are discussed in order to bring out their strengths and weaknesses, and to identify any limitations, within the context of pervasive eye-gaze tracking, that have yet to be considered by the computer vision community.peer-reviewe
Hand Keypoint Detection in Single Images using Multiview Bootstrapping
We present an approach that uses a multi-camera system to train fine-grained
detectors for keypoints that are prone to occlusion, such as the joints of a
hand. We call this procedure multiview bootstrapping: first, an initial
keypoint detector is used to produce noisy labels in multiple views of the
hand. The noisy detections are then triangulated in 3D using multiview geometry
or marked as outliers. Finally, the reprojected triangulations are used as new
labeled training data to improve the detector. We repeat this process,
generating more labeled data in each iteration. We derive a result analytically
relating the minimum number of views to achieve target true and false positive
rates for a given detector. The method is used to train a hand keypoint
detector for single images. The resulting keypoint detector runs in realtime on
RGB images and has accuracy comparable to methods that use depth sensors. The
single view detector, triangulated over multiple views, enables 3D markerless
hand motion capture with complex object interactions.Comment: CVPR 201
Recommended from our members
Learning to See with Minimal Human Supervision
Deep learning has significantly advanced computer vision in the past decade, paving the way for practical applications such as facial recognition and autonomous driving. However, current techniques depend heavily on human supervision, limiting their broader deployment. This dissertation tackles this problem by introducing algorithms and theories to minimize human supervision in three key areas: data, annotations, and neural network architectures, in the context of various visual understanding tasks such as object detection, image restoration, and 3D generation.
First, we present self-supervised learning algorithms to handle in-the-wild images and videos that traditionally require time-consuming manual curation and labeling. We demonstrate that when a deep network is trained to be invariant to geometric and photometric transformations, representations from its intermediate layers are highly predictive of object semantic parts such as eyes and noses. This insight offers a simple unsupervised learning framework that significantly improves the efficiency and accuracy of few-shot landmark prediction and matching. We then present a technique for learning single-view 3D object pose estimation models by utilizing in-the-wild videos where objects turn (e.g., cars in roundabouts). This technique achieves competitive performance with respect to existing state-of-the-art without requiring any manual labels during training. We also contribute an Accidental Turntables Dataset, containing a challenging set of 41,212 images of cars in cluttered backgrounds, motion blur, and illumination changes that serve as a benchmark for 3D pose estimation.
Second, we address variations in labeling styles across different annotators, which leads to a type of noisy label referred to as heterogeneous label. This variability in human annotation can cause subpar performance during both the training and testing phases. To mitigate this, we have developed a framework that models the labeling styles of individual annotators, reducing the impact of human annotation variations and enhancing the performance of standard object detection models. We have also applied this framework to analyze ecological data, which are often collected opportunistically across different case studies without consistent annotation guidelines. Through this application, we have obtained several insightful observations into large-scale bird migration behaviors and their relationship to climate change.
Our next study explores the challenges of designing neural networks, an area that lacks a comprehensive theoretical understanding. By linking deep neural networks with Gaussian processes, we propose a novel Bayesian interpretation of the deep image prior, which parameterizes a natural image as the output of a convolutional network with random parameters and random input. This approach offers valuable insights to optimize the design of neural networks for various image restoration tasks.
Lastly, we introduce several machine-learning techniques to reconstruct and edit 3D shapes from 2D images with minimal human effort. We first present a generic multi-modal generative model that bridges 2D images and 3D shapes via a shared latent space, and demonstrate its applications on versatile 3D shape generation and manipulation tasks. Additionally, we develop a framework for joint estimation of 3D neural scene representation and camera poses. This approach outperforms prior works and allows us to operate in the general SE(3) camera pose setting, unlike the baselines. The results also indicate this method can be complementary to classical structure-from-motion (SfM) pipelines as it compares favorably to SfM on low-texture and low-resolution images
- ā¦