42 research outputs found
Benchmarking Self-Supervised Learning on Diverse Pathology Datasets
Computational pathology can lead to saving human lives, but models are
annotation hungry and pathology images are notoriously expensive to annotate.
Self-supervised learning has shown to be an effective method for utilizing
unlabeled data, and its application to pathology could greatly benefit its
downstream tasks. Yet, there are no principled studies that compare SSL methods
and discuss how to adapt them for pathology. To address this need, we execute
the largest-scale study of SSL pre-training on pathology image data, to date.
Our study is conducted using 4 representative SSL methods on diverse downstream
tasks. We establish that large-scale domain-aligned pre-training in pathology
consistently out-performs ImageNet pre-training in standard SSL settings such
as linear and fine-tuning evaluations, as well as in low-label regimes.
Moreover, we propose a set of domain-specific techniques that we experimentally
show leads to a performance boost. Lastly, for the first time, we apply SSL to
the challenging task of nuclei instance segmentation and show large and
consistent performance improvements under diverse settings
ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation
Gaze estimation is a fundamental task in many applications of computer
vision, human computer interaction and robotics. Many state-of-the-art methods
are trained and tested on custom datasets, making comparison across methods
challenging. Furthermore, existing gaze estimation datasets have limited head
pose and gaze variations, and the evaluations are conducted using different
protocols and metrics. In this paper, we propose a new gaze estimation dataset
called ETH-XGaze, consisting of over one million high-resolution images of
varying gaze under extreme head poses. We collect this dataset from 110
participants with a custom hardware setup including 18 digital SLR cameras and
adjustable illumination conditions, and a calibrated system to record ground
truth gaze targets. We show that our dataset can significantly improve the
robustness of gaze estimation methods across different head poses and gaze
angles. Additionally, we define a standardized experimental protocol and
evaluation metric on ETH-XGaze, to better unify gaze estimation research going
forward. The dataset and benchmark website are available at
https://ait.ethz.ch/projects/2020/ETH-XGazeComment: Accepted at ECCV 2020 (Spotlight
Representation Learning for Webcam-based Gaze Estimation
Knowing where the user is looking at, allows for intelligent systems to keep track of the user's needs and respond immediately to changes. The availability of eye gaze information enables new applications in adaptive user interfaces, increased context awareness in intelligent assistants, and direct assistance in the execution of complex tasks in various settings such as augmented reality, cockpits, mobile devices in the field, and the office space. Recent works have proposed large-scale in-the-wild datasets and deep learning based approaches for estimating eye gaze based on webcam image inputs, to make gaze estimation possible in unmodified and unconstrained environments. However to truly bring gaze estimation to the masses, large gains in performance must yet be made with as few manual interventions as possible being required from the user. We suggest that by incorporating known prior knowledge on eye shape and eyeball anatomy into the design and training of deep neural networks, we can yield meaningful improvements in performance both without and with a few manually labeled samples from the end-user.
In this thesis, we explore the space of learning-based representations for webcam-based gaze estimation, in particular, by proposing novel explicitly defined representations, as well as training methods and neural network architectures for learning implicitly defined representations. We show through evaluations on publicly available datasets that our representations yield improvements in gaze estimator performance in the absence of labeled samples from the final user as well as when only a few such labeled samples are available. Our contributions thus push the boundaries of what is possible with webcam-based gaze estimation, allowing for novel applications to become more accessible and possible.
First, we propose to learn eye-region landmarks as an intermediate representation for conventional gaze estimation methods. The description of eyelid and iris shape via landmark coordinates allows for better generalization across datasets and in adapting to previously unseen users when given just a few labeled samples.
Second, we propose a pictorial intermediate representation which can be produced from gaze direction labels alone. This representation effectively decomposes the gaze direction estimation problem into two easier parts, showing large performance improvements in cross-person gaze estimation.
Third, we explore the learning of equivariance to gaze direction changes, while disentangling the effects of head orientation from gaze direction via a novel disentangling and transforming neural network architecture. Furthermore, this representation is used as input to a meta-learning scheme to result in large performance improvements when using as few as a single labeled sample from a target user to adapt to them.
Last, we suggest that the spatio-temporal relation between the visual stimulus presented to a user and their apparent eye movements should be modeled jointly by a neural network. To achieve this, we collect a novel video-based dataset from 54 participants with synchronized multi-camera views and screen content video. Our proposed architecture for automatic gaze estimate refinement yields large performance improvements even in the absence of any labeled samples from the target user. This is enabled particularly by a data augmentation scheme which mimics the unique person-specific offsets in the definition of line-of-sight
Deep Pictorial Gaze Estimation
ISSN:0302-9743ISSN:1611-334
ETH-XGaze: A Large Scale Dataset for Gaze Estimation Under Extreme Head Pose and Gaze Variation
Gaze estimation is a fundamental task in many applications of computer vision, human computer interaction and robotics. Many state-of-the-art methods are trained and tested on custom datasets, making comparison across methods challenging. Furthermore, existing gaze estimation datasets have limited head pose and gaze variations, and the evaluations are conducted using different protocols and metrics. In this paper, we propose a new gaze estimation dataset called ETH-XGaze, consisting of over one million high-resolution images of varying gaze under extreme head poses. We collect this dataset from 110 participants with a custom hardware setup including 18 digital SLR cameras and adjustable illumination conditions, and a calibrated system to record ground truth gaze targets. We show that our dataset can significantly improve the robustness of gaze estimation methods across different head poses and gaze angles. Additionally, we define a standardized experimental protocol and evaluation metric on ETH-XGaze, to better unify gaze estimation research going forward. The dataset and benchmark website are available at https://ait.ethz.ch/projects/2020/ETH-XGaze.ISSN:0302-9743ISSN:1611-334
Design and Vehicle Implementation of Autonomous Lane Change Algorithm based on Probabilistic Prediction
This paper describes design, vehicle implementation and validation of a motion planning and control algorithm of autonomous driving vehicle for lane change. Autonomous lane change is necessary for high-level autonomous driving. A vehicle equipped with diverse devices like sensors and computer is introduced for implementation and validation of autonomous driving. The autonomous driving system consists of three parts: perception, motion planning and control. In a perception part, surrounding vehicles' states and lane information are estimated. In motion planning part, using these information and chassis information, probabilistic prediction is conducted for ego vehicle and surrounding vehicle separately. And then, driving mode are decided among three modes: lane keeping, lane change and traffic pressure. Driving mode is determined based on a safety distance by predicting states of surrounding vehicles and ego vehicle. If the ego vehicle cannot perform lane change when the lane change is required, the most proper space is selected considering the probabilistic prediction information and the safety distance. Target states are defined based on driving mode and information of surrounding vehicles behaviors. In control part, the distributed control architecture for real time implementation to the vehicle. A linear quadratic regulator (LQR) optimal control and a model predictive control (MPC) are used to obtain the longitudinal acceleration and the desired steering angle. The proposed automated driving algorithm has been evaluated via vehicle test, which has used one autonomous vehicle and two normal vehicles.N
Self-Learning Transformations for Improving Gaze and Head Redirection
Many computer vision tasks rely on labeled data. Rapid progress in generative modeling has led to the ability to synthesize photorealistic images. However, controlling specific aspects of the generation process such that the data can be used for supervision of downstream tasks remains challenging. In this paper we propose a novel generative model for images of faces, that is capable of producing high-quality images under fine-grained control over eye gaze and head orientation angles. This requires the disentangling of many appearance related factors including gaze and head orientation but also lighting, hue etc. We propose a novel architecture which learns to discover, disentangle and encode these extraneous variations in a self-learned manner. We further show that explicitly disentangling task-irrelevant factors results in more accurate modelling of gaze and head orientation. A novel evaluation scheme shows that our method improves upon the state-of-the-art in redirection accuracy and disentanglement between gaze direction and head orientation changes. Furthermore, we show that in the presence of limited amounts of real-world training data, our method allows for improvements in the downstream task of semi-supervised cross-dataset gaze estimation. Please check our project page at: https://ait.ethz.ch/projects/2020/STED-gaze