60 research outputs found
Towards Making Deep Transfer Learning Never Hurt
Transfer learning have been frequently used to improve deep neural network
training through incorporating weights of pre-trained networks as the
starting-point of optimization for regularization. While deep transfer learning
can usually boost the performance with better accuracy and faster convergence,
transferring weights from inappropriate networks hurts training procedure and
may lead to even lower accuracy. In this paper, we consider deep transfer
learning as minimizing a linear combination of empirical loss and regularizer
based on pre-trained weights, where the regularizer would restrict the training
procedure from lowering the empirical loss, with conflicted descent directions
(e.g., derivatives). Following the view, we propose a novel strategy making
regularization-based Deep Transfer learning Never Hurt (DTNH) that, for each
iteration of training procedure, computes the derivatives of the two terms
separately, then re-estimates a new descent direction that does not hurt the
empirical loss minimization while preserving the regularization affects from
the pre-trained weights. Extensive experiments have been done using common
transfer learning regularizers, such as L2-SP and knowledge distillation, on
top of a wide range of deep transfer learning benchmarks including Caltech, MIT
indoor 67, CIFAR-10 and ImageNet. The empirical results show that the proposed
descent direction estimation strategy DTNH can always improve the performance
of deep transfer learning tasks based on all above regularizers, even when
transferring pre-trained weights from inappropriate networks. All in all, DTNH
strategy can improve state-of-the-art regularizers in all cases with 0.1%--7%
higher accuracy in all experiments.Comment: 10 page
EdgeSense: Edge-Mediated Spatial-Temporal Crowdsensing
Edge computing recently is increasingly popular due to the growth of data size and the need of sensing with the reduced center. Based on Edge computing architecture, we propose a novel crowdsensing framework called Edge-Mediated Spatial-Temporal Crowdsensing. This algorithm targets on receiving the environment information such as air pollution, temperature, and traffic flow in some parts of the goal area, and does not aggregate sensor data with its location information. Specifically, EdgeSense works on top of a secured peer-To-peer network consisted of participants and propose a novel Decentralized Spatial-Temporal Crowdsensing framework based on Parallelized Stochastic Gradient Descent. To approximate the sensing data in each part of the target area in each sensing cycle, EdgeSense uses the local sensor data in participants\u27 mobile devices to learn the low-rank characteristic and then recovers the sensing data from it. We evaluate the EdgeSense on the real-world data sets (temperature [1] and PM2.5 [2] data sets), where our algorithm can achieve low error in approximation and also can compete with the baseline algorithm which is designed using centralized and aggregated mechanism
On the Noisy Gradient Descent that Generalizes as SGD
The gradient noise of SGD is considered to play a central role in the
observed strong generalization abilities of deep learning. While past studies
confirm that the magnitude and the covariance structure of gradient noise are
critical for regularization, it remains unclear whether or not the class of
noise distributions is important. In this work we provide negative results by
showing that noises in classes different from the SGD noise can also
effectively regularize gradient descent. Our finding is based on a novel
observation on the structure of the SGD noise: it is the multiplication of the
gradient matrix and a sampling noise that arises from the mini-batch sampling
procedure. Moreover, the sampling noises unify two kinds of gradient
regularizing noises that belong to the Gaussian class: the one using (scaled)
Fisher as covariance and the one using the gradient covariance of SGD as
covariance. Finally, thanks to the flexibility of choosing noise class, an
algorithm is proposed to perform noisy gradient descent that generalizes well,
the variant of which even benefits large batch SGD training without hurting
generalization.Comment: ICML 2020 near camera ready versio
Early Detection of Disease using Electronic Health Records and Fisher\u27s Wishart Discriminant Analysis
Linear Discriminant Analysis (LDA) is a simple and effective technique for pattern classification, while it is also widely-used for early detection of diseases using Electronic Health Records (EHR) data. However, the performance of LDA for EHR data classification is frequently affected by two main factors: ill-posed estimation of LDA parameters (e.g., covariance matrix), and linear inseparability of the EHR data for classification. To handle these two issues, in this paper, we propose a novel classifier FWDA -- Fisher\u27s Wishart Discriminant Analysis, which is developed as a faster and robust nonlinear classifier. Specifically, FWDA first surrogates the distribution of potential inverse covariance matrix estimates using a Wishart distribution estimated from the training data. Then, FWDA samples a group of inverse covariance matrices from the Wishart distribution, predicts using LDA classifiers based on the sampled inverse covariance matrices, and weighted-averages the prediction results via Bayesian Voting scheme. The weights for voting are optimally updated to adapt each new input data, so as to enable the nonlinear classification
RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot
A key challenge in robotic manipulation in open domains is how to acquire
diverse and generalizable skills for robots. Recent research in one-shot
imitation learning has shown promise in transferring trained policies to new
tasks based on demonstrations. This feature is attractive for enabling robots
to acquire new skills and improving task and motion planning. However, due to
limitations in the training dataset, the current focus of the community has
mainly been on simple cases, such as push or pick-place tasks, relying solely
on visual guidance. In reality, there are many complex skills, some of which
may even require both visual and tactile perception to solve. This paper aims
to unlock the potential for an agent to generalize to hundreds of real-world
skills with multi-modal perception. To achieve this, we have collected a
dataset comprising over 110,000 contact-rich robot manipulation sequences
across diverse skills, contexts, robots, and camera viewpoints, all collected
in the real world. Each sequence in the dataset includes visual, force, audio,
and action information. Moreover, we also provide a corresponding human
demonstration video and a language description for each robot sequence. We have
invested significant efforts in calibrating all the sensors and ensuring a
high-quality dataset. The dataset is made publicly available at rh20t.github.ioComment: RSS 2023 workshop on LTAMP. The project page is at rh20t.github.i
AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time
Accurate whole-body multi-person pose estimation and tracking is an important
yet challenging topic in computer vision. To capture the subtle actions of
humans for complex behavior analysis, whole-body pose estimation including the
face, body, hand and foot is essential over conventional body-only pose
estimation. In this paper, we present AlphaPose, a system that can perform
accurate whole-body pose estimation and tracking jointly while running in
realtime. To this end, we propose several new techniques: Symmetric Integral
Keypoint Regression (SIKR) for fast and fine localization, Parametric Pose
Non-Maximum-Suppression (P-NMS) for eliminating redundant human detections and
Pose Aware Identity Embedding for jointly pose estimation and tracking. During
training, we resort to Part-Guided Proposal Generator (PGPG) and multi-domain
knowledge distillation to further improve the accuracy. Our method is able to
localize whole-body keypoints accurately and tracks humans simultaneously given
inaccurate bounding boxes and redundant detections. We show a significant
improvement over current state-of-the-art methods in both speed and accuracy on
COCO-wholebody, COCO, PoseTrack, and our proposed Halpe-FullBody pose
estimation dataset. Our model, source codes and dataset are made publicly
available at https://github.com/MVIG-SJTU/AlphaPose.Comment: Documents for AlphaPose, accepted to TPAM
H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps
Solving real-world complex tasks using reinforcement learning (RL) without
high-fidelity simulation environments or large amounts of offline data can be
quite challenging. Online RL agents trained in imperfect simulation
environments can suffer from severe sim-to-real issues. Offline RL approaches
although bypass the need for simulators, often pose demanding requirements on
the size and quality of the offline datasets. The recently emerged hybrid
offline-and-online RL provides an attractive framework that enables joint use
of limited offline data and imperfect simulator for transferable policy
learning. In this paper, we develop a new algorithm, called H2O+, which offers
great flexibility to bridge various choices of offline and online learning
methods, while also accounting for dynamics gaps between the real and
simulation environment. Through extensive simulation and real-world robotics
experiments, we demonstrate superior performance and flexibility over advanced
cross-domain online and offline RL algorithms
- …