72 research outputs found
Towards Making Deep Transfer Learning Never Hurt
Transfer learning have been frequently used to improve deep neural network
training through incorporating weights of pre-trained networks as the
starting-point of optimization for regularization. While deep transfer learning
can usually boost the performance with better accuracy and faster convergence,
transferring weights from inappropriate networks hurts training procedure and
may lead to even lower accuracy. In this paper, we consider deep transfer
learning as minimizing a linear combination of empirical loss and regularizer
based on pre-trained weights, where the regularizer would restrict the training
procedure from lowering the empirical loss, with conflicted descent directions
(e.g., derivatives). Following the view, we propose a novel strategy making
regularization-based Deep Transfer learning Never Hurt (DTNH) that, for each
iteration of training procedure, computes the derivatives of the two terms
separately, then re-estimates a new descent direction that does not hurt the
empirical loss minimization while preserving the regularization affects from
the pre-trained weights. Extensive experiments have been done using common
transfer learning regularizers, such as L2-SP and knowledge distillation, on
top of a wide range of deep transfer learning benchmarks including Caltech, MIT
indoor 67, CIFAR-10 and ImageNet. The empirical results show that the proposed
descent direction estimation strategy DTNH can always improve the performance
of deep transfer learning tasks based on all above regularizers, even when
transferring pre-trained weights from inappropriate networks. All in all, DTNH
strategy can improve state-of-the-art regularizers in all cases with 0.1%--7%
higher accuracy in all experiments.Comment: 10 page
On the Noisy Gradient Descent that Generalizes as SGD
The gradient noise of SGD is considered to play a central role in the
observed strong generalization abilities of deep learning. While past studies
confirm that the magnitude and the covariance structure of gradient noise are
critical for regularization, it remains unclear whether or not the class of
noise distributions is important. In this work we provide negative results by
showing that noises in classes different from the SGD noise can also
effectively regularize gradient descent. Our finding is based on a novel
observation on the structure of the SGD noise: it is the multiplication of the
gradient matrix and a sampling noise that arises from the mini-batch sampling
procedure. Moreover, the sampling noises unify two kinds of gradient
regularizing noises that belong to the Gaussian class: the one using (scaled)
Fisher as covariance and the one using the gradient covariance of SGD as
covariance. Finally, thanks to the flexibility of choosing noise class, an
algorithm is proposed to perform noisy gradient descent that generalizes well,
the variant of which even benefits large batch SGD training without hurting
generalization.Comment: ICML 2020 near camera ready versio
EdgeSense: Edge-Mediated Spatial-Temporal Crowdsensing
Edge computing recently is increasingly popular due to the growth of data size and the need of sensing with the reduced center. Based on Edge computing architecture, we propose a novel crowdsensing framework called Edge-Mediated Spatial-Temporal Crowdsensing. This algorithm targets on receiving the environment information such as air pollution, temperature, and traffic flow in some parts of the goal area, and does not aggregate sensor data with its location information. Specifically, EdgeSense works on top of a secured peer-To-peer network consisted of participants and propose a novel Decentralized Spatial-Temporal Crowdsensing framework based on Parallelized Stochastic Gradient Descent. To approximate the sensing data in each part of the target area in each sensing cycle, EdgeSense uses the local sensor data in participants\u27 mobile devices to learn the low-rank characteristic and then recovers the sensing data from it. We evaluate the EdgeSense on the real-world data sets (temperature [1] and PM2.5 [2] data sets), where our algorithm can achieve low error in approximation and also can compete with the baseline algorithm which is designed using centralized and aggregated mechanism
Early Detection of Disease using Electronic Health Records and Fisher\u27s Wishart Discriminant Analysis
Linear Discriminant Analysis (LDA) is a simple and effective technique for pattern classification, while it is also widely-used for early detection of diseases using Electronic Health Records (EHR) data. However, the performance of LDA for EHR data classification is frequently affected by two main factors: ill-posed estimation of LDA parameters (e.g., covariance matrix), and linear inseparability of the EHR data for classification. To handle these two issues, in this paper, we propose a novel classifier FWDA -- Fisher\u27s Wishart Discriminant Analysis, which is developed as a faster and robust nonlinear classifier. Specifically, FWDA first surrogates the distribution of potential inverse covariance matrix estimates using a Wishart distribution estimated from the training data. Then, FWDA samples a group of inverse covariance matrices from the Wishart distribution, predicts using LDA classifiers based on the sampled inverse covariance matrices, and weighted-averages the prediction results via Bayesian Voting scheme. The weights for voting are optimally updated to adapt each new input data, so as to enable the nonlinear classification
Beyond Two-Tower Matching: Learning Sparse Retrievable Cross-Interactions for Recommendation
Two-tower models are a prevalent matching framework for recommendation, which
have been widely deployed in industrial applications. The success of two-tower
matching attributes to its efficiency in retrieval among a large number of
items, since the item tower can be precomputed and used for fast Approximate
Nearest Neighbor (ANN) search. However, it suffers two main challenges,
including limited feature interaction capability and reduced accuracy in online
serving. Existing approaches attempt to design novel late interactions instead
of dot products, but they still fail to support complex feature interactions or
lose retrieval efficiency. To address these challenges, we propose a new
matching paradigm named SparCode, which supports not only sophisticated feature
interactions but also efficient retrieval. Specifically, SparCode introduces an
all-to-all interaction module to model fine-grained query-item interactions.
Besides, we design a discrete code-based sparse inverted index jointly trained
with the model to achieve effective and efficient model inference. Extensive
experiments have been conducted on open benchmark datasets to demonstrate the
superiority of our framework. The results show that SparCode significantly
improves the accuracy of candidate item matching while retaining the same level
of retrieval efficiency with two-tower models. Our source code will be
available at MindSpore/models.Comment: Accepted by SIGIR 2023. Code will be available at
https://reczoo.github.io/SparCod
RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot
A key challenge in robotic manipulation in open domains is how to acquire
diverse and generalizable skills for robots. Recent research in one-shot
imitation learning has shown promise in transferring trained policies to new
tasks based on demonstrations. This feature is attractive for enabling robots
to acquire new skills and improving task and motion planning. However, due to
limitations in the training dataset, the current focus of the community has
mainly been on simple cases, such as push or pick-place tasks, relying solely
on visual guidance. In reality, there are many complex skills, some of which
may even require both visual and tactile perception to solve. This paper aims
to unlock the potential for an agent to generalize to hundreds of real-world
skills with multi-modal perception. To achieve this, we have collected a
dataset comprising over 110,000 contact-rich robot manipulation sequences
across diverse skills, contexts, robots, and camera viewpoints, all collected
in the real world. Each sequence in the dataset includes visual, force, audio,
and action information. Moreover, we also provide a corresponding human
demonstration video and a language description for each robot sequence. We have
invested significant efforts in calibrating all the sensors and ensuring a
high-quality dataset. The dataset is made publicly available at rh20t.github.ioComment: RSS 2023 workshop on LTAMP. The project page is at rh20t.github.i
AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time
Accurate whole-body multi-person pose estimation and tracking is an important
yet challenging topic in computer vision. To capture the subtle actions of
humans for complex behavior analysis, whole-body pose estimation including the
face, body, hand and foot is essential over conventional body-only pose
estimation. In this paper, we present AlphaPose, a system that can perform
accurate whole-body pose estimation and tracking jointly while running in
realtime. To this end, we propose several new techniques: Symmetric Integral
Keypoint Regression (SIKR) for fast and fine localization, Parametric Pose
Non-Maximum-Suppression (P-NMS) for eliminating redundant human detections and
Pose Aware Identity Embedding for jointly pose estimation and tracking. During
training, we resort to Part-Guided Proposal Generator (PGPG) and multi-domain
knowledge distillation to further improve the accuracy. Our method is able to
localize whole-body keypoints accurately and tracks humans simultaneously given
inaccurate bounding boxes and redundant detections. We show a significant
improvement over current state-of-the-art methods in both speed and accuracy on
COCO-wholebody, COCO, PoseTrack, and our proposed Halpe-FullBody pose
estimation dataset. Our model, source codes and dataset are made publicly
available at https://github.com/MVIG-SJTU/AlphaPose.Comment: Documents for AlphaPose, accepted to TPAM
- …