Search CORE

572 research outputs found

Minimizing Negative Transfer of Knowledge in Multivariate Gaussian Processes: A Scalable and Regularized Approach

Author: Kontar Raed
Raskutti Garvesh
Zhou Shiyu
Publication venue
Publication date: 31/03/2019
Field of study

Recently there has been an increasing interest in the multivariate Gaussian process (MGP) which extends the Gaussian process (GP) to deal with multiple outputs. One approach to construct the MGP and account for non-trivial commonalities amongst outputs employs a convolution process (CP). The CP is based on the idea of sharing latent functions across several convolutions. Despite the elegance of the CP construction, it provides new challenges that need yet to be tackled. First, even with a moderate number of outputs, model building is extremely prohibitive due to the huge increase in computational demands and number of parameters to be estimated. Second, the negative transfer of knowledge may occur when some outputs do not share commonalities. In this paper we address these issues. We propose a regularized pairwise modeling approach for the MGP established using CP. The key feature of our approach is to distribute the estimation of the full multivariate model into a group of bivariate GPs which are individually built. Interestingly pairwise modeling turns out to possess unique characteristics, which allows us to tackle the challenge of negative transfer through penalizing the latent function that facilitates information sharing in each bivariate model. Predictions are then made through combining predictions from the bivariate models within a Bayesian framework. The proposed method has excellent scalability when the number of outputs is large and minimizes the negative transfer of knowledge between uncorrelated outputs. Statistical guarantees for the proposed method are studied and its advantageous features are demonstrated through numerical studies

arXiv.org e-Print Archive

On Indexed Data Broadcast

Author: Khanna Sanjeev
Zhou Shiyu
Publication venue: ScholarlyCommons
Publication date: 01/06/2000
Field of study

We consider the problem of efficient information retrieval in asymmetric communication environments where multiple clients with limited resources retrieve information from a powerful server who periodically broadcasts its information repository over a communication medium. The cost of a retrieving client consists of two components: (a) access time, defined as the total amount of time spent by a client in retrieving the information of interest; and (b) tuning time, defined as the time spent by the client in actively listening to the communication medium, measuring a certain efficiency in resource usage. A probability distribution is associated with the data items in the broadcast representing the likelihood of a data item\u27s being requested at any point of time. The problem of indexed data broadcast is to schedule the data items interleaved with certain indexing information in the broadcast so as to minimize simultaneously the mean access time and the mean tuning time. Prior work on this problem thus far has focused only on some special cases. In this paper we study the indexed data broadcast problem in its full generality and design a broadcast scheme that achieves a mean access time oef at most (1.5 + ε) times the optimal and a mean tuning time bounded by O(log n)

On Indexed Data Broadcast

Author: Khanna Sanjeev
Zhou Shiyu
Publication venue: ScholarlyCommons
Publication date: 01/06/2000
Field of study

Elsevier - Publisher Connector

ScholarlyCommons@Penn

DeepICP: An End-to-End Deep Neural Network for 3D Point Cloud Registration

Author: Fu Xiangyu
Lu Weixin
Song Shiyu
Wan Guowei
Yuan Pengfei
Zhou Yao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/09/2019
Field of study

We present DeepICP - a novel end-to-end learning-based 3D point cloud registration framework that achieves comparable registration accuracy to prior state-of-the-art geometric methods. Different from other keypoint based methods where a RANSAC procedure is usually needed, we implement the use of various deep neural network structures to establish an end-to-end trainable network. Our keypoint detector is trained through this end-to-end structure and enables the system to avoid the inference of dynamic objects, leverages the help of sufficiently salient features on stationary objects, and as a result, achieves high robustness. Rather than searching the corresponding points among existing points, the key contribution is that we innovatively generate them based on learned matching probabilities among a group of candidates, which can boost the registration accuracy. Our loss function incorporates both the local similarity and the global geometric constraints to ensure all above network designs can converge towards the right direction. We comprehensively validate the effectiveness of our approach using both the KITTI dataset and the Apollo-SouthBay dataset. Results demonstrate that our method achieves comparable or better performance than the state-of-the-art geometry-based methods. Detailed ablation and visualization analysis are included to further illustrate the behavior and insights of our network. The low registration error and high robustness of our method makes it attractive for substantial applications relying on the point cloud registration task.Comment: 10 pages, 6 figures, 3 tables, typos corrected, experimental results updated, accepted by ICCV 201

arXiv.org e-Print Archive

Crossref

Long-Running Speech Recognizer:An End-to-End Multi-Task Learning Framework for Online ASR and VAD

Author: Li Meng
Xu Bo
Zhou Shiyu
Publication venue
Publication date: 02/03/2021
Field of study

When we use End-to-end automatic speech recognition (E2E-ASR) system for real-world applications, a voice activity detection (VAD) system is usually needed to improve the performance and to reduce the computational cost by discarding non-speech parts in the audio. This paper presents a novel end-to-end (E2E), multi-task learning (MTL) framework that integrates ASR and VAD into one model. The proposed system, which we refer to as Long-Running Speech Recognizer (LR-SR), learns ASR and VAD jointly from two seperate task-specific datasets in the training stage. With the assistance of VAD, the ASR performance improves as its connectionist temporal classification (CTC) loss function can leverage the VAD alignment information. In the inference stage, the LR-SR system removes non-speech parts at low computational cost and recognizes speech parts with high robustness. Experimental results on segmented speech data show that the proposed MTL framework outperforms the baseline single-task learning (STL) framework in ASR task. On unsegmented speech data, we find that the LR-SR system outperforms the baseline ASR systems that build an extra GMM-based or DNN-based voice activity detector.Comment: 5 pages, 2 figure

arXiv.org e-Print Archive