Search CORE

14,820 research outputs found

Nonlinear Online Learning with Adaptive Nystr\"{o}m Approximation

Author: Kumar Sanjiv
Li Yang
Si Si
Publication venue
Publication date: 23/02/2018
Field of study

Use of nonlinear feature maps via kernel approximation has led to success in many online learning tasks. As a popular kernel approximation method, Nystr\"{o}m approximation, has been well investigated, and various landmark points selection methods have been proposed to improve the approximation quality. However, these improved Nystr\"{o}m methods cannot be directly applied to the online learning setting as they need to access the entire dataset to learn the landmark points, while we need to update model on-the-fly in the online setting. To address this challenge, we propose Adaptive Nystr\"{o}m approximation for solving nonlinear online learning problems. The key idea is to adaptively modify the landmark points via online kmeans and adjust the model accordingly via solving least square problem followed by a gradient descent step. We show that the resulting algorithm outperforms state-of-the-art online learning methods under the same budget

arXiv.org e-Print Archive

Area Attention

Author: Bengio Samy
Kaiser Lukasz
Li Yang
Si Si
Publication venue
Publication date: 07/05/2020
Field of study

Existing attention mechanisms are trained to attend to individual items in a collection (the memory) with a predefined, fixed granularity, e.g., a word token or an image grid. We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e.g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences. Importantly, the shape and the size of an area are dynamically determined via learning, which enables a model to attend to information with varying granularity. Area attention can easily work with existing model architectures such as multi-head attention for simultaneously attending to multiple areas in the memory. We evaluate area attention on two tasks: neural machine translation (both character and token-level) and image captioning, and improve upon strong (state-of-the-art) baselines in all the cases. These improvements are obtainable with a basic form of area attention that is parameter free.Comment: @InProceedings{pmlr-v97-li19e, title = {Area Attention}, author = {Li, Yang and Kaiser, Lukasz and Bengio, Samy and Si, Si}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {3846--3855}, year = {2019}, volume = {97}, series = {Proceedings of Machine Learning Research}, publisher = {PMLR}

arXiv.org e-Print Archive

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks

Author: Chen Patrick H.
Hsieh Cho-Jui
Kumar Sanjiv
Li Yang
Si Si
Publication venue
Publication date: 29/10/2018
Field of study

Neural language models have been widely used in various NLP tasks, including machine translation, next word prediction and conversational agents. However, it is challenging to deploy these models on mobile devices due to their slow prediction speed, where the bottleneck is to compute top candidates in the softmax layer. In this paper, we introduce a novel softmax layer approximation algorithm by exploiting the clustering structure of context vectors. Our algorithm uses a light-weight screening model to predict a much smaller set of candidate words based on the given context, and then conducts an exact softmax only within that subset. Training such a procedure end-to-end is challenging as traditional clustering methods are discrete and non-differentiable, and thus unable to be used with back-propagation in the training process. Using the Gumbel softmax, we are able to train the screening model end-to-end on the training set to exploit data distribution. The algorithm achieves an order of magnitude faster inference than the original softmax layer for predicting top-

k

words in various tasks such as beam search in machine translation or next words prediction. For example, for machine translation task on German to English dataset with around 25K vocabulary, we can achieve 20.4 times speed up with 98.9\% precision@1 and 99.3\% precision@5 with the original softmax layer prediction, while state-of-the-art ~\citep{MSRprediction} only achieves 6.7x speedup with 98.7\% precision@1 and 98.1\% precision@5 for the same task

arXiv.org e-Print Archive

Localization Trajectory and Chern-Simons axion coupling for Bilayer Quantum Anomalous Hall Systems

Author: Guan Ji-Huan
Li Shu-Shen
Wang Si-Si
Xia Yang
Yu Yan
Zhang Yan-Yang
Publication venue: 'American Physical Society (APS)'
Publication date: 12/03/2019
Field of study

Quantum anomalous Hall (QAH) multilayers provide a platform of topological materials with high Chern numbers. We investigate the localization routes of bilayer QAH systems with Chern number C = 2 under strong disorder, by numerical simulations on their quantum transport properties and the Chern-Simons axion coupling. Compared to the single layer counterpart with C = 2, the localization trajectories present much richer behaviors, for example, the existence of the stable intermediate state with C = 1 can be tuned by model parameters. This state was always unstable in the single layer case. Furthermore, the two parameter scaling trajectories also exhibit multiple patterns, some of which were not captured by the standard Pruisken picture. During the process towards localization, the Chern-Simons axion coupling shows a surprisingly remarkable peak which is even higher and sharper in the large size limit. Therefore the disordered bilayer QAH system can be a good candidate for this nontrivial magnetoelectric coupling mediated by orbital motions.Comment: 11 pages, 11 figure

arXiv.org e-Print Archive

Cyclone intensity estimate with context-aware cyclegan

Author: Cheng Mingfei
Li Si
Xu Yajing
Yang Haitao
Publication venue
Publication date: 10/05/2019
Field of study

Deep learning approaches to cyclone intensity estimationhave recently shown promising results. However, sufferingfrom the extreme scarcity of cyclone data on specific in-tensity, most existing deep learning methods fail to achievesatisfactory performance on cyclone intensity estimation,especially on classes with few instances. To avoid the degra-dation of recognition performance caused by scarce samples,we propose a context-aware CycleGAN which learns the la-tent evolution features from adjacent cyclone intensity andsynthesizes CNN features of classes lacking samples fromunpaired source classes. Specifically, our approach synthe-sizes features conditioned on the learned evolution features,while the extra information is not required. Experimentalresults of several evaluation methods show the effectivenessof our approach, even can predicting unseen classes.Comment: 5 page

arXiv.org e-Print Archive

Cosmic Reionization Study : Principle Component Analysis After Planck

Author: Li Hong
Li Si-Yu
Li Yong-Ping
Liu Yang
Zhang Xinmin
Publication venue: 'IOP Publishing'
Publication date: 23/12/2015
Field of study

The study of reionization history plays an important role in understanding the evolution of our universe. It is commonly believed that the intergalactic medium (IGM) in our universe are fully ionized today, however the reionizing process remains to be mysterious. A simple instantaneous reionization process is usually adopted in modern cosmology without direct observational evidence. However, the history of ionization fraction,

x_e(z)

will influence cosmic microwave background (CMB) observables and constraints on optical depth

\tau

. With the mocked future data sets based on featured reionization model, we find the bias on

\tau

introduced by instantaneous model can not be neglected. In this paper, we study the cosmic reionization history in a model independent way, the so called principle component analysis (PCA) method, and reconstruct

x_e (z)

at different redshift

z

with the data sets of Planck, WMAP 9 years temperature and polarization power spectra, combining with the baryon acoustic oscillation (BAO) from galaxy survey and type Ia supernovae (SN) Union 2.1 sample respectively. The results show that reconstructed

x_e(z)

is consistent with instantaneous behavior, however, there exists slight deviation from this behavior at some epoch. With PCA method, after abandoning the noisy modes, we get stronger constraints, and the hints for featured

x_e(z)

evolution could become a little more obvious.Comment: 12 pages, 10 figure

arXiv.org e-Print Archive

Monogamy deficit for quantum correlation in multipartite quantum system

Author: Fan Heng
Li Bo
Liu Si-Yuan
Yang Wen-Li
Publication venue: 'American Physical Society (APS)'
Publication date: 28/05/2013
Field of study

We introduce the concept of monogamy deficit for quantum correlation by combining together two types of monogamy inequalities depending on different measurement sides. For tripartite pure state, we demonstrate a relation which connects two types of monogamy inequalities for quantum discord and provide the difference between them. By using this relation, we obtain an unified physical interpretation for these two monogamy deficit. In addition, we find an interesting fact that there is a general monogamy condition for several quantum correlations for tripartite pure states. We then provide a necessary and sufficient condition for the establishment of one kind of monogamy inequality for tripartite mixed state and generalize it to multipartite quantum state.Comment: 8 pages, 1 figur

arXiv.org e-Print Archive

Deformed Legendre Polynomial and Its Application

Author: Jing Si Cong
Li Hu
Yang Wei Min
Publication venue
Publication date: 02/12/2002
Field of study

A new kind of deformed calculus was introduced recently in studying of parabosonic coordinate representation. Based on this deformed calculus, a new deformation of Legendre polynomials is proposed in this paper, some properties and applications of which are also discussed.Comment: 11 pages, LaTe

arXiv.org e-Print Archive

Tibet $^\prime$ s Window on Primordial Gravitational Waves

Author: Li Hong
Li Si-Yu
Li Yong-Ping
Liu Yang
Zhang Xinmin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/02/2018
Field of study

As an essential part of ChinaÃ¢ÂÂs Gravitational Waves Program, the Ali CMB Polarization Telescope (AliCPT) is a ground-based experiment aiming at the Primordial Gravitational Waves (PGWs) by measuring B-mode polarization of Cosmic Microwave Background (CMB). First proposed in 2014 and currently in fast construction phase, AliCPT is ChinaÃ¢ÂÂs first CMB project that plans for commissioning in 2019. Led by the Institute of High Energy Physics (IHEP) under the Chinese Academy of Sciences (CAS), the project is a worldwide collaboration of more than fifteen universities and research institutes. Ali CMB Project is briefly introduced

arXiv.org e-Print Archive

Measurement of weak static magnetic fields with nitrogen-vacancy color center

Author: Ai Qing
Li Hong-Hui
Li Lu-Si
Yang Zhi-Sheng
Zhou Li-Li
Publication venue: Acta Physica Sinica, Chinese Physical Society and Institute of Physics, Chinese Academy of Sciences
Publication date: 18/08/2017
Field of study

We propose a strategy to measure weak static magnetic fields with nitrogen-vacancy color center in diamond. Inspired by avian magnetoreception models, we consider the feasibility of utilizing quantum coherence phenomena to measure weak static magnetic fields. Nitrogen-vacancy (NV) color centers are regarded as the ideal platform to study quantum sciences as a result of its long coherence time up to a millisecond timescale. In high-purity diamond, hyperfine interaction with 13C nuclear spins dominates the decoherence process. In this paper, we numerically simulate the decoherence process between 0 and +1 of the individual NV color center spin in 13C nuclear baths with various of magnitudes of external magnetic fields. By applying Hahn echo into the system, we obtain the coherence of NV color center spin as a function of total evolution time and magnetic field. Furthermore we obtain the high-accuracy relationship between the three decoherence-characteristic timescales, i.e. T_W, T_R, T_2, and magnetic field B. And we draw a conclusion that T_R has the highest sensitivity about magnetic field among the three time-scales. Thus, for a certain NV color center, T_R can be the scale for the magnitude of magnetic field, or rather, the component along the NV electronic spin axis. When measuring an unknown magnetic field, we adjust the NV axis to three mutually orthogonal directions respectively. By this means, we obtain the three components of the magnetic field and thus the magnitude and direction of the actual magnetic field. The accuracy could reach 60 nT/Hz^{1/2},and could be greatly improved by using an ensemble of NV color centers or diamond crystals purified with 12C atoms.Comment: 17 pages, 5 figures, 1 tabl

arXiv.org e-Print Archive