717 research outputs found

    Approximate Quantile Computation over Sensor Networks

    Get PDF
    Sensor networks have been deployed in various environments, from battle field surveillance to weather monitoring. The amount of data generated by the sensors can be large. One way to analyze such large data set is to capture the essential statistics of the data. Thus the quantile computation in the large scale sensor network becomes an important but challenging problem. The data may be widely distributed, e.g., there may be thousands of sensors. In addition, the memory and bandwidth among sensors could be quite limited. Most previous quantile computation methods assume that the data is either stored or streaming in a centralized site, which could not be directly applied in the sensor environment. In this paper, we propose a novel algorithm to compute the quantile for sensor network data, which dynamically adapts to the memory limitations. Moreover, since sensors may update their values at any time, an incremental maintenance algorithm is developed to reduce the number of times that a global recomputation is needed upon updates. The performance and complexity of our algorithms are analyzed both theoretically and empirically on various large data sets, which demonstrate the high promise of our method

    XL-NBT: A Cross-lingual Neural Belief Tracking Framework

    Full text link
    Task-oriented dialog systems are becoming pervasive, and many companies heavily rely on them to complement human agents for customer service in call centers. With globalization, the need for providing cross-lingual customer support becomes more urgent than ever. However, cross-lingual support poses great challenges---it requires a large amount of additional annotated data from native speakers. In order to bypass the expensive human annotation and achieve the first step towards the ultimate goal of building a universal dialog system, we set out to build a cross-lingual state tracking framework. Specifically, we assume that there exists a source language with dialog belief tracking annotations while the target languages have no annotated dialog data of any form. Then, we pre-train a state tracker for the source language as a teacher, which is able to exploit easy-to-access parallel data. We then distill and transfer its own knowledge to the student state tracker in target languages. We specifically discuss two types of common parallel resources: bilingual corpus and bilingual dictionary, and design different transfer learning strategies accordingly. Experimentally, we successfully use English state tracker as the teacher to transfer its knowledge to both Italian and German trackers and achieve promising results.Comment: 13 pages, 5 figures, 3 tables, accepted to EMNLP 2018 conferenc

    Statistical Learning Methods for Diffusion Magnetic Resonance Imaging

    Get PDF
    Diffusion Magnetic Resonance Imaging (dMRI) is a commonly used imaging technique to reveal white matter (WM) microstructure by probing the diffusion of water molecules. The diffusion of water molecules is constrained by the biological boundaries including nerves and tissues. Thus, quantifying the diffusion process is important to understand the WM microstructure. However, the development of efficient analytical methods for the reconstruction, lifespan structural connectome analysis, and surrogate variable analysis have fallenseriously behind the technological advances. This challenge motivates us to develop new statistical learning methods for dMRI. In the first project, we propose a two-stage sparse and adaptive smoothing model (TSASM) for two major image denoising tasks in neuroimaging data analysis, including image reconstruction from a series of noisy images within each subject and group analysis of images obtained from different subjects. Our TSASM consists of an initial smoothing stage of applying a penalized M-estimator and a refined smoothing stage of applying kernel-based smoothing methods. The key novelties of our TSASM are that it accounts for the sparse structure of imaging signals while preserving piecewise smooth regions with unknown edges. In the second project, we develop a scalable analytical method for mapping the lifespan human structural connectome. Specifically, we develop a novel lifespan population-based structural connectome (LPSC) framework that integrates fiber bundle and functional network information for hierarchically guiding the registration. Our LPSC is applicable to several neuroimaging studies of neuropsychiatric disorders as well as normal brain development. An improved understanding of human structural connectome has the potential to inspire new approaches to prevention, diagnosis, and treatment of many illnesses. In the third project, we propose an eigen-shrinkage projection (ESP) method to perform the surrogate variable analysis and solve the hidden confounder and harmonization problems in the neuroimaging studies. Our ESP can eliminate the signals from primary variable while preserving the eigenvalue-gap between hidden confounder and noises, which enables hidden confounders estimation from the projected data. We then investigate the statistical properties of the estimated hidden confounders and uncover the natural connection with ridge regression. Numerical experiments are used to illustrate the finite-sample performance.Doctor of Philosoph
    • …
    corecore