25,722 research outputs found

    Preserving Statistical Validity in Adaptive Data Analysis

    Full text link
    A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of sophisticated validation techniques, to deep statistical methods for controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of data analysis: the theory of statistical inference assumes a fixed collection of hypotheses to be tested, or learning algorithms to be applied, selected non-adaptively before the data are gathered, whereas in practice data is shared and reused with hypotheses and new analyses being generated on the basis of data exploration and the outcomes of previous analyses. In this work we initiate a principled study of how to guarantee the validity of statistical inference in adaptive data analysis. As an instance of this problem, we propose and investigate the question of estimating the expectations of mm adaptively chosen functions on an unknown distribution given nn random samples. We show that, surprisingly, there is a way to estimate an exponential in nn number of expectations accurately even if the functions are chosen adaptively. This gives an exponential improvement over standard empirical estimators that are limited to a linear number of estimates. Our result follows from a general technique that counter-intuitively involves actively perturbing and coordinating the estimates, using techniques developed for privacy preservation. We give additional applications of this technique to our question.Comment: Updated related work with recent development

    Kajian motivasi ekstrinsik di antara Pelajar Lepasan Sijil dan Diploma Politeknik Jabatan Kejuruteraan Awam KUiTTHO

    Get PDF
    Kajian ini dijalankan untuk menyelidiki pengaruh dorongan keluarga, cara pengajaran pensyarah, pengaruh rakan sebaya dan kemudahan infrastruktur terhadap motivasi ekstrinsik bagi pelajar tahun tiga dan tahun empat lepasan sijil dan diploma politeknik Jabatan Kejuruteraan Awain Kolej Universiti Teknologi Tun Hussein Onn. Sampel kajian ini beijumlah 87 orang bagi pelajar lepasan sijil politeknik dan 38 orang bagi lepasan diploma politeknik. Data kajian telah diperolehi melalui borang soal selidik dan telah dianalisis menggunakan perisian SPSS (Statical Package For Sciences). Hasil kajian telah dipersembahkan dalam bentuk jadual dan histohgrapi. Analisis kajian mendapati bahawa kedua-dua kumpulan setuju bahawa faktor-faktor di atas memberi kesan kepada motivasi ekstrinsik mereka. Dengan kata lain faktpr-faktor tersebut penting dalam membentuk pelajar mencapai kecemerlangan akademik

    Quasi-local evolution of cosmic gravitational clustering in the weakly non-linear regime

    Full text link
    We investigate the weakly non-linear evolution of cosmic gravitational clustering in phase space by looking at the Zel'dovich solution in the discrete wavelet transform (DWT) representation. We show that if the initial perturbations are Gaussian, the relation between the evolved DWT mode and the initial perturbations in the weakly non-linear regime is quasi-local. That is, the evolved density perturbations are mainly determined by the initial perturbations localized in the same spatial range. Furthermore, we show that the evolved mode is monotonically related to the initial perturbed mode. Thus large (small) perturbed modes statistically correspond to the large (small) initial perturbed modes. We test this prediction by using QSO Lyα\alpha absorption samples. The results show that the weakly non-linear features for both the transmitted flux and identified forest lines are quasi-localized. The locality and monotonic properties provide a solid basis for a DWT scale-by-scale Gaussianization reconstruction algorithm proposed by Feng & Fang (Feng & Fang, 2000) for data in the weakly non-linear regime. With the Zel'dovich solution, we find also that the major non-Gaussianity caused by the weakly non-linear evolution is local scale-scale correlations. Therefore, to have a precise recovery of the initial Gaussian mass field, it is essential to remove the scale-scale correlations.Comment: 22 pages, 13 figures. Accepted for publication in the Astrophysical Journa
    • …
    corecore