25,722 research outputs found
Preserving Statistical Validity in Adaptive Data Analysis
A great deal of effort has been devoted to reducing the risk of spurious
scientific discoveries, from the use of sophisticated validation techniques, to
deep statistical methods for controlling the false discovery rate in multiple
hypothesis testing. However, there is a fundamental disconnect between the
theoretical results and the practice of data analysis: the theory of
statistical inference assumes a fixed collection of hypotheses to be tested, or
learning algorithms to be applied, selected non-adaptively before the data are
gathered, whereas in practice data is shared and reused with hypotheses and new
analyses being generated on the basis of data exploration and the outcomes of
previous analyses.
In this work we initiate a principled study of how to guarantee the validity
of statistical inference in adaptive data analysis. As an instance of this
problem, we propose and investigate the question of estimating the expectations
of adaptively chosen functions on an unknown distribution given random
samples.
We show that, surprisingly, there is a way to estimate an exponential in
number of expectations accurately even if the functions are chosen adaptively.
This gives an exponential improvement over standard empirical estimators that
are limited to a linear number of estimates. Our result follows from a general
technique that counter-intuitively involves actively perturbing and
coordinating the estimates, using techniques developed for privacy
preservation. We give additional applications of this technique to our
question.Comment: Updated related work with recent development
Kajian motivasi ekstrinsik di antara Pelajar Lepasan Sijil dan Diploma Politeknik Jabatan Kejuruteraan Awam KUiTTHO
Kajian ini dijalankan untuk menyelidiki pengaruh dorongan keluarga, cara pengajaran pensyarah, pengaruh rakan sebaya dan kemudahan infrastruktur terhadap motivasi ekstrinsik bagi pelajar tahun tiga dan tahun empat lepasan sijil dan diploma politeknik Jabatan Kejuruteraan Awain Kolej Universiti Teknologi Tun Hussein Onn. Sampel kajian ini beijumlah 87 orang bagi pelajar lepasan sijil politeknik dan 38 orang bagi lepasan diploma politeknik. Data kajian telah diperolehi melalui borang soal selidik dan telah dianalisis menggunakan perisian SPSS (Statical Package For Sciences). Hasil kajian telah dipersembahkan dalam bentuk jadual dan histohgrapi. Analisis kajian mendapati bahawa kedua-dua kumpulan setuju bahawa faktor-faktor di atas memberi kesan kepada motivasi ekstrinsik mereka. Dengan kata lain faktpr-faktor tersebut penting dalam membentuk pelajar mencapai kecemerlangan akademik
Quasi-local evolution of cosmic gravitational clustering in the weakly non-linear regime
We investigate the weakly non-linear evolution of cosmic gravitational
clustering in phase space by looking at the Zel'dovich solution in the discrete
wavelet transform (DWT) representation. We show that if the initial
perturbations are Gaussian, the relation between the evolved DWT mode and the
initial perturbations in the weakly non-linear regime is quasi-local. That is,
the evolved density perturbations are mainly determined by the initial
perturbations localized in the same spatial range. Furthermore, we show that
the evolved mode is monotonically related to the initial perturbed mode. Thus
large (small) perturbed modes statistically correspond to the large (small)
initial perturbed modes. We test this prediction by using QSO Ly
absorption samples. The results show that the weakly non-linear features for
both the transmitted flux and identified forest lines are quasi-localized. The
locality and monotonic properties provide a solid basis for a DWT
scale-by-scale Gaussianization reconstruction algorithm proposed by Feng & Fang
(Feng & Fang, 2000) for data in the weakly non-linear regime. With the
Zel'dovich solution, we find also that the major non-Gaussianity caused by the
weakly non-linear evolution is local scale-scale correlations. Therefore, to
have a precise recovery of the initial Gaussian mass field, it is essential to
remove the scale-scale correlations.Comment: 22 pages, 13 figures. Accepted for publication in the Astrophysical
Journa
- …