363 research outputs found
Subset Sampling and Its Extensions
This paper studies the \emph{subset sampling} problem. The input is a set
of records together with a function that assigns
each record a probability . A query returns a
random subset of , where each record is
sampled into independently with probability . The goal is to
store in a data structure to answer queries efficiently. If
fits in memory, the problem is interesting when is
dynamic. We develop a dynamic data structure with
expected \emph{query} time,
space and amortized expected \emph{update}, \emph{insert} and
\emph{delete} time, where
. The query time and
space are optimal. If does not fit in memory, the problem is
difficult even if is static. Under this scenario, we present an
I/O-efficient algorithm that answers a \emph{query} in
amortized expected I/Os using space, where is the memory
size, is the block size and is the number of iterative
operations we need to perform on before going below . In
addition, when each record is associated with a real-valued key, we extend the
\emph{subset sampling} problem to the \emph{range subset sampling} problem, in
which we require that the keys of the sampled records fall within a specified
input range . For this extension, we provide a solution under the
dynamic setting, with expected
\emph{query} time, space and amortized
expected \emph{update}, \emph{insert} and \emph{delete} time.Comment: 17 page
A novel artificial bee colony based clustering algorithm for categorical data
Funding: This work was supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. (21127010, 61202309, http://www.nsfc.gov.cn/), China Postdoctoral Science Foundation under Grant No. 2013M530956 (http://res.chinapostdoctor.org.cn), the UK Economic & Social Research Council (ESRC): award reference: ES/M001628/1 (http://www.esrc.ac.uk/), Science and Technology Development Plan of Jilin province under Grant No. 20140520068JH (http://www.jlkjt.gov.cn), Fundamental Research Funds for the Central Universities under No. 14QNJJ028 (http://www.nenu.edu.cn), the open project program of Key Laboratory of Symbolic Computation andKnowledge Engineering of Ministry of Education, Jilin University under Grant No. 93K172014K07 (http://www.jlu.edu.cn). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Peer reviewedPublisher PD
High throughput photonic time stretch optical coherence tomography with data compression
Photonic time stretch enables real time high throughput optical coherence tomography (OCT), but with massive data volume being a real challenge. In this paper, data compression in high throughput optical time stretch OCT has been explored and experimentally demonstrated. This is made possible by exploiting spectral sparsity of encoded optical pulse spectrum using compressive sensing (CS) approach. Both randomization and integration have been implemented in the optical domain avoiding an electronic bottleneck. A data compression ratio of 66% has been achieved in high throughput OCT measurements with 1.51 MHz axial scan rate using greatly reduced data sampling rate of 50 MS/s. Potential to improve compression ratio has been exploited. In addition, using a dual pulse integration method, capability of improving frequency measurement resolution in the proposed system has been demonstrated. A number of optimization algorithms for the reconstruction of the frequency-domain OCT signals have been compared in terms of reconstruction accuracy and efficiency. Our results show that the L1 Magic implementation of the primal-dual interior point method offers the best compromise between accuracy and reconstruction time of the time-stretch OCT signal tested
- …