257 research outputs found
Distributed Estimation and Inference with Statistical Guarantees
This paper studies hypothesis testing and parameter estimation in the context
of the divide and conquer algorithm. In a unified likelihood based framework,
we propose new test statistics and point estimators obtained by aggregating
various statistics from subsamples of size , where is the sample
size. In both low dimensional and high dimensional settings, we address the
important question of how to choose as grows large, providing a
theoretical upper bound on such that the information loss due to the divide
and conquer algorithm is negligible. In other words, the resulting estimators
have the same inferential efficiencies and estimation rates as a practically
infeasible oracle with access to the full sample. Thorough numerical results
are provided to back up the theory
Molecular dynamics simulation of graphene sinking during chemical vapor deposition growth on semi-molten Cu substrate
Copper foil is the most promising catalyst for the synthesis of large-area, high-quality monolayer graphene. Experimentally, it has been found that the Cu substrate is semi-molten at graphene growth temperatures. In this study, based on a self-developed C-Cu empirical potential and density functional theory (DFT) methods, we performed systematic molecular dynamics simulations to explore the stability of graphene nanostructures, i.e., carbon nanoclusters and graphene nanoribbons, on semi-molten Cu substrates. Many atomic details observed in the classical MD simulations agree well with those seen in DFT-MD simulations, confirming the high accuracy of the C-Cu potential. Depending on the size of the graphene island, two different sunken-modes are observed: (i) graphene island sinks into the first layer of the metal substrate and (ii) many metal atoms surround the graphene island. Further study reveals that the sinking graphene leads to the unidirectional alignment and seamless stitching of the graphene islands, which explains the growth of large single-crystal graphene on Cu foil. This study deepens our physical insights into the CVD growth of graphene on semi-molten Cu substrate with multiple experimental mysteries well explained and provides theoretic references for the controlled synthesis of large-area single-crystalline monolayer graphene
Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis
In this paper, we propose binary sparse convolutional networks called BSC-Net
for efficient point cloud analysis. We empirically observe that sparse
convolution operation causes larger quantization errors than standard
convolution. However, conventional network quantization methods directly
binarize the weights and activations in sparse convolution, resulting in
performance drop due to the significant quantization loss. On the contrary, we
search the optimal subset of convolution operation that activates the sparse
convolution at various locations for quantization error alleviation, and the
performance gap between real-valued and binary sparse convolutional networks is
closed without complexity overhead. Specifically, we first present the shifted
sparse convolution that fuses the information in the receptive field for the
active sites that match the pre-defined positions. Then we employ the
differentiable search strategies to discover the optimal opsitions for active
site matching in the shifted sparse convolution, and the quantization errors
are significantly alleviated for efficient point cloud analysis. For fair
evaluation of the proposed method, we empirically select the recently advances
that are beneficial for sparse convolution network binarization to construct a
strong baseline. The experimental results on Scan-Net and NYU Depth v2 show
that our BSC-Net achieves significant improvement upon our srtong baseline and
outperforms the state-of-the-art network binarization methods by a remarkable
margin without additional computation overhead for binarizing sparse
convolutional networks.Comment: Accepted to CVPR202
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression
In this paper, we propose an ultrafast automated model compression framework
called SeerNet for flexible network deployment. Conventional
non-differen-tiable methods discretely search the desirable compression policy
based on the accuracy from exhaustively trained lightweight models, and
existing differentiable methods optimize an extremely large supernet to obtain
the required compressed model for deployment. They both cause heavy
computational cost due to the complex compression policy search and evaluation
process. On the contrary, we obtain the optimal efficient networks by directly
optimizing the compression policy with an accurate performance predictor, where
the ultrafast automated model compression for various computational cost
constraint is achieved without complex compression policy search and
evaluation. Specifically, we first train the performance predictor based on the
accuracy from uncertain compression policies actively selected by efficient
evolutionary search, so that informative supervision is provided to learn the
accurate performance predictor with acceptable cost. Then we leverage the
gradient that maximizes the predicted performance under the barrier complexity
constraint for ultrafast acquisition of the desirable compression policy, where
adaptive update stepsizes with momentum are employed to enhance optimality of
the acquired pruning and quantization strategy. Compared with the
state-of-the-art automated model compression methods, experimental results on
image classification and object detection show that our method achieves
competitive accuracy-complexity trade-offs with significant reduction of the
search cost.Comment: Accepted to IJC
Joint Communication and Computation Design in Transmissive RMS Transceiver Enabled Multi-Tier Computing Networks
In this paper, a novel transmissive reconfigurable meta-surface (RMS)
transceiver enabled multi-tier computing network architecture is proposed for
improving computing capability, decreasing computing delay and reducing base
station (BS) deployment cost, in which transmissive RMS equipped with a feed
antenna can be regarded as a new type of multi-antenna system. We formulate a
total energy consumption minimization problem by a joint optimization of
subcarrier allocation, task input bits, time slot allocation, transmit power
allocation and RMS transmissive coefficient while taking into account the
constraints of communication resources and computing resources. This formulated
problem is a non-convex optimization problem due to the high coupling of
optimization variables, which is NP-hard to obtain its optimal solution. To
address the above challenging problems, block coordinate descent (BCD)
technique is employed to decouple the optimization variables to solve the
problem. Specifically, the joint optimization problem of subcarrier allocation,
task input bits, time slot allocation, transmit power allocation and RMS
transmissive coefficient is divided into three subproblems to solve by applying
BCD. Then, the decoupled three subproblems are optimized alternately by using
successive convex approximation (SCA) and difference-convex (DC) programming
until the convergence is achieved. Numerical results verify that our proposed
algorithm is superior in reducing total energy consumption compared to other
benchmarks
Towards Accurate Data-free Quantization for Diffusion Models
In this paper, we propose an accurate data-free post-training quantization
framework of diffusion models (ADP-DM) for efficient image generation.
Conventional data-free quantization methods learn shared quantization functions
for tensor discretization regardless of the generation timesteps, while the
activation distribution differs significantly across various timesteps. The
calibration images are acquired in random timesteps which fail to provide
sufficient information for generalizable quantization function learning. Both
issues cause sizable quantization errors with obvious image generation
performance degradation. On the contrary, we design group-wise quantization
functions for activation discretization in different timesteps and sample the
optimal timestep for informative calibration image generation, so that our
quantized diffusion model can reduce the discretization errors with negligible
computational overhead. Specifically, we partition the timesteps according to
the importance weights of quantization functions in different groups, which are
optimized by differentiable search algorithms. We also select the optimal
timestep for calibration image generation by structural risk minimizing
principle in order to enhance the generalization ability in the deployment of
quantized diffusion model. Extensive experimental results show that our method
outperforms the state-of-the-art post-training quantization of diffusion model
by a sizable margin with similar computational cost
- …