121 research outputs found
Accelerating Toeplitz Neural Network with Constant-time Inference Complexity
Toeplitz Neural Networks (TNNs) have exhibited outstanding performance in
various sequence modeling tasks. They outperform commonly used
Transformer-based models while benefiting from log-linear space-time
complexities. On the other hand, State Space Models (SSMs) achieve lower
performance than TNNs in language modeling but offer the advantage of constant
inference complexity. In this paper, we aim to combine the strengths of TNNs
and SSMs by converting TNNs to SSMs during inference, thereby enabling TNNs to
achieve the same constant inference complexities as SSMs. To accomplish this,
we formulate the conversion process as an optimization problem and provide a
closed-form solution. We demonstrate how to transform the target equation into
a Vandermonde linear system problem, which can be efficiently solved using the
Discrete Fourier Transform (DFT). Notably, our method requires no training and
maintains numerical stability. It can be also applied to any LongConv-based
model. To assess its effectiveness, we conduct extensive experiments on
language modeling tasks across various settings. Additionally, we compare our
method to other gradient-descent solutions, highlighting the superior numerical
stability of our approach. The source code is available at
https://github.com/OpenNLPLab/ETSC-Exact-Toeplitz-to-SSM-Conversion.Comment: Accepted to EMNLP 2023. Yiran Zhong is the corresponding author. The
source code is available at
https://github.com/OpenNLPLab/ETSC-Exact-Toeplitz-to-SSM-Conversio
Unsupervised Deep Epipolar Flow for Stationary or Dynamic Scenes
Unsupervised deep learning for optical flow computation has achieved
promising results. Most existing deep-net based methods rely on image
brightness consistency and local smoothness constraint to train the networks.
Their performance degrades at regions where repetitive textures or occlusions
occur. In this paper, we propose Deep Epipolar Flow, an unsupervised optical
flow method which incorporates global geometric constraints into network
learning. In particular, we investigate multiple ways of enforcing the epipolar
constraint in flow estimation. To alleviate a "chicken-and-egg" type of problem
encountered in dynamic scenes where multiple motions may be present, we propose
a low-rank constraint as well as a union-of-subspaces constraint for training.
Experimental results on various benchmarking datasets show that our method
achieves competitive performance compared with supervised methods and
outperforms state-of-the-art unsupervised deep-learning methods.Comment: CVPR 201
A Projection-Based Approach for Distributed Energy Resources Aggregation
Aggregating distributed energy resources (DERs) is of great significance to
improve the overall operational efficiency of smart grid. The aggregation model
needs to consider various factors such as network constraints, operational
constraints, and economic characteristics of the DERs. This paper constructs a
multi-slot DER aggregation model that considers the above factors using
feasible region projection approach, which achieved the protection of DERs data
information and the elimination of internal variables. A system economic
dispatch (ED) model is established for the operators to make full use of the
DER clusters. We calculate the feasible regions with temporal coupling by
extending the Progressive Vertex Enumeration (PVE) algorithm to high dimension
by the Quickhull algorithm. Finally, an IEEE 39-bus distribution network is
simulated with DERs to verify the effectiveness of the proposed model. Results
show that the two-step ED derives the same results as the centralized ED
Image-based Geolocalization by Ground-to-2.5D Map Matching
We study the image-based geolocalization problem, aiming to localize
ground-view query images on cartographic maps. Current methods often utilize
cross-view localization techniques to match ground-view query images with 2D
maps. However, the performance of these methods is unsatisfactory due to
significant cross-view appearance differences. In this paper, we lift
cross-view matching to a 2.5D space, where heights of structures (e.g., trees
and buildings) provide geometric information to guide the cross-view matching.
We propose a new approach to learning representative embeddings from
multi-modal data. Specifically, we establish a projection relationship between
2.5D space and 2D aerial-view space. The projection is further used to combine
multi-modal features from the 2.5D and 2D maps using an effective
pixel-to-point fusion method. By encoding crucial geometric cues, our method
learns discriminative location embeddings for matching panoramic images and
maps. Additionally, we construct the first large-scale ground-to-2.5D map
geolocalization dataset to validate our method and facilitate future research.
Both single-image based and route based localization experiments are conducted
to test our method. Extensive experiments demonstrate that the proposed method
achieves significantly higher localization accuracy and faster convergence than
previous 2D map-based approaches
- …