689 research outputs found
Fractional-Order Sliding Mode Synchronization for Fractional-Order Chaotic Systems
Some sufficient conditions, which are valid for stability check of fractional-order nonlinear systems, are given in this paper. Based on these results, the synchronization of two fractional-order chaotic systems is investigated. A novel fractional-order sliding surface, which is composed of a synchronization error and its fractional-order integral, is introduced. The asymptotical stability of the synchronization error dynamical system can be guaranteed by the proposed fractional-order sliding mode controller. Finally, two numerical examples are given to show the feasibility of the proposed methods
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data
Large Vision-Language Models (VLMs) have demonstrated impressive performance
on complex tasks involving visual input with natural language instructions.
However, it remains unclear to what extent capabilities on natural images
transfer to Earth observation (EO) data, which are predominantly satellite and
aerial images less common in VLM training data. In this work, we propose a
comprehensive benchmark to gauge the progress of VLMs toward being useful tools
for EO data by assessing their abilities on scene understanding, localization
and counting, and change detection tasks. Motivated by real-world applications,
our benchmark includes scenarios like urban monitoring, disaster relief, land
use, and conservation. We discover that, although state-of-the-art VLMs like
GPT-4V possess extensive world knowledge that leads to strong performance on
open-ended tasks like location understanding and image captioning, their poor
spatial reasoning limits usefulness on object localization and counting tasks.
Our benchmark will be made publicly available at https://vleo.danielz.ch/ and
on Hugging Face at
https://huggingface.co/collections/mit-ei/vleo-benchmark-datasets-65b789b0466555489cce0d70
for easy model evaluation.Comment: 62 pages; work in progres
Structure of nanoscale-pitch helical phases: blue phase and twist-bend nematic phase resolved by resonant soft X-ray scattering
Periodic structures of phases with orientational order of molecules, but
homogenous electron density distribution: a short pitch cholesteric, blue phase
and twist-bend nematic phase, were probed by a resonant soft x-ray scattering
(RSoXS) at the carbon K-edge. The theoretical model shows that in case of a
simple heliconical nematic structure two resonant signals corresponding to the
full and half pitch band should be present, while only the full pitch band is
observed in experiment. This suggests that the twist-bend nematic phase has
complex structure with a double-helix, built of two interlocked, shifted
helices. We confirm that the helical pitch in the twist-bend nematic phase is
in a 10 nm range, for both, the chiral and achiral materials. We also show that
the symmetry of a blue phase can unambiguously be determined through a resonant
enhancement of x-ray diffraction signals, by including polarization effects,
which are found to be an important indicator in phase structure determination
Geometric Multi-Model Fitting by Deep Reinforcement Learning
This paper deals with the geometric multi-model fitting from noisy,
unstructured point set data (e.g., laser scanned point clouds). We formulate
multi-model fitting problem as a sequential decision making process. We then
use a deep reinforcement learning algorithm to learn the optimal decisions
towards the best fitting result. In this paper, we have compared our method
against the state-of-the-art on simulated data. The results demonstrated that
our approach significantly reduced the number of fitting iterations
Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning
Extracting meaningful entities belonging to predefined categories from
Visually-rich Form-like Documents (VFDs) is a challenging task. Visual and
layout features such as font, background, color, and bounding box location and
size provide important cues for identifying entities of the same type. However,
existing models commonly train a visual encoder with weak cross-modal
supervision signals, resulting in a limited capacity to capture these
non-textual features and suboptimal performance. In this paper, we propose a
novel \textbf{V}isually-\textbf{A}symmetric co\textbf{N}sisten\textbf{C}y
\textbf{L}earning (\textsc{Vancl}) approach that addresses the above limitation
by enhancing the model's ability to capture fine-grained visual and layout
features through the incorporation of color priors. Experimental results on
benchmark datasets show that our approach substantially outperforms the strong
LayoutLM series baseline, demonstrating the effectiveness of our approach.
Additionally, we investigate the effects of different color schemes on our
approach, providing insights for optimizing model performance. We believe our
work will inspire future research on multimodal information extraction.Comment: 14 pages, 6 figures, Accepted by EMNLP202
BerDiff: Conditional Bernoulli Diffusion Model for Medical Image Segmentation
Medical image segmentation is a challenging task with inherent ambiguity and
high uncertainty, attributed to factors such as unclear tumor boundaries and
multiple plausible annotations. The accuracy and diversity of segmentation
masks are both crucial for providing valuable references to radiologists in
clinical practice. While existing diffusion models have shown strong capacities
in various visual generation tasks, it is still challenging to deal with
discrete masks in segmentation. To achieve accurate and diverse medical image
segmentation masks, we propose a novel conditional Bernoulli Diffusion model
for medical image segmentation (BerDiff). Instead of using the Gaussian noise,
we first propose to use the Bernoulli noise as the diffusion kernel to enhance
the capacity of the diffusion model for binary segmentation tasks, resulting in
more accurate segmentation masks. Second, by leveraging the stochastic nature
of the diffusion model, our BerDiff randomly samples the initial Bernoulli
noise and intermediate latent variables multiple times to produce a range of
diverse segmentation masks, which can highlight salient regions of interest
that can serve as valuable references for radiologists. In addition, our
BerDiff can efficiently sample sub-sequences from the overall trajectory of the
reverse diffusion, thereby speeding up the segmentation process. Extensive
experimental results on two medical image segmentation datasets with different
modalities demonstrate that our BerDiff outperforms other recently published
state-of-the-art methods. Our results suggest diffusion models could serve as a
strong backbone for medical image segmentation.Comment: 14 pages, 7 figure
A Spatio-temporal Decomposition Method for the Coordinated Economic Dispatch of Integrated Transmission and Distribution Grids
With numerous distributed energy resources (DERs) integrated into the
distribution networks (DNs), the coordinated economic dispatch (C-ED) is
essential for the integrated transmission and distribution grids. For large
scale power grids, the centralized C-ED meets high computational burden and
information privacy issues. To tackle these issues, this paper proposes a
spatio-temporal decomposition algorithm to solve the C-ED in a distributed and
parallel manner. In the temporal dimension, the multi-period economic dispatch
(ED) of transmission grid (TG) is decomposed to several subproblems by
introducing auxiliary variables and overlapping time intervals to deal with the
temporal coupling constraints. Besides, an accelerated alternative direction
method of multipliers (A-ADMM) based temporal decomposition algorithm with the
warm-start strategy, is developed to solve the ED subproblems of TG in
parallel. In the spatial dimension, a multi-parametric programming projection
based spatial decomposition algorithm is developed to coordinate the ED
problems of TG and DNs in a distributed manner. To further improve the
convergence performance of the spatial decomposition algorithm, the aggregate
equivalence approach is used for determining the feasible range of boundary
variables of TG and DNs. Moreover, we prove that the proposed spatio-temporal
decomposition method can obtain the optimal solution for bilevel convex
optimization problems with continuously differentiable objectives and
constraints. Numerical tests are conducted on three systems with different
scales, demonstrating the high computational efficiency and scalability of the
proposed spatio-temporal decomposition method
- …