558 research outputs found
Structure from Motion with Higher-level Environment Representations
Computer vision is an important area focusing on understanding,
extracting and using the information from vision-based sensor. It
has many applications such as vision-based 3D reconstruction,
simultaneous localization and mapping(SLAM) and data-driven
understanding of the real world. Vision is a fundamental sensing
modality in many different fields of application.
While the traditional structure from motion mostly uses sparse
point-based feature, this thesis aims to explore the possibility
of using higher order feature representation. It starts with a
joint work which uses straight line for feature representation
and performs bundle adjustment with straight line
parameterization. Then, we further try an even higher order
representation where we use Bezier spline for parameterization.
We start with a simple case where all contours are lying on the
plane and uses Bezier splines to parametrize the curves in the
background and optimize on both camera position and Bezier
splines. For application, we present a complete end-to-end
pipeline which produces meaningful dense 3D models from natural
data of a 3D object: the target object is placed on a structured
but unknown planar background that is modeled with splines. The
data is captured using only a hand-held monocular camera.
However, this application is limited to a planar scenario and we
manage to push the parameterizations into real 3D. Following the
potential of this idea, we introduce a more flexible higher-order
extension of points that provide a general model for structural
edges in the environment, no matter if straight or curved. Our
model relies on linked B´ezier curves, the geometric intuition
of which proves great benefits during parameter initialization
and regularization. We present the
first fully automatic pipeline that is able to generate
spline-based representations without any human supervision.
Besides a full graphical formulation of the problem, we introduce
both geometric and photometric cues as well as higher-level
concepts such overall curve visibility and viewing angle
restrictions to automatically manage the correspondences in the
graph. Results prove that curve-based structure from motion with
splines is able to outperform state-of-the-art sparse
feature-based methods, as well as to model curved edges in the
environment
Quantum interference and controllable magic cavity QED via giant atom in coupled resonator waveguide
We study the Markovian and Non-Markovian dynamics in a giant atom system
which couples to a coupled resonator waveguide (CRW) via two distant sites.
Under certain conditions, we find that the giant atom population can exhibit an
oscillating behavior and the photon can be trapped in the giant atom regime.
These phenomena are induced by the interference effect among the bound states
both in and outside the continuum. As an application of the photon trapping, we
theoretically propose a magic cavity model where the giant atom serve as either
a perfect or leaky cavity, depending on the distance between the coupling
sites. The controllability of the magic cavity from perfect to leaky one can
not be realized in the traditional cavity or circuit QED setup. The predicted
effects can be probed in state-of-the-art waveguide QED experiments and provide
a striking example of how the different kinds of bound states modify the
dynamics of quantum open system in a structured environment.Comment: 11 pages, 7 figures, comments are welcome
Nearest Neighbor Machine Translation is Meta-Optimizer on Output Projection Layer
Nearest Neighbor Machine Translation (NN-MT) has achieved great success in
domain adaptation tasks by integrating pre-trained Neural Machine Translation
(NMT) models with domain-specific token-level retrieval. However, the reasons
underlying its success have not been thoroughly investigated. In this paper, we
comprehensively analyze NN-MT through theoretical and empirical studies.
Initially, we provide new insights into the working mechanism of NN-MT as an
efficient technique to implicitly execute gradient descent on the output
projection layer of NMT, indicating that it is a specific case of model
fine-tuning. Subsequently, we conduct multi-domain experiments and word-level
analysis to examine the differences in performance between NN-MT and
entire-model fine-tuning. Our findings suggest that: (1) Incorporating NN-MT
with adapters yields comparable translation performance to fine-tuning on
in-domain test sets, while achieving better performance on out-of-domain test
sets; (2) Fine-tuning significantly outperforms NN-MT on the recall of
in-domain low-frequency words, but this gap could be bridged by optimizing the
context representations with additional adapter layers.Comment: Accepted by EMNLP202
QuMoS: A Framework for Preserving Security of Quantum Machine Learning Model
Security has always been a critical issue in machine learning (ML)
applications. Due to the high cost of model training -- such as collecting
relevant samples, labeling data, and consuming computing power --
model-stealing attack is one of the most fundamental but vitally important
issues. When it comes to quantum computing, such a quantum machine learning
(QML) model-stealing attack also exists and is even more severe because the
traditional encryption method, such as homomorphic encryption can hardly be
directly applied to quantum computation. On the other hand, due to the limited
quantum computing resources, the monetary cost of training QML model can be
even higher than classical ones in the near term. Therefore, a well-tuned QML
model developed by a third-party company can be delegated to a quantum cloud
provider as a service to be used by ordinary users. In this case, the QML model
will likely be leaked if the cloud provider is under attack. To address such a
problem, we propose a novel framework, namely QuMoS, to preserve model
security. We propose to divide the complete QML model into multiple parts and
distribute them to multiple physically isolated quantum cloud providers for
execution. As such, even if the adversary in a single provider can obtain a
partial model, it does not have sufficient information to retrieve the complete
model. Although promising, we observed that an arbitrary model design under
distributed settings cannot provide model security. We further developed a
reinforcement learning-based security engine, which can automatically optimize
the model design under the distributed setting, such that a good trade-off
between model performance and security can be made. Experimental results on
four datasets show that the model design proposed by QuMoS can achieve
competitive performance while providing the highest security than the
baselines
DCP-Net: A Distributed Collaborative Perception Network for Remote Sensing Semantic Segmentation
Onboard intelligent processing is widely applied in emergency tasks in the
field of remote sensing. However, it is predominantly confined to an individual
platform with a limited observation range as well as susceptibility to
interference, resulting in limited accuracy. Considering the current state of
multi-platform collaborative observation, this article innovatively presents a
distributed collaborative perception network called DCP-Net. Firstly, the
proposed DCP-Net helps members to enhance perception performance by integrating
features from other platforms. Secondly, a self-mutual information match module
is proposed to identify collaboration opportunities and select suitable
partners, prioritizing critical collaborative features and reducing redundant
transmission cost. Thirdly, a related feature fusion module is designed to
address the misalignment between local and collaborative features, improving
the quality of fused features for the downstream task. We conduct extensive
experiments and visualization analyses using three semantic segmentation
datasets, including Potsdam, iSAID and DFC23. The results demonstrate that
DCP-Net outperforms the existing methods comprehensively, improving mIoU by
2.61%~16.89% at the highest collaboration efficiency, which promotes the
performance to a state-of-the-art level
FabricFolding: Learning Efficient Fabric Folding without Expert Demonstrations
Autonomous fabric manipulation is a challenging task due to complex dynamics
and potential self-occlusion during fabric handling. An intuitive method of
fabric folding manipulation first involves obtaining a smooth and unfolded
fabric configuration before the folding process begins. However, the
combination of quasi-static actions such as pick & place and dynamic action
like fling proves inadequate in effectively unfolding long-sleeved T-shirts
with sleeves mostly tucked inside the garment. To address this limitation, this
paper introduces an improved quasi-static action called pick & drag,
specifically designed to handle this type of fabric configuration.
Additionally, an efficient dual-arm manipulation system is designed in this
paper, which combines quasi-static (including pick & place and pick & drag) and
dynamic fling actions to flexibly manipulate fabrics into unfolded and smooth
configurations. Subsequently, keypoints of the fabric are detected, enabling
autonomous folding. To address the scarcity of publicly available keypoint
detection datasets for real fabric, we gathered images of various fabric
configurations and types in real scenes to create a comprehensive keypoint
dataset for fabric folding. This dataset aims to enhance the success rate of
keypoint detection. Moreover, we evaluate the effectiveness of our proposed
system in real-world settings, where it consistently and reliably unfolds and
folds various types of fabrics, including challenging situations such as
long-sleeved T-shirts with most parts of sleeves tucked inside the garment.
Specifically, our method achieves a coverage rate of 0.822 and a success rate
of 0.88 for long-sleeved T-shirts folding
GraphPrompt: Biomedical Entity Normalization Using Graph-based Prompt Templates
Biomedical entity normalization unifies the language across biomedical
experiments and studies, and further enables us to obtain a holistic view of
life sciences. Current approaches mainly study the normalization of more
standardized entities such as diseases and drugs, while disregarding the more
ambiguous but crucial entities such as pathways, functions and cell types,
hindering their real-world applications. To achieve biomedical entity
normalization on these under-explored entities, we first introduce an
expert-curated dataset OBO-syn encompassing 70 different types of entities and
2 million curated entity-synonym pairs. To utilize the unique graph structure
in this dataset, we propose GraphPrompt, a prompt-based learning approach that
creates prompt templates according to the graphs. GraphPrompt obtained 41.0%
and 29.9% improvement on zero-shot and few-shot settings respectively,
indicating the effectiveness of these graph-based prompt templates. We envision
that our method GraphPrompt and OBO-syn dataset can be broadly applied to
graph-based NLP tasks, and serve as the basis for analyzing diverse and
accumulating biomedical data.Comment: 12 page
- …