118 research outputs found
LIGHT: Joint Individual Building Extraction and Height Estimation from Satellite Images through a Unified Multitask Learning Network
Building extraction and height estimation are two important basic tasks in
remote sensing image interpretation, which are widely used in urban planning,
real-world 3D construction, and other fields. Most of the existing research
regards the two tasks as independent studies. Therefore the height information
cannot be fully used to improve the accuracy of building extraction and vice
versa. In this work, we combine the individuaL buIlding extraction and heiGHt
estimation through a unified multiTask learning network (LIGHT) for the first
time, which simultaneously outputs a height map, bounding boxes, and a
segmentation mask map of buildings. Specifically, LIGHT consists of an instance
segmentation branch and a height estimation branch. In particular, so as to
effectively unify multi-scale feature branches and alleviate feature spans
between branches, we propose a Gated Cross Task Interaction (GCTI) module that
can efficiently perform feature interaction between branches. Experiments on
the DFC2023 dataset show that our LIGHT can achieve superior performance, and
our GCTI module with ResNet101 as the backbone can significantly improve the
performance of multitask learning by 2.8% AP50 and 6.5% delta1, respectively
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
We consider the problem of eliciting compositional generalization
capabilities in large language models (LLMs) with a novel type of prompting
strategy. Compositional generalization empowers the LLMs to solve problems that
are harder than the ones they have seen (i.e., easy-to-hard generalization),
which is a critical reasoning capability of human-like intelligence. However,
even the current state-of-the-art LLMs still struggle with this form of
reasoning. To bridge this gap, we propose skills-in-context (SKiC) prompting,
which instructs LLMs how to compose basic skills to resolve more complex
problems. We find that it is crucial to demonstrate both the skills and the
compositional examples within the same prompting context. With as few as two
examplars, our SKiC prompting initiates strong synergies between skills and
their composition capabilities. Notably, it empowers LLMs to solve unseen
problems that require innovative skill compositions, achieving near-perfect
generalization on a broad range of challenging compositionality tasks.
Intriguingly, SKiC prompting unlocks the latent potential of LLMs, enabling
them to leverage pre-existing internal skills acquired during earlier
pre-training stages, even when these skills are not explicitly presented in the
prompting context. This results in the capability of LLMs to solve unseen
complex problems by activating and composing internal competencies. With such
prominent features, SKiC prompting is able to achieve state-of-the-art
performance on challenging mathematical reasoning benchmarks (e.g., MATH)
PIVOINE: Instruction Tuning for Open-world Information Extraction
We consider the problem of Open-world Information Extraction (Open-world IE),
which extracts comprehensive entity profiles from unstructured texts. Different
from the conventional closed-world setting of Information Extraction (IE),
Open-world IE considers a more general situation where entities and relations
could be beyond a predefined ontology. More importantly, we seek to develop a
large language model (LLM) that is able to perform Open-world IE to extract
desirable entity profiles characterized by (possibly fine-grained) natural
language instructions. We achieve this by finetuning LLMs using instruction
tuning. In particular, we construct INSTRUCTOPENWIKI, a substantial instruction
tuning dataset for Open-world IE enriched with a comprehensive corpus,
extensive annotations, and diverse instructions. We finetune the pretrained
BLOOM models on INSTRUCTOPENWIKI and obtain PIVOINE, an LLM for Open-world IE
with strong instruction-following capabilities. Our experiments demonstrate
that PIVOINE significantly outperforms traditional closed-world methods and
other LLM baselines, displaying impressive generalization capabilities on both
unseen instructions and out-of-ontology cases. Consequently, PIVOINE emerges as
a promising solution to tackle the open-world challenge in IE effectively
DCP-Net: A Distributed Collaborative Perception Network for Remote Sensing Semantic Segmentation
Onboard intelligent processing is widely applied in emergency tasks in the
field of remote sensing. However, it is predominantly confined to an individual
platform with a limited observation range as well as susceptibility to
interference, resulting in limited accuracy. Considering the current state of
multi-platform collaborative observation, this article innovatively presents a
distributed collaborative perception network called DCP-Net. Firstly, the
proposed DCP-Net helps members to enhance perception performance by integrating
features from other platforms. Secondly, a self-mutual information match module
is proposed to identify collaboration opportunities and select suitable
partners, prioritizing critical collaborative features and reducing redundant
transmission cost. Thirdly, a related feature fusion module is designed to
address the misalignment between local and collaborative features, improving
the quality of fused features for the downstream task. We conduct extensive
experiments and visualization analyses using three semantic segmentation
datasets, including Potsdam, iSAID and DFC23. The results demonstrate that
DCP-Net outperforms the existing methods comprehensively, improving mIoU by
2.61%~16.89% at the highest collaboration efficiency, which promotes the
performance to a state-of-the-art level
Semantic Segmentation for Point Cloud Scenes via Dilated Graph Feature Aggregation and Pyramid Decoders
Semantic segmentation of point clouds generates comprehensive understanding
of scenes through densely predicting the category for each point. Due to the
unicity of receptive field, semantic segmentation of point clouds remains
challenging for the expression of multi-receptive field features, which brings
about the misclassification of instances with similar spatial structures. In
this paper, we propose a graph convolutional network DGFA-Net rooted in dilated
graph feature aggregation (DGFA), guided by multi-basis aggregation loss
(MALoss) calculated through Pyramid Decoders. To configure multi-receptive
field features, DGFA which takes the proposed dilated graph convolution
(DGConv) as its basic building block, is designed to aggregate multi-scale
feature representation by capturing dilated graphs with various receptive
regions. By simultaneously considering penalizing the receptive field
information with point sets of different resolutions as calculation bases, we
introduce Pyramid Decoders driven by MALoss for the diversity of receptive
field bases. Combining these two aspects, DGFA-Net significantly improves the
segmentation performance of instances with similar spatial structures.
Experiments on S3DIS, ShapeNetPart and Toronto-3D show that DGFA-Net
outperforms the baseline approach, achieving a new state-of-the-art
segmentation performance.Comment: accepted to AAAI Workshop 202
Joint Parsing and Generation for Abstractive Summarization
Sentences produced by abstractive summarization systems can be ungrammatical
and fail to preserve the original meanings, despite being locally fluent. In
this paper we propose to remedy this problem by jointly generating a sentence
and its syntactic dependency parse while performing abstraction. If generating
a word can introduce an erroneous relation to the summary, the behavior must be
discouraged. The proposed method thus holds promise for producing grammatical
sentences and encouraging the summary to stay true-to-original. Our
contributions of this work are twofold. First, we present a novel neural
architecture for abstractive summarization that combines a sequential decoder
with a tree-based decoder in a synchronized manner to generate a summary
sentence and its syntactic parse. Secondly, we describe a novel human
evaluation protocol to assess if, and to what extent, a summary remains true to
its original meanings. We evaluate our method on a number of summarization
datasets and demonstrate competitive results against strong baselines.Comment: AAAI 2020 (Main Technical Track
Elevation Estimation-Driven Building 3D Reconstruction from Single-View Remote Sensing Imagery
Building 3D reconstruction from remote sensing images has a wide range of
applications in smart cities, photogrammetry and other fields. Methods for
automatic 3D urban building modeling typically employ multi-view images as
input to algorithms to recover point clouds and 3D models of buildings.
However, such models rely heavily on multi-view images of buildings, which are
time-intensive and limit the applicability and practicality of the models. To
solve these issues, we focus on designing an efficient DSM estimation-driven
reconstruction framework (Building3D), which aims to reconstruct 3D building
models from the input single-view remote sensing image. First, we propose a
Semantic Flow Field-guided DSM Estimation (SFFDE) network, which utilizes the
proposed concept of elevation semantic flow to achieve the registration of
local and global features. Specifically, in order to make the network semantics
globally aware, we propose an Elevation Semantic Globalization (ESG) module to
realize the semantic globalization of instances. Further, in order to alleviate
the semantic span of global features and original local features, we propose a
Local-to-Global Elevation Semantic Registration (L2G-ESR) module based on
elevation semantic flow. Our Building3D is rooted in the SFFDE network for
building elevation prediction, synchronized with a building extraction network
for building masks, and then sequentially performs point cloud reconstruction,
surface reconstruction (or CityGML model reconstruction). On this basis, our
Building3D can optionally generate CityGML models or surface mesh models of the
buildings. Extensive experiments on ISPRS Vaihingen and DFC2019 datasets on the
DSM estimation task show that our SFFDE significantly improves upon
state-of-the-arts. Furthermore, our Building3D achieves impressive results in
the 3D point cloud and 3D model reconstruction process
- …