55 research outputs found
CSWA: Aggregation-Free Spatial-Temporal Community Sensing
In this paper, we present a novel community sensing paradigm -- {C}ommunity
{S}ensing {W}ithout {A}ggregation}. CSWA is designed to obtain the environment
information (e.g., air pollution or temperature) in each subarea of the target
area, without aggregating sensor and location data collected by community
members. CSWA operates on top of a secured peer-to-peer network over the
community members and proposes a novel \emph{Decentralized Spatial-Temporal
Compressive Sensing} framework based on \emph{Parallelized Stochastic Gradient
Descent}. Through learning the \emph{low-rank structure} via distributed
optimization, CSWA approximates the value of the sensor data in each subarea
(both covered and uncovered) for each sensing cycle using the sensor data
locally stored in each member's mobile device. Simulation experiments based on
real-world datasets demonstrate that CSWA exhibits low approximation error
(i.e., less than C in city-wide temperature sensing task and
units of PM2.5 index in urban air pollution sensing) and performs comparably to
(sometimes better than) state-of-the-art algorithms based on the data
aggregation and centralized computation.Comment: This paper has been accepted by AAAI 2018. First two authors are
equally contribute
TiC: Exploring Vision Transformer in Convolution
While models derived from Vision Transformers (ViTs) have been phonemically
surging, pre-trained models cannot seamlessly adapt to arbitrary resolution
images without altering the architecture and configuration, such as sampling
the positional encoding, limiting their flexibility for various vision tasks.
For instance, the Segment Anything Model (SAM) based on ViT-Huge requires all
input images to be resized to 10241024. To overcome this limitation, we
propose the Multi-Head Self-Attention Convolution (MSA-Conv) that incorporates
Self-Attention within generalized convolutions, including standard, dilated,
and depthwise ones. Enabling transformers to handle images of varying sizes
without retraining or rescaling, the use of MSA-Conv further reduces
computational costs compared to global attention in ViT, which grows costly as
image size increases. Later, we present the Vision Transformer in Convolution
(TiC) as a proof of concept for image classification with MSA-Conv, where two
capacity enhancing strategies, namely Multi-Directional Cyclic Shifted
Mechanism and Inter-Pooling Mechanism, have been proposed, through establishing
long-distance connections between tokens and enlarging the effective receptive
field. Extensive experiments have been carried out to validate the overall
effectiveness of TiC. Additionally, ablation studies confirm the performance
improvement made by MSA-Conv and the two capacity enhancing strategies
separately. Note that our proposal aims at studying an alternative to the
global attention used in ViT, while MSA-Conv meets our goal by making TiC
comparable to state-of-the-art on ImageNet-1K. Code will be released at
https://github.com/zs670980918/MSA-Conv
Continual Driving Policy Optimization with Closed-Loop Individualized Curricula
The safety of autonomous vehicles (AV) has been a long-standing top concern,
stemming from the absence of rare and safety-critical scenarios in the
long-tail naturalistic driving distribution. To tackle this challenge, a surge
of research in scenario-based autonomous driving has emerged, with a focus on
generating high-risk driving scenarios and applying them to conduct
safety-critical testing of AV models. However, limited work has been explored
on the reuse of these extensive scenarios to iteratively improve AV models.
Moreover, it remains intractable and challenging to filter through gigantic
scenario libraries collected from other AV models with distinct behaviors,
attempting to extract transferable information for current AV improvement.
Therefore, we develop a continual driving policy optimization framework
featuring Closed-Loop Individualized Curricula (CLIC), which we factorize into
a set of standardized sub-modules for flexible implementation choices: AV
Evaluation, Scenario Selection, and AV Training. CLIC frames AV Evaluation as a
collision prediction task, where it estimates the chance of AV failures in
these scenarios at each iteration. Subsequently, by re-sampling from historical
scenarios based on these failure probabilities, CLIC tailors individualized
curricula for downstream training, aligning them with the evaluated capability
of AV. Accordingly, CLIC not only maximizes the utilization of the vast
pre-collected scenario library for closed-loop driving policy optimization but
also facilitates AV improvement by individualizing its training with more
challenging cases out of those poorly organized scenarios. Experimental results
clearly indicate that CLIC surpasses other curriculum-based training
strategies, showing substantial improvement in managing risky scenarios, while
still maintaining proficiency in handling simpler cases
Dynamic Path Planning for Unmanned Aerial Vehicles under Deadline and Sector Capacity Constraints
The US National Airspace System is currently operating at a level close to its maximum potential. The limitation comes from the workload demand on the air traffic controllers. Currently, the air traffic flow management is based on the flight path requests by the airline operators, whereas the minimum separation assurance between flights is handled strategically by air traffic control personnel. In this paper, we propose a scalable framework that allows path planning for a large number of unmanned aerial vehicles (UAVs) taking into account the deadline and weather constraints. Our proposed solution has a polynomial-time computational complexity that is also verified by measuring the runtime for typical workloads. We further demonstrate that the proposed framework is able to route 80% of the workloads while not exceeding the sector capacity constraints, even under dynamic weather conditions. Due to low computational complexity, our framework is suitable for a fleet of UAVs where decentralizing the routing process limits the workload demand on the air traffic personnel
Natural Language based Context Modeling and Reasoning with LLMs: A Tutorial
Large language models (LLMs) have become phenomenally surging, since
2018--two decades after introducing context-awareness into computing systems.
Through taking into account the situations of ubiquitous devices, users and the
societies, context-aware computing has enabled a wide spectrum of innovative
applications, such as assisted living, location-based social network services
and so on. To recognize contexts and make decisions for actions accordingly,
various artificial intelligence technologies, such as Ontology and OWL, have
been adopted as representations for context modeling and reasoning. Recently,
with the rise of LLMs and their improved natural language understanding and
reasoning capabilities, it has become feasible to model contexts using natural
language and perform context reasoning by interacting with LLMs such as ChatGPT
and GPT-4. In this tutorial, we demonstrate the use of texts, prompts, and
autonomous agents (AutoAgents) that enable LLMs to perform context modeling and
reasoning without requiring fine-tuning of the model. We organize and introduce
works in the related field, and name this computing paradigm as the LLM-driven
Context-aware Computing (LCaC). In the LCaC paradigm, users' requests, sensors
reading data, and the command to actuators are supposed to be represented as
texts. Given the text of users' request and sensor data, the AutoAgent models
the context by prompting and sends to the LLM for context reasoning. LLM
generates a plan of actions and responds to the AutoAgent, which later follows
the action plan to foster context-awareness. To prove the concepts, we use two
showcases--(1) operating a mobile z-arm in an apartment for assisted living,
and (2) planning a trip and scheduling the itinerary in a context-aware and
personalized manner.Comment: Under revie
Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal task
for autonomous driving, as it involves predicting per-voxel occupancy within a
3D scene from partial LiDAR or image inputs. Existing methods primarily focus
on the voxel-wise feature aggregation, while neglecting the instance-centric
semantics and broader context. In this paper, we present a novel paradigm
termed Symphonies (Scene-from-Insts) for SSC, which completes the scene volume
from a sparse set of instance queries derived from the input with context
awareness. By incorporating the queries as the instance feature representations
within the scene, Symphonies dynamically encodes the instance-centric semantics
to interact with the image and volume features while avoiding the dense
voxel-wise modeling. Simultaneously, it orchestrates a more comprehensive
understanding of the scenario by capturing context throughout the entire scene,
contributing to alleviating the geometric ambiguity derived from occlusion and
perspective errors. Symphonies achieves a state-of-the-art result of 13.02 mIoU
on the challenging SemanticKITTI dataset, outperforming existing methods and
showcasing the promising advancements of the paradigm. The code is available at
\url{https://github.com/hustvl/Symphonies}.Comment: Technical report. Code and models at:
https://github.com/hustvl/Symphonie
EdgeSense: Edge-Mediated Spatial-Temporal Crowdsensing
Edge computing recently is increasingly popular due to the growth of data size and the need of sensing with the reduced center. Based on Edge computing architecture, we propose a novel crowdsensing framework called Edge-Mediated Spatial-Temporal Crowdsensing. This algorithm targets on receiving the environment information such as air pollution, temperature, and traffic flow in some parts of the goal area, and does not aggregate sensor data with its location information. Specifically, EdgeSense works on top of a secured peer-To-peer network consisted of participants and propose a novel Decentralized Spatial-Temporal Crowdsensing framework based on Parallelized Stochastic Gradient Descent. To approximate the sensing data in each part of the target area in each sensing cycle, EdgeSense uses the local sensor data in participants\u27 mobile devices to learn the low-rank characteristic and then recovers the sensing data from it. We evaluate the EdgeSense on the real-world data sets (temperature [1] and PM2.5 [2] data sets), where our algorithm can achieve low error in approximation and also can compete with the baseline algorithm which is designed using centralized and aggregated mechanism
- …