229 research outputs found
Object-oriented Neural Programming (OONP) for Document Understanding
We propose Object-oriented Neural Programming (OONP), a framework for
semantically parsing documents in specific domains. Basically, OONP reads a
document and parses it into a predesigned object-oriented data structure
(referred to as ontology in this paper) that reflects the domain-specific
semantics of the document. An OONP parser models semantic parsing as a decision
process: a neural net-based Reader sequentially goes through the document, and
during the process it builds and updates an intermediate ontology to summarize
its partial understanding of the text it covers. OONP supports a rich family of
operations (both symbolic and differentiable) for composing the ontology, and a
big variety of forms (both symbolic and differentiable) for representing the
state and the document. An OONP parser can be trained with supervision of
different forms and strength, including supervised learning (SL) ,
reinforcement learning (RL) and hybrid of the two. Our experiments on both
synthetic and real-world document parsing tasks have shown that OONP can learn
to handle fairly complicated ontology with training data of modest sizes.Comment: accepted by ACL 201
A Robust Integrated Multi-Strategy Bus Control System via Deep Reinforcement Learning
An efficient urban bus control system has the potential to significantly
reduce travel delays and streamline the allocation of transportation resources,
thereby offering enhanced and user-friendly transit services to passengers.
However, bus operation efficiency can be impacted by bus bunching. This problem
is notably exacerbated when the bus system operates along a signalized corridor
with unpredictable travel demand. To mitigate this challenge, we introduce a
multi-strategy fusion approach for the longitudinal control of connected and
automated buses. The approach is driven by a physics-informed deep
reinforcement learning (DRL) algorithm and takes into account a variety of
traffic conditions along urban signalized corridors. Taking advantage of
connected and autonomous vehicle (CAV) technology, the proposed approach can
leverage real-time information regarding bus operating conditions and road
traffic environment. By integrating the aforementioned information into the
DRL-based bus control framework, our designed physics-informed DRL state fusion
approach and reward function efficiently embed prior physics and leverage the
merits of equilibrium and consensus concepts from control theory. This
integration enables the framework to learn and adapt multiple control
strategies to effectively manage complex traffic conditions and fluctuating
passenger demands. Three control variables, i.e., dwell time at stops, speed
between stations, and signal priority, are formulated to minimize travel
duration and ensure bus stability with the aim of avoiding bus bunching. We
present simulation results to validate the effectiveness of the proposed
approach, underlining its superior performance when subjected to sensitivity
analysis, specifically considering factors such as traffic volume, desired
speed, and traffic signal conditions
CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout
Recent advances have shown promise in merging neural radiance fields (NeRFs)
with pre-trained diffusion models for text-to-3D object generation. However,
one enduring challenge is their inadequate capability to accurately parse and
regenerate consistent multi-object environments. Specifically, these models
encounter difficulties in accurately representing quantity and style prompted
by multi-object texts, often resulting in a collapse of the rendering fidelity
that fails to match the semantic intricacies. Moreover, amalgamating these
elements into a coherent 3D scene is a substantial challenge, stemming from
generic distribution inherent in diffusion models. To tackle the issue of
'guidance collapse' and enhance consistency, we propose a novel framework,
dubbed CompoNeRF, by integrating an editable 3D scene layout with object
specific and scene-wide guidance mechanisms. It initiates by interpreting a
complex text into an editable 3D layout populated with multiple NeRFs, each
paired with a corresponding subtext prompt for precise object depiction. Next,
a tailored composition module seamlessly blends these NeRFs, promoting
consistency, while the dual-level text guidance reduces ambiguity and boosts
accuracy. Noticeably, the unique modularity of CompoNeRF permits NeRF
decomposition. This enables flexible scene editing and recomposition into new
scenes based on the edited layout or text prompts. Utilizing the open source
Stable Diffusion model, CompoNeRF not only generates scenes with high fidelity
but also paves the way for innovative multi-object composition using editable
3D layouts. Remarkably, our framework achieves up to a 54\% improvement in
performance, as measured by the multi-view CLIP score metric. Code is available
at https://github.com/hbai98/Componerf
On the Temporal-spatial Analysis of Estimating Urban Traffic Patterns Via GPS Trace Data of Car-hailing Vehicles
Car-hailing services have become a prominent data source for urban traffic
studies. Extracting useful information from car-hailing trace data is essential
for effective traffic management, while discrepancies between car-hailing
vehicles and urban traffic should be considered. This paper proposes a generic
framework for estimating and analyzing urban traffic patterns using car-hailing
trace data. The framework consists of three layers: the data layer, the
interactive software layer, and the processing method layer. By pre-processing
car-hailing GPS trace data with operations such as data cutting, map matching,
and trace correction, the framework generates tensor matrices that estimate
traffic patterns for car-hailing vehicle flow and average road speed. An
analysis block based on these matrices examines the relationships and
differences between car-hailing vehicles and urban traffic patterns, which have
been overlooked in previous research. Experimental results demonstrate the
effectiveness of the proposed framework in examining temporal-spatial patterns
of car-hailing vehicles and urban traffic. For temporal analysis, urban road
traffic displays a bimodal characteristic while car-hailing flow exhibits a
'multi-peak' pattern, fluctuating significantly during holidays and thus
generating a hierarchical structure. For spatial analysis, the heat maps
generated from the matrices exhibit certain discrepancies, but the spatial
distribution of hotspots and vehicle aggregation areas remains similar
Reviving Static Charts into Live Charts
Data charts are prevalent across various fields due to their efficacy in
conveying complex data relationships. However, static charts may sometimes
struggle to engage readers and efficiently present intricate information,
potentially resulting in limited understanding. We introduce "Live Charts," a
new format of presentation that decomposes complex information within a chart
and explains the information pieces sequentially through rich animations and
accompanying audio narration. We propose an automated approach to revive static
charts into Live Charts. Our method integrates GNN-based techniques to analyze
the chart components and extract data from charts. Then we adopt large natural
language models to generate appropriate animated visuals along with a
voice-over to produce Live Charts from static ones. We conducted a thorough
evaluation of our approach, which involved the model performance, use cases, a
crowd-sourced user study, and expert interviews. The results demonstrate Live
Charts offer a multi-sensory experience where readers can follow the
information and understand the data insights better. We analyze the benefits
and drawbacks of Live Charts over static charts as a new information
consumption experience
HairBrush for Immersive Data-Driven Hair Modeling
International audienceWhile hair is an essential component of virtual humans, it is also one of the most challenging digital assets to create. Existing automatic techniques lack the generality and flexibility to create rich hair variations, while manual authoring interfaces often require considerable artistic skills and efforts, especially for intricate 3D hair structures that can be difficult to navigate. We propose an interactive hair modeling system that can help create complex hairstyles in minutes or hours that would otherwise take much longer with existing tools. Modelers, including novice users, can focus on the overall hairstyles and local hair deformations, as our system intelligently suggests the desired hair parts. Our method combines the flexibility of manual authoring and the convenience of data-driven automation. Since hair contains intricate 3D structures such as buns, knots, and strands, they are inherently challenging to create using traditional 2D interfaces. Our system provides a new 3D hair author-ing interface for immersive interaction in virtual reality (VR). Users can draw high-level guide strips, from which our system predicts the most plausible hairstyles via a deep neural network trained from a professionally curated dataset. Each hairstyle in our dataset is composed of multiple variations, serving as blend-shapes to fit the user drawings via global blending and local deformation. The fitted hair models are visualized as interactive suggestions that the user can select, modify, or ignore. We conducted a user study to confirm that our system can significantly reduce manual labor while improve the output quality for modeling a variety of head and facial hairstyles that are challenging to create via existing techniques
KB4VA: A Knowledge Base of Visualization Designs for Visual Analytics
Visual analytics (VA) systems have been widely used to facilitate
decision-making and analytical reasoning in various application domains. VA
involves visual designs, interaction designs, and data mining, which is a
systematic and complex paradigm. In this work, we focus on the design of
effective visualizations for complex data and analytical tasks, which is a
critical step in designing a VA system. This step is challenging because it
requires extensive knowledge about domain problems and visualization to design
effective encodings. Existing visualization designs published in top venues are
valuable resources to inspire designs for problems with similar data structures
and tasks. However, those designs are hard to understand, parse, and retrieve
due to the lack of specifications. To address this problem, we build KB4VA, a
knowledge base of visualization designs in VA systems with comprehensive labels
about their analytical tasks and visual encodings. Our labeling scheme is
inspired by a workshop study with 12 VA researchers to learn user requirements
in understanding and retrieving professional visualization designs in VA
systems. The theme extends Vega-Lite specifications for describing advanced and
composited visualization designs in a declarative manner, thus facilitating
human understanding and automatic indexing. To demonstrate the usefulness of
our knowledge base, we present a user study about design inspirations for VA
tasks. In summary, our work opens new perspectives for enhancing the
accessibility and reusability of professional visualization designs
Intelligent Multi-Modal Sensing-Communication Integration: Synesthesia of Machines
In the era of sixth-generation (6G) wireless communications, integrated
sensing and communications (ISAC) is recognized as a promising solution to
upgrade the physical system by endowing wireless communications with sensing
capability. Existing ISAC is mainly oriented to static scenarios with
radio-frequency (RF) sensors being the primary participants, thus lacking a
comprehensive environment feature characterization and facing a severe
performance bottleneck in dynamic environments. To date, extensive surveys on
ISAC have been conducted but are limited to summarizing RF-based radar sensing.
Currently, some research efforts have been devoted to exploring multi-modal
sensing-communication integration but still lack a comprehensive review.
Therefore, we generalize the concept of ISAC inspired by human synesthesia to
establish a unified framework of intelligent multi-modal sensing-communication
integration and provide a comprehensive review under such a framework in this
paper. The so-termed Synesthesia of Machines (SoM) gives the clearest cognition
of such intelligent integration and details its paradigm for the first time. We
commence by justifying the necessity of the new paradigm. Subsequently, we
offer a definition of SoM and zoom into the detailed paradigm, which is
summarized as three operation modes. To facilitate SoM research, we overview
the prerequisite of SoM research, i.e., mixed multi-modal (MMM) datasets. Then,
we introduce the mapping relationships between multi-modal sensing and
communications. Afterward, we cover the technological review on
SoM-enhance-based and SoM-concert-based applications. To corroborate the
superiority of SoM, we also present simulation results related to dual-function
waveform and predictive beamforming design. Finally, we propose some potential
directions to inspire future research efforts.Comment: This paper has been accepted by IEEE Communications Surveys &
Tutorial
- …