2,876 research outputs found
DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
Text-driven image manipulation remains challenging in training or inference
flexibility. Conditional generative models depend heavily on expensive
annotated training data. Meanwhile, recent frameworks, which leverage
pre-trained vision-language models, are limited by either per text-prompt
optimization or inference-time hyper-parameters tuning. In this work, we
propose a novel framework named \textit{DeltaEdit} to address these problems.
Our key idea is to investigate and identify a space, namely delta image and
text space that has well-aligned distribution between CLIP visual feature
differences of two images and CLIP textual embedding differences of source and
target texts. Based on the CLIP delta space, the DeltaEdit network is designed
to map the CLIP visual features differences to the editing directions of
StyleGAN at training phase. Then, in inference phase, DeltaEdit predicts the
StyleGAN's editing directions from the differences of the CLIP textual
features. In this way, DeltaEdit is trained in a text-free manner. Once
trained, it can well generalize to various text prompts for zero-shot inference
without bells and whistles. Code is available at
https://github.com/Yueming6568/DeltaEdit.Comment: Accepted by CVPR2023. Code is available at
https://github.com/Yueming6568/DeltaEdi
Key Issues in Wireless Transmission for NTN-Assisted Internet of Things
Non-terrestrial networks (NTNs) have become appealing resolutions for
seamless coverage in the next-generation wireless transmission, where a large
number of Internet of Things (IoT) devices diversely distributed can be
efficiently served. The explosively growing number of IoT devices brings a new
challenge for massive connection. The long-distance wireless signal propagation
in NTNs leads to severe path loss and large latency, where the accurate
acquisition of channel state information (CSI) is another challenge, especially
for fast-moving non-terrestrial base stations (NTBSs). Moreover, the scarcity
of on-board resources of NTBSs is also a challenge for resource allocation. To
this end, we investigate three key issues, where the existing schemes and
emerging resolutions for these three key issues have been comprehensively
presented. The first issue is to enable the massive connection by designing
random access to establish the wireless link and multiple access to transmit
data streams. The second issue is to accurately acquire CSI in various channel
conditions by channel estimation and beam training, where orthogonal time
frequency space modulation and dynamic codebooks are on focus. The third issue
is to efficiently allocate the wireless resources, including power allocation,
spectrum sharing, beam hopping, and beamforming. At the end of this article,
some future research topics are identified.Comment: 7 pages, 6 figure
Dynamic V2X Autonomous Perception from Road-to-Vehicle Vision
Vehicle-to-everything (V2X) perception is an innovative technology that
enhances vehicle perception accuracy, thereby elevating the security and
reliability of autonomous systems. However, existing V2X perception methods
focus on static scenes from mainly vehicle-based vision, which is constrained
by sensor capabilities and communication loads. To adapt V2X perception models
to dynamic scenes, we propose to build V2X perception from road-to-vehicle
vision and present Adaptive Road-to-Vehicle Perception (AR2VP) method. In
AR2VP,we leverage roadside units to offer stable, wide-range sensing
capabilities and serve as communication hubs. AR2VP is devised to tackle both
intra-scene and inter-scene changes. For the former, we construct a dynamic
perception representing module, which efficiently integrates vehicle
perceptions, enabling vehicles to capture a more comprehensive range of dynamic
factors within the scene.Moreover, we introduce a road-to-vehicle perception
compensating module, aimed at preserving the maximized roadside unit perception
information in the presence of intra-scene changes.For inter-scene changes, we
implement an experience replay mechanism leveraging the roadside unit's storage
capacity to retain a subset of historical scene data, maintaining model
robustness in response to inter-scene shifts. We conduct perception experiment
on 3D object detection and segmentation, and the results show that AR2VP excels
in both performance-bandwidth trade-offs and adaptability within dynamic
environments
On explosive boiling of a multicomponent Leidenfrost drop
The gasification of multicomponent fuel drops is relevant in various
energy-related technologies. An interesting phenomenon associated with this
process is the self-induced explosion of the drop, producing a multitude of
smaller secondary droplets, which promotes overall fuel atomization and,
consequently, improves the combustion efficiency and reduces emissions of
liquid-fueled engines. Here, we study a unique explosive gasification process
of a tricomponent droplet consisting of water, ethanol, and oil ("ouzo"), by
high-speed monitoring of the entire gasification event taking place in the
well-controlled, levitated Leidenfrost state over a superheated plate. It is
observed that the preferential evaporation of the most volatile component,
ethanol, triggers nucleation of the oil microdroplets/nanodroplets in the
remaining drop, which, consequently, becomes an opaque oil-in-water
microemulsion. The tiny oil droplets subsequently coalesce into a large one,
which, in turn, wraps around the remnant water. Because of the encapsulating
oil layer, the droplet can no longer produce enough vapor for its levitation,
and, thus, falls and contacts the superheated surface. The direct thermal
contact leads to vapor bubble formation inside the drop and consequently drop
explosion in the final stage.Comment: 8 pages, 5 figure
Privacy-Preserving Blockchain-Based Federated Learning for IoT Devices
Home appliance manufacturers strive to obtain feedback from users to improve
their products and services to build a smart home system. To help manufacturers
develop a smart home system, we design a federated learning (FL) system
leveraging the reputation mechanism to assist home appliance manufacturers to
train a machine learning model based on customers' data. Then, manufacturers
can predict customers' requirements and consumption behaviors in the future.
The working flow of the system includes two stages: in the first stage,
customers train the initial model provided by the manufacturer using both the
mobile phone and the mobile edge computing (MEC) server. Customers collect data
from various home appliances using phones, and then they download and train the
initial model with their local data. After deriving local models, customers
sign on their models and send them to the blockchain. In case customers or
manufacturers are malicious, we use the blockchain to replace the centralized
aggregator in the traditional FL system. Since records on the blockchain are
untampered, malicious customers or manufacturers' activities are traceable. In
the second stage, manufacturers select customers or organizations as miners for
calculating the averaged model using received models from customers. By the end
of the crowdsourcing task, one of the miners, who is selected as the temporary
leader, uploads the model to the blockchain. To protect customers' privacy and
improve the test accuracy, we enforce differential privacy on the extracted
features and propose a new normalization technique. We experimentally
demonstrate that our normalization technique outperforms batch normalization
when features are under differential privacy protection. In addition, to
attract more customers to participate in the crowdsourcing FL task, we design
an incentive mechanism to award participants.Comment: This paper appears in IEEE Internet of Things Journal (IoT-J
Translating Radiology Reports into Plain Language using ChatGPT and GPT-4 with Prompt Learning: Promising Results, Limitations, and Potential
The large language model called ChatGPT has drawn extensively attention
because of its human-like expression and reasoning abilities. In this study, we
investigate the feasibility of using ChatGPT in experiments on using ChatGPT to
translate radiology reports into plain language for patients and healthcare
providers so that they are educated for improved healthcare. Radiology reports
from 62 low-dose chest CT lung cancer screening scans and 76 brain MRI
metastases screening scans were collected in the first half of February for
this study. According to the evaluation by radiologists, ChatGPT can
successfully translate radiology reports into plain language with an average
score of 4.27 in the five-point system with 0.08 places of information missing
and 0.07 places of misinformation. In terms of the suggestions provided by
ChatGPT, they are general relevant such as keeping following-up with doctors
and closely monitoring any symptoms, and for about 37% of 138 cases in total
ChatGPT offers specific suggestions based on findings in the report. ChatGPT
also presents some randomness in its responses with occasionally
over-simplified or neglected information, which can be mitigated using a more
detailed prompt. Furthermore, ChatGPT results are compared with a newly
released large model GPT-4, showing that GPT-4 can significantly improve the
quality of translated reports. Our results show that it is feasible to utilize
large language models in clinical education, and further efforts are needed to
address limitations and maximize their potential
Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach
Due to the scale and complexity of cloud systems, a system failure would
trigger an "alert storm", i.e., massive correlated alerts. Although these
alerts can be traced back to a few root causes, the overwhelming number makes
it infeasible for manual handling. Alert aggregation is thus critical to help
engineers concentrate on the root cause and facilitate failure resolution.
Existing methods typically utilize semantic similarity-based methods or
statistical methods to aggregate alerts. However, semantic similarity-based
methods overlook the causal rationale of alerts, while statistical methods can
hardly handle infrequent alerts.
To tackle these limitations, we introduce leveraging external knowledge,
i.e., Standard Operation Procedure (SOP) of alerts as a supplement. We propose
COLA, a novel hybrid approach based on correlation mining and LLM (Large
Language Model) reasoning for online alert aggregation. The correlation mining
module effectively captures the temporal and spatial relations between alerts,
measuring their correlations in an efficient manner. Subsequently, only
uncertain pairs with low confidence are forwarded to the LLM reasoning module
for detailed analysis. This hybrid design harnesses both statistical evidence
for frequent alerts and the reasoning capabilities of computationally intensive
LLMs, ensuring the overall efficiency of COLA in handling large volumes of
alerts in practical scenarios. We evaluate COLA on three datasets collected
from the production environment of a large-scale cloud platform. The
experimental results show COLA achieves F1-scores from 0.901 to 0.930,
outperforming state-of-the-art methods and achieving comparable efficiency. We
also share our experience in deploying COLA in our real-world cloud system,
Cloud X.Comment: Accepted by Proceedings of the 46th International Conference on
Software Engineering: Software Engineering in Practice (ICSE SEIP 2024
- …