145 research outputs found
Sequence Generation via Subsequence Similarity: Theory and Application to UAV Identification
The ability to generate synthetic sequences is crucial for a wide range of
applications, and recent advances in deep learning architectures and generative
frameworks have greatly facilitated this process. Particularly, unconditional
one-shot generative models constitute an attractive line of research that
focuses on capturing the internal information of a single image, video, etc. to
generate samples with similar contents. Since many of those one-shot models are
shifting toward efficient non-deep and non-adversarial approaches, we examine
the versatility of a one-shot generative model for augmenting whole datasets.
In this work, we focus on how similarity at the subsequence level affects
similarity at the sequence level, and derive bounds on the optimal transport of
real and generated sequences based on that of corresponding subsequences. We
use a one-shot generative model to sample from the vicinity of individual
sequences and generate subsequence-similar ones and demonstrate the improvement
of this approach by applying it to the problem of Unmanned Aerial Vehicle (UAV)
identification using limited radio-frequency (RF) signals. In the context of
UAV identification, RF fingerprinting is an effective method for distinguishing
legitimate devices from malicious ones, but heterogenous environments and
channel impairments can impose data scarcity and affect the performance of
classification models. By using subsequence similarity to augment sequences of
RF data with a low ratio (5\%-20\%) of training dataset, we achieve significant
improvements in performance metrics such as accuracy, precision, recall, and F1
score.Comment: 12 pages, 5 figures, 2 table
Leveraging Large Language Models to Build and Execute Computational Workflows
The recent development of large language models (LLMs) with multi-billion
parameters, coupled with the creation of user-friendly application programming
interfaces (APIs), has paved the way for automatically generating and executing
code in response to straightforward human queries. This paper explores how
these emerging capabilities can be harnessed to facilitate complex scientific
workflows, eliminating the need for traditional coding methods. We present
initial findings from our attempt to integrate Phyloflow with OpenAI's
function-calling API, and outline a strategy for developing a comprehensive
workflow management system based on these concepts
FedCompass: Efficient Cross-Silo Federated Learning on Heterogeneous Client Devices using a Computing Power Aware Scheduler
Cross-silo federated learning offers a promising solution to collaboratively
train robust and generalized AI models without compromising the privacy of
local datasets, e.g., healthcare, financial, as well as scientific projects
that lack a centralized data facility. Nonetheless, because of the disparity of
computing resources among different clients (i.e., device heterogeneity),
synchronous federated learning algorithms suffer from degraded efficiency when
waiting for straggler clients. Similarly, asynchronous federated learning
algorithms experience degradation in the convergence rate and final model
accuracy on non-identically and independently distributed (non-IID)
heterogeneous datasets due to stale local models and client drift. To address
these limitations in cross-silo federated learning with heterogeneous clients
and data, we propose FedCompass, an innovative semi-asynchronous federated
learning algorithm with a computing power aware scheduler on the server side,
which adaptively assigns varying amounts of training tasks to different clients
using the knowledge of the computing power of individual clients. FedCompass
ensures that multiple locally trained models from clients are received almost
simultaneously as a group for aggregation, effectively reducing the staleness
of local models. At the same time, the overall training process remains
asynchronous, eliminating prolonged waiting periods from straggler clients.
Using diverse non-IID heterogeneous distributed datasets, we demonstrate that
FedCompass achieves faster convergence and higher accuracy than other
asynchronous algorithms while remaining more efficient than synchronous
algorithms when performing federated learning on heterogeneous clients
APPFLx: Providing Privacy-Preserving Cross-Silo Federated Learning as a Service
Cross-silo privacy-preserving federated learning (PPFL) is a powerful tool to
collaboratively train robust and generalized machine learning (ML) models
without sharing sensitive (e.g., healthcare of financial) local data. To ease
and accelerate the adoption of PPFL, we introduce APPFLx, a ready-to-use
platform that provides privacy-preserving cross-silo federated learning as a
service. APPFLx employs Globus authentication to allow users to easily and
securely invite trustworthy collaborators for PPFL, implements several
synchronous and asynchronous FL algorithms, streamlines the FL experiment
launch process, and enables tracking and visualizing the life cycle of FL
experiments, allowing domain experts and ML practitioners to easily orchestrate
and evaluate cross-silo FL under one platform. APPFLx is available online at
https://appflx.lin
- …