378 research outputs found
A Computational Framework for Host-Pathogen Protein-Protein Interactions
Infectious diseases cause millions of illnesses and deaths every year, and raise great health concerns world widely. How to monitor and cure the infectious diseases has become a prevalent and intractable problem. Since the host-pathogen interactions are considered as the key infection processes at the molecular level for infectious diseases, there have been a large amount of researches focusing on the host-pathogen interactions towards the understanding of infection mechanisms and the development of novel therapeutic solutions. For years, the continuously development of technologies in biology has benefitted the wet lab-based experiments, such as small-scale biochemical, biophysical and genetic experiments and large-scale methods (for example yeast-two-hybrid analysis and cryogenic electron microscopy approach). As a result of past decades of efforts, there has been an exploded accumulation of biological data, which includes multi omics data, for example, the genomics data and proteomics data.
Thus, an initiative review of omics data has been conducted in Chapter 2, which has exclusively demonstrated the recent update of ‘omics’ study, particularly focusing on proteomics and genomics. With the high-throughput technologies, the increasing amount of ‘omics’ data, including genomics and proteomics, has even further boosted. An upsurge of interest for data analytics in bioinformatics comes as no surprise to the researchers from a variety of disciplines. Specifically, the astonishing rate at which genomics and proteomics data are generated leads the researchers into the realm of ‘Big Data’ research. Chapter 2 is thus developed to providing an update of the omics background and the state-of-the-art developments in the omics area, with a focus on genomics data, from the perspective of big data analytics..
Benchmarking Transferable Adversarial Attacks
The robustness of deep learning models against adversarial attacks remains a
pivotal concern. This study presents, for the first time, an exhaustive review
of the transferability aspect of adversarial attacks. It systematically
categorizes and critically evaluates various methodologies developed to augment
the transferability of adversarial attacks. This study encompasses a spectrum
of techniques, including Generative Structure, Semantic Similarity, Gradient
Editing, Target Modification, and Ensemble Approach. Concurrently, this paper
introduces a benchmark framework \textit{TAA-Bench}, integrating ten leading
methodologies for adversarial attack transferability, thereby providing a
standardized and systematic platform for comparative analysis across diverse
model architectures. Through comprehensive scrutiny, we delineate the efficacy
and constraints of each method, shedding light on their underlying operational
principles and practical utility. This review endeavors to be a quintessential
resource for both scholars and practitioners in the field, charting the complex
terrain of adversarial transferability and setting a foundation for future
explorations in this vital sector. The associated codebase is accessible at:
https://github.com/KxPlaug/TAA-BenchComment: Accepted by NDSS 2024 Worksho
Large Language Models Based Fuzzing Techniques: A Survey
In the modern era where software plays a pivotal role, software security and
vulnerability analysis have become essential for software development. Fuzzing
test, as an efficient software testing method, are widely used in various
domains. Moreover, the rapid development of Large Language Models (LLMs) has
facilitated their application in the field of software testing, demonstrating
remarkable performance. Considering that existing fuzzing test techniques are
not entirely automated and software vulnerabilities continue to evolve, there
is a growing trend towards employing fuzzing test generated based on large
language models. This survey provides a systematic overview of the approaches
that fuse LLMs and fuzzing tests for software testing. In this paper, a
statistical analysis and discussion of the literature in three areas, namely
LLMs, fuzzing test, and fuzzing test generated based on LLMs, are conducted by
summarising the state-of-the-art methods up until 2024. Our survey also
investigates the potential for widespread deployment and application of fuzzing
test techniques generated by LLMs in the future.Comment: 9 pages submission under revie
APEX2S: A Two-Layer Machine Learning Model for Discovery of host-pathogen protein-protein Interactions on Cloud-based Multiomics Data
Presented by the avalanche of biological interactions data, computational biology is now facing greater challenges on big data analysis and solicits more studies to mine and integrate cloud-based multiomics data, especially when the data are related to infectious diseases. Meanwhile, machine learning techniques have recently succeeded in different computational biology tasks. In this article, we have calibrated the focus for host-pathogen protein-protein interactions study, aiming to apply the machine learning techniques for learning the interactions data and making predictions. A comprehensive and practical workflow to harness different cloud-based multiomics data is discussed. In particular, a novel two-layer machine learning model, namely APEX2S, is proposed for discovery of the protein-protein interactions data. The results show that our model can better learn and predict from the accumulated host-pathogen protein-protein interactions
Honest Score Client Selection Scheme: Preventing Federated Learning Label Flipping Attacks in Non-IID Scenarios
Federated Learning (FL) is a promising technology that enables multiple
actors to build a joint model without sharing their raw data. The distributed
nature makes FL vulnerable to various poisoning attacks, including model
poisoning attacks and data poisoning attacks. Today, many byzantine-resilient
FL methods have been introduced to mitigate the model poisoning attack, while
the effectiveness when defending against data poisoning attacks still remains
unclear. In this paper, we focus on the most representative data poisoning
attack - "label flipping attack" and monitor its effectiveness when attacking
the existing FL methods. The results show that the existing FL methods perform
similarly in Independent and identically distributed (IID) settings but fail to
maintain the model robustness in Non-IID settings. To mitigate the weaknesses
of existing FL methods in Non-IID scenarios, we introduce the Honest Score
Client Selection (HSCS) scheme and the corresponding HSCSFL framework. In the
HSCSFL, The server collects a clean dataset for evaluation. Under each
iteration, the server collects the gradients from clients and then perform HSCS
to select aggregation candidates. The server first evaluates the performance of
each class of the global model and generates the corresponding risk vector to
indicate which class could be potentially attacked. Similarly, the server
evaluates the client's model and records the performance of each class as the
accuracy vector. The dot product of each client's accuracy vector and global
risk vector is generated as the client's host score; only the top p\% host
score clients are included in the following aggregation. Finally, server
aggregates the gradients and uses the outcome to update the global model. The
comprehensive experimental results show our HSCSFL effectively enhances the FL
robustness and defends against the "label flipping attack.
Taming Gradient Variance in Federated Learning with Networked Control Variates
Federated learning, a decentralized approach to machine learning, faces
significant challenges such as extensive communication overheads, slow
convergence, and unstable improvements. These challenges primarily stem from
the gradient variance due to heterogeneous client data distributions. To
address this, we introduce a novel Networked Control Variates (FedNCV)
framework for Federated Learning. We adopt the REINFORCE Leave-One-Out (RLOO)
as a fundamental control variate unit in the FedNCV framework, implemented at
both client and server levels. At the client level, the RLOO control variate is
employed to optimize local gradient updates, mitigating the variance introduced
by data samples. Once relayed to the server, the RLOO-based estimator further
provides an unbiased and low-variance aggregated gradient, leading to robust
global updates. This dual-side application is formalized as a linear
combination of composite control variates. We provide a mathematical expression
capturing this integration of double control variates within FedNCV and present
three theoretical results with corresponding proofs. This unique dual structure
equips FedNCV to address data heterogeneity and scalability issues, thus
potentially paving the way for large-scale applications. Moreover, we tested
FedNCV on six diverse datasets under a Dirichlet distribution with {\alpha} =
0.1, and benchmarked its performance against six SOTA methods, demonstrating
its superiority.Comment: 14 page
- …