569 research outputs found
Drug-Target Interaction Networks Prediction Using Short-linear Motifs
Drug-target interaction (DTI) prediction is a fundamental step in drug discovery and genomic research and contributes to medical treatment. Various computational methods have been developed to find potential DTIs. Machine learning (ML) has been currently used for new DTIs identification from existing DTI networks. There are mainly two ML-based approaches for DTI network prediction: similarity-based methods and feature-based methods. In this thesis, we propose a feature-based approach, and firstly use short-linear motifs (SLiMs) as descriptors of protein. Additionally, chemical substructure fingerprints are used as features of drug. Moreover, another challenge in this field is the lack of negative data for the training set because most data which can be found in public databases is interaction samples. Many researchers regard unknown drug-target pairs as non-interaction, which is incorrect, and may cause serious consequences. To solve this problem, we introduce a strategy to select reliable negative samples according to the features of positive data. We use the same benchmark datasets as previous research in order to compare with them. After trying three classifiers k nearest neighbours (k-NN), Random Forest (RF) and Support Vector Machine (SVM), we find that the results of k-NN are satisfied but not as excellent as RF and SVM. Compared with existing approaches using the same datasets to solve the same problem, our method performs the best under most circumstance
Temporal Robustness against Data Poisoning
Data poisoning considers cases when an adversary manipulates the behavior of
machine learning algorithms through malicious training data. Existing threat
models of data poisoning center around a single metric, the number of poisoned
samples. In consequence, if attackers can poison more samples than expected
with affordable overhead, as in many practical scenarios, they may be able to
render existing defenses ineffective in a short time. To address this issue, we
leverage timestamps denoting the birth dates of data, which are often available
but neglected in the past. Benefiting from these timestamps, we propose a
temporal threat model of data poisoning with two novel metrics, earliness and
duration, which respectively measure how long an attack started in advance and
how long an attack lasted. Using these metrics, we define the notions of
temporal robustness against data poisoning, providing a meaningful sense of
protection even with unbounded amounts of poisoned samples. We present a
benchmark with an evaluation protocol simulating continuous data collection and
periodic deployments of updated models, thus enabling empirical evaluation of
temporal robustness. Lastly, we develop and also empirically verify a baseline
defense, namely temporal aggregation, offering provable temporal robustness and
highlighting the potential of our temporal threat model for data poisoning.Comment: 13 pages, 7 figure
- …