2,668 research outputs found
Machine learning on normalized protein sequences
<p>Abstract</p> <p>Background</p> <p>Machine learning techniques have been widely applied to biological sequences, e.g. to predict drug resistance in HIV-1 from sequences of drug target proteins and protein functional classes. As deletions and insertions are frequent in biological sequences, a major limitation of current methods is the inability to handle varying sequence lengths.</p> <p>Findings</p> <p>We propose to normalize sequences to uniform length. To this end, we tested one linear and four different non-linear interpolation methods for the normalization of sequence lengths of 19 classification datasets. Classification tasks included prediction of HIV-1 drug resistance from drug target sequences and sequence-based prediction of protein function. We applied random forests to the classification of sequences into "positive" and "negative" samples. Statistical tests showed that the linear interpolation outperforms the non-linear interpolation methods in most of the analyzed datasets, while in a few cases non-linear methods had a small but significant advantage. Compared to other published methods, our prediction scheme leads to an improvement in prediction accuracy by up to 14%.</p> <p>Conclusions</p> <p>We found that machine learning on sequences normalized by simple linear interpolation gave better or at least competitive results compared to state-of-the-art procedures, and thus, is a promising alternative to existing methods, especially for protein sequences of variable length.</p
A Rough Set-Based Model of HIV-1 Reverse Transcriptase Resistome
Reverse transcriptase (RT) is a viral enzyme crucial for HIV-1 replication. Currently, 12 drugs are targeted against the RT. The low fidelity of the RT-mediated transcription leads to the quick accumulation of drug-resistance mutations. The sequence-resistance relationship remains only partially understood. Using publicly available data collected from over 15 years of HIV proteome research, we have created a general and predictive rule-based model of HIV-1 resistance to eight RT inhibitors. Our rough set-based model considers changes in the physicochemical properties of a mutated sequence as compared to the wild-type strain. Thanks to the application of the Monte Carlo feature selection method, the model takes into account only the properties that significantly contribute to the resistance phenomenon. The obtained results show that drug-resistance is determined in more complex way than believed. We confirmed the importance of many resistance-associated sites, found some sites to be less relevant than formerly postulated and—more importantly—identified several previously neglected sites as potentially relevant. By mapping some of the newly discovered sites on the 3D structure of the RT, we were able to suggest possible molecular-mechanisms of drug-resistance. Importantly, our model has the ability to generalize predictions to the previously unseen cases. The study is an example of how computational biology methods can increase our understanding of the HIV-1 resistome
IoT Platform for COVID-19 Prevention and Control: A Survey
As a result of the worldwide transmission of severe acute respiratory
syndrome coronavirus 2 (SARS-CoV-2), coronavirus disease 2019 (COVID-19) has
evolved into an unprecedented pandemic. Currently, with unavailable
pharmaceutical treatments and vaccines, this novel coronavirus results in a
great impact on public health, human society, and global economy, which is
likely to last for many years. One of the lessons learned from the COVID-19
pandemic is that a long-term system with non-pharmaceutical interventions for
preventing and controlling new infectious diseases is desirable to be
implemented. Internet of things (IoT) platform is preferred to be utilized to
achieve this goal, due to its ubiquitous sensing ability and seamless
connectivity. IoT technology is changing our lives through smart healthcare,
smart home, and smart city, which aims to build a more convenient and
intelligent community. This paper presents how the IoT could be incorporated
into the epidemic prevention and control system. Specifically, we demonstrate a
potential fog-cloud combined IoT platform that can be used in the systematic
and intelligent COVID-19 prevention and control, which involves five
interventions including COVID-19 Symptom Diagnosis, Quarantine Monitoring,
Contact Tracing & Social Distancing, COVID-19 Outbreak Forecasting, and
SARS-CoV-2 Mutation Tracking. We investigate and review the state-of-the-art
literatures of these five interventions to present the capabilities of IoT in
countering against the current COVID-19 pandemic or future infectious disease
epidemics.Comment: 12 pages; Submitted to IEEE Internet of Things Journa
Improved Bevirimat resistance prediction by combination of structural and sequence-based classifiers
<p>Abstract</p> <p>Background</p> <p>Maturation inhibitors such as Bevirimat are a new class of antiretroviral drugs that hamper the cleavage of HIV-1 proteins into their functional active forms. They bind to these preproteins and inhibit their cleavage by the HIV-1 protease, resulting in non-functional virus particles. Nevertheless, there exist mutations in this region leading to resistance against Bevirimat. Highly specific and accurate tools to predict resistance to maturation inhibitors can help to identify patients, who might benefit from the usage of these new drugs.</p> <p>Results</p> <p>We tested several methods to improve Bevirimat resistance prediction in HIV-1. It turned out that combining structural and sequence-based information in classifier ensembles led to accurate and reliable predictions. Moreover, we were able to identify the most crucial regions for Bevirimat resistance computationally, which are in line with experimental results from other studies.</p> <p>Conclusions</p> <p>Our analysis demonstrated the use of machine learning techniques to predict HIV-1 resistance against maturation inhibitors such as Bevirimat. New maturation inhibitors are already under development and might enlarge the arsenal of antiretroviral drugs in the future. Thus, accurate prediction tools are very useful to enable a personalized therapy.</p
Proteochemometric Modeling of the Susceptibility of Mutated Variants of the HIV-1 Virus to Reverse Transcriptase Inhibitors
BACKGROUND: Reverse transcriptase is a major drug target in highly active antiretroviral therapy (HAART) against HIV, which typically comprises two nucleoside/nucleotide analog reverse transcriptase (RT) inhibitors (NRTIs) in combination with a non-nucleoside RT inhibitor or a protease inhibitor. Unfortunately, HIV is capable of escaping the therapy by mutating into drug-resistant variants. Computational models that correlate HIV drug susceptibilities to the virus genotype and to drug molecular properties might facilitate selection of improved combination treatment regimens. METHODOLOGY/PRINCIPAL FINDINGS: We applied our earlier developed proteochemometric modeling technology to analyze HIV mutant susceptibility to the eight clinically approved NRTIs. The data set used covered 728 virus variants genotyped for 240 sequence residues of the DNA polymerase domain of the RT; 165 of these residues contained mutations; totally the data-set covered susceptibility data for 4,495 inhibitor-RT combinations. Inhibitors and RT sequences were represented numerically by 3D-structural and physicochemical property descriptors, respectively. The two sets of descriptors and their derived cross-terms were correlated to the susceptibility data by partial least-squares projections to latent structures. The model identified more than ten frequently occurring mutations, each conferring more than two-fold loss of susceptibility for one or several NRTIs. The most deleterious mutations were K65R, Q151M, M184V/I, and T215Y/F, each of them decreasing susceptibility to most of the NRTIs. The predictive ability of the model was estimated by cross-validation and by external predictions for new HIV variants; both procedures showed very high correlation between the predicted and actual susceptibility values (Q2=0.89 and Q2ext=0.86). The model is available at www.hivdrc.org as a free web service for the prediction of the susceptibility to any of the clinically used NRTIs for any HIV-1 mutant variant. CONCLUSIONS/SIGNIFICANCE: Our results give directions how to develop approaches for selection of genome-based optimum combination therapy for patients harboring mutated HIV variants
Artificial intelligence in the cyber domain: Offense and defense
Artificial intelligence techniques have grown rapidly in recent years, and their applications in practice can be seen in many fields, ranging from facial recognition to image analysis. In the cybersecurity domain, AI-based techniques can provide better cyber defense tools and help adversaries improve methods of attack. However, malicious actors are aware of the new prospects too and will probably attempt to use them for nefarious purposes. This survey paper aims at providing an overview of how artificial intelligence can be used in the context of cybersecurity in both offense and defense.Web of Science123art. no. 41
A Tent L\'evy Flying Sparrow Search Algorithm for Feature Selection: A COVID-19 Case Study
The "Curse of Dimensionality" induced by the rapid development of information
science, might have a negative impact when dealing with big datasets. In this
paper, we propose a variant of the sparrow search algorithm (SSA), called Tent
L\'evy flying sparrow search algorithm (TFSSA), and use it to select the best
subset of features in the packing pattern for classification purposes. SSA is a
recently proposed algorithm that has not been systematically applied to feature
selection problems. After verification by the CEC2020 benchmark function, TFSSA
is used to select the best feature combination to maximize classification
accuracy and minimize the number of selected features. The proposed TFSSA is
compared with nine algorithms in the literature. Nine evaluation metrics are
used to properly evaluate and compare the performance of these algorithms on
twenty-one datasets from the UCI repository. Furthermore, the approach is
applied to the coronavirus disease (COVID-19) dataset, yielding the best
average classification accuracy and the average number of feature selections,
respectively, of 93.47% and 2.1. Experimental results confirm the advantages of
the proposed algorithm in improving classification accuracy and reducing the
number of selected features compared to other wrapper-based algorithms
- …