220 research outputs found

    The importance of physicochemical characteristics and nonlinear classifiers in determining HIV-1 protease specificity

    Get PDF
    This paper reviews recent research relating to the application of bioinformatics approaches to determining HIV-1 protease specificity, outlines outstanding issues, and presents a new approach to addressing these issues. Leading machine learning theory for the problem currently suggests that the direct encoding of the physicochemical properties of the amino acid substrates is not required for optimal performance. A number of amino acid encoding approaches which incorporate potentially relevant physicochemical properties of the substrate are identified, and are evaluated using a nonlinear task decomposition based neuroevolution algorithm. The results are evaluated, and compared against a recent benchmark set on a nonlinear classifier using only amino acid sequence and identity information. Ensembles of these nonlinear classifiers using the physicochemical properties of the substrate are demonstrated to consistently outperform the recently published state-of-the-art linear support vector machine based approach in out-of-sample evaluations

    Bio-AIMS collection of chemoinformatics web tools based on molecular graph information and artificial intelligence models

    Get PDF
    [Abstract] The molecular information encoding into molecular descriptors is the first step into in silico Chemoinformatics methods in Drug Design. The Machine Learning methods are a complex solution to find prediction models for specific biological properties of molecules. These models connect the molecular structure information such as atom connectivity (molecular graphs) or physical-chemical properties of an atom/group of atoms to the molecular activity (Quantitative Structure - Activity Relationship, QSAR). Due to the complexity of the proteins, the prediction of their activity is a complicated task and the interpretation of the models is more difficult. The current review presents a series of 11 prediction models for proteins, implemented as free Web tools on an Artificial Intelligence Model Server in Biosciences, Bio-AIMS (http://bio-aims.udc.es/TargetPred.php). Six tools predict protein activity, two models evaluate drug - protein target interactions and the other three calculate protein - protein interactions. The input information is based on the protein 3D structure for nine models, 1D peptide amino acid sequence for three tools and drug SMILES formulas for two servers. The molecular graph descriptor-based Machine Learning models could be useful tools for in silico screening of new peptides/proteins as future drug targets for specific treatments.Red Gallega de InvestigaciĂłn y Desarrollo de Medicamentos; R2014/025Instituto de Salud Carlos III; PI13/0028

    The role of bovine adenovirus (BAdV)-3 protein pVIII in virus replication

    Get PDF
    Bovine adenovirus (BAdV)-3 is a non-enveloped icosahedral DNA virus, which replicates in the nucleus of infected cells, and is being developed as a vector for vaccination for humans and animals. The genome of BAdV-3 is organized into early, intermediate and late genes and it has thirty three predicted open reading frames (Reddy et al., 1998). The late region of BAdV-3 is divided into seven families (L1-L7) (Reddy et al., 1998). One of the proteins expressed in the L-6 region encodes a protein called pVIII, which is a minor capsid protein connecting the core with the inner surface of the capsid. The objective of the current study was to characterize pVIII protein of BAdV-3 and to examine its role in the life cycle of BAdV-3. Anti-pVIII serum detected a protein of 24 kDa at 12-48 hr post infection and an additional protein of 8 kDa at 24-48 hr post infection. While a 24 kDa protein is detected in empty capsids, only the C-terminal cleaved protein of 8 kDa is detected in the mature virion suggesting that amino acids 147-216 of conserved C- terminus of BAdV-3 pVIII are incorporated in mature virions. The pVIII protein predominantly localizes to the nucleus of BAdV-3 infected cells utilizing the classical importin α /β dependent nuclear import pathway. Analysis of mutant pVIII demonstrated that amino acids 57-72 of the conserved N-terminus bind to importin α-3 with high affinity and are required for the nuclear localization. Detection of hexon associated with both, precursor (24 kDa) and cleaved (8 kDa) form of pVIII suggests that the C-terminus of pVIII interacts with Hexon. Based on yeast II hybrid screening assay, we identified the cellular protein DDX3 as an interacting protein partner of pVIII. Earlier, targeting of DDX3 by few viral proteins has defined its role in mRNA transport (Yedavalli et al., 2004) and induction of interferon production (Schroder et al., 2008; Wang et al., 2009). Here, we provide evidence regarding the involvement of DDX3 in cap dependent cellular mRNA translation and show that targeting of DDX3 by the adenovirus pVIII protein abolishes cap-dependent mRNA translation function of DDX3 in virus infected cells. Adenovirus late protein pVIII interacts with DDX3 in transfected and bovine adenovirus (BAdV-3) infected cells. pVIII inhibited capped mRNA translation in-vitro and in-vivo by limiting the amount of DDX3 and eIF3. Diminished amount of DDX3 and eIFs including eIF3, eIF4E and PABP were present in cap binding complex in BAdV-3 infected or pVIII transfected cells with no trace of pVIII in the cap binding complex. The total amount of eIFs appeared similar in uninfected or BAdV-3 infected cells. The co-immunoprecipitation experiments indicated the absence of direct interaction between pVIII and eIF3, eIF4E or PABP. These data indicate that interaction of pVIII with DDX3 depletes eIF3, eIF4E and PABP from the cap-binding complex. We conclude that DDX3 promotes cap-dependent cellular mRNA translation and BAdV-3 pVIII inhibits translation of capped cellular mRNA by excluding functional cap-binding complex from the capped cellular mRNA. BAdV-3 infection of DDX3 positive cells significantly inhibits cellular protein synthesis at late times post-infection. Interestingly, knockdown of DDX3 resulted in significant reduction in virus yield and expression of BAdV-3 late proteins at late times post-infection. Our results suggest that selective translation of BAdV-3 late mRNAs observed at late time post-infection of DDX3 positive cells is abrogated in DDX3 knock down cells. Moreover, the reduction in the extent of protein synthesis is evidenced by less functional 80S and polysomes in pVIII expressing plasmid transfected cells. Alternatively, DDX3 and pVIII binds to BAdV-3 tripartite leader (TPL) and the translation of mRNAs containing TPL at their 5’ ends is enhanced in the presence of pVIII and DDX3 proteins. From this observation, we concluded that pVIII and DDX-3 might promote the translation of late viral mRNAs by interacting with TPL

    An improved bees algorithm local search mechanism for numerical dataset

    Get PDF
    Bees Algorithm (BA), a heuristic optimization procedure, represents one of the fundamental search techniques is based on the food foraging activities of bees. This algorithm performs a kind of exploitative neighbourhoods search combined with random explorative search. However, the main issue of BA is that it requires long computational time as well as numerous computational processes to obtain a good solution, especially in more complicated issues. This approach does not guarantee any optimum solutions for the problem mainly because of lack of accuracy. To solve this issue, the local search in the BA is investigated by Simple swap, 2-Opt and 3-Opt were proposed as Massudi methods for Bees Algorithm Feature Selection (BAFS). In this study, the proposed extension methods is 4-Opt as search neighbourhood is presented. This proposal was implemented and comprehensively compares and analyse their performances with respect to accuracy and time. Furthermore, in this study the feature selection algorithm is implemented and tested using most popular dataset from Machine Learning Repository (UCI). The obtained results from experimental work confirmed that the proposed extension of the search neighbourhood including 4-Opt approach has provided better accuracy with suitable time than the Massudi methods

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. ConsellerĂ­a de EconomĂ­a e Industria; 10SIN105004P

    Machine learning approaches for computer aided drug discovery

    Get PDF
    Pharmaceutical drug discovery is expensive, time consuming and scientifically challenging. In order to increase efficiency of the pre-clinical drug discovery pathway, computational drug discovery methods and most recently, machine learning-based methods are increasingly used as powerful tools to aid early stage drug discovery. In this thesis, I present three complementary computer-aided drug discovery methods, with a focus on aiding hit discovery and hit-to-lead optimization. In addition, this thesis particularly focuses on exploring different molecular representations used to featurise machine learning models, in order explore how best to capture valuable information about protein, ligands and 3D protein-ligand complexes to build more robust, more interpretable and more accurate machine learning models. First, I developed ligand-based models using a Gaussian Process (GP) as an easy-to-implement tool to guide exploration of chemical space for the optimization of protein-ligand binding affinity. I explored different topological fingerprint and autoencoder representations for Bayesian optimisation (BO) and showed that BO is a powerful tool to help medicinal chemists to prioritise which new compounds to make for single-target as well as multi-target optimisation. The algorithm achieved high enrichment of top compounds for both single target and multiobjective optimisation when tested on a well known benchmark dataset of the drug target matrix metalloproteinase-12 and a real, ongoing drug optimisation dataset targeting four bacterial metallo-β-lactamases. Next, I present the development of a knowledge-based approach to drug design, combining new protein-ligand interaction fingerprints with a fragment-based drug discovery approach to understand SARS-CoV-2 Mpro-substrate specificity and to design novel small molecule inhibitors in silico. In combination with a fragment-based drug discovery approach, I show how this knowledge-based interaction fingerprint-driven approach can reveal fruitful fragment-growth design strategies. Lastly, I expand on the knowledge-based contact fingerprints to create a ligand-shaped molecular graph representation (Protein Ligand Interaction Graphs, PLIGs) to develop novel graph-based deep learning protein-ligand binding affinity scoring functions. PLIGs encode all intermolecular interactions in a protein-ligand complex within the node features of the graph and are therefore simple and fully interpretable. I explore a variety of Graph Neural Network architectures in combination with PLIGs and found Graph Attention Networks to perform slightly better than other GNN architectures, performing amongst the best known protein-ligand binding affinity scoring functions

    Mechanisms Regulating HIV-1 Protease Activity

    Get PDF
    The Human Immunodeficiency Virus Type 1 (HIV-1) Protease (PR) has no direct involvement in the early steps of HIV-1 replication. Nonetheless, it is the timely and ordered processing of the viral structural proteins by the HIV-1 PR during virion maturation that facilitates the successful completion of virus entry, reverse transcription, and integration. Though a considerable amount of research has been devoted to deciphering how the enzyme prepares a virus particle for infection, the mechanisms regulating its activities continue to remain incompletely defined. RNA serves as one putative regulatory factor, since efficient processing of the maturation intermediate p15NC requires RNA in vitro. Though previously believed relevant to only p15NC cleavage, I demonstrate that RNA enhances HIV-1 proteolysis reactions in a substrate-independent manner. The increased catalytic activity of the HIV-1 PR results from a direct interaction between RNA and the enzyme, with the magnitude of the effect dependent upon the size of the RNA molecule. Large (>400 base) RNAs accelerated proteolytic processing by over 100-fold under near-physiological conditions. This considerable change stemmed from both improved substrate recognition (Km) and turnover rate (kcat). Variability in amino acid sequence also guides HIV-1 PR activity. However, the absence of any overt patterns across HIV-1 cleavage sites has complicated the delineation of why these differences result in diverse processing efficiencies. To address this question, I generated the largest-to-date dataset of globular proteins cleaved by the HIV-1 PR in near-physiological conditions. From these data, I unravel a number of site-specific processing requirements, and identify potentially important relationships shared between multiple cleavage sites. These results additionally enabled the formation of a preliminary conceptual model for explaining processing site amino acid composition.Doctor of Philosoph

    Cat Swarm based Optimization of Gene Expression Data Classification

    Get PDF
    Abstract-An Artificial Neural Network (ANN) does have the capability to provide solutions of various complex problems. The generalization ability of ANN due to the massively parallel processing capability can be utilized to learn the patterns discovered in the data set which can be represented in terms of a set of rules. This rule can be used to find the solution to a classification problem. The learning ability of the ANN is degraded due to the high dimensionality of the datasets. Hence, to minimize this risk we have used Principal Component Analysis (PCA) and Factor Analysis (FA) which provides a feature reduced dataset to the Multi Layer Perceptron (MLP), the classifier used. Again, since the weight matrices are randomly initialized, hence, in this paper we have used Cat Swarm Optimization (CSO) method to update the weight values of the weight matrix. From the experimental evaluation, it was found that using CSO with the MLP classifier provides better classification accuracy as compared to when the classifier is solely used
    • …
    corecore