8 research outputs found

    Deep-STP: a deep learning-based approach to predict snake toxin proteins by using word embeddings

    Get PDF
    Snake venom contains many toxic proteins that can destroy the circulatory system or nervous system of prey. Studies have found that these snake venom proteins have the potential to treat cardiovascular and nervous system diseases. Therefore, the study of snake venom protein is conducive to the development of related drugs. The research technologies based on traditional biochemistry can accurately identify these proteins, but the experimental cost is high and the time is long. Artificial intelligence technology provides a new means and strategy for large-scale screening of snake venom proteins from the perspective of computing. In this paper, we developed a sequence-based computational method to recognize snake toxin proteins. Specially, we utilized three different feature descriptors, namely g-gap, natural vector and word 2 vector, to encode snake toxin protein sequences. The analysis of variance (ANOVA), gradient-boost decision tree algorithm (GBDT) combined with incremental feature selection (IFS) were used to optimize the features, and then the optimized features were input into the deep learning model for model training. The results show that our model can achieve a prediction performance with an accuracy of 82.00% in 10-fold cross-validation. The model is further verified on independent data, and the accuracy rate reaches to 81.14%, which demonstrated that our model has excellent prediction performance and robustness

    Predicting Gene Ontology Function of Human MicroRNAs by Integrating Multiple Networks

    Get PDF
    MicroRNAs (miRNAs) have been demonstrated to play significant biological roles in many human biological processes. Inferring the functions of miRNAs is an important strategy for understanding disease pathogenesis at the molecular level. In this paper, we propose an integrated model, PmiRGO, to infer the gene ontology (GO) functions of miRNAs by integrating multiple data sources, including the expression profiles of miRNAs, miRNA-target interactions, and protein-protein interactions (PPI). PmiRGO starts by building a global network consisting of three networks. Then, it employs DeepWalk to learn latent representations as network features of the global heterogeneous network. Finally, the SVM-based models are applied to label the GO terms of miRNAs. The experimental results show that PmiRGO has a significantly better performance than existing state-of-the-art methods in terms of Fmax. A case study further demonstrates the feasibility of PmiRGO to annotate the potential functions of miRNAs

    Predicting Ion Channels Genes and Their Types With Machine Learning Techniques

    Get PDF
    Motivation: The number of ion channels is increasing rapidly. As many of them are associated with diseases, they are the targets of more than 700 drugs. The discovery of new ion channels is facilitated by computational methods that predict ion channels and their types from protein sequences.Methods: We used the SVMProt and the k-skip-n-gram methods to extract the feature vectors of ion channels, and obtained 188- and 400-dimensional features, respectively. The 188- and 400-dimensional features were combined to obtain 588-dimensional features. We then employed the maximum-relevance-maximum-distance method to reduce the dimensions of the 588-dimensional features. Finally, the support vector machine and random forest methods were used to build the prediction models to evaluate the classification effect.Results: Different methods were employed to extract various feature vectors, and after effective dimensionality reduction, different classifiers were used to classify the ion channels. We extracted the ion channel data from the Universal Protein Resource (UniProt, http://www.uniprot.org/) and Ligand-Gated Ion Channel databases (http://www.ebi.ac.uk/compneur-srv/LGICdb/LGICdb.php), and then verified the performance of the classifiers after screening. The findings of this study could inform the research and development of drugs

    Gradient Boosting Decision Tree-Based Method for Predicting Interactions Between Target Genes and Drugs

    Get PDF
    Determining the target genes that interact with drugs—drug–target interactions—plays an important role in drug discovery. Identification of drug–target interactions through biological experiments is time consuming, laborious, and costly. Therefore, using computational approaches to predict candidate targets is a good way to reduce the cost of wet-lab experiments. However, the known interactions (positive samples) and the unknown interactions (negative samples) display a serious class imbalance, which has an adverse effect on the accuracy of the prediction results. To mitigate the impact of class imbalance and completely exploit the negative samples, we proposed a new method, named DTIGBDT, based on gradient boosting decision trees, for predicting candidate drug–target interactions. We constructed a drug–target heterogeneous network that contains the drug similarities based on the chemical structures of drugs, the target similarities based on target sequences, and the known drug–target interactions. The topological information of the network was captured by random walks to update the similarities between drugs or targets. The paths between drugs and targets could be divided into multiple categories, and the features of each category of paths were extracted. We constructed a prediction model based on gradient boosting decision trees. The model establishes multiple decision trees with the extracted features and obtains the interaction scores between drugs and targets. DTIGBDT is a method of ensemble learning, and it effectively reduces the impact of class imbalance. The experimental results indicate that DTIGBDT outperforms several state-of-the-art methods for drug–target interaction prediction. In addition, case studies on Quetiapine, Clozapine, Olanzapine, Aripiprazole, and Ziprasidone demonstrate the ability of DTIGBDT to discover potential drug–target interactions

    Identification of Phage Viral Proteins With Hybrid Sequence Features

    Get PDF
    The uniqueness of bacteriophages plays an important role in bioinformatics research. In real applications, the function of the bacteriophage virion proteins is the main area of interest. Therefore, it is very important to classify bacteriophage virion proteins and non-phage virion proteins accurately. Extracting comprehensive and effective sequence features from proteins plays a vital role in protein classification. In order to more fully represent protein information, this paper is more comprehensive and effective by combining the features extracted by the feature information representation algorithm based on sequence information (CCPA) and the feature representation algorithm based on sequence and structure information. After extracting features, the Max-Relevance-Max-Distance (MRMD) algorithm is used to select the optimal feature set with the strongest correlation between class labels and low redundancy between features. Given the randomness of the samples selected by the random forest classification algorithm and the randomness features for producing each node variable, a random forest method is employed to perform 10-fold cross-validation on the bacteriophage protein classification. The accuracy of this model is as high as 93.5% in the classification of phage proteins in this study. This study also found that, among the eight physicochemical properties considered, the charge property has the greatest impact on the classification of bacteriophage proteins These results indicate that the model discussed in this paper is an important tool in bacteriophage protein research

    DNA replication in growth conditions that mimic the natural habitat of Haloferax volcanii

    Get PDF
    The initial aim of the project was to assess origin-independent replication in Haloferax volcanii (Hfx. volcanii). DNA replication is initiated at specific sites on the chromosome called origins. Origins are assumed to be an essential feature of all cells, because they serve as binding sites for proteins that recruit the DNA replication machinery. In work published by Hawkins et al, (2013), it was demonstrated that mutants of Hfx. volcanii lacking all replication origins are viable; in fact, they grow faster than the wild-type and have no obvious cellular defects. By contrast, deletion of origins from Eukaryotes and Bacteria leads to cell death or profound growth defects. The question addressed in this project was whether the accelerated growth of Hfx. volcanii cells in the absence of replication originsis due to an artefact created by rich laboratory media conditions. This may explain why replication origins have not been eliminated by natural selection, as in the natural habitat of Hfx. volcanii, the wild-type strain would have an evolutionary advantage. To test this, a growth competition assay was modified to use fluorescent proteins and flow cytometry. It was predicted that in low nutrient media, the growth advantage of origin-deleted mutants will be minimised or eliminated, as these phenotypes are not witnessed in a natural environment. However, due to the outbreak of the COVID-19 pandemic, the project was altered to examine which factors are required for an organism to replicate without origins. A bioinformatic approach was chosen, adapting previously created tools to better fit a large data set and to predict the ability of 85 species to survive without origins. The bioinformatic pipeline involved a principal component analysis, which would take into account for any given species their respective nucleotide skew indices, spectral ratios, information gene linkage, co-orientation of core genes with DNA replication, and types of DNA polymerase genes located near origins. The results suggested several new candidate species for further experimentation and potential directions for improvement of the origin independent replication prediction tool

    DNA replication in growth conditions that mimic the natural habitat of Haloferax volcanii

    Get PDF
    The initial aim of the project was to assess origin-independent replication in Haloferax volcanii (Hfx. volcanii). DNA replication is initiated at specific sites on the chromosome called origins. Origins are assumed to be an essential feature of all cells, because they serve as binding sites for proteins that recruit the DNA replication machinery. In work published by Hawkins et al, (2013), it was demonstrated that mutants of Hfx. volcanii lacking all replication origins are viable; in fact, they grow faster than the wild-type and have no obvious cellular defects. By contrast, deletion of origins from Eukaryotes and Bacteria leads to cell death or profound growth defects. The question addressed in this project was whether the accelerated growth of Hfx. volcanii cells in the absence of replication originsis due to an artefact created by rich laboratory media conditions. This may explain why replication origins have not been eliminated by natural selection, as in the natural habitat of Hfx. volcanii, the wild-type strain would have an evolutionary advantage. To test this, a growth competition assay was modified to use fluorescent proteins and flow cytometry. It was predicted that in low nutrient media, the growth advantage of origin-deleted mutants will be minimised or eliminated, as these phenotypes are not witnessed in a natural environment. However, due to the outbreak of the COVID-19 pandemic, the project was altered to examine which factors are required for an organism to replicate without origins. A bioinformatic approach was chosen, adapting previously created tools to better fit a large data set and to predict the ability of 85 species to survive without origins. The bioinformatic pipeline involved a principal component analysis, which would take into account for any given species their respective nucleotide skew indices, spectral ratios, information gene linkage, co-orientation of core genes with DNA replication, and types of DNA polymerase genes located near origins. The results suggested several new candidate species for further experimentation and potential directions for improvement of the origin independent replication prediction tool

    Bacterial DNA replication initiation : structural and functional analysisof the master initiator DnaA

    Get PDF
    PhD ThesisThe essential process of DNA replication begins with initiator proteins binding to origins of replication and triggering DNA synthesis. The highly conserved bacterial master initiator protein, DnaA, performs several key activities at the bacterial origin (oriC) to initiate replication. DnaA binds specifically to oriC and assembles into a filament that engages and stretches a single DNA strand to induce duplex unwinding. Subsequently, DnaA recruits a loading complex that deposits the replicative helicases around single DNA strands. In this thesis I have investigated the molecular mechanisms underpinning some of the essential activities of DnaA in the model organism Bacillus subtilis. Using a chimeric DnaA system I was able to identify several activities required for origin unwinding by DnaA bound to a specific DnaA-box located upstream of the site of unwinding. This result suggested that the protein binding here is directly involved in unwinding the DNA duplex, and the likely role of the upstream region is to increase the local DnaA concentration at the site of unwinding.To unwind oriC, DnaA engages and stretches a specific DNA strand with a recently identified repeating tri-nucleotide motif, termed the DnaA-tri os, providing the specific sequence. Utilising an inducible heterologous replication initiation system I determined which DnaA residues from a region implicated in ssDNA binding were essential in vivo. Using recombinant DnaA protein variants, two isoleucine residues were determined to be required for forming filaments on ssDNA and unwinding the DNA duplex in vitro. Further work is required to determine if these residues are required for the specific interaction with DnaA-trios or more generally for DNA binding/unwinding. A range of essential residues required for the interaction between DnaA and thefirmicute specific initiation accessory protein DnaD, the first step in helicase recruitment, were identified. The DnaA residues overlap with a binding site for the developmental regulator, SirA, a developmentally expressed inhibitor of DNA replication initiation. This suggested that SirA functions by blocking the interaction between DnaA and DnaD, preventing helicase loading. I found that SirA inhibits the interaction of DnaA with DnaD, providing a molecular mechanism for this SirA activity and revealing, for the first time, an endogenous system for regulating helicase recruitment in bacteri
    corecore