12,022 research outputs found

    Identification of functionally related enzymes by learning-to-rank methods

    Full text link
    Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes

    Binding Ligand Prediction for Proteins Using Partial Matching of Local Surface Patches

    Get PDF
    Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group

    Bridging the synaptic gap: neuroligins and neurexin I in Apis mellifera

    Get PDF
    Vertebrate studies show neuroligins and neurexins are binding partners in a trans-synaptic cell adhesion complex, implicated in human autism and mental retardation disorders. Here we report a genetic analysis of homologous proteins in the honey bee. As in humans, the honeybee has five large (31-246 kb, up to 12 exons each) neuroligin genes, three of which are tightly clustered. RNA analysis of the neuroligin-3 gene reveals five alternatively spliced transcripts, generated through alternative use of exons encoding the cholinesterase-like domain. Whereas vertebrates have three neurexins the bee has just one gene named neurexin I (400 kb, 28 exons). However alternative isoforms of bee neurexin I are generated by differential use of 12 splice sites, mostly located in regions encoding LNS subdomains. Some of the splice variants of bee neurexin I resemble the vertebrate alpha- and beta-neurexins, albeit in vertebrates these forms are generated by alternative promoters. Novel splicing variations in the 3' region generate transcripts encoding alternative trans-membrane and PDZ domains. Another 3' splicing variation predicts soluble neurexin I isoforms. Neurexin I and neuroligin expression was found in brain tissue, with expression present throughout development, and in most cases significantly up-regulated in adults. Transcripts of neurexin I and one neuroligin tested were abundant in mushroom bodies, a higher order processing centre in the bee brain. We show neuroligins and neurexins comprise a highly conserved molecular system with likely similar functional roles in insects as vertebrates, and with scope in the honeybee to generate substantial functional diversity through alternative splicing. Our study provides important prerequisite data for using the bee as a model for vertebrate synaptic development.Australian National University PhD Scholarship Award to Sunita Biswas

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Interaction site prediction by structural similarity to neighboring clusters in protein-protein interaction networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recently, revealing the function of proteins with protein-protein interaction (PPI) networks is regarded as one of important issues in bioinformatics. With the development of experimental methods such as the yeast two-hybrid method, the data of protein interaction have been increasing extremely. Many databases dealing with these data comprehensively have been constructed and applied to analyzing PPI networks. However, few research on prediction interaction sites using both PPI networks and the 3D protein structures complementarily has explored.</p> <p>Results</p> <p>We propose a method of predicting interaction sites in proteins with unknown function by using both of PPI networks and protein structures. For a protein with unknown function as a target, several clusters are extracted from the neighboring proteins based on their structural similarity. Then, interaction sites are predicted by extracting similar sites from the group of a protein cluster and the target protein. Moreover, the proposed method can improve the prediction accuracy by introducing repetitive prediction process.</p> <p>Conclusions</p> <p>The proposed method has been applied to small scale dataset, then the effectiveness of the method has been confirmed. The challenge will now be to apply the method to large-scale datasets.</p

    Efficient search and comparison algorithms for 3D protein binding site retrieval and structure alignment from large-scale databases

    Get PDF
    Finding similar 3D structures is crucial for discovering potential structural, evolutionary, and functional relationships among proteins. As the number of known protein structures has dramatically increased, traditional methods can no longer provide the life science community with the adequate informatics capability needed to conduct large-scale and complex analyses. A suite of high-throughput and accurate protein structure search and comparison methods is essential. To meet the needs of the community, we develop several bioinformatics methods for protein binding site comparison and global structure alignment. First, we developed an efficient protein binding site search that is based on extracting geometric features both locally and globally. The main idea of this work was to capture spatial relationships among landmarks of binding site surfaces and bfuild a vocabulary of visual words to represent the characteristics of the surfaces. A vector model was then used to speed up the search of similar surfaces that share similar visual words with the query interface. Second, we developed an approach for accurate protein binding site comparison. Our algorithm provides an accurate binding site alignment by applying a two-level heuristic process which progressively refines alignment results from coarse surface point level to accurate residue atom level. This setting allowed us to explore different combinations of pairs of corresponding residues, thus improving the alignment quality of the binding site surfaces. Finally, we introduced a parallel algorithm for global protein structure alignment. Specifically, to speed up the time-consuming structure alignment process of protein 3D structures, we designed a parallel protein structure alignment framework to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, the framework is capable of parallelizing traditional structure alignment algorithms. Our findings can be applied in various research areas, such as prediction of protein inte

    A novel graph-based method for targeted ligand-protein fitting

    Get PDF
    A thesis submitted to the Faculty of Creative Arts, Technologies & Science, University of Bedfordshire, in partial & fulfilment of the requirements for the degree of Master of Philosophy.The determination of protein binding sites and ligand -protein fitting are key to understanding the functionality of proteins, from revealing which ligand classes can bind or the optimal ligand for a given protein, such as protein/ drug interactions. There is a need for novel generic computational approaches for representation of protein-ligand interactions and the subsequent prediction of hitherto unknown interactions in proteins where the ligand binding sites are experimentally uncharacterised. The TMSite algorithms read in existing PDB structural data and isolate binding sites regions and identifies conserved features in functionally related proteins (proteins that bind the same ligand). The Boundary Cubes method for surface representation was applied to the modified PDB file allowing the creation of graphs for proteins and ligands that could be compared and caused no loss of geometric data. A method is included for describing binding site features of individual ligands conserved in terms of spatial relationships allowed identification of 3D motifs, named fingerprints, which could be searched for in other protein structures. This method combine with a modification of the pocket algorithm allows reduced search areas for graph matching. The methods allow isolation of the binding site from a complexed protein PDB file, identification of conserved features among the binding sites of individual ligand types, and search for these features in sequence data. In terms of spatial conservation create a fingerprint ofthe binding site that can be sought in other proteins of/mown structure, identifYing putative binding sites. The approach offers a novel and generic method for the identification of putative ligand binding sites for proteins for which there is no prior detailed structural characterisation of protein/ ligand interactions. It is unique in being able to convert PDB data into graphs, ready for comparison and thus fitting of ligand to protein with consideration of chemical charge and in the future other chemica! properties

    IDSS: deformation invariant signatures for molecular shape comparison

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many molecules of interest are flexible and undergo significant shape deformation as part of their function, but most existing methods of molecular shape comparison (MSC) treat them as rigid bodies, which may lead to incorrect measure of the shape similarity of flexible molecules.</p> <p>Results</p> <p>To address the issue we introduce a new shape descriptor, called Inner Distance Shape Signature (IDSS), for describing the 3D shapes of flexible molecules. The inner distance is defined as the length of the shortest path between landmark points within the molecular shape, and it reflects well the molecular structure and deformation without explicit decomposition. Our IDSS is stored as a histogram which is a probability distribution of inner distances between all sample point pairs on the molecular surface. We show that IDSS is insensitive to shape deformation of flexible molecules and more effective at capturing molecular structures than traditional shape descriptors. Our approach reduces the 3D shape comparison problem of flexible molecules to the comparison of IDSS histograms.</p> <p>Conclusion</p> <p>The proposed algorithm is robust and does not require any prior knowledge of the flexible regions. We demonstrate the effectiveness of IDSS within a molecular search engine application for a benchmark containing abundant conformational changes of molecules. Such comparisons in several thousands per second can be carried out. The presented IDSS method can be considered as an alternative and complementary tool for the existing methods for rigid MSC. The binary executable program for Windows platform and database are available from <url>https://engineering.purdue.edu/PRECISE/IDSS</url>.</p

    Prediction of functionally important residues in globular proteins from unusual central distances of amino acids

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Well-performing automated protein function recognition approaches usually comprise several complementary techniques. Beside constructing better consensus, their predictive power can be improved by either adding or refining independent modules that explore orthogonal features of proteins. In this work, we demonstrated how the exploration of global atomic distributions can be used to indicate functionally important residues.</p> <p>Results</p> <p>Using a set of carefully selected globular proteins, we parametrized continuous probability density functions describing preferred central distances of individual protein atoms. Relative preferred burials were estimated using mixture models of radial density functions dependent on the amino acid composition of a protein under consideration. The unexpectedness of extraordinary locations of atoms was evaluated in the information-theoretic manner and used directly for the identification of key amino acids. In the validation study, we tested capabilities of a tool built upon our approach, called SurpResi, by searching for binding sites interacting with ligands. The tool indicated multiple candidate sites achieving success rates comparable to several geometric methods. We also showed that the unexpectedness is a property of regions involved in protein-protein interactions, and thus can be used for the ranking of protein docking predictions. The computational approach implemented in this work is freely available via a Web interface at <url>http://www.bioinformatics.org/surpresi</url>.</p> <p>Conclusions</p> <p>Probabilistic analysis of atomic central distances in globular proteins is capable of capturing distinct orientational preferences of amino acids as resulting from different sizes, charges and hydrophobic characters of their side chains. When idealized spatial preferences can be inferred from the sole amino acid composition of a protein, residues located in hydrophobically unfavorable environments can be easily detected. Such residues turn out to be often directly involved in binding ligands or interfacing with other proteins.</p

    Hermetia illucens (L.) (Diptera: Stratiomyidae) Odorant Binding Proteins and Their Interactions with Selected Volatile Organic Compounds: An In Silico Approach

    Get PDF
    The black soldier fly (BSF), Hermetia illucens (Diptera: Stratiomyidae), has considerable global interest due to its outstanding capacity in bioconverting organic waste to insect biomass, which can be used for livestock, poultry, and aquaculture feed. Mass production of this insect in colonies requires the development of methods concentrating oviposition in specific collection devices, while the mass production of larvae and disposing of waste may require substrates that are more palatable and more attractive to the insects. In insects, chemoreception plays an essential role throughout their life cycle, responding to an array of chemical, biological and environmental signals to locate and select food, mates, oviposition sites and avoid predators. To interpret these signals, insects use an arsenal of molecular components, including small proteins called odorant binding proteins (OBPs). Next generation sequencing was used to identify genes involved in chemoreception during the larval and adult stage of BSF, with particular attention to OBPs. The analysis of the de novo adult and larval transcriptome led to the identification of 27 and 31 OBPs for adults and larvae, respectively. Among these OBPs, 15 were common in larval and adult transcriptomes and the tertiary structures of 8 selected OBPs were modelled. In silico docking of ligands confirms the potential interaction with VOCs of interest. Starting from the information about the growth performance of H. illucens on different organic substrates from the agri-food sector, the present work demonstrates a possible correlation between a pool of selected VOCs, emitted by those substrates that are attractive for H. illucens females when searching for oviposition sites, as well as phagostimulants for larvae. The binding affinities between OBPs and selected ligands calculated by in silico modelling may indicate a correlation among OBPs, VOCs and behavioural preferences that will be the basis for further analysis
    corecore