511 research outputs found

    A novel graph-based method for targeted ligand-protein fitting

    Get PDF
    A thesis submitted to the Faculty of Creative Arts, Technologies & Science, University of Bedfordshire, in partial & fulfilment of the requirements for the degree of Master of Philosophy.The determination of protein binding sites and ligand -protein fitting are key to understanding the functionality of proteins, from revealing which ligand classes can bind or the optimal ligand for a given protein, such as protein/ drug interactions. There is a need for novel generic computational approaches for representation of protein-ligand interactions and the subsequent prediction of hitherto unknown interactions in proteins where the ligand binding sites are experimentally uncharacterised. The TMSite algorithms read in existing PDB structural data and isolate binding sites regions and identifies conserved features in functionally related proteins (proteins that bind the same ligand). The Boundary Cubes method for surface representation was applied to the modified PDB file allowing the creation of graphs for proteins and ligands that could be compared and caused no loss of geometric data. A method is included for describing binding site features of individual ligands conserved in terms of spatial relationships allowed identification of 3D motifs, named fingerprints, which could be searched for in other protein structures. This method combine with a modification of the pocket algorithm allows reduced search areas for graph matching. The methods allow isolation of the binding site from a complexed protein PDB file, identification of conserved features among the binding sites of individual ligand types, and search for these features in sequence data. In terms of spatial conservation create a fingerprint ofthe binding site that can be sought in other proteins of/mown structure, identifYing putative binding sites. The approach offers a novel and generic method for the identification of putative ligand binding sites for proteins for which there is no prior detailed structural characterisation of protein/ ligand interactions. It is unique in being able to convert PDB data into graphs, ready for comparison and thus fitting of ligand to protein with consideration of chemical charge and in the future other chemica! properties

    Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics Domains

    Get PDF
    Problemas cada vez mais complexos estão a ser tratados na àrea das ciências da vida. A aquisição de todos os dados que possam estar relacionados com o problema em questão é primordial. Igualmente importante é saber como os dados estão relacionados uns com os outros e com o próprio problema. Por outro lado, existem grandes quantidades de dados e informações disponíveis na Web. Os investigadores já estão a utilizar Data Mining e Machine Learning como ferramentas valiosas nas suas investigações, embora o procedimento habitual seja procurar a informação baseada nos modelos indutivos. Até agora, apesar dos grandes sucessos já alcançados com a utilização de Data Mining e Machine Learning, não é fácil integrar esta vasta quantidade de informação disponível no processo indutivo, com algoritmos proposicionais. A nossa principal motivação é abordar o problema da integração de informação de domínio no processo indutivo de técnicas proposicionais de Data Mining e Machine Learning, enriquecendo os dados de treino a serem utilizados em sistemas de programação de lógica indutiva. Os algoritmos proposicionais de Machine Learning são muito dependentes dos atributos dos dados. Ainda é difícil identificar quais os atributos mais adequados para uma determinada tarefa na investigação. É também difícil extrair informação relevante da enorme quantidade de dados disponíveis. Vamos concentrar os dados disponíveis, derivar características que os algoritmos de ILP podem utilizar para induzir descrições, resolvendo os problemas. Estamos a criar uma plataforma web para obter informação relevante para problemas de Bioinformática (particularmente Genómica) e Quimioinformática. Esta vai buscar os dados a repositórios públicos de dados genómicos, proteicos e químicos. Após o enriquecimento dos dados, sistemas Prolog utilizam programação lógica indutiva para induzir regras e resolver casos específicos de Bioinformática e Cheminformática. Para avaliar o impacto do enriquecimento dos dados com ILP, comparamos com os resultados obtidos na resolução dos mesmos casos utilizando algoritmos proposicionais.Increasingly more complex problems are being addressed in life sciences. Acquiring all the data that may be related to the problem in question is paramount. Equally important is to know how the data is related to each other and to the problem itself. On the other hand, there are large amounts of data and information available on the Web. Researchers are already using Data Mining and Machine Learning as a valuable tool in their researches, albeit the usual procedure is to look for the information based on induction models. So far, despite the great successes already achieved using Data Mining and Machine Learning, it is not easy to integrate this vast amount of available information in the inductive process with propositional algorithms. Our main motivation is to address the problem of integrating domain information into the inductive process of propositional Data Mining and Machine Learning techniques by enriching the training data to be used in inductive logic programming systems. The algorithms of propositional machine learning are very dependent on data attributes. It still is hard to identify which attributes are more suitable for a particular task in the research. It is also hard to extract relevant information from the enormous quantity of data available. We will concentrate the available data, derive features that ILP algorithms can use to induce descriptions, solving the problems. We are creating a web platform to obtain relevant bioinformatics (particularly Genomics) and Cheminformatics problems. It fetches the data from public repositories with genomics, protein and chemical data. After the data enrichment, Prolog systems use inductive logic programming to induce rules and solve specific Bioinformatics and Cheminformatics case studies. To assess the impact of the data enrichment with ILP, we compare with the results obtained solving the same cases using propositional algorithms

    Tarantulas and social spiders : a tale of sex and silk

    Get PDF
    Studies of spider silks indicate that they may outperform virtually all synthetic fibres in terms of strength, elasticity and toughness. To date, most silks studied come from only a select few species and likely underrepresent the immense diversity of the clades. Here, protein and mRNA sequence analyses were used to study silk from two types of spider. The first approach used ESI tandem mass spectrometry to sequence peptide fragments of a silk from a tarantula (Mygalomorphae, Theraphosidae), a hitherto neglected family. The results confirm that the common silk types found in araneomorph spiders, Spidroin 1 and Spidroin 2, are also found in mygalomorphs. A putative N-terminal domain that bears a striking similarity to the N-terminus of araneomorph pyriform silk was isolated. If correctly identified, this would be the first ever recorded N-terminal domain for a mygalomorph. The second approach taken was to construct a cDNA library from theraphosid silk glands and adjacent tissue. Sequencing identified a significant number of uniquely truncated rRNAs. These may be the result of specific 'fragile sites' within these transcripts, which would explain the discrete classes of length polymorphisms found. The cDNA library sequences also provided evidence consistent with RNA editing and furthermore identified the presence of both transcribed nuclear pseudogenes and transposable elements. These may reflect past evolutionary horizontal gene transfer events within the spider genome. Similar analysis of next generation sequencing data from the transcriptomes of three Stegodyphus spp. (Araneomorphae) reveal a range of apparent silk types with similarity to major ampullate, minor ampullate and pyriform silks. These were identified by searching for comparative sequence homologies using Microsoft Office Word. No flagelliform silk or recognisable sticky silks were identified, which is consistent with the biology of Stegodyphus species. In addition to studies of silk, previous common conceptions of dimensional morphologies were examined to see if they could adequately sex theraphosid spiders, including the species that was the subject of the silk study already described. An independent samples t-test was conducted to compare morphologies of particular leg hairs and statistical analysis demonstrated that there were significant differences between males and females (t (70) = 9.445, p < .001). This technique may be important in future evolutionary and ecological studies of theraphosids

    Exploring Written Artefacts

    Get PDF
    This collection, presented to Michael Friedrich in honour of his academic career at of the Centre for the Study of Manuscript Cultures, traces key concepts that scholars associated with the Centre have developed and refined for the systematic study of manuscript cultures. At the same time, the contributions showcase the possibilities of expanding the traditional subject of ‘manuscripts’ to the larger perspective of ‘written artefacts’

    Integrated multiple sequence alignment

    Get PDF
    Sammeth M. Integrated multiple sequence alignment. Bielefeld (Germany): Bielefeld University; 2005.The thesis presents enhancements for automated and manual multiple sequence alignment: existing alignment algorithms are made more easily accessible and new algorithms are designed for difficult cases. Firstly, we introduce the QAlign framework, a graphical user interface for multiple sequence alignment. It comprises several state-of-the-art algorithms and supports their parameters by convenient dialogs. An alignment viewer with guided editing functionality can also highlight or print regions of the alignment. Also phylogenetic features are provided, e.g., distance-based tree reconstruction methods, corrections for multiple substitutions and a tree viewer. The modular concept and the platform-independent implementation guarantee an easy extensibility. Further, we develop a constrained version of the divide-and-conquer alignment such that it can be restricted by anchors found earlier with local alignments. It can be shown that this method shares attributes of both, local and global aligners, in the quality of results as well as in the computation time. We further modify the local alignment step to work on bipartite (or even multipartite) sets for sequences where repeats overshadow valuable sequence information. In the end a technique is established that can accurately align sequences containing eventually repeated motifs. Finally, another algorithm is presented that allows to compare tandem repeat sequences by aligning them with respect to their possible repeat histories. We describe an evolutionary model including tandem duplications and excisions, and give an exact algorithm to compare two sequences under this model

    Tarantulas and social spiders : a tale of sex and silk

    Get PDF
    Studies of spider silks indicate that they may outperform virtually all synthetic fibres in terms of strength, elasticity and toughness. To date, most silks studied come from only a select few species and likely underrepresent the immense diversity of the clades. Here, protein and mRNA sequence analyses were used to study silk from two types of spider. The first approach used ESI tandem mass spectrometry to sequence peptide fragments of a silk from a tarantula (Mygalomorphae, Theraphosidae), a hitherto neglected family. The results confirm that the common silk types found in araneomorph spiders, Spidroin 1 and Spidroin 2, are also found in mygalomorphs. A putative N-terminal domain that bears a striking similarity to the N-terminus of araneomorph pyriform silk was isolated. If correctly identified, this would be the first ever recorded N-terminal domain for a mygalomorph. The second approach taken was to construct a cDNA library from theraphosid silk glands and adjacent tissue. Sequencing identified a significant number of uniquely truncated rRNAs. These may be the result of specific 'fragile sites' within these transcripts, which would explain the discrete classes of length polymorphisms found. The cDNA library sequences also provided evidence consistent with RNA editing and furthermore identified the presence of both transcribed nuclear pseudogenes and transposable elements. These may reflect past evolutionary horizontal gene transfer events within the spider genome. Similar analysis of next generation sequencing data from the transcriptomes of three Stegodyphus spp. (Araneomorphae) reveal a range of apparent silk types with similarity to major ampullate, minor ampullate and pyriform silks. These were identified by searching for comparative sequence homologies using Microsoft Office Word. No flagelliform silk or recognisable sticky silks were identified, which is consistent with the biology of Stegodyphus species. In addition to studies of silk, previous common conceptions of dimensional morphologies were examined to see if they could adequately sex theraphosid spiders, including the species that was the subject of the silk study already described. An independent samples t-test was conducted to compare morphologies of particular leg hairs and statistical analysis demonstrated that there were significant differences between males and females (t (70) = 9.445, p < .001). This technique may be important in future evolutionary and ecological studies of theraphosids
    corecore