Search CORE

229 research outputs found

GRIFFIN: a system for predicting GPCR–G-protein coupling selectivity using a support vector machine and a hidden Markov model

Author: Hirokawa Takatsugu
Mukai Hidehito
Muramatsu Takahiko
Suwa Makiko
Yabuki Yukimitsu
Publication venue: Oxford University Press
Publication date: 27/06/2005
Field of study

We describe a novel system, GRIFFIN (G-protein and Receptor Interaction Feature Finding INstrument), that predicts G-protein coupled receptor (GPCR) and G-protein coupling selectivity based on a support vector machine (SVM) and a hidden Markov model (HMM) with high sensitivity and specificity. Based on our assumption that whole structural segments of ligands, GPCRs and G-proteins are essential to determine GPCR and G-protein coupling, various quantitative features were selected for ligands, GPCRs and G-protein complex structures, and those parameters that are the most effective in selecting G-protein type were used as feature vectors in the SVM. The main part of GRIFFIN includes a hierarchical SVM classifier using the feature vectors, which is useful for Class A GPCRs, the major family. For the opsins and olfactory subfamilies of Class A and other minor families (Classes B, C, frizzled and smoothened), the binding G-protein is predicted with high accuracy using the HMM. Applying this system to known GPCR sequences, each binding G-protein is predicted with high sensitivity and specificity (>85% on average). GRIFFIN () is freely available and allows users to easily execute this reliable prediction of G-proteins

Crossref

PubMed Central

Modeling CNS receptor binding profiles of small molecules

Author: Ferreira Vânia Alexandra Conceição
Publication venue
Publication date: 01/01/2015
Field of study

Tese de mestrado, Bioinformática e Biologia Computacional (Biologia Computacional), Universidade de Lisboa, Faculdade de Ciências 2015A identificação de novos compostos ativos, passíveis de serem aplicados no tratamento de doenças, é a principal preocupação da indústria farmacêutica, que se foca em encontrar compostos de atuação altamente específica, evitando assim a existência de efeitos secundários. Contudo, este processo nem sempre é fácil, pois tem sido comprovado que muitas moléculas têm como alvo mais do que um recetor. Estas são moléculas promiscuas que ao se ligarem a diferentes recetores podem levar ao surgimento de efeitos inesperados. Este problema recebe o nome de polifarmacologia e muitos estudos têm sido desenvolvidos no seu âmbito. Na primeira parte deste trabalho, tentou-se estabelecer uma relação entre os perfis de ligação de moléculas a diferentes recetores e a sua relação com a semelhança entre as sequências proteicas dos mesmos. Verificou-se que não existe um padrão constante e que, na maioria dos casos, as moléculas apresentam perfis de ligação diferentes, mesmo para recetores muito semelhantes. Este resultado mostrou que a polifarmacologia é, de facto, um problema complexo e que é necessário investir em diferentes tipos de informação para prever perfis de ligação e evitar o surgimento de efeitos secundários indesejados. Para prever todos os efeitos resultantes da atuação de uma molécula, é necessário ter um conhecimento prévio acerca das interações entre esta e os recetores, conhecer os tipos de ligações e também as suas forças. Uma forma de obter este conhecimento passa por experiências laboratoriais, no entanto, estes são processos muito dispendiosos e que consomem muito tempo. Uma forma mais acessível de abordar esta questão foi criando modelos computacionais capazes de prever possíveis interações entre moléculas e recetores com o objetivo de identificar moléculas alvo para a realização dos ensaios experimentais, aumentando assim a probabilidade de sucesso. Muitos destes modelos computacionais são baseados em métodos de aprendizagem automática, abordagens muito comuns em informática. Estes métodos baseiam-se num processo de aprendizagem de entidades, tendo como fundamento as suas caraterísticas já conhecidas, para criar um modelo capaz de classificar novas entidades. O sucesso destas técnicas tem sido comprovado em vários contextos da bioinformática e são uma aposta promissora na predição de interações entre moléculas e recetores. Com este trabalho, pretendeu-se utilizar uma abordagem de aprendizagem automática para desenvolver um modelo de predição de interações entre moléculas e recetores, tendo por base as semelhanças estruturais entre as moléculas e os seus respetivos níveis de atividade, já conhecidos, para recetores de serotonina e dopamina. O interesse nestas duas famílias de recetores recai no facto de fazerem parte da superfamília de recetores acoplados à proteína G, uma das mais importantes presentes no Sistema Nervoso Central. Para além disso, é conhecido o envolvimento de recetores de serotonina e dopamina em doenças neurológicas, como a doença de Parkinson e o Distúrbio de Défice de Atenção e Hiperatividade. Assim, surge a necessidade de identificar, para estes recetores, moléculas candidatas a serem utilizadas como ponto de partida para o desenvolvimento de novos fármacos, a serem aplicados no tratamento de algumas destas doenças neurológicas. Como técnica de aprendizagem automática, optou-se pela utilização de um classificador de Naive Bayes, um método de aprendizagem supervisionada baseado no Teorema de Bayes e que tem como pressuposto a independência entre as características que classificam uma entidade. Para obter a semelhança estrutural entre as moléculas foi utilizado o NAMS (Non-contiguous Atom Matching Structural Similarity), um método que identifica o alinhamento ótimo entre os átomos de duas moléculas tendo em conta, não só os seus perfis topológicos, mas também os próprios átomos e as características das ligações entre os mesmos. Para a concretização deste trabalho foi obtida informação acerca de moléculas com ligações, já identificadas, a recetores de serotonina e dopamina, tendo estes dados sido recolhidos com base em informação presente no ChEMBL. Adicionalmente, foram também recolhidos os valores de bioatividade de cada molécula para cada recetor, sobre a forma de Kis, as constantes de inibição que quantificam as forças de interação entre as moléculas e os recetores em estudo. No decorrer deste trabalho, foram construídos três modelos de predição de interações molécula-recetor. Estes incluíram informação relativa a semelhanças estruturais entre moléculas e os seus níveis de bioatividade, perfis de ligação de moléculas para com diferentes recetores e uma combinação de toda a informação anterior. O primeiro modelo de predição foi construído tendo em conta apenas a informação relativa a semelhanças estruturais entre as moléculas e os seus níveis de atividade. Para isso, foram identificadas, para cada recetor, moléculas kernel, isto é, moléculas muito ativas e estruturalmente distintas das restantes, com as quais as moléculas em teste são comparadas. Tendo por base as suas semelhanças estruturais a cada molécula kernel, as probabilidades de ligação a cada recetor são então calculadas. Apesar deste modelo ter demonstrado resultados promissores durante o processo de validação, uma elevada taxa de falsos negativos mostrou que se trata de um modelo conservador e que deve ser aplicado quando se pretendem resultados mais precisos. O segundo modelo foi construído de modo a verificar se a informação relativa ao comportamento de ligação de uma molécula para com outros recetores pode ser relevante na predição da sua interação com novos recetores. Para isso, foram tidas em conta apenas as moléculas comuns entre recetores e os seus níveis de bioatividade. Com esta informação, foram construídas duas bases de dados contendo as probabilidades usadas aquando do cálculo das probabilidades de interação entre as moléculas em teste e os recetores. Durante o processo de validação, este modelo evidenciou melhores resultados do que o primeiro modelo. Contudo, estes foram considerados como devidos a uma sobrerrepresentação de moléculas ativas nos dados recolhidos. No entanto, não querendo descartar a informação proveniente de outros recetores, os dois modelos foram integrados para construir o terceiro modelo. O terceiro modelo, integrando informação relativa a semelhanças estruturais entre moléculas, os seus níveis de bioatividade e informação relativa a outros recetores, foi o que demostrou melhores resultados, atingindo o maior nível de acuidade. Para além disso, foi também o modelo que mostrou um maior equilíbrio entre as proporções de falsos positivos e falsos negativos. Consequentemente, este modelo mostrou ser a melhor opção na identificação de potenciais interações entre um conjunto de moléculas e recetores de serotonina e dopamina. Numa tentativa de aumentar o desempenho dos modelos propostos, tentou-se identificar, para cada recetor, um valor de probabilidade mais preciso a partir do qual uma molécula deveria ser classificada como ativa. No entanto, apesar de aumentar a especificidade e precisão dos modelos propostos, este ajustamento não conduziu a um melhor desempenho. Em conjunto, os resultados obtidos mostraram que o classificador de Naive Bayes é um método passível de ser utilizado na construção de modelos de predição de interações entre moléculas e recetores. Também a ferramenta NAMS demostrou um bom desempenho durante a comparação estrutural de moléculas, o que se tornou evidente pelos resultados obtidos durante o processo de validação dos modelos. Adicionalmente, verificou-se que a utilização da semelhança estrutural entre moléculas em conjunto com os seus níveis de bioatividade é uma abordagem promissora na identificação de moléculas candidatas a validação experimental. A nível global, verificámos que a integração de informação de diferentes tipos continua a ser a melhor alternativa na previsão de perfis de ligação entre moléculas e recetores. Para além disso, comprovámos, mais uma vez, que os métodos de aprendizagem automática são uma forma eficiente e pouco dispendiosa de selecionar novos compostos candidatos para validação in vitro.Pharmaceutical industry has been focused on finding highly selective single target drugs. However, different studies have been showing that this is not always possible since many molecules can bind to more than one receptor. These molecules are described as promiscuous compounds and their polypharmacological behavior has been case of many studies. In the first part of our work, we have investigated the relationship between molecules binding profiles and the sequence similarity of their target receptors. We have found different patterns but no evident relationship was identified since many molecules present different binding patterns for different receptors, even when they are very closed. These results show the level of complexity inherent to pharmacology and the importance of finding new methods to predict molecules binding profiles. When binding to different receptors, a drug can led to unpredictable side-effects which is a limitation in case of disease treatment. To avoid side-effects it is import to get knowledge on molecules’ binding profiles. With this purpose, different approaches have been developed to predict interactions between molecules and receptors. Many of these approaches rely on the use of machine learning techniques to predict drug-target interactions. These techniques have been widely used in informatics and have already shown their contribute to bioinformatics. In this work, we have used a machine learning method to predict interactions between molecules and serotonin and dopamine receptors, two of the most important families of receptors present in the Central Nervous System. To construct our model, we have used the Naïve Bayes classifier, which is a supervised learning method based on applying Bayes’ Theorem with the assumption of conditional independence between features. We have developed three different models that include co-activity data between receptors, molecular similarity and a combination of these two. Despite the three models have presented promising results, the model integrating all the data has shown to be the one with the best performance. Our results have demonstrated that Naïve Bayes is an efficient method to drug target interactions prediction. Moreover, it was demonstrated that structural similarity between compounds together with their bioactivity levels is a promising approach to identify candidate molecules for further in vitro validation

Universidade de Lisboa: Repositório.UL

A database for G proteins and their interaction with GPCRs

Author: Bagos Pantelis G
Elefsinioti Antigoni L
Hamodrakas Stavros J
Spyropoulos Ioannis C
Publication venue: BioMed Central
Publication date: 01/12/2004
Field of study

BACKGROUND: G protein-coupled receptors (GPCRs) transduce signals from extracellular space into the cell, through their interaction with G proteins, which act as switches forming hetero-trimers composed of different subunits (α,β,γ). The α subunit of the G protein is responsible for the recognition of a given GPCR. Whereas specialised resources for GPCRs, and other groups of receptors, are already available, currently, there is no publicly available database focusing on G Proteins and containing information about their coupling specificity with their respective receptors. DESCRIPTION: gpDB is a publicly accessible G proteins/GPCRs relational database. Including species homologs, the database contains detailed information for 418 G protein monomers (272 Gα, 87 Gβ and 59 Gγ) and 2782 GPCRs sequences belonging to families with known coupling to G proteins. The GPCRs and the G proteins are classified according to a hierarchy of different classes, families and sub-families, based on extensive literature searchs. The main innovation besides the classification of both G proteins and GPCRs is the relational model of the database, describing the known coupling specificity of the GPCRs to their respective α subunit of G proteins, a unique feature not available in any other database. There is full sequence information with cross-references to publicly available databases, references to the literature concerning the coupling specificity and the dimerization of GPCRs and the user may submit advanced queries for text search. Furthermore, we provide a pattern search tool, an interface for running BLAST against the database and interconnectivity with PRED-TMR, PRED-GPCR and TMRPres2D. CONCLUSIONS: The database will be very useful, for both experimentalists and bioinformaticians, for the study of G protein/GPCR interactions and for future development of predictive algorithms. It is available for academics, via a web browser at the URL

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Building an automated platform for the classification of peptides/proteins using machine learning

Author: Sequeira Ana Marta Fernandes Tavares
Publication venue
Publication date: 18/11/2019
Field of study

Dissertação de mestrado em BioinformaticsOne of the challenging problems in bioinformatics is to computationally characterize sequences, structures and functions of proteins. Sequence-derived structural and physico-chemical properties of proteins have been used in the development of machine learning models in protein related problems. However, tools and platforms to calculate features and perform Machine learning (ML) with proteins are scarce and have their limitations in terms of effectiveness, user-friendliness and capacity. Here, a generic modular automated platform for the classification of proteins based on their physicochemical properties using different ML algorithms is proposed. The tool developed, as a Python package, facilitates the major tasks of ML and includes modules to read and alter sequences, calculate protein features, preprocess datasets, execute feature reduction and selection, perform clustering, train and optimize ML models and make predictions. As it is modular, the user retains the power to alter the code to fit specific needs. This platform was tested to predict membrane active anticancer and antimicrobial peptides and further used to explore viral fusion peptides. Membrane-interacting peptides play a crucial role in several biological processes. Fusion peptides are a subclass found in enveloped viruses, that are particularly relevant for membrane fusion. Determining what are the properties that characterize fusion peptides and distinguishing them from other proteins is a very relevant scientific question with important technological implications. Using three different datasets composed by well annotated sequences, different feature extraction techniques and feature selection methods (resulting in a total of over 20 datasets), seven ML models were trained and tested, using cross validation for error estimation and grid search for model selection. The different models, feature sets and feature selection techniques were compared. The best models obtained for distinct metric were then used to predict the location of a known fusion peptide in a protein sequence from the Dengue virus. Feature importances were also analysed. The models obtained will be useful in future research, also providing a biological insight of the distinctive physicochemical characteristics of fusion peptides. This work presents a freely available tool to perform ML-based protein classification and the first global analysis and prediction of viral fusion peptides using ML, reinforcing the usability and importance of ML in protein classification problems.Um dos problemas mais desafiantes em bioinformática é a caracterização de sequências, estruturas e funções de proteínas. Propriedades físico-químicas e estruturais derivadas da sequêcia proteica têm sido utilizadas no desenvolvimento de modelos de aprendizagem máquina (AM). No entanto, ferramentas para calcular estes atributos são escassas e têm limitações em termos de eficiência, facilidade de uso e capacidade de adaptação a diferentes problemas. Aqui, é descrita uma plataforma modular genérica e automatizada para a classificação de proteínas com base nas suas propriedades físico-químicas, que faz uso de diferentes algoritmos de AM. A ferramenta desenvolvida facilita as principais tarefas de AM e inclui módulos para ler e alterar sequências, calcular atributos de proteínas, realizar pré-processamento de dados, fazer redução e seleção de features, executar clustering, criar modelos de AM e fazer previsões. Como é construído de forma modular, o utilizador mantém o poder de alterar o código para atender às suas necessidades específicas. Esta plataforma foi testada com péptidos anticancerígenos e antimicrobianos e foi ainda utilizada para explorar péptidos de fusão virais. Os péptidos de fusão são uma classe de péptidos que interagem com a membrana, encontrados em vírus encapsulados e que são particularmente relevantes para a fusão da membrana do vírus com a membrana do hospedeiro. Determinar quais são as propriedades que os caracterizam é uma questão científica muito relevante, com importantes implicações tecnológicas. Usando três conjuntos de dados diferentes compostos por sequências bem anotadas, quatro técnicas diferentes de extração de features e cinco métodos diferentes de seleção de features (num total de 24 conjuntos de dados testados), sete modelos de AM, com validação cruzada de io vezes e uma abordagem de pesquisa em grelha, foram treinados e testados. Os melhores modelos obtidos, com avaliações MCC entre 0,7 e o,8 e precisão entre 0,85 e 0,9, foram utilizados para prever a localização de um péptido de fusão conhecido numa sequência da proteína de fusão do vírus do Dengue. Os modelos obtidos para prever a localização do péptido de fusão são úteis em pesquisas futuras, fornecendo também uma visão biológica das características físico-químicas distintivas dos mesmos. Este trabalho apresenta uma ferramenta disponível gratuitamente para realizar a classificação de proteínas com AM e a primeira análise global de péptidos de fusão virais usando métodos baseados em AM, reforçando a usabilidade e a importância da AM em problemas de classificação de proteínas

Universidade do Minho: RepositoriUM

Profiling patterns of interhelical associations in membrane proteins.

Author: Gorka Lasso Cabrera
Publication venue
Publication date: 01/01/2007
Field of study

A novel set of methods has been developed to characterize polytopic membrane proteins at the topological, organellar and functional level, in order to reduce the existing functional gap in the membrane proteome. Firstly, a novel clustering tool was implemented, named PROCLASS, to facilitate the manual curation of large sets of proteins, in readiness for feature extraction. TMLOOP and TMLOOP writer were implemented to refine current topological models by predicting membrane dipping loops. TMLOOP applies weighted predictive rules in a collective motif method, to overcome the inherent limitations of single motif methods. The approach achieved 92.4% accuracy in sensitivity and 100% reliability in specificity and 1,392 topological models described in the Swiss-Prot database were refined. The subcellular location (TMLOCATE) and molecular function (TMFUN) prediction methods rely on the TMDEPTH feature extraction method along data mining techniques. TMDEPTH uses refined topological models and amino acid sequences to calculate pairs of residues located at a similar depth in the membrane. Evaluation of TMLOCATE showed a normalized accuracy of 75% in discriminating between proteins belonging to the main organelles. At a sequence similarity threshold of 40%, TMFLTN predicted main functional classes with a sensitivity of 64.1-71.4%) and 70% of the olfactory GPCRs were correctly predicted. At a sequence similarity threshold of 90%, main functional classes were predicted with a sensitivity of 75.6-92.8%) and class A GPCRs were sub-classified with a sensitivity of 84.5%>-92.9%. These results reflect a direct association between the spatial arrangement of residues in the transmembrane regions and the capacity for polytopic membrane proteins to carry out their functions. The developed methods have for the first time categorically shown that the transmembrane regions hold essential information associated with a wide range of functional properties such as filtering and gating processes, subcellular location and molecular function

Cronfa at Swansea University

Protein classification from primary structures in the context of database biocuration

Author: Terpugova Ilmira
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2017
Field of study

En col·laboració amb la Universitat de Barcelona (UB) i la Universitat Rovira i Virgili (URV)The problem of automatic protein classification using only their primary structures plays an important role in modern bioinformatics research, especially for proteins whose 3-D structures are yet unknown. One of these types of proteins, at the center of this thesis, is class C of the G-Protein Coupled Receptors super-family. This class is of a great interest in pharmacoproteomics, from the point of view of drug design, because of their involvement in signaling pathways in cells of the central nervous system. The automatic classification of protein sequences may improve the understanding of their function and be a basis for the prediction of their 3-D structure, which is an information of interest for drug research. This thesis compares classification results for different versions of the same database, including the most recent ones. This exploration of the evolution of classification provides relevant information about its capabilities and limitations. Furthermore, and given that several data transformations are investigated, it also provides strong evidence concerning the robustness of these transformations. The other important contribution of the thesis is the investigation oriented towards the definition of approaches for semi-automatized database curation by using the automatic evaluation of the database changes between versions with advanced machine learning techniques. The thesis shows the consistency in improvements of the quality of the data between three versions of the database across different classification techniques and different primary structure transformations. It also validates the recently introduced continuous distributed representation for protein sequences, originally developed for natural text processing. This new representation is shown to be adequate and robust for the task of primary structure classification

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Decoding Neural Signals with Computational Models: A Systematic Review of Invasive BMI

Author: Abdi Mohammad Foad
Ahmadyani Hamed
Bokani Ayub
Firuzi Rezwan
Hassan Jahan
Naderi Dana
Publication venue
Publication date: 07/11/2022
Field of study

There are significant milestones in modern human's civilization in which mankind stepped into a different level of life with a new spectrum of possibilities and comfort. From fire-lighting technology and wheeled wagons to writing, electricity and the Internet, each one changed our lives dramatically. In this paper, we take a deep look into the invasive Brain Machine Interface (BMI), an ambitious and cutting-edge technology which has the potential to be another important milestone in human civilization. Not only beneficial for patients with severe medical conditions, the invasive BMI technology can significantly impact different technologies and almost every aspect of human's life. We review the biological and engineering concepts that underpin the implementation of BMI applications. There are various essential techniques that are necessary for making invasive BMI applications a reality. We review these through providing an analysis of (i) possible applications of invasive BMI technology, (ii) the methods and devices for detecting and decoding brain signals, as well as (iii) possible options for stimulating signals into human's brain. Finally, we discuss the challenges and opportunities of invasive BMI for further development in the area.Comment: 51 pages, 14 figures, review articl

arXiv.org e-Print Archive

IST Austria Thesis

Author: Morri Maurizio
Publication venue: IST Austria
Publication date: 01/01/2016
Field of study

IST Austria: PubRep (Institute of Science and Technology)

Likelihood of protein structure determination

Author: Kumar Sampathrajan Suresh
Publication venue
Publication date: 01/01/2010
Field of study

Strukturelle Genomanalyse (SG) beinhaltet die, mit hohem datendurchsatz verbundene bestimmung der dreidimensionalen struktur von makromolekülen durch experimentelle Methoden wie röntgenstrahlen-kristallographie und NMR spektroskopie. Eines der ziele von SG ist es, zeit und kosten der bestimmung von dreidimensionalen proteinstrukturen zu reduzieren, für die homologe strukturen noch nicht gelöst worden sind. Mehrere faktoren wie unregelmäßige conformationen, unzulässige selektion von domängrenzen und löslichkeit können die produktion von proteinkonstrukten für die strukturbiologie erschweren. Zuverlässige, auf aminosäuresequenz basierende prädiktoren zur berechnung von proteinkristallisation sind folglich von nöten. Die vorhersage von unregelmäßigen konformationen ist essentiell, da diese schwierigkeiten in der kristallisation verursachen können. In dieser arbeit wird eine neue methode präsentiert, die es erlaubt, ungeordnete residuen auf basis der aminosäuresequenz mit hoher genauigkeit vorherzusagen, indem verschiedene, auf einer konsensusmethode basierende vorhersagemittel verwendet werden. Die Leistung dieser neuen methode ist signifikant besser als von jedem einzelnen, bisher erwähnten Prädiktor. Zusätzlich ist es wichtig, die voraussetzungen für den quartärstatus eines proteins auf basis seiner sequenz vorherzusagen. Eine Proteinkette kann aus einem monomeren protein bestehen, oder kann, zusammen mit anderen ketten, oligomere komplexe formen, die entweder aus homo-oligomeren oder hetero-oligomeren bestehen können. Im letzten fall muss vermieden werden, die dreidimensionale struktur eines einzelnen protomers zu bestimmen, weil es nicht funktionell ist und auch extrem schwer in löslicher form zu exprimieren ist. Es ist daher erstrebenswert, ein berechnungsmittel zu nützen, das vorherzusagen erlaubt, ob ein potentielles genprodukt teil eines permanenten und obligaten hetero-oligomeren komplexes ist. Hier wird eine neue, auf der aminosäuresequenz basierende methode präsentiert, um hetero-oligomere von monomer und homo-oligomeren proteinen und auch um monomere von homo-oligomeren mit hoher genauigkeit zu unterscheiden. Das erfordernis von metallionen ist im design von strukturbiologischen experimenten ebenso wichtig. Metalloproteine bilden etwa ein drittel der proteoms. Die vorhersage von metalloproteinen hilft kristallographen, geeignetes wachstumsmedium für überexpressionsstudien auszuwählen und auch die wahrscheinlichkeit zu erhöhen, ein korrekt gefaltetes und funktionelles molekül zu erhalten. Hier wird gezeigt, dass die aufnahme von metallionen von proteinen auf basis der aminosäurenzusammensetzung und durch verwenden von lernfähigen analyseprogrammen mit hoher genauigkeit vorhergesagt werden kann. Die ergebnisse in der vorliegenden Doktorarbeit stellen die basis für das sorgfältige design von Proteinkonstrukten dar. Diese computer basierenden selektionsmethoden sind hilfreich, um die auswahl von unmöglichen Zielen zu vermeiden – ein Muss in Strukturbiologie und Proteomics.Structural Genomics (SG) involves the high-throughput determination of threedimensional structures of macromolecules by experimental methods such as X-ray crystallography and NMR spectroscopy. One of the aims of SG is to reduce the time and cost in the determination of three-dimensional protein structures for which a homologous structure had not yet been solved. Several factors such as conformational disorder, improper selection of domain boundaries and solubility can hamper the production of protein constructs for structural biology. Reliable computational protein crystallization propensity predictors, based on amino acid sequences, are consequently required. Prediction of protein conformational disorder is important since it can cause difficulty in crystallization. In this work, a new procedure is presented that allows one to predict disordered residues with high accuracy on the basis of amino acid sequences, by using a consensus method based on various prediction tools. The performance of this new procedure is significantly better than that of each individual predictor previously reported. Furthermore, it is important to be able to predict the quaternary status requirements of a protein on the basis of its sequence. A protein chain can be a monomeric protein or it can form, together with other chains, oligomeric assemblies, which can be either homooligomers or hetero-oligomers. In the later case, it must be avoided to determine the three-dimensional structure of a single protomer, since it will not be functional and it will also be extremely difficult to express in a soluble form. It is thus desirable to have a computational tool that allows one to predict if a potential gene product is a part of permanent and obligate hetero-oligomeric assembly. A new method is presented for discriminating hetero-oligomers from monomeric and homo-oligomeric proteins and also between monomers and homo-oliogmers with high accuracy on the basis of amino acid sequences. Metal ion requirements are also important in designing structural biology experiments. Metalloproteins constitute about one-third of the proteome. Prediction of metalloprotein helps crystallographers to select the proper growth medium for over-expression studies and also to increase the probability of obtaining a properly folded and functional molecule. Here it is shown that the uptake of metal ions by proteins can be predicted with high accuracy on the basis of the amino acid composition and by using machine learning methods. The results described in the present Thesis provide a basis for the careful design of protein constructs. These computational screening methods are helpful to avoid the selection of 'impossible' targets- a must in structural biology and proteomics

OTHES

Applying machine learning to predict adipose browning capacity and mitochondria-endoplasmic reticulum crosstalk

Author: Jiang Li
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 10/12/2020
Field of study

Digitale Hochschulschriften der LMU