7,439 research outputs found

    A Semantic Grid Service for Experimentation with an Agent-Based Model of Land-Use Change

    Get PDF
    Agent-based models, perhaps more than other models, feature large numbers of parameters and potentially generate vast quantities of results data. This paper shows through the FEARLUS-G project (an ESRC e-Social Science Initiative Pilot Demonstrator Project) how deploying an agent-based model on the Semantic Grid facilitates international collaboration on investigations using such a model, and contributes to establishing rigorous working practices with agent-based models as part of good science in social simulation. The experimental workflow is described explicitly using an ontology, and a Semantic Grid service with a web interface implements the workflow. Users are able to compare their parameter settings and results, and relate their work with the model to wider scientific debate.Agent-Based Social Simulation, Experiments, Ontologies, Replication, Semantic Grid

    Building an automated platform for the classification of peptides/proteins using machine learning

    Get PDF
    Dissertação de mestrado em BioinformaticsOne of the challenging problems in bioinformatics is to computationally characterize sequences, structures and functions of proteins. Sequence-derived structural and physico-chemical properties of proteins have been used in the development of machine learning models in protein related problems. However, tools and platforms to calculate features and perform Machine learning (ML) with proteins are scarce and have their limitations in terms of effectiveness, user-friendliness and capacity. Here, a generic modular automated platform for the classification of proteins based on their physicochemical properties using different ML algorithms is proposed. The tool developed, as a Python package, facilitates the major tasks of ML and includes modules to read and alter sequences, calculate protein features, preprocess datasets, execute feature reduction and selection, perform clustering, train and optimize ML models and make predictions. As it is modular, the user retains the power to alter the code to fit specific needs. This platform was tested to predict membrane active anticancer and antimicrobial peptides and further used to explore viral fusion peptides. Membrane-interacting peptides play a crucial role in several biological processes. Fusion peptides are a subclass found in enveloped viruses, that are particularly relevant for membrane fusion. Determining what are the properties that characterize fusion peptides and distinguishing them from other proteins is a very relevant scientific question with important technological implications. Using three different datasets composed by well annotated sequences, different feature extraction techniques and feature selection methods (resulting in a total of over 20 datasets), seven ML models were trained and tested, using cross validation for error estimation and grid search for model selection. The different models, feature sets and feature selection techniques were compared. The best models obtained for distinct metric were then used to predict the location of a known fusion peptide in a protein sequence from the Dengue virus. Feature importances were also analysed. The models obtained will be useful in future research, also providing a biological insight of the distinctive physicochemical characteristics of fusion peptides. This work presents a freely available tool to perform ML-based protein classification and the first global analysis and prediction of viral fusion peptides using ML, reinforcing the usability and importance of ML in protein classification problems.Um dos problemas mais desafiantes em bioinformática é a caracterização de sequências, estruturas e funções de proteínas. Propriedades físico-químicas e estruturais derivadas da sequêcia proteica têm sido utilizadas no desenvolvimento de modelos de aprendizagem máquina (AM). No entanto, ferramentas para calcular estes atributos são escassas e têm limitações em termos de eficiência, facilidade de uso e capacidade de adaptação a diferentes problemas. Aqui, é descrita uma plataforma modular genérica e automatizada para a classificação de proteínas com base nas suas propriedades físico-químicas, que faz uso de diferentes algoritmos de AM. A ferramenta desenvolvida facilita as principais tarefas de AM e inclui módulos para ler e alterar sequências, calcular atributos de proteínas, realizar pré-processamento de dados, fazer redução e seleção de features, executar clustering, criar modelos de AM e fazer previsões. Como é construído de forma modular, o utilizador mantém o poder de alterar o código para atender às suas necessidades específicas. Esta plataforma foi testada com péptidos anticancerígenos e antimicrobianos e foi ainda utilizada para explorar péptidos de fusão virais. Os péptidos de fusão são uma classe de péptidos que interagem com a membrana, encontrados em vírus encapsulados e que são particularmente relevantes para a fusão da membrana do vírus com a membrana do hospedeiro. Determinar quais são as propriedades que os caracterizam é uma questão científica muito relevante, com importantes implicações tecnológicas. Usando três conjuntos de dados diferentes compostos por sequências bem anotadas, quatro técnicas diferentes de extração de features e cinco métodos diferentes de seleção de features (num total de 24 conjuntos de dados testados), sete modelos de AM, com validação cruzada de io vezes e uma abordagem de pesquisa em grelha, foram treinados e testados. Os melhores modelos obtidos, com avaliações MCC entre 0,7 e o,8 e precisão entre 0,85 e 0,9, foram utilizados para prever a localização de um péptido de fusão conhecido numa sequência da proteína de fusão do vírus do Dengue. Os modelos obtidos para prever a localização do péptido de fusão são úteis em pesquisas futuras, fornecendo também uma visão biológica das características físico-químicas distintivas dos mesmos. Este trabalho apresenta uma ferramenta disponível gratuitamente para realizar a classificação de proteínas com AM e a primeira análise global de péptidos de fusão virais usando métodos baseados em AM, reforçando a usabilidade e a importância da AM em problemas de classificação de proteínas

    The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases

    Get PDF
    One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs

    Predicting locations of cryptic pockets from single protein structures using the PocketMiner graph neural network

    Get PDF
    Cryptic pockets expand the scope of drug discovery by enabling targeting of proteins currently considered undruggable because they lack pockets in their ground state structures. However, identifying cryptic pockets is labor-intensive and slow. The ability to accurately and rapidly predict if and where cryptic pockets are likely to form from a structure would greatly accelerate the search for druggable pockets. Here, we present PocketMiner, a graph neural network trained to predict where pockets are likely to open in molecular dynamics simulations. Applying PocketMiner to single structures from a newly curated dataset of 39 experimentally confirmed cryptic pockets demonstrates that it accurately identifies cryptic pockets (ROC-AUC: 0.87) \u3e1,000-fold faster than existing methods. We apply PocketMiner across the human proteome and show that predicted pockets open in simulations, suggesting that over half of proteins thought to lack pockets based on available structures likely contain cryptic pockets, vastly expanding the potentially druggable proteome

    Structure determination of membrane proteins by electron crystallography

    Get PDF
    A fundamental principle of life is the separation of environments into different compartments. Prokaryotes shield their interior from the environment by a plasma membrane and in some cases also by a cell wall. Eukaryotes refine this compartmentalization by building different organelles for different parts of the cell metabolism. Nevertheless, these different compartments are dependent on each other and are interconnected by membrane proteins that transport specific nutrients, hormones, ions, water and waste products across the membrane and facilitate signal transmission between different compartments. Understanding the structure and function of membrane proteins can therefore allow an enormous insight into the regulation of different metabolic pathways. The electron microscope (EM) proved itself a great tool for studying membrane proteins, offering the unique opportunity to image membrane proteins within a lipid bilayer as close to the natural conditions as possible. Processing of images acquired by an electron microscope poses a challenging task for both scientist and processing hardware. Newly developed and optimized algorithms are needed to improve the image processing to a level that allows atomic resolution to be achieved regularly. Membrane proteins pose a difficult challenge for a structural biologist. To crystallize membrane proteins into well ordered two dimensional (2D) or three dimensional (3D) crystals is one of the most important prerequisites for structural analysis at the atomic level, yet membrane proteins are notoriously difficult to crystallize. One exception may be bacteriorhodopsin, which forms near-perfect crystals already in its native membrane. This may explain the fact that the first 2D electron crystallographic structure determined at 7 Å resolution by Henderson and Unwin[20][43] in 1975 was the structure of bacteriorhodopsin. In 1990 the structure of Br was determined to atomic resolution by Henderson et al.[19], being the first atomic structure of a membrane protein. The structure determination of Br was also the starting point for the mrc program suite, which is widely used at the moment in the, albeit small, 2D electron crystallography community. Using the mrc software Kühlbrandt et al.[26] solved the structure of the light-harvesting chlorophyll a/b-protein complex in 1994. For recording the images they used the spot scan technique developed by Downing in 1991[9]. The first aquaporin water channel determined was aquaporin 1, resolved by Walz et al. in 1997[45] at 6 Å resolution, and subsequently solved to atomic resolution by Murata et al. in 2000[29]. Recently, several more aquaporin structures were determined by 2D electron crystallographic methods, aquaporin-0 (AQP0) by Gonen et al. in 2004[14] at 3 Å and in 2005[13] at 1.9 Å and aquaporin-4 (AQP4) by Hiroaki et al. in 2006[22]. Interestingly, AQP4 shows exactly the same monomer arrangement as SoPIP2;1. The recent publications show that the trend goes from recording solely images to the recording of diffraction data in combination with images or even to recording diffraction data exclusively, and then using methods developed for x-ray crystallography to obtain the phase information. Given the fact that the software available for processing of 2D electron diffraction patterns is less evolved than the one for processing images, and given this new development of increased usage of diffraction patterns, it only makes sense to focus on implementing new and improved programs for 2D electron diffraction processing. In this work I would like to present the advances I achieved in the structural determination of aquaporin 2, as well as my contribution to other projects, in particular the structural investigations of SoPIP2;1 and KdgM. I will also explain the modified sample preparation methods which made data recording at high tilt angles more reliable and achieved an improvement in resolution of the measured data. A second, equally important and detailed part of my thesis is the work invested in improving and extending the image processing to a point where a user, not adept in programming in several languages, can use it and produce good results. For this I improved the functionality and performance at several points, including a strong emphasis on user friendliness and ease of maintenance

    DisPredict: A Predictor of Disordered Protein Using Optimized RBF Kernel

    Get PDF
    Intrinsically disordered proteins or, regions perform important biological functions through their dynamic conformations during binding. Thus accurate identification of these disordered regions have significant implications in proper annotation of function, induced fold prediction and drug design to combat critical diseases. We introduce DisPredict, a disorder predictor that employs a single support vector machine with RBF kernel and novel features for reliable characterization of protein structure. DisPredict yields effective performance. In addition to 10-fold cross validation, training and testing of DisPredict was conducted with independent test datasets. The results were consistent with both the training and test error minimal. The use of multiple data sources, makes the predictor generic. The datasets used in developing the model include disordered regions of various length which are categorized as short and long having different compositions, different types of disorder, ranging from fully to partially disordered regions as well as completely ordered regions. Through comparison with other state of the art approaches and case studies, DisPredict is found to be a useful tool with competitive performance. DisPredict is available at https://github.com/tamjidul/DisPredict_v1.0
    corecore