126 research outputs found

    A novel approach to forecast urban surface-level ozone considering heterogeneous locations and limited information

    Get PDF
    Surface ozone (O3) is considered an hazard to human health, affecting vegetation crops and ecosystems. Accurate time and location O3 forecasting can help to protect citizens to unhealthy exposures when high levels are expected. Usually, forecasting models use numerous O3 precursors as predictors, limiting the reproducibility of these models to the availability of such information from data providers. This study introduces a 24 h-ahead hourly O3 concentrations forecasting methodology based on bagging and ensemble learning, using just two predictors with lagged O3 concentrations. This methodology was applied on ten-year time series (2006–2015) from three major urban areas of Andalusia (Spain). Its forecasting performance was contrasted with an algorithm especially designed to forecast time series exhibiting temporal patterns. The proposed methodology outperforms the contrast algorithm and yields comparable results to others existing in literature. Its use is encouraged due to its forecasting performance and wide applicability, but also as benchmark methodology

    IoT-based platform for automated IEQ spatio-temporal analysis in buildings using machine learning techniques

    Get PDF
    Financiaciado para publicación en acceso aberto: Universidade de Vigo/CISUGProviding accurate information about the indoor environmental quality (IEQ) conditions inside building spaces is essential to assess the comfort levels of their occupants. These values may vary inside the same space, especially for large zones, requiring many sensors to produce a fine-grained representation of the space conditions, which increases hardware installation and maintenance costs. However, sound interpolation techniques may produce accurate values with fewer input points, reducing the number of sensors needed. This work presents a platform to automate this accurate IEQ representation based on a few sensor devices placed across a large building space. A case study is presented in a research centre in Spain using 8 wall-mounted devices and an additional moving device to train a machine learning model. The system yields accurate results for estimations at positions and times never seen before by the trained model, with relative errors between 4% and 10% for the analysed variables.Ministerio de Ciencia, Innovación y Universidades | Ref. RTI2018-096296-B-C2Ministerio de Ciencia, Innovación y Universidades | Ref. FPU17/ 01834Ministerio de Ciencia, Innovación y Universidades | Ref. FPU19/01187Universidad de Vigo | Ref. 00VI 131H 641.0

    Air pollution relevance analysis in the bay of Algeciras (Spain)

    Get PDF
    The aim of this work is to accomplish an in-depth analysis of the air pollution in the two main cities of the Bay of Algeciras (Spain). A large database of air pollutant concentrations and weather measurements were collected using a monitoring network installed throughout the region from the period of 2010-2015. The concentration parameters contain nitrogen dioxide (NO2), sulphur dioxide (SO2) and particulate matter (PM10). The analysis was developed in two monitoring stations (Algeciras and La Linea). The higher average concentration values were obtained in Algeciras for NO2 (28.850 mu g/m(3)) and SO2 (11.966 mu g/m(3)), and in La Linea for PM10 (30.745 mu g/m(3)). The analysis shows patterns that coincide with human activity. One of the goals of this work is to develop a useful virtual sensor capable of achieving a more robust monitoring network, which can be used, for instance, in the case of missing data. By means of trends analysis, groups of equivalent stations were determined, implying that the values of one station could be substituted for those in the equivalent station in case of failure (e.g., SO2 weekly trends in Algeciras and Los Barrios show equivalence). On the other hand, a calculation of relative risks was developed showing that relative humidity, wind speed and wind direction produce an increase in the risk of higher pollutant concentrations. Besides, obtained results showed that wind speed and wind direction are the most important variables in the distribution of particles. The results obtained may allow administrations or citizens to support decisions

    Artificial Neural Networks, Sequence-to-Sequence LSTMs, and Exogenous Variables as Analytical Tools for NO2 (Air Pollution) Forecasting: A Case Study in the Bay of Algeciras (Spain)

    Get PDF
    This study aims to produce accurate predictions of the NO2 concentrations at a specific station of a monitoring network located in the Bay of Algeciras (Spain). Artificial neural networks (ANNs) and sequence-to-sequence long short-term memory networks (LSTMs) were used to create the forecasting models. Additionally, a new prediction method was proposed combining LSTMs using a rolling window scheme with a cross-validation procedure for time series (LSTM-CVT). Two different strategies were followed regarding the input variables: using NO2 from the station or employing NO2 and other pollutants data from any station of the network plus meteorological variables. The ANN and LSTM-CVT exogenous models used lagged datasets of different window sizes. Several feature ranking methods were used to select the top lagged variables and include them in the final exogenous datasets. Prediction horizons of t + 1, t + 4 and t + 8 were employed. The exogenous variables inclusion enhanced the model's performance, especially for t + 4 (rho approximate to 0.68 to rho approximate to 0.74) and t + 8 (rho approximate to 0.59 to rho approximate to 0.66). The proposed LSTM-CVT method delivered promising results as the best performing models per prediction horizon employed this new methodology. Additionally, per each parameter combination, it obtained lower error values than ANNs in 85% of the cases

    Molecular simulations on proteins of biomedical interest : A. Ligand-protein hydration B. Cytochrome P450 2D6 and 2C9 C. Myelin associated glycoprotein (MAG)

    Get PDF
    TOPIC 1: Water molecules mediating polar interactions in ligand–protein complexes contribute to both binding affinity and specificity. To account for such water molecules in computer-aided drug discovery, we performed an extensive search in the Cambridge Structural Database (CSD) to identify the geometrical criteria defining interactions of water molecules with ligand and protein. In addition, ab initio calculations were used to derive the propensity of ligand hydration. Based on these information we developed an algorithm (AcquaAlta) to reproduce water molecules bridging polar interactions between ligand and protein moieties. This approach was validated using 20 crystal structures and yielded a match of 76% between experimental and calculated water positions. The solvation algorithm was then applied to the docking of oligopeptides to the periplasmic oligopeptide binding protein A (OppA), supported by a pharmacophore-based alignment tool. TOPIC 2: Drug metabolism, toxicity, and interaction profile are major issues in the drug discovery and lead optimization processes. The Cytochromes P450 (CYPs) 2D6 and 2C9 are enzymes involved in the oxidative metabolism of a majority of the marketed drugs. By identifying the binding mode using pharmacophore pre-alignement and automated flexible docking, and quantifying the binding affinity by multi-dimensional QSAR, we validated a model family of 56 compounds (46 training, 10 test) and 85 (68 training, 17 test) for CYP2D6 and CYP2C9, respectively. The correlation with the experimental data (cross- validated r2 = 0.811 for CYP2D6 and 0.687 for CYP2C9) suggests that our approach is suited for predicting the binding affinity of compounds towards the CYP2D6 and CYP2C9. The models were challenged by Y-scrambling, and by testing an external dataset of binding compounds (15 compounds for CYP2D6 and 40 for CYP2C9) and not binding compounds (64 compounds for CYP2D6 and 56 for CYP2C9). TOPIC 3: After injury, neurites from mammalian adult central nervous systems are inhibited to regenerate by inhibitory proteins such as the myelin-associated glycoprotein (MAG). The block of MAG with potent glycomimetic antagonists could be a fruitful approach to enhance axon regeneration. Libraries of MAG antagonists were derived and synthesized starting from the (general) sialic acid moiety. The binding data were rationalized by docking studies, molecular dynamics simulations and free energy perturbations on a homology model of MAG. The pharmacokinetic profile (i.e. stability in cerebrospinal fluid, logD, and blood-brain barrier permeation) of these compounds has been thoroughly investigated to evaluate the drug-likeness of the identified antagonists

    Analysis of the effect of maritime traffic on the estimation of air quality in a port-city

    Get PDF
    Predecir la calidad del aire es una tarea muy importante ya que se sabe que tiene un impacto significativo en la salud. La Bahía de Algeciras (España) es una zona altamente industrializada con uno de los mayores puertos de Europa. Durante el periodo 2010-2019, se registraron diferentes datos en las estaciones de monitorización de la Bahía, formando una base de datos de 131 variables de concentraciones de contaminantes atmosféricos, de información meteorológica y datos de tonelada-buque de barcos en la Bahía de Algeciras. En esta tesis se desarrolló un análisis en tres fases. La primera fase consistió en un diagnóstico de la calidad del aire, la segunda en una estimación-predicción de la calidad del aire y la tercera en una predicción de la calidad del aire. En la primera fase, el primer paso fue desarrollar un análisis en profundidad realizado durante los años 2010 a 2015 del alcance de la contaminación atmosférica en las dos principales ciudades y más pobladas de la Bahía de Algeciras (Algeciras y La Línea). Es una zona donde coexisten varios grandes focos de contaminación atmosférica como varias industrias químicas y carreteras. Además, el puerto de Algeciras es uno de los más importantes de Europa, y la ciudad de La Línea está afectada por el aeropuerto de Gibraltar, lo que contribuye al aumento de la contaminación en el Estrecho de Gibraltar. Se ha desarrollado un completo análisis estadístico para conocer aspectos relevantes de la contaminación atmosférica en este escenario concreto. Por un lado, los valores medios de concentración más elevados se obtuvieron en Algeciras para NO2 (28,850 μg/m3 ) y en La Línea para SO2 (11,966 μg/m3 ) y PM10 (30,745 μg/m3 ). El análisis mostró patrones que coinciden con la actividad humana. Por otra parte, se desarrolló un cálculo de riesgos relativos que mostró que la humedad relativa, la velocidad del viento y la dirección del viento producen un aumento del riesgo de mayores concentraciones de contaminantes. Además, los resultados obtenidos mostraron que la velocidad y la dirección del viento son las variables más importantes en la distribución de partículas. En la segunda fase, el objetivo era obtener predicciones fiables de las concentraciones de contaminantes relacionados con el tráfico marítimo (SO2, PM10, NO2, NOX y, NO) durante los años 2017 a 2019. Se analizaron tres escenarios diferentes de la Bahía de Algeciras que fueron las localizaciones del Parque de los Alcornocales, así como las ciudades de La Línea y Algeciras. Estos escenarios permitieron comparar los resultados entre ellos. El objetivo fue predecir los niveles futuros de calidad del aire de los principales contaminantes relacionados con el tráfico marítimo en la Bahía de Algeciras en función del resto de contaminantes, las xvii variables meteorológicas y, una base de datos de buques. Se diseñó un procedimiento de remuestreo aleatorio utilizando un procedimiento de 5-CV (5-validación cruzada) y 20 repeticiones en cada experimento para asegurar la capacidad de generalización de los modelos probados, para calcular las predicciones de contaminantes con diferentes modelos de clasificación (incluyendo árboles de decisión, máquinas de soporte vectorial, ensembles, entre otros) y también con redes neuronales artificiales (RNAs) utilizando diferentes números de capas y unidades ocultas. El procedimiento propuesto de remuestreo aleatorio permite realizar comparaciones múltiples adecuadas y robustas de los modelos y permitió seleccionar un grupo de los mejores modelos de predicción. Los modelos se compararon utilizando varios índices de calidad de la clasificación, como la sensibilidad, la especificidad, la exactitud y la precisión. También se utilizó la distancia (d1) al clasificador perfecto (1, 1, 1, 1) como característica discriminante, para seleccionar los mejores modelos. Además, se realizó un análisis de relevancia para conocer qué variables son las más importantes para cada contaminante y se diseñaron modelos con menor número de entradas en una red de monitorización más óptima. Las sensibilidades obtenidas en los mejores modelos fueron de 0,98 para SO2, 0,97 para PM10, 0,82 para NO2 y NOX, y 0,83 para NO. Los resultados obtenidos demuestran el potencial de los modelos para predecir la contaminación en una ciudad portuaria o en un escenario de contaminación atmosférica complejo. En la tercera fase, se predijeron los datos disponibles durante los años 2017 a 2019 en la estación de Algeciras utilizando modelos de Long-Short Term Memory (LSTM). Se desarrollaron cuatro enfoques diferentes para realizar previsiones de SO2 y NO2 a 1 hora y a 4 horas en Algeciras. El primero utiliza las 130 variables exógenas restantes. El segundo utiliza únicamente los datos de series temporales sin variables exógenas. El tercer enfoque consiste en utilizar un arreglo de series temporales autorregresivas como entrada y el cuarto es similar utilizando las series temporales junto con datos de viento y barcos. Los resultados mostraron que el SO2 se predice mejor con información autorregresiva y el NO2 se predice mejor con series temporales autorregresivas de barcos y viento, lo que indica que el NO2 está estrechamente relacionado con los motores de combustión y puede predecirse mejor. Uno de los objetivos de esta tesis doctoral es desarrollar un sensor virtual útil capaz de conseguir una red de vigilancia más robusta, que pueda utilizarse en caso de datos faltantes. Además, puede servir como sistema de apoyo a la toma de decisiones por parte de las autoridades, que podría ser utilizado por los ciudadanos para prevenir la exposición a contaminantes y por las empresas para tomar decisiones sobre la calidad del aire.Predicting air quality is a very important task as it is known to have a significant impact on health. The Bay of Algeciras (Spain) is a highly industrialised area with one of the largest super-ports in Europe. During the period 2010-2019, different data were recorded in the monitoring stations of the Bay, forming a database of 131 variables (air pollutants, meteorological information, and vessel data). A three stage analysis was developed in this Thesis. The first stage was an air quality diagnosis, the second stage was an estimation-prediction of the air quality, and the third stage was an air quality forecasting. In the first stage, the first step was to develop an in-depth analysis accomplished during the years 2010 to 2015 of the scope of pollution in the two main cities of the Bay of Algeciras (Algeciras and La Línea). Surrounded by several industries and roads, this area is highly polluted. Besides, Algeciras port is one of the most important ports in Europe, and La Línea city is affected by Gibraltar airport, which contributes to the increase of air pollution in The Strait of Gibraltar. A complete statistical analysis was developed in order to gain knowledge about relevant aspects of air pollution in this particular scenario. On the one hand, the higher average concentration values were obtained in Algeciras for NO2 (28.850 μg/m3 ) and in La Línea for SO2 (11.966 μg/m3 ) and PM10 (30.745 μg/m3 ). The analysis showed patterns that coincide with human activity. On the other hand, a calculation of relative risks was developed showing that relative humidity, wind speed and wind direction produce an increase in the risk of higher pollutant concentrations. Besides, obtained results showed that wind speed and wind direction are the most important variables in the distribution of particles. In the second stage, the objective was to obtain reliable predictions of pollutant concentrations related to maritime traffic (SO2, PM10, NO2, NOX and, NO) during the years 2017 to 2019. Three scenarios were analysed which involve the locations of Alcornocales Park, as well as the cities of La Línea and Algeciras. These scenarios allowed us to compare the results. The objective was to predict future air quality levels of the principal maritime traffic-related pollutants in the Bay of Algeciras as a function of the rest of the pollutants, the meteorological variables and, a vessel database. A randomised resampling procedure was designed using a 5-fold cross-validation procedure and 20 repetitions in each experiment to ensure the generalisation capabilities of the tested models to compute the pollutant xix predictions with different classification models (including Trees, Ensembles, SVM, among others) and also with artificial neural networks (ANNs) using different numbers of hidden layers and units. The proposed procedure of random resampling permits adequate and robust multiple comparisons of the tested models and allowed us to select a group of best prediction models. The models were compared using several quality classification indexes such as sensitivity, specificity, accuracy, and precision. The distance (d1) to the perfect classifier (1, 1, 1, 1) was also used as a discriminant feature, which allowed us to select the best models. Furthermore, a relevance analysis was performed in order to know which variables are the most relevant to each pollutant and to design models with less number of inputs in a more optimal monitoring network. These models seem to be the best in a number of scenarios. The obtained sensibilities in the best models were 0.98 for SO2, 0.97 for PM10, 0.82 for NO2 and NOX, and 0.83 for NO. The obtained results demonstrate the potential of the models to forecast air pollution in a port city or a complex scenario. The third stage, data available during the years 2017 to 2019 were predicted in the Algeciras station using Long-Short Term Memory (LSTM) models. Four different approaches have been developed to make SO2 and NO2 forecasts 1h and 4h in Algeciras. The first uses the remaining 130 exogenous variables. The second uses only the time series data without exogenous variables. The third approach consists of using an autoregressive time series arrangement as input and the fourth one is similar using the time series together with wind and ship data. The results showed that SO2 is better predicted with autoregressive information and NO2 is better predicted with ships and wind autoregressive time series, indicating that NO2 is closely related to combustion engines and can be better predicted. One of the goals of this Thesis is to develop a useful virtual sensor capable of achieving a more robust monitoring network, which can be used, for instance, in the case of missing data. Besides, it can serve as a decision support system for authorities, that could be used by citizens to prevent exposure to pollutants and companies to make air quality decisions.233 página

    Molecular modelling of thymidylate synthase and rational design of its inhibitors as novel anticancer drugs

    Get PDF
    In search of novel anticancer drugs, putative inhibitors of the enzyme thymidylate synthase were investigated. The dissertation presents several steps of computationally aided drug design. Two targets are described: active site of the enzyme, for competitive inhibitors, and an allosteric pocket at the dimer interface. The potential hits were selected by computational high-throughput screening (molecular docking calculations) of available drug and prodrug databases. The selected compounds were then modified and scored further to indicate potential leads. Molecular dynamics simulations were performed for selected putative inhibitors of thymidylate synthase, both competitive and allosteric, in order to assess their dynamical behaviour, binding properties and arrangement of the ligands, and to select lead compounds for further tests in vitro. Moreover, a library of peptoids is described, created with the aim to design novel compounds with the desired peptide-like properties. Furthermore, quantum mechanics calculations were conducted to aid the synthesis and investigation of novel enzyme inhibitors, including boron containing compounds.W poszukiwaniu leków przeciwnowotworowych nowej generacji badano potencjalne inhibitory enzymu syntazy tymidylanowej. Opisano szereg etapów komputerowo wspomaganego projektowania leków. Wybrano dwa miejsca docelowe dla poszukiwanych inhibitorów: kieszeń aktywną enzymu oraz kieszeń allosteryczną między podjednostkami białka. Potencjalnie obiecujące związki wybrano w drodze wysokowydajnej procedury przesiewania (przy zastosowaniu metod dokowania molekularnego) dostępnych baz danych leków i proleków, a następnie modyfikację i dalszą selekcję wyników dokowania. Dla wybranych potencjalnych inhibitorów syntazy tymidylanowej, zarówno kompetycyjnych, jak i allosterycznych, przeprowadzono symulacje metodą dynamiki molekularnej w celu oceny dynamiki układu, parametrów wiązania i ułożenia ligandów, jak również wskazania wiodących związków do dalszych badań in vitro. Ponadto opisano bibliotekę peptoidów, stworzoną w celu projektowania nowej generacji związków o pożądanych właściwościach peptydomimetycznych. Wykonano również obliczenia metodami mechaniki kwantowej mające na celu wspomaganie badań i syntezy nowych inhibitorów enzymów, w tym związków zawierających bor

    Development of Computational Methods to Predict Protein Pocket Druggability and Profile Ligands using Structural Data

    Get PDF
    This thesis presents the development of computational methods and tools using as input three-dimensional structures data of protein-ligand complexes. The tools are useful to mine, profile and predict data from protein-ligand complexes to improve the modeling and the understanding of the protein-ligand recognition. This thesis is divided into five sub-projects. In addition, unpublished results about positioning water molecules in binding pockets are also presented. I developed a statistical model, PockDrug, which combines three properties (hydrophobicity, geometry and aromaticity) to predict the druggability of protein pockets, with results that are not dependent on the pocket estimation methods. The performance of pockets estimated on apo or holo proteins is better than that previously reported in the literature (Publication I). PockDrug is made available through a web server, PockDrug-Server (http://pockdrug.rpbs.univ-paris-diderot.fr), which additionally includes many tools for protein pocket analysis and characterization (Publication II). I developed a customizable computational workflow based on the superimposition of homologous proteins to mine the structural replacements of functional groups in the Protein Data Bank (PDB). Applied to phosphate groups, we identified a surprisingly high number of phosphate non-polar replacements as well as some mechanisms allowing positively charged replacements. In addition, we observed that ligands adopted a U-shape conformation at nucleotide binding pockets across phylogenetically unrelated proteins (Publication III). I investigated the prevalence of salt bridges at protein-ligand complexes in the PDB for five basic functional groups. The prevalence ranges from around 70% for guanidinium to 16% for tertiary ammonium cations, in this latter case appearing to be connected to a smaller volume available for interacting groups. In the absence of strong carboxylate-mediated salt bridges, the environment around the basic functional groups studied appeared enriched in functional groups with acidic properties such as hydroxyl, phenol groups or water molecules (Publication IV). I developed a tool that allows the analysis of binding poses obtained by docking. The tool compares a set of docked ligands to a reference bound ligand (may be different molecule) and provides a graphic output that plots the shape overlap and a Jaccard score based on comparison of molecular interaction fingerprints. The tool was applied to analyse the docking poses of active ligands at the orexin-1 and orexin-2 receptors found as a result of a combined virtual and experimental screen (Publication V). The review of literature focusses on protein-ligand recognition, presenting different concepts and current challenges in drug discovery.Tässä väitöskirjassa esitetään tietokoneavusteisia menetelmiä ja työkaluja, jotka perustuvat proteiini-ligandikompleksien kolmiulotteisiin rakenteisiin. Ne soveltuvat proteiini-ligandikompleksien rakennetiedon louhimiseen, optimointiin ja ennustamiseen. Tavoitteena on parantaa sekä mallinnusta että käsitystä proteiini-liganditunnistuksesta. Väitöskirjassa työkalut kuvataan viitenä eri alahankkeena. Lisäksi esitetään toistaiseksi julkaisemattomia tuloksia vesimolekyylien asemoinnista proteiinien sitoutumistaskuihin. Kehitin PockDrugiksi kutsumani tilastollisen mallin, joka yhdistää kolme ominaisuutta – hydrofobisuuden, geometrian ja aromaattisuuden – proteiinitaskujen lääkekehityskohteeksi soveltuvuuden ennustamista varten siten, että tulokset ovat riippumattomia sitoutumistaskun sijoitusmenetelmästä. Apo- ja holoproteiinien taskujen ennustaminen toimii paremmin kuin alan kirjallisuudessa on aiemmin kuvattu (Julkaisu I). PockDrug on vapaasti käyttäjien saatavilla PockDrug-verkkopalvelimelta (http://pockdrug.rpbs.univ-paris-diderot.fr), jossa on lisäksi useita työkaluja proteiinin sitoutumiskohdan analyysiin ja karakterisointiin (Julkaisu II). Kehitin myös muokattavissa olevan tietokoneavusteisen prosessin, joka perustuu samankaltaisten proteiinien päällekkäin asetteluun, louhiakseni Protein Data Bankista (PDB) toiminnallisten ryhmien rakenteellisia korvikkeita. Tätä fosfaattiryhmiin soveltaessani tunnistin yllättävän paljon poolittomia fosfaattiryhmän korvikkeita ja joitakin positiivisesti varautuneita korvikkeita mahdollistavia mekanismeja. Lisäksi havaitsin, että ligandit omaksuivat U muotoisen konformaation fylogeneettisesti riippumattomien proteiinien nukleotidien sitoutumistaskuissa (Julkaisu III). Tutkin PDB:n proteiini-ligandikompleksien suolasiltojen yleisyyttä viidelle emäksiselle toiminnalliselle ryhmälle. Suolasiltojen yleisyys vaihteli guanidinium-ionin 70 prosentista tertiääristen ammoniumkationien 16 prosenttiin. Jälkimmäisessä tapauksessa suolasiltojen vähäisyys vaikuttaa riippuvan siitä, että vuorovaikuttaville ryhmille on vähemmän tilaa. Mikäli tarkastellut emäksiset ryhmät eivät osallistuneet vahvoihin karboksylaattivälitteisiin suolasiltoihin, niiden ympäristössä vaikutti olevan runsaasti happamia toiminnallisia ryhmiä, kuten hydroksi- ja fenoliryhmiä sekä vesimolekyylejä (Julkaisu IV). Lopuksi kehitin työkalun, joka mahdollistaa telakoinnista saatujen sitoutumisasentojen analyysin. Työkalu vertaa telakoitua ligandisarjaa sitoutuneeseen vertailuligandiin, joka voi olla eri molekyyli. Graafisena tulosteena saadaan diagrammi ligandien muotojen samankaltaisuudesta ja molekyylivuorovaikutusten sormenjälkiin perustuvasta Jaccard-pistemäärästä. Työkalua sovellettiin oreksiini-1- ja oreksiini-2-reseptoreille yhdistetyllä virtuaalisella ja kokeellisella seulonnalla löydettyjen aktiivisten ligandien sitoutumisasentojen analyysiin (Julkaisu V).Cette thèse présente le développement de méthodes et d’outils informatiques basés sur la structure tridimensionnelle des complexes protéine-ligand. Ces différentes méthodes sont utilisées pour extraire, optimiser et prédire des données à partir de la structure des complexes afin d’améliorer la modélisation et la compréhension de la reconnaissance entre une protéine et un ligand. Ce travail de thèse est divisé en cinq projets. En complément, une étude sur le positionnement des molécules d’eau dans les sites de liaisons a aussi été développée et est présentée. Dans une première partie un modèle statistique, PockDrug, a été mis en place. Il combine trois propriétés de poches protéiques (l’hydrophobicité, la géométrie et l’aromaticité) pour prédire la druggabilité des poches protéiques, si une poche protéique peut lier une molécule drug-like. Le modèle est optimisé pour s’affranchir des différentes méthodes d’estimation de poches protéiques. La qualité des prédictions, est meilleure à la fois sur des poches estimées à partir de protéines apo et holo et est supérieure aux autres modèles de la littérature (Publication I). Le modèle PockDrug est disponible sur un serveur web, PockDrug-Server (http://pockdrug.rpbs.univ-paris-diderot.fr) qui inclus d’autres outils pour l’analyse et la caractérisation des poches protéiques. Dans un second temps un protocole, basé sur la superposition de protéines homologues a été développé pour extraire des replacements structuraux de groupements chimiques fonctionnels à partir de la Protein Data Bank (PDB). Appliqué aux phosphates, un grand nombre de remplacements non-polaires ont été identifié pouvant notamment être chargés positivement. Quelques mécanismes de remplacements ont ainsi pu être analysé. Nous avons, par exemple, observé que le ligand adopte une configuration en forme U dans les sites de liaison des nucléotides indépendamment de la phylogénétique des protéines (Publication III). Dans une quatrième partie, la prévalence des ponts salins de cinq groupements chimiques basiques a été étudié dans les complexes protéine-ligand. Ainsi le pourcentage de pont salin fluctue de 70% pour le guanidinium à 16% pour l’amine tertiaire qui a le plus faible volume disponible autour de lui pour accueillir un group pouvant interagir. L’absence d’acide fort comme l’acide carboxylique pour former un pont salin est remplacé par un milieu enrichis en groupement chimiques fonctionnels avec des propriétés acides comme l’hydroxyle, le phénol ou encore les molécules d’eau (Publication IV). Dans un dernier temps un outil permettant l’analyse des poses de ligand obtenues par une méthode d’ancrage moléculaire a été développé. Cet outil compare ces poses à un ligand de référence, qui peut être une molécule différente en combinant l’information du chevauchement de forme de la pose et du ligand de référence et un score de Jaccard basé sur une comparaison des empreintes d’interaction moléculaires du ligand de référence et de la pose. Cette méthode a été utilisé dans l’analyse des résultats d’ancrage moléculaires pour des ligands actifs pour les récepteurs aux orexine 1 et 2. Ces ligands actifs ont été trouvés à partir de résultats combinant un criblage virtuel et expérimental. La revue de la littérature associée est focalisée sur la reconnaissance moléculaire d’un ligand pour une protéine et présente diffèrent concepts et challenges pour la recherche de nouveaux médicaments
    corecore