15,672 research outputs found
Statistical methods for analysis and correction of high-throughput screening data
Durant le criblage à haut débit (High-throughput screening, HTS), la première étape dans la découverte de médicaments, le niveau d'activité de milliers de composés chimiques est mesuré afin d'identifier parmi eux les candidats potentiels pour devenir futurs médicaments (i.e., hits). Un grand nombre de facteurs environnementaux et procéduraux peut affecter négativement le processus de criblage en introduisant des erreurs systématiques dans les mesures obtenues. Les erreurs systématiques ont le potentiel de modifier de manière significative les résultats de la sélection des hits, produisant ainsi un grand nombre de faux positifs et de faux négatifs. Des méthodes de correction des données HTS ont été développées afin de modifier les données reçues du criblage et compenser pour l'effet négatif que les erreurs systématiques ont sur ces données (Heyse 2002, Brideau et al. 2003, Heuer et al. 2005, Kevorkov and Makarenkov 2005, Makarenkov et al. 2006, Malo et al. 2006, Makarenkov et al. 2007). Dans cette thèse, nous évaluons d'abord l'applicabilité de plusieurs méthodes statistiques servant à détecter la présence d'erreurs systématiques dans les données HTS expérimentales, incluant le x2 goodness-of-fit test, le t-test et le test de Kolmogorov-Smirnov précédé par la méthode de Transformation de Fourier. Nous montrons premièrement que la détection d'erreurs systématiques dans les données HTS brutes est réalisable, de même qu'il est également possible de déterminer l'emplacement exact (lignes, colonnes et plateau) des erreurs systématiques de l'essai. Nous recommandons d'utiliser une version spécialisée du t-test pour détecter l'erreur systématique avant la sélection de hits afin de déterminer si une correction d'erreur est nécessaire ou non. Typiquement, les erreurs systématiques affectent seulement quelques lignes ou colonnes, sur certains, mais pas sur tous les plateaux de l'essai. Toutes les méthodes de correction d'erreur existantes ont été conçues pour modifier toutes les données du plateau sur lequel elles sont appliquées et, dans certains cas, même toutes les données de l'essai. Ainsi, lorsqu'elles sont appliquées, les méthodes existantes modifient non seulement les mesures expérimentales biaisées par l'erreur systématique, mais aussi de nombreuses données correctes. Dans ce contexte, nous proposons deux nouvelles méthodes de correction d'erreur systématique performantes qui sont conçues pour modifier seulement des lignes et des colonnes sélectionnées d'un plateau donné, i.e., celles où la présence d'une erreur systématique a été confirmée. Après la correction, les mesures corrigées restent comparables avec les valeurs non modifiées du plateau donné et celles de tout l'essai. Les deux nouvelles méthodes s'appuient sur les résultats d'un test de détection d'erreur pour déterminer quelles lignes et colonnes de chaque plateau de l'essai doivent être corrigées. Une procédure générale pour la correction des données de criblage à haut débit a aussi été suggérée. Les méthodes actuelles de sélection des hits en criblage à haut débit ne permettent généralement pas d'évaluer la fiabilité des résultats obtenus. Dans cette thèse, nous décrivons une méthodologie permettant d'estimer la probabilité de chaque composé chimique d'être un hit dans le cas où l'essai contient plus qu'un seul réplicat. En utilisant la nouvelle méthodologie, nous définissons une nouvelle procédure de sélection de hits basée sur la probabilité qui permet d'estimer un niveau de confiance caractérisant chaque hit. En plus, de nouvelles mesures servant à estimer des taux de changement de faux positifs et de faux négatifs, en fonction du nombre de réplications de l'essai, ont été proposées. En outre, nous étudions la possibilité de définir des modèles statistiques précis pour la prédiction informatique des mesures HTS. Remarquons que le processus de criblage expérimental est très coûteux. Un criblage virtuel, in silico, pourrait mener à une baisse importante de coûts. Nous nous sommes concentrés sur la recherche de relations entre les mesures HTS expérimentales et un groupe de descripteurs chimiques caractérisant les composés chimiques considérés. Nous avons effectué l'analyse de redondance polynomiale (Polynomial Redundancy Analysis) pour prouver l'existence de ces relations. En même temps, nous avons appliqué deux méthodes d'apprentissage machine, réseaux de neurones et arbres de décision, pour tester leur capacité de prédiction des résultats de criblage expérimentaux.\ud
______________________________________________________________________________ \ud
MOTS-CLÉS DE L’AUTEUR : criblage à haut débit (HTS), modélisation statistique, modélisation prédictive, erreur systématique, méthodes de correction d'erreur, méthodes d'apprentissage automatiqu
Resolving transition metal chemical space: feature selection for machine learning and structure-property relationships
Machine learning (ML) of quantum mechanical properties shows promise for
accelerating chemical discovery. For transition metal chemistry where accurate
calculations are computationally costly and available training data sets are
small, the molecular representation becomes a critical ingredient in ML model
predictive accuracy. We introduce a series of revised autocorrelation functions
(RACs) that encode relationships between the heuristic atomic properties (e.g.,
size, connectivity, and electronegativity) on a molecular graph. We alter the
starting point, scope, and nature of the quantities evaluated in standard ACs
to make these RACs amenable to inorganic chemistry. On an organic molecule set,
we first demonstrate superior standard AC performance to other
presently-available topological descriptors for ML model training, with mean
unsigned errors (MUEs) for atomization energies on set-aside test molecules as
low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs
on set-aside test molecules in spin-state splitting in comparison to 15-20x
higher errors from feature sets that encode whole-molecule structural
information. Systematic feature selection methods including univariate
filtering, recursive feature elimination, and direct optimization (e.g., random
forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5x
smaller than RAC-155 produce sub- to 1-kcal/mol spin-splitting MUEs, with good
transferability to metal-ligand bond length prediction (0.004-5 {\AA} MUE) and
redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature
selection results across property sets reveals the relative importance of
local, electronic descriptors (e.g., electronegativity, atomic number) in
spin-splitting and distal, steric effects in redox potential and bond lengths.Comment: 43 double spaced pages, 11 figures, 4 table
Recommended from our members
Discovery of high-entropy ceramics via machine learning
AbstractAlthough high-entropy materials are attracting considerable interest due to a combination of useful properties and promising applications, predicting their formation remains a hindrance for rational discovery of new systems. Experimental approaches are based on physical intuition and/or expensive trial and error strategies. Most computational methods rely on the availability of sufficient experimental data and computational power. Machine learning (ML) applied to materials science can accelerate development and reduce costs. In this study, we propose an ML method, leveraging thermodynamic and compositional attributes of a given material for predicting the synthesizability (i.e., entropy-forming ability) of disordered metal carbides. The relative importance of the thermodynamic and compositional features for the predictions are then explored. The approach’s suitability is demonstrated by comparing values calculated with density functional theory to ML predictions. Finally, the model is employed to predict the entropy-forming ability of 70 new compositions; several predictions are validated by additional density functional theory calculations and experimental synthesis, corroborating the effectiveness in exploring vast compositional spaces in a high-throughput manner. Importantly, seven compositions are selected specifically, because they contain all three of the Group VI elements (Cr, Mo, and W), which do not form room temperature-stable rock-salt monocarbides. Incorporating the Group VI elements into the rock-salt structure provides further opportunity for tuning the electronic structure and potentially material performance
Computational characterization and prediction of metal-organic framework properties
In this introductory review, we give an overview of the computational
chemistry methods commonly used in the field of metal-organic frameworks
(MOFs), to describe or predict the structures themselves and characterize their
various properties, either at the quantum chemical level or through classical
molecular simulation. We discuss the methods for the prediction of crystal
structures, geometrical properties and large-scale screening of hypothetical
MOFs, as well as their thermal and mechanical properties. A separate section
deals with the simulation of adsorption of fluids and fluid mixtures in MOFs
Robust, reproducible, industrialized, standard membrane feeding assay for assessing the transmission blocking activity of vaccines and drugs against Plasmodium falciparum.
BackgroundA vaccine that interrupts malaria transmission (VIMT) would be a valuable tool for malaria control and elimination. One VIMT approach is to identify sexual erythrocytic and mosquito stage antigens of the malaria parasite that induce immune responses targeted at disrupting parasite development in the mosquito. The standard Plasmodium falciparum membrane-feeding assay (SMFA) is used to assess transmission-blocking activity (TBA) of antibodies against candidate immunogens and of drugs targeting the mosquito stages. To develop its P. falciparum sporozoite (SPZ) products, Sanaria has industrialized the production of P. falciparum-infected Anopheles stephensi mosquitoes, incorporating quantitative analyses of oocyst and P. falciparum SPZ infections as part of the manufacturing process.MethodsThese capabilities were exploited to develop a robust, reliable, consistent SMFA that was used to assess 188 serum samples from animals immunized with the candidate vaccine immunogen, Pfs25, targeting P. falciparum mosquito stages. Seventy-four independent SMFAs were performed. Infection intensity (number of oocysts/mosquito) and infection prevalence (percentage of mosquitoes infected with oocysts) were compared between mosquitoes fed cultured gametocytes plus normal human O(+) serum (negative control), anti-Pfs25 polyclonal antisera (MRA39 or MRA38, at a final dilution in the blood meal of 1:54 as positive control), and test sera from animals immunized with Pfs25 (at a final dilution in the blood meal of 1:9).ResultsSMFA negative controls consistently yielded high infection intensity (mean = 46.1 oocysts/midgut, range of positives 3.7-135.6) and infection prevalence (mean = 94.2%, range 71.4-100.0) and in positive controls, infection intensity was reduced by 81.6% (anti-Pfs25 MRA39) and 97.0% (anti-Pfs25 MRA38), and infection prevalence was reduced by 12.9 and 63.5%, respectively. A range of TBAs was detected among the 188 test samples assayed in duplicate. Consistent administration of infectious gametocytes to mosquitoes within and between assays was achieved, and the TBA of anti-Pfs25 control antibodies was highly reproducible.ConclusionsThese results demonstrate a robust capacity to perform the SMFA in a medium-to-high throughput format, suitable for assessing large numbers of experimental samples of candidate antibodies or drugs
Predicting yellow rust in wheat breeding trials by proximal phenotyping and machine learning
Background High-throughput plant phenotyping (HTPP) methods have the potential to speed up the crop breeding process through the development of cost-effective, rapid and scalable phenotyping methods amenable to automation. Crop disease resistance breeding stands to benefit from successful implementation of HTPP methods, as bypassing the bottleneck posed by traditional visual phenotyping of disease, enables the screening of larger and more diverse populations for novel sources of resistance. The aim of this study was to use HTPP data obtained through proximal phenotyping to predict yellow rust scores in a large winter wheat field trial. Results The results show that 40-42 spectral vegetation indices (SVIs) derived from spectroradiometer data are sufficient to predict yellow rust scores using Random Forest (RF) modelling. The SVIs were selected through RF-based recursive feature elimination (RFE), and the predicted scores in the resulting models had a prediction accuracy of r(s) = 0.50-0.61 when measuring the correlation between predicted and observed scores. Some of the most important spectral features for prediction were the Plant Senescence Reflectance Index (PSRI), Photochemical Reflectance Index (PRI), Red-Green Pigment Index (RGI), and Greenness Index (GI). Conclusions The proposed HTPP method of combining SVI data from spectral sensors in RF models, has the potential to be deployed in wheat breeding trials to score yellow rust
An automated high-throughput system for phenotypic screening of chemical libraries on C. elegans and parasitic nematodes
Parasitic nematodes infect hundreds of millions of people and farmed livestock. Further, plant parasitic nematodes result in major crop damage. The pipeline of therapeutic compounds is limited and parasite resistance to the existing anthelmintic compounds is a global threat. We have developed an INVertebrate Automated Phenotyping Platform (INVAPP) for high-throughput, plate-based chemical screening, and an algorithm (Paragon) which allows screening for compounds that have an effect on motility and development of parasitic worms. We have validated its utility by determining the efficacy of a panel of known anthelmintics against model and parasitic nematodes: Caenorhabditis elegans, Haemonchus contortus, Teladorsagia circumcincta, and Trichuris muris. We then applied the system to screen the Pathogen Box chemical library in a blinded fashion and identified compounds already known to have anthelmintic or anti-parasitic activity, including tolfenpyrad, auranofin, and mebendazole; and 14 compounds previously undescribed as anthelmintics, including benzoxaborole and isoxazole chemotypes. This system offers an effective, high-throughput system for the discovery of novel anthelmintics
- …