317 research outputs found

    CAESAR models for developmental toxicity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The new REACH legislation requires assessment of a large number of chemicals in the European market for several endpoints. Developmental toxicity is one of the most difficult endpoints to assess, on account of the complexity, length and costs of experiments. Following the encouragement of QSAR (<it>in silico</it>) methods provided in the REACH itself, the CAESAR project has developed several models.</p> <p>Results</p> <p>Two QSAR models for developmental toxicity have been developed, using different statistical/mathematical methods. Both models performed well. The first makes a classification based on a random forest algorithm, while the second is based on an adaptive fuzzy partition algorithm. The first model has been implemented and inserted into the CAESAR on-line application, which is java-based software that allows everyone to freely use the models.</p> <p>Conclusions</p> <p>The CAESAR QSAR models have been developed with the aim to minimize false negatives in order to make them more usable for REACH. The CAESAR on-line application ensures that both industry and regulators can easily access and use the developmental toxicity model (as well as the models for the other four endpoints).</p

    PREDICTIVE DIAGNOSIS THROUGH DATA MINING FOR CARDIOVASCULAR DISEASES

    Get PDF
    Abstract Cardiovascular diseases (CVDs) are a leading cause of mortality worldwide, and early detection and accurate diagnosis are critical for effective treatment and prevention. Data mining techniques have emerged as powerful tools for analyzing large datasets to extract meaningful patterns and make predictions. This research paper aims to explore the application of data mining in predictive diagnosis for cardiovascular diseases. The study will start by collecting a comprehensive dataset comprising patient information, including demographics, medical history, lifestyle factors, and diagnostic test results. Various data mining techniques, such as classification, clustering, and association rule mining, will be applied to uncover hidden patterns and relationships within the data. Feature selection methods will be employed to identify the most relevant attributes for accurate prediction. The research will investigate different predictive models, including decision trees, support vector machines, and neural networks, to develop a reliable diagnostic system. Model performance will be evaluated using metrics such as accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC-ROC). Additionally, the study will employ cross-validation techniques to ensure the generalizability and robustness of the developed models. The research will explore the integration of advanced techniques, such as deep learning and ensemble methods, to enhance the predictive accuracy of the diagnosis. The use of explainable AI techniques will also be considered to provide interpretable insights into the predictive models' decision-making process. The findings of this research will contribute to the advancement of predictive diagnosis for cardiovascular diseases by leveraging data mining techniques. The developed diagnostic models will assist healthcare professionals in making accurate and timely predictions, leading to improved patient outcomes, personalized treatment plans, and effective preventive measures

    Application of Multivariate Adaptive Regression Splines (MARSplines) for Predicting Hansen Solubility Parameters Based on 1D and 2D Molecular Descriptors Computed from SMILES String

    Full text link
    A new method of Hansen solubility parameters (HSPs) prediction was developed by combining the multivariate adaptive regression splines (MARSplines) methodology with a simple multivariable regression involving 1D and 2D PaDEL molecular descriptors. In order to adopt the MARSplines approach to QSPR/QSAR problems, several optimization procedures were proposed and tested. The effectiveness of the obtained models was checked via standard QSPR/QSAR internal validation procedures provided by the QSARINS software and by predicting the solubility classification of polymers and drug-like solid solutes in collections of solvents. By utilizing information derived only from SMILES strings, the obtained models allow for computing all of the three Hansen solubility parameters including dispersion, polarization, and hydrogen bonding. Although several descriptors are required for proper parameters estimation, the proposed procedure is simple and straightforward and does not require a molecular geometry optimization. The obtained HSP values are highly correlated with experimental data, and their application for solving solubility problems leads to essentially the same quality as for the original parameters. Based on provided models, it is possible to characterize any solvent and liquid solute for which HSP data are unavailable

    Exploring Patterns of Epigenetic Information With Data Mining Techniques

    Get PDF
    [Abstract] Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of epigenetic data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of data for which data mining is essential. Part of these data may contain patterns of epigenetic information which are mitotically and/or meiotically heritable determining gene expression and cellular differentiation, as well as cellular fate. Epigenetic lesions and genetic mutations are acquired by individuals during their life and accumulate with ageing. Both defects, either together or individually, can result in losing control over cell growth and, thus, causing cancer development. Data mining techniques could be then used to extract the previous patterns. This work reviews some of the most important applications of data mining to epigenetics.Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo; 209RT-0366Galicia. Consellería de Economía e Industria; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/000

    Application of MIA-QSAR in Designing New Protein P38 MAP Kinase Compounds Using a Genetic Algorithm

    Get PDF
    Multivariate image analysis quantitative structure-activity relationship (MIA-QSAR) study aims to obtain information from a descriptor set, which are image pixels of two-dimensional molecule structures. In the QSAR study of protein P38 mitogen-activated protein (MAP) kinase compounds, the genetic algorithm application for pixel selection and image processing is investigated. There is a quantitative relationship between the structure and the pIC50 based on the information obtained. (The pIC50 is the negative logarithm of the half-maximal inhibitory concentration ( IC50 ), so pIC50 = −log IC50 .) Protein P38 MAP kinase inhibitors are used in the treatment of malignant tumors. The development of a model to predict the pIC50 of these compounds was performed in this study. To accomplish this, the molecules were first plotted and fixed in the same coordinates in ChemSketch. Then, the images were processed in the MATLAB program. Partial least squares (PLS) model, orthogonal signal correction partial least squares (OSC-PLS) model, and genetic algorithm partial least squares (GA-PLS) model methods are used to generate quantitative models, and pIC50 prediction is performed. The GA-PLS model has the highest predictive power for a series of statistical parameters such as root mean square error of prediction (RMSEP) and relative standard errors of prediction (RSEP). Finally, the molecular junction (docking) was done for predicted molecules in quantitative structure activity relationship (QSAR) with an appropriate receptor and acceptable results were obtained. These results are good and proper for the prediction of compounds with better properties

    Granular Support Vector Machines Based on Granular Computing, Soft Computing and Statistical Learning

    Get PDF
    With emergence of biomedical informatics, Web intelligence, and E-business, new challenges are coming for knowledge discovery and data mining modeling problems. In this dissertation work, a framework named Granular Support Vector Machines (GSVM) is proposed to systematically and formally combine statistical learning theory, granular computing theory and soft computing theory to address challenging predictive data modeling problems effectively and/or efficiently, with specific focus on binary classification problems. In general, GSVM works in 3 steps. Step 1 is granulation to build a sequence of information granules from the original dataset or from the original feature space. Step 2 is modeling Support Vector Machines (SVM) in some of these information granules when necessary. Finally, step 3 is aggregation to consolidate information in these granules at suitable abstract level. A good granulation method to find suitable granules is crucial for modeling a good GSVM. Under this framework, many different granulation algorithms including the GSVM-CMW (cumulative margin width) algorithm, the GSVM-AR (association rule mining) algorithm, a family of GSVM-RFE (recursive feature elimination) algorithms, the GSVM-DC (data cleaning) algorithm and the GSVM-RU (repetitive undersampling) algorithm are designed for binary classification problems with different characteristics. The empirical studies in biomedical domain and many other application domains demonstrate that the framework is promising. As a preliminary step, this dissertation work will be extended in the future to build a Granular Computing based Predictive Data Modeling framework (GrC-PDM) with which we can create hybrid adaptive intelligent data mining systems for high quality prediction

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P
    corecore