98 research outputs found

    Data mining of magnetocardiograms for prediction of ischemic heart disease

    Get PDF
    Ischemic Heart Disease (IHD) is a major cause of death. Early and accurate detection of IHD along with rapid diagnosis are important for reducing the mortality rate. Magnetocardiogram (MCG) is a tool for detecting electro-physiological activity of the myocardium. MCG is a fully non-contact method, which avoids the problems of skin-electrode contact in the Electrocardiogram (ECG) method. However, the interpretation of MCG recordings is time-consuming and requires analysis by an expert. Therefore, we propose the use of machine learning for identification of IHD patients. Back-propagation neural network (BPNN), the Bayesian neural network (BNN), the probabilistic neural network (PNN) and the support vector machine (SVM) were applied to develop classification models for identifying IHD patients. MCG data was acquired by sequential measurement, above the torso, of the magnetic field emitted by the myocardium using a J-T interval of 125 cases. The training and validation data of 74 cases employed 10-fold cross-validation methods to optimize support vector machine and neural network parameters. The predictive performance was assessed on the testing data of 51 cases using the following metrics: accuracy, sensitivity, and specificity and area under the receiver operating characteristic (ROC) curve. The results demonstrated that both BPNN and BNN displayed the highest and the same level of accuracy at 78.43 %. Furthermore, the decision threshold and the area under the ROC curve was -0.2774 and 0.9059, respectively, for BPNN and 0.0470 and 0.8495, respectively, for BNN. This indicated that BPNN was the best classification model, BNN was the best performing model with sensitivity of 96.65 %, and SVM employing the radial basis function kernel displayed the highest specificity of 86.36 %

    Prediction of aromatase inhibitory activity using the efficient linear method (ELM)

    Get PDF
    Aromatase inhibition is an effective treatment strategy for breast cancer. Currently, several in silico methods have been developed for the prediction of aromatase inhibitors (AIs) using artificial neural network (ANN) or support vector machine (SVM). In spite of this, there are ample opportunities for further improvements by developing a simple and interpretable quantitative structure-activity relationship (QSAR) method. Herein, an efficient linear method (ELM) is proposed for constructing a highly predictive QSAR model containing a spontaneous feature importance estimator. Briefly, ELM is a linear-based model with optimal parameters derived from genetic algorithm. Results showed that the simple ELM method displayed robust performance with 10-fold cross-validation MCC values of 0.64 and 0.56 for steroidal and non-steroidal AIs, respectively. Comparative analyses with other machine learning methods (i.e. ANN, SVM and decision tree) were also performed. A thorough analysis of informative molecular descriptors for both steroidal and non-steroidal AIs provided insights into the mechanism of action of compounds. Our findings suggest that the shape and polarizability of compounds may govern the inhibitory activity of both steroidal and non-steroidal types whereas the terminal primary C(sp3) functional group and electronegativity may be required for non-steroidal AIs. The R code of the ELM method is available at http://dx.doi.org/10.6084/m9.figshare.1274030

    PyBact

    Get PDF
    PyBact is a software written in Python for bacterial identification. The code simulates the predefined behavior of bacterial species by generating a simulated data set based on the frequency table of biochemical tests from diagnostic microbiology textbook. The generated data was used for predictive model construction by machine learning approaches and results indicated that the classifiers could accurately predict its respective bacterial class with accuracy in excess of 99 %

    Data mining for the identification of metabolic syndrome status

    Get PDF
    Metabolic syndrome (MS) is a condition associated with metabolic abnormalities that are characterized by central obesity (e.g. waist circumference or body mass index), hypertension (e.g. systolic or diastolic blood pressure), hyperglycemia (e.g. fasting plasma glucose) and dyslipidemia (e.g. triglyceride and high-density lipoprotein cholesterol). It is also associated with the development of diabetes mellitus (DM) type 2 and cardiovascular disease (CVD). Therefore, the rapid identification of MS is required to prevent the occurrence of such diseases. Herein, we review the utilization of data mining approaches for MS identification. Furthermore, the concept of quantitative population-health relationship (QPHR) is also presented, which can be defined as the elucidation/ understanding of the relationship that exists between health parameters and health status. The QPHR modeling uses data mining techniques such as artificial neural network (ANN), support vector machine (SVM), principal component analysis (PCA), decision tree (DT), random forest (RF) and association analysis (AA) for modeling and construction of predictive models for MS characterization. The DT method has been found to outperform other data mining techniques in the identification of MS status. Moreover, the AA technique has proved useful in the discovery of in-depth as well as frequently occurring health parameters that can be used for revealing the rules of MS development. This review presents the potential benefits on the applications of data mining as a rapid identification tool for classifying MS

    Classification of P-glycoprotein-interacting compounds using machine learning methods

    Get PDF
    P-glycoprotein (Pgp) is a drug transporter that plays important roles in multidrug resistance and drug pharmacokinetics. The inhibition of Pgp has become a notable strategy for combating multidrug-resistant cancers and improving therapeutic outcomes. However, the polyspecific nature of Pgp, together with inconsistent results in experimental assays, renders the determination of endpoints for Pgp-interacting compounds a great challenge. In this study, the classification of a large set of 2,477 Pgp-interacting compounds (i.e., 1341 inhibitors, 913 noninhibitors, 197 substrates and 26 non-substrates) was performed using several machine learning methods (i.e., decision tree induction, artificial neural network modelling and support vector machine) as a function of their physicochemical properties. The models provided good predictive performance, producing MCC values in the range of 0.739-1 for internal cross-validation and 0.665-1 for external validation. The study provided simple and interpretable models for important properties that influence the activity of Pgp-interacting compounds, which are potentially beneficial for screening and rational design of Pgp inhibitors that are of clinical importance

    Probing the origins of aromatase inhibitory activity of disubstituted coumarins via QSAR and molecular docking

    Get PDF
    This study investigated the quantitative structure-activity relationship (QSAR) of imidazole derivatives of 4,7-disubstituted coumarins as inhibitors of aromatase, a potential therapeutic protein target for the treatment of breast cancer. Herein, a series of 3,7- and 4,7-disubstituted coumarin derivatives (1-34) with R1 and R2 substituents bearing aromatase inhibitory activity were modeled as a function of molecular and quantum chemical descriptors derived from low-energy conformer geometrically optimized at B3LYP/6-31G(d) level of theory. Insights on origins of aromatase inhibitory activity was afforded by the computed set of 7 descriptors comprising of F10[N-O], Inflammat-50, Psychotic-80, H-047, BELe1, B10[C-O] and MAXDP. Such significant descriptors were used for QSAR model construction and results indicated that model 4 afforded the best statistical performance. Good predictive performance were achieved as verified from the internal (comprising the training and the leave-one-out cross-validation (LOO-CV) sets) and external sets affording the following statistical parameters: R2Tr = 0.9576 and RMSETr = 0.0958 for the training set; Q2CV = 0.9239 and RMSECV = 0.1304 for the LOO-CV set as well as Q2Ext = 0.7268 and RMSEExt = 0.2927 for the external set. Significant descriptors showed correlation with functional substituents, particularly, R1 in governing high potency as aromatase inhibitor. Molecular docking calculations suggest that key residues interacting with the coumarins were predominantly lipophilic or non-polar while a few were polar and positively-charged. Findings illuminated herein serve as the impetus that can be used to rationally guide the design of new aromatase inhibitors

    Quantitative population-health relationship (QPHR) for assessing metabolic syndrome

    Get PDF
    Background: Metabolic syndrome (MS) is a condition that predisposes individuals to the development of cardiovascular diseases and type 2 diabetes mellitus. Methods: A cross-sectional investigation of 15,365 participants residing in metropolitan Bangkok who had received an annual health checkup in 2007 was used in this study. Individuals were classified as MS or non-MS according to the International Diabetes Federation criteria using BMI cutoff of ≥ 25 kg/m2 plus two or more MS components. This study explores the utility of quantitative population-health relationship (QPHR) for predicting MS status as well as discovers variables that frequently occur together. The former was achieved by decision tree (DT) analysis, artificial neural network (ANN), support vector machine (SVM) and principal component analysis (PCA) while the latter was obtained by association analysis(AA). Results: DT outperformed both ANN and SVM in MS classification as deduced from its accuracy value of 99 % as compared to accuracies of 98 % and 91 % for ANN and SVM, respectively. Furthermore, PCA was able to effectively classify individuals as MS and non-MS as observed from the scores plot. Moreover, AA was employed to analyze individuals with MS in order to elucidate pertinent rule from MS components that occur frequently together, which included TG+BP, BP+FPG and TG+FPG where TG, BP and FPG corresponds to triglyceride, blood pressure and fasting plasma glucose, respectively. Conclusion: QPHR was demonstrated to be useful in predicting the MS status of individuals from an urban Thai population. Rules obtained from AA analysis provided general guidelines (i.e. co-occurrences of TG, BP and FPG) that may be used in the prevention of MS in at risk individuals

    A practical overview of quantitative structure-activity relationship

    Get PDF
    Quantitative structure-activity relationship (QSAR) modeling pertains to the construction of predictive models of biological activities as a function of structural and molecular information of a compound library. The concept of QSAR has typically been used for drug discovery and development and has gained wide applicability for correlating molecular information with not only biological activities but also with other physicochemical properties, which has therefore been termed quantitative structure-property relationship (QSPR). Typical molecular parameters that are used to account for electronic properties, hydrophobicity, steric effects, and topology can be determined empirically through experimentation or theoretically via computational chemistry. A given compilation of data sets is then subjected to data preprocessing and data modeling through the use of statistical and/or machine learning techniques. This review aims to cover the essential concepts and techniques that are relevant for performing QSAR/QSPR studies through the use of selected examples from our previous work

    Synthesis and computational investigation of molecularly imprinted nanospheres for selective recognition of alpha-tocopherol succinate

    Get PDF
    Molecularly imprinted polymers (MIPs) are macromolecular matrices that can mimic the functional properties of antibodies, receptors and enzymes while possessing higher durability. As such, these polymers are interesting materials for applications in biomimetic sensor, drug synthesis, drug delivery and separation. In this study, we prepared MIPs and molecularly imprinted nanospheres (MINs) as receptors with specific recognition properties toward tocopherol succinate (TPS) in comparison to tocopherol (TP) and tocopherol nicotinate (TPN). MIPs were synthesized using methacrylic acid (MAA) as functional monomer, ethylene glycol dimethacrylate (EGDMA) as crosslinking agent and dichloromethane or acetronitrile as porogenic solvent under thermal-induced polymerization condition. Results indicated that imprinted polymers of TPS-MIP, TP-MIP and TPN-MIP all bound specifically to their template molecules at 2 folds greater than the non-imprinted polymers. The calculated binding capacity of all MIP was approximately 2 mg per gram of polymer when using the optimal rebinding solvent EtOH:H2O (3:2, v/v). Furthermore, the MINs toward TPS and TP were prepared by precipitation polymerization that yielded particles that are 200-400 nm in size. The binding capacities of MINs to their templates were greater than that of the non-imprinted nanospheres when using the optimal rebinding solvent EtOH:H2O (4:1, v/v). Computer simulation was performed to provide mechanistic insights on the binding modalities of template-monomer complexes. In conclusion, we had successful prepared MIPs and MINs for binding specifically to TP and TPS. Such MIPs and MINs have great potential for industrial and medical applications, particularly for the selective separation of TP and TPS

    QSAR-driven rational design of novel DNA methyltransferase 1 inhibitors

    Get PDF
    DNA methylation, an epigenetic modification, is mediated by DNA methyltransferases (DNMTs), a family of enzymes. Inhibitions of these enzymes are considered a promising strategy for the treatment of several diseases. In this study, a quantitative structure-activity relationship (QSAR) modeling was employed to understand the structure-activity relationship (SAR) of currently available non-nucleoside DNMT1 inhibitors (i.e., indole and oxazoline/1,2-oxazole scaffolds). Two QSAR models were successfully constructed using multiple linear regression (MLR) and provided good predictive performance (R2Tr = 0.850-0.988 and R2CV = 0.672-0.869). Bond information content index (BIC1) and electronegativity (R6e+) are the most influential descriptors governing the activity of compounds. The constructed QSAR models were further applied for guiding a rational design of novel inhibitors. A novel set of 153 structurally modified compounds were designed in silico according to the important descriptors deduced from the QSAR finding, and their DNMT1 inhibitory activities were predicted. This result demonstrated that 86 newly designed inhibitors were predicted to elicit enhanced DNMT1 inhibitory activity when compared to their parent compounds. Finally, a set of promising compounds as potent DNMT1 inhibitors were highlighted to be further developed. The key SAR findings may also be beneficial for structural optimization to improve properties of the known inhibitors
    corecore