10,264 research outputs found

    Development of models for predicting Torsade de Pointes cardiac arrhythmias using perceptron neural networks

    Full text link
    Blockage of some ion channels and in particular, the hERG cardiac potassium channel delays cardiac repolarization and can induce arrhythmia. In some cases it leads to a potentially life-threatening arrhythmia known as Torsade de Pointes (TdP). Therefore recognizing drugs with TdP risk is essential. Candidate drugs that are determined not to cause cardiac ion channel blockage are more likely to pass successfully through clinical phases II and III trials (and preclinical work) and not be withdrawn even later from the marketplace due to cardiotoxic effects. The objective of the present study is to develop an SAR model that can be used as an early screen for torsadogenic (causing TdP arrhythmias) potential in drug candidates. The method is performed using descriptors comprised of atomic NMR chemical shifts and corresponding interatomic distances which are combined into a 3D abstract space matrix. The method is called 3D-SDAR (3 dimensional spectral data-activity relationship) and can be interrogated to identify molecular features responsible for the activity, which can in turn yield simplified hERG toxicophores. A dataset of 55 hERG potassium channel inhibitors collected from Kramer et al. consisting of 32 drugs with TdP risk and 23 with no TdP risk was used for training the 3D-SDAR model.An ANN model with multilayer perceptron was used to define collinearities among the independent 3D-SDAR features. A composite model from 200 random iterations with 25% of the molecules in each case yielded the following figures of merit: training, 99.2 %; internal test sets, 66.7%; external (blind validation) test set, 68.4%. In the external test set, 70.3% of positive TdP drugs were correctly predicted. Moreover, toxicophores were generated from TdP drugs. A 3D-SDAR was successfully used to build a predictive model for drug-induced torsadogenic and non-torsadogenic drugs.Comment: Accepted for publication in BMC Bioinformatics (Springer) July 201

    Fault diagnosis and comparing risk for the steel coil manufacturing process using statistical models for binary data

    Full text link
    [EN] Advanced statistical models can help industry to design more economical and rational investment plans. Fault detection and diagnosis is an important problem in continuous hot dip galvanizing. Increasingly stringent quality requirements in the automotive industry also require ongoing efforts in process control to make processes more robust. Robust methods for estimating the quality of galvanized steel coils are an important tool for the comprehensive monitoring of the performance of the manufacturing process. This study applies different statistical regression models: generalized linear models, generalized additive models and classification trees to estimate the quality of galvanized steel coils on the basis of short time histories. The data, consisting of 48 galvanized steel coils, was divided into sets of conforming and nonconforming coils. Five variables were selected for monitoring the process: steel strip velocity and four bath temperatures. The present paper reports a comparative evaluation of statistical models for binary data using Receiver Operating Characteristic (ROC) curves. A ROC curve is a graph or a technique for visualizing, organizing and selecting classifiers based on their performance. The purpose of this paper is to examine their use in research to obtain the best model to predict defective steel coil probability. In relation to the work of other authors who only propose goodness of fit statistics, we should highlight one distinctive feature of the methodology presented here, which is the possibility of comparing the different models with ROC graphs which are based on model classification performance. Finally, the results are validated by bootstrap procedures.The authors are indebted to the anonymous referees whose suggestions improved the original manuscript. This work was supported by a grant from PAID-06-08 (Programa de Apoyo a la Investigacion y Desarrollo) of the Universitat Politecnica de Valencia.DebĂłn Aucejo, AM.; GarcĂ­a-DĂ­az, JC. (2012). Fault diagnosis and comparing risk for the steel coil manufacturing process using statistical models for binary data. Reliability Engineering and System Safety. 100:102-114. https://doi.org/10.1016/j.ress.2011.12.022S10211410

    The Search for Extraterrestrial Intelligence (SETI)

    Get PDF
    A bibliography of reports concerning the Search for Extraterrestrial Intelligence is presented. Cosmic evolution, space communication, and technological advances are discussed along with search strategies and search systems

    Identification and Classification of Player Types in Massive Multiplayer Online Games using Avatar Behavior

    Get PDF
    The purpose of our research is to develop an improved methodology for classifying players (identifying deviant players such as terrorists) through multivariate analysis of data from avatar characteristics and behaviors in massive multiplayer online games (MMOGs). To build our classification models, we developed three significant enhancements to the standard Generalized Regression Neural Networks (GRNN) modeling method. The first enhancement is a feature selection technique based on GRNNs, allowing us to tailor our feature set to be best modeled by GRNNs. The second enhancement is a hybrid GRNN which allows each feature to be modeled by a GRNN tailored to its data type. The third enhancement is a spread estimation technique for large data sets that is faster than exhaustive searches, yet more accurate than a standard heuristic. We applied our new techniques to a set of data from the MMOG, Everquest II, to identify deviant players (\u27gold farmers\u27). The identification of gold farmers is similar to labeling terrorists in that the ratio of gold farmer to standard player is extremely small, and the in-game behaviors for a gold farmer have detectable differences from a standard player. Our results were promising given the difficulty of the classification process, primarily the extremely unbalanced data set with a small number of observations from the class of interest. As a screening tool our method identifies a significantly reduced set of avatars and associated players with a much improved probability of containing a number of players displaying deviant behaviors. With further efforts at improving computing efficiencies to allow inclusion of additional features and observations with our framework, we expect even better results

    Optimized Machine Learning Performance with Feature Selection for Breast Cancer Disease Classification

    Get PDF
    The prevalence of breast cancer is relatively high among adults worldwide. Particularly in Indonesia, according to the latest data from the World Health Organization (WHO), breast cancer accounts for 1.41% of all deaths and continues to increase. In order to address this growing issue, a proactive approach becomes essential. Therefore, the objective of this study is to classify the diagnosis of breast cancer into two categories: Benign and Malignant. Moreover, this classification pattern can serve as a benchmark for early detection and is expected to reduce mortality and cancer rates in breast cancer cases. The dataset used in this study is obtained from Kaggle and consists of 569 rows with 32 attributes. Various machine learning algorithms, such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), K-Nearest Neighbors (KNN), and NaĂŻve Bayes (NB), are employed for the classification analysis in this disease. . This study uses Principal Component Analysis (PCA) for optimized feature selection techniques with dimension reduction are employed on the dataset prior to modeling the data. Our highest accuracy model is the Support Vector Machine (SVM) with an RBF kernel, utilizing c-value selection. Additionally, the Logistic Regression (LR) model achieves an accuracy of 97.3%. However, it is worth noting that the precision and recall of the SVM model are both 100%. Moreover, the Receiver Operating Characteristic (ROC) curve indicates that the SVM graph surpasses the LR graph, which can be attributed to the results obtained from the confusion matrix calculation, where the False Positive Rate is found to be 0. Consequently, the overall performance evaluation of the SVM model with an RBF kernel, along with the utilization of the c-value selection approach, is significantly superior. This is primarily due to the fact that the SVM model does not make any incorrect predictions by classifying something as positive when it is actually negative

    Prediction of Concurrent Hypertensive Disorders in Pregnancy and Gestational Diabetes Mellitus Using Machine Learning Techniques

    Get PDF
    Gestational diabetes mellitus and hypertensive disorders in pregnancy are serious maternal health conditions with immediate and lifelong mother-child health consequences. These obstetric pathologies have been widely investigated, but mostly in silos, while studies focusing on their simultaneous occurrence rarely exist. This is especially the case in the machine learning domain. This retrospective study sought to investigate, construct, evaluate, compare, and isolate a supervised machine learning predictive model for the binary classification of co-occurring gestational diabetes mellitus and hypertensive disorders in pregnancy in a cohort of otherwise healthy pregnant women. To accomplish the stated aims, this study analyzed an extract (n=4624, n_features=38) of a labelled maternal perinatal dataset (n=9967, n_fields=79) collected by the PeriData.Net® database from a participating community hospital in Southeast Wisconsin between 2013 and 2018. The datasets were named, “WiseSample” and “WiseSubset” respectively in this study. Thirty-three models were constructed with the six supervised machine learning algorithms explored on the extracted dataset: logistic regression, random forest, decision tree, support vector machine, StackingClassifier, and KerasClassifier, which is a deep learning classification algorithm; all were evaluated using the StratifiedKfold cross-validation (k=10) method. The Synthetic Minority Oversampling Technique was applied to the training data to resolve the class imbalance that was noted in the sub-sample at the preprocessing phase. A wide range of evidence-based feature selection techniques were used to identify the best predictors of the comorbidity under investigation. Multiple model performance evaluation metrics that were employed to quantitatively evaluate and compare model performance quality include accuracy, F1, precision, recall, and the area under the receiver operating characteristic curve. Support Vector Machine objectively emerged as the most generalizable model for identifying the gravidae in WiseSubset who may develop concurrent gestational diabetes mellitus and hypertensive disorders in pregnancy, scoring 100.00% (mean) in recall. The model consisted of 9 predictors extracted by the recursive feature elimination with cross-validation with random forest. Finding from this study show that appropriate machine learning methods can reliably predict comorbid gestational diabetes and hypertensive disorders in pregnancy, using readily available routine prenatal attributes. Six of the nine most predictive factors of the comorbidity were also in the top 6 selections of at least one other feature selection method examined. The six predictors are healthy weight prepregnancy BMI, mother’s educational status, husband’s educational status, husband’s occupation in one year before the current pregnancy, mother’s blood group, and mother’s age range between 34 and 44 years. Insight from this analysis would support clinical decision making of obstetric experts when they are caring for 1.) nulliparous women, since they would have no obstetric history that could prompt their care providers for feto-maternal medical surveillance; and 2.) the experienced mothers with no obstetric history suggestive of any of the disease(s) under this study. Hence, among other benefits, the artificial-intelligence-backed tool designed in this research would likely improve maternal and child care quality outcomes

    Classification of Missing Youths Cases using Support Vector Machines

    Get PDF
    The purpose of this study is to provide the Saskatoon Police Service (SPS) with a set of predictive models for intervention and risk reduction applied to the missing youths (MY) database in a graphical user interface (GUI)
    • …
    corecore