10,264 research outputs found
Development of models for predicting Torsade de Pointes cardiac arrhythmias using perceptron neural networks
Blockage of some ion channels and in particular, the hERG cardiac potassium
channel delays cardiac repolarization and can induce arrhythmia. In some cases
it leads to a potentially life-threatening arrhythmia known as Torsade de
Pointes (TdP). Therefore recognizing drugs with TdP risk is essential.
Candidate drugs that are determined not to cause cardiac ion channel blockage
are more likely to pass successfully through clinical phases II and III trials
(and preclinical work) and not be withdrawn even later from the marketplace due
to cardiotoxic effects. The objective of the present study is to develop an SAR
model that can be used as an early screen for torsadogenic (causing TdP
arrhythmias) potential in drug candidates. The method is performed using
descriptors comprised of atomic NMR chemical shifts and corresponding
interatomic distances which are combined into a 3D abstract space matrix. The
method is called 3D-SDAR (3 dimensional spectral data-activity relationship)
and can be interrogated to identify molecular features responsible for the
activity, which can in turn yield simplified hERG toxicophores. A dataset of 55
hERG potassium channel inhibitors collected from Kramer et al. consisting of 32
drugs with TdP risk and 23 with no TdP risk was used for training the 3D-SDAR
model.An ANN model with multilayer perceptron was used to define collinearities
among the independent 3D-SDAR features. A composite model from 200 random
iterations with 25% of the molecules in each case yielded the following figures
of merit: training, 99.2 %; internal test sets, 66.7%; external (blind
validation) test set, 68.4%. In the external test set, 70.3% of positive TdP
drugs were correctly predicted. Moreover, toxicophores were generated from TdP
drugs. A 3D-SDAR was successfully used to build a predictive model for
drug-induced torsadogenic and non-torsadogenic drugs.Comment: Accepted for publication in BMC Bioinformatics (Springer) July 201
Fault diagnosis and comparing risk for the steel coil manufacturing process using statistical models for binary data
[EN] Advanced statistical models can help industry to design more economical and rational investment
plans. Fault detection and diagnosis is an important problem in continuous hot dip galvanizing.
Increasingly stringent quality requirements in the automotive industry also require ongoing efforts
in process control to make processes more robust. Robust methods for estimating the quality of
galvanized steel coils are an important tool for the comprehensive monitoring of the performance of the
manufacturing process. This study applies different statistical regression models: generalized linear
models, generalized additive models and classification trees to estimate the quality of galvanized steel
coils on the basis of short time histories. The data, consisting of 48 galvanized steel coils, was divided
into sets of conforming and nonconforming coils. Five variables were selected for monitoring the
process: steel strip velocity and four bath temperatures.
The present paper reports a comparative evaluation of statistical models for binary data using
Receiver Operating Characteristic (ROC) curves. A ROC curve is a graph or a technique for visualizing,
organizing and selecting classifiers based on their performance. The purpose of this paper is to examine
their use in research to obtain the best model to predict defective steel coil probability. In relation to
the work of other authors who only propose goodness of fit statistics, we should highlight one distinctive
feature of the methodology presented here, which is the possibility of comparing the different models
with ROC graphs which are based on model classification performance. Finally, the results are validated
by bootstrap procedures.The authors are indebted to the anonymous referees whose suggestions improved the original manuscript. This work was supported by a grant from PAID-06-08 (Programa de Apoyo a la Investigacion y Desarrollo) of the Universitat Politecnica de Valencia.DebĂłn Aucejo, AM.; GarcĂa-DĂaz, JC. (2012). Fault diagnosis and comparing risk for the steel coil manufacturing process using statistical models for binary data. Reliability Engineering and System Safety. 100:102-114. https://doi.org/10.1016/j.ress.2011.12.022S10211410
The Search for Extraterrestrial Intelligence (SETI)
A bibliography of reports concerning the Search for Extraterrestrial Intelligence is presented. Cosmic evolution, space communication, and technological advances are discussed along with search strategies and search systems
Identification and Classification of Player Types in Massive Multiplayer Online Games using Avatar Behavior
The purpose of our research is to develop an improved methodology for classifying players (identifying deviant players such as terrorists) through multivariate analysis of data from avatar characteristics and behaviors in massive multiplayer online games (MMOGs). To build our classification models, we developed three significant enhancements to the standard Generalized Regression Neural Networks (GRNN) modeling method. The first enhancement is a feature selection technique based on GRNNs, allowing us to tailor our feature set to be best modeled by GRNNs. The second enhancement is a hybrid GRNN which allows each feature to be modeled by a GRNN tailored to its data type. The third enhancement is a spread estimation technique for large data sets that is faster than exhaustive searches, yet more accurate than a standard heuristic. We applied our new techniques to a set of data from the MMOG, Everquest II, to identify deviant players (\u27gold farmers\u27). The identification of gold farmers is similar to labeling terrorists in that the ratio of gold farmer to standard player is extremely small, and the in-game behaviors for a gold farmer have detectable differences from a standard player. Our results were promising given the difficulty of the classification process, primarily the extremely unbalanced data set with a small number of observations from the class of interest. As a screening tool our method identifies a significantly reduced set of avatars and associated players with a much improved probability of containing a number of players displaying deviant behaviors. With further efforts at improving computing efficiencies to allow inclusion of additional features and observations with our framework, we expect even better results
Optimized Machine Learning Performance with Feature Selection for Breast Cancer Disease Classification
The prevalence of breast cancer is relatively high among adults worldwide. Particularly in Indonesia, according to the latest data from the World Health Organization (WHO), breast cancer accounts for 1.41% of all deaths and continues to increase. In order to address this growing issue, a proactive approach becomes essential. Therefore, the objective of this study is to classify the diagnosis of breast cancer into two categories: Benign and Malignant. Moreover, this classification pattern can serve as a benchmark for early detection and is expected to reduce mortality and cancer rates in breast cancer cases. The dataset used in this study is obtained from Kaggle and consists of 569 rows with 32 attributes. Various machine learning algorithms, such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), K-Nearest Neighbors (KNN), and NaĂŻve Bayes (NB), are employed for the classification analysis in this disease. . This study uses Principal Component Analysis (PCA) for optimized feature selection techniques with dimension reduction are employed on the dataset prior to modeling the data. Our highest accuracy model is the Support Vector Machine (SVM) with an RBF kernel, utilizing c-value selection. Additionally, the Logistic Regression (LR) model achieves an accuracy of 97.3%. However, it is worth noting that the precision and recall of the SVM model are both 100%. Moreover, the Receiver Operating Characteristic (ROC) curve indicates that the SVM graph surpasses the LR graph, which can be attributed to the results obtained from the confusion matrix calculation, where the False Positive Rate is found to be 0. Consequently, the overall performance evaluation of the SVM model with an RBF kernel, along with the utilization of the c-value selection approach, is significantly superior. This is primarily due to the fact that the SVM model does not make any incorrect predictions by classifying something as positive when it is actually negative
Prediction of Concurrent Hypertensive Disorders in Pregnancy and Gestational Diabetes Mellitus Using Machine Learning Techniques
Gestational diabetes mellitus and hypertensive disorders in pregnancy are serious maternal health conditions with immediate and lifelong mother-child health consequences. These obstetric pathologies have been widely investigated, but mostly in silos, while studies focusing on their simultaneous occurrence rarely exist. This is especially the case in the machine learning domain. This retrospective study sought to investigate, construct, evaluate, compare, and isolate a supervised machine learning predictive model for the binary classification of co-occurring gestational diabetes mellitus and hypertensive disorders in pregnancy in a cohort of otherwise healthy pregnant women. To accomplish the stated aims, this study analyzed an extract (n=4624, n_features=38) of a labelled maternal perinatal dataset (n=9967, n_fields=79) collected by the PeriData.Net® database from a participating community hospital in Southeast Wisconsin between 2013 and 2018. The datasets were named, “WiseSample” and “WiseSubset” respectively in this study. Thirty-three models were constructed with the six supervised machine learning algorithms explored on the extracted dataset: logistic regression, random forest, decision tree, support vector machine, StackingClassifier, and KerasClassifier, which is a deep learning classification algorithm; all were evaluated using the StratifiedKfold cross-validation (k=10) method. The Synthetic Minority Oversampling Technique was applied to the training data to resolve the class imbalance that was noted in the sub-sample at the preprocessing phase. A wide range of evidence-based feature selection techniques were used to identify the best predictors of the comorbidity under investigation. Multiple model performance evaluation metrics that were employed to quantitatively evaluate and compare model performance quality include accuracy, F1, precision, recall, and the area under the receiver operating characteristic curve. Support Vector Machine objectively emerged as the most generalizable model for identifying the gravidae in WiseSubset who may develop concurrent gestational diabetes mellitus and hypertensive disorders in pregnancy, scoring 100.00% (mean) in recall. The model consisted of 9 predictors extracted by the recursive feature elimination with cross-validation with random forest. Finding from this study show that appropriate machine learning methods can reliably predict comorbid gestational diabetes and hypertensive disorders in pregnancy, using readily available routine prenatal attributes. Six of the nine most predictive factors of the comorbidity were also in the top 6 selections of at least one other feature selection method examined. The six predictors are healthy weight prepregnancy BMI, mother’s educational status, husband’s educational status, husband’s occupation in one year before the current pregnancy, mother’s blood group, and mother’s age range between 34 and 44 years. Insight from this analysis would support clinical decision making of obstetric experts when they are caring for 1.) nulliparous women, since they would have no obstetric history that could prompt their care providers for feto-maternal medical surveillance; and 2.) the experienced mothers with no obstetric history suggestive of any of the disease(s) under this study. Hence, among other benefits, the artificial-intelligence-backed tool designed in this research would likely improve maternal and child care quality outcomes
Classification of Missing Youths Cases using Support Vector Machines
The purpose of this study is to provide the Saskatoon Police Service (SPS) with
a set of predictive models for intervention and risk reduction applied to the missing youths (MY) database
in a graphical user interface (GUI)
- …