530,102 research outputs found
Machine Learning the Dimension of a Polytope
We use machine learning to predict the dimension of a lattice polytope
directly from its Ehrhart series. This is highly effective, achieving almost
100% accuracy. We also use machine learning to recover the volume of a lattice
polytope from its Ehrhart series, and to recover the dimension, volume, and
quasi-period of a rational polytope from its Ehrhart series. In each case we
achieve very high accuracy, and we propose mathematical explanations for why
this should be so.Comment: 13 pages, 7 figure
Machine learning based prediction models in male reproductive health: Development of a proof-of-concept model for Klinefelter Syndrome in azoospermic patients
Background Due to the highly variable clinical phenotype, Klinefelter Syndrome is underdiagnosed. Objective Assessment of supervised machine learning based prediction models for identification of Klinefelter Syndrome among azoospermic patients, and comparison to expert clinical evaluation. Materials and methods Retrospective patient data (karyotype, age, height, weight, testis volume, follicle-stimulating hormone, luteinizing hormone, testosterone, estradiol, prolactin, semen pH and semen volume) collected between January 2005 and June 2019 were retrieved from a patient data bank of a University Centre. Models were trained, validated and benchmarked based on different supervised machine learning algorithms. Models were then tested on an independent, prospectively acquired set of patient data (between July 2019 and July 2020). Benchmarking against physicians was performed in addition. Results Based on average performance, support vector machines and CatBoost were particularly well-suited models, with 100% sensitivity and >93% specificity on the test dataset. Compared to a group of 18 expert clinicians, the machine learning models had significantly better median sensitivity (100% vs. 87.5%, p = 0.0455) and fared comparably with regards to specificity (90% vs. 89.9%, p = 0.4795), thereby possibly improving diagnosis rate. A Klinefelter Syndrome Score Calculator based on the prediction models is available on . Discussion Differentiating Klinefelter Syndrome patients from azoospermic patients with normal karyotype (46,XY) is a problem that can be solved with supervised machine learning techniques, improving patient care. Conclusions Machine learning could improve the diagnostic rate of Klinefelter Syndrome among azoospermic patients, even more for less-experienced physicians
Recommended from our members
Exploratory analysis using machine learning to predict for chest wall pain in patients with stage I non-small-cell lung cancer treated with stereotactic body radiation therapy.
Background and purposeChest wall toxicity is observed after stereotactic body radiation therapy (SBRT) for peripherally located lung tumors. We utilize machine learning algorithms to identify toxicity predictors to develop dose-volume constraints.Materials and methodsTwenty-five patient, tumor, and dosimetric features were recorded for 197 consecutive patients with Stage I NSCLC treated with SBRT, 11 of whom (5.6%) developed CTCAEv4 grade â„2 chest wall pain. Decision tree modeling was used to determine chest wall syndrome (CWS) thresholds for individual features. Significant features were determined using independent multivariate methods. These methods incorporate out-of-bag estimation using Random forests (RF) and bootstrapping (100 iterations) using decision trees.ResultsUnivariate analysis identified rib dose to 1 cc < 4000 cGy (P = 0.01), chest wall dose to 30 cc < 1900 cGy (P = 0.035), rib Dmax < 5100 cGy (P = 0.05) and lung dose to 1000 cc < 70 cGy (P = 0.039) to be statistically significant thresholds for avoiding CWS. Subsequent multivariate analysis confirmed the importance of rib dose to 1 cc, chest wall dose to 30 cc, and rib Dmax. Using learning-curve experiments, the dataset proved to be self-consistent and provides a realistic model for CWS analysis.ConclusionsUsing machine learning algorithms in this first of its kind study, we identify robust features and cutoffs predictive for the rare clinical event of CWS. Additional data in planned subsequent multicenter studies will help increase the accuracy of multivariate analysis
PREDIKSI PENYAKIT STROKE MENGGUNAKAN SUPPORT VECTOR MACHINE (SVM)
Berdasarkan data dari Kementerian Kesehatan Indonesia, telah terjadi peningkatan jumlah pada kasuspenyakit stroke sebesar 3.9% mulai dari tahun 2013 sampai dengan tahun 2018. Secara nasional, jumlahkasus stroke sering terjadi pada kelompok yang memiliki rentang umur antara 55-64 tahun dan palingsedikit terjadi pada rentang umur 15-24. Stroke atau (Cerebrovascular Accidents) merupakan sebuahkeadaan dimana aliran darah ke otak mengalami gangguan mendadak atau berkurang. Hal tersebutdapat disebabkan oleh penyumbatan atau pecah pembuluh darah, sehingga sel-sel pada area otak tidakmendapatkan pasokan darah yang nutrisi dan oksigen. Diperlukan deteksi dini yang bertujuan untukmengurangi jumlah potensi kematian akibat stroke. Prediksi stroke masih menjadi tantang dalam bidangkedokteran, salah satu penyebabnya adalah volume data pada data medis yang memiliki heterogenitasdan kompleksitas yang tinggi. Teknik machine learning merupakan model analisis data yang dapatdigunakan untuk memprediksi penyakit stroke. Berbagai model pembelajaran machine learning telahdiusulkan oleh peneliti-peneliti sebelumnya, salah satunya Support Vector Machine. Penelitian inimencoba menerapkan kembali algoritma SVM dengan mendapatkan hasil kinerja lebih baik daripenelitian sebelumnya. Dalam penelitian ini didapatkan nilai accuracy sebesar 100% dan nilai ROC-AUC sebesar 100%. Perlu dilakukan pengkajian lagi terkait hasil yang didapatkan hingga mencapai100%
Predicting the Probability of Cargo Theft for Individual Cases in Railway Transport
In the heavy industry, the value of cargo transported by rail is very high. Due to high value, poor security and volume of rail transport, the theft cases are often. The main problem of securing rail transport is predicting the location of a high probability of risk. Because of this, the aim of the presented research was to predict the highest probability of rail cargo theft for areas. It is important to prevent theft cases by better securing the railway lines. To solve that problem the authors\u27 model was developed. The model uses information about past transport cases for the learning process of Artificial Neural Networks (ANN) and Machine Learning (ML).The ANN predicted the probability for 94.7% of the cases of theft and the Machine Learning identified 100% of the cases. This method can be used to develop a support system for securing the rail infrastructure
Hunting for open clusters in \textit{Gaia} DR2: the Galactic anticentre
The Gaia Data Release 2 (DR2) provided an unprecedented volume of precise
astrometric and excellent photometric data. In terms of data mining the Gaia
catalogue, machine learning methods have shown to be a powerful tool, for
instance in the search for unknown stellar structures. Particularly, supervised
and unsupervised learning methods combined together significantly improves the
detection rate of open clusters. We systematically scan Gaia DR2 in a region
covering the Galactic anticentre and the Perseus arm and
, with the goal of finding any open clusters that may
exist in this region, and fine tuning a previously proposed methodology
successfully applied to TGAS data, adapting it to different density regions.
Our methodology uses an unsupervised, density-based, clustering algorithm,
DBSCAN, that identifies overdensities in the five-dimensional astrometric
parameter space that may correspond
to physical clusters. The overdensities are separated into physical clusters
(open clusters) or random statistical clusters using an artificial neural
network to recognise the isochrone pattern that open clusters show in a colour
magnitude diagram. The method is able to recover more than 75% of the open
clusters confirmed in the search area. Moreover, we detected 53 open clusters
unknown previous to Gaia DR2, which represents an increase of more than 22%
with respect to the already catalogued clusters in this region. We find that
the census of nearby open clusters is not complete. Different machine learning
methodologies for a blind search of open clusters are complementary to each
other; no single method is able to detect 100% of the existing groups. Our
methodology has shown to be a reliable tool for the automatic detection of open
clusters, designed to be applied to the full Gaia DR2 catalogue.Comment: 8 pages, accepted by Astronomy and Astrophysics (A&A) the 14th May,
2019. Tables 1 and 2 available at the CD
Performance Evaluation of Apache Spark MLlib Algorithms on an Intrusion Detection Dataset
The increase in the use of the Internet and web services and the advent of
the fifth generation of cellular network technology (5G) along with
ever-growing Internet of Things (IoT) data traffic will grow global internet
usage. To ensure the security of future networks, machine learning-based
intrusion detection and prevention systems (IDPS) must be implemented to detect
new attacks, and big data parallel processing tools can be used to handle a
huge collection of training data in these systems. In this paper Apache Spark,
a general-purpose and fast cluster computing platform is used for processing
and training a large volume of network traffic feature data. In this work, the
most important features of the CSE-CIC-IDS2018 dataset are used for
constructing machine learning models and then the most popular machine learning
approaches, namely Logistic Regression, Support Vector Machine (SVM), three
different Decision Tree Classifiers, and Naive Bayes algorithm are used to
train the model using up to eight number of worker nodes. Our Spark cluster
contains seven machines acting as worker nodes and one machine is configured as
both a master and a worker. We use the CSE-CIC-IDS2018 dataset to evaluate the
overall performance of these algorithms on Botnet attacks and distributed
hyperparameter tuning is used to find the best single decision tree parameters.
We have achieved up to 100% accuracy using selected features by the learning
method in our experimentsComment: Journal of Computing and Security (Isfahan University, Iran), Vol. 9,
No.1, 202
Machine-learning-based radiomics identifies atrial fibrillation on the epicardial fat in contrast-enhanced and non-enhanced chest CT
Objective: The purpose is to establish and validate a machine-learning-derived radiomics approach to deter-mine the existence of atrial fibrillation (AF) by analyzing epicardial adipose tissue (EAT) in CT images. Methods: Patients with AF based on electrocardio-graphic tracing who underwent contrast-enhanced (n = 200) or non-enhanced (n = 300) chest CT scans were analyzed retrospectively. After EAT segmentation and radiomics feature extraction, the segmented EAT yielded 1691 radiomics features. The most contributive features to AF were selected by the Boruta algorithm and machine-learning-based random forest algorithm, and combined to construct a radiomics signature (EAT-score). Multivariate logistic regression was used to build clinical factor and nested models. Results: In the test cohort of contrast-enhanced scanning (n = 60/200), the AUC of EAT-score for identifying patients with AF was 0.92 (95%CI: 0.84â1.00), higher than 0.71 (0.58â0.85) of the clinical factor model (total cholesterol and body mass index) (DeLongâs p = 0.01), and higher than 0.73 (0.61â0.86) of the EAT volume model (p = 0.01). In the test cohort of non-enhanced scanning (n = 100/300), the AUC of EAT-score was 0.85 (0.77â0.92), higher than that of the CT attenuation model (p 0.05). Conclusion: EAT-score generated by machine-learning-based radiomics achieved high performance in identifying patients with AF. Advances in knowledge: A radiomics analysis based on machine learning allows for the identification of AF on the EAT in contrast-enhanced and non-enhanced chest CT
The CAMELS project: Cosmology and Astrophysics with MachinE Learning Simulations
We present the Cosmology and Astrophysics with MachinE Learning Simulations
--CAMELS-- project. CAMELS is a suite of 4,233 cosmological simulations of
volume each: 2,184 state-of-the-art
(magneto-)hydrodynamic simulations run with the AREPO and GIZMO codes,
employing the same baryonic subgrid physics as the IllustrisTNG and SIMBA
simulations, and 2,049 N-body simulations. The goal of the CAMELS project is to
provide theory predictions for different observables as a function of cosmology
and astrophysics, and it is the largest suite of cosmological
(magneto-)hydrodynamic simulations designed to train machine learning
algorithms. CAMELS contains thousands of different cosmological and
astrophysical models by way of varying , , and four
parameters controlling stellar and AGN feedback, following the evolution of
more than 100 billion particles and fluid elements over a combined volume of
. We describe the simulations in detail and
characterize the large range of conditions represented in terms of the matter
power spectrum, cosmic star formation rate density, galaxy stellar mass
function, halo baryon fractions, and several galaxy scaling relations. We show
that the IllustrisTNG and SIMBA suites produce roughly similar distributions of
galaxy properties over the full parameter space but significantly different
halo baryon fractions and baryonic effects on the matter power spectrum. This
emphasizes the need for marginalizing over baryonic effects to extract the
maximum amount of information from cosmological surveys. We illustrate the
unique potential of CAMELS using several machine learning applications,
including non-linear interpolation, parameter estimation, symbolic regression,
data generation with Generative Adversarial Networks (GANs), dimensionality
reduction, and anomaly detection.Comment: 33 pages, 18 figures, CAMELS webpage at
https://www.camel-simulations.or
- âŠ