530,102 research outputs found

    Machine Learning the Dimension of a Polytope

    Full text link
    We use machine learning to predict the dimension of a lattice polytope directly from its Ehrhart series. This is highly effective, achieving almost 100% accuracy. We also use machine learning to recover the volume of a lattice polytope from its Ehrhart series, and to recover the dimension, volume, and quasi-period of a rational polytope from its Ehrhart series. In each case we achieve very high accuracy, and we propose mathematical explanations for why this should be so.Comment: 13 pages, 7 figure

    Machine learning based prediction models in male reproductive health: Development of a proof-of-concept model for Klinefelter Syndrome in azoospermic patients

    Get PDF
    Background Due to the highly variable clinical phenotype, Klinefelter Syndrome is underdiagnosed. Objective Assessment of supervised machine learning based prediction models for identification of Klinefelter Syndrome among azoospermic patients, and comparison to expert clinical evaluation. Materials and methods Retrospective patient data (karyotype, age, height, weight, testis volume, follicle-stimulating hormone, luteinizing hormone, testosterone, estradiol, prolactin, semen pH and semen volume) collected between January 2005 and June 2019 were retrieved from a patient data bank of a University Centre. Models were trained, validated and benchmarked based on different supervised machine learning algorithms. Models were then tested on an independent, prospectively acquired set of patient data (between July 2019 and July 2020). Benchmarking against physicians was performed in addition. Results Based on average performance, support vector machines and CatBoost were particularly well-suited models, with 100% sensitivity and >93% specificity on the test dataset. Compared to a group of 18 expert clinicians, the machine learning models had significantly better median sensitivity (100% vs. 87.5%, p = 0.0455) and fared comparably with regards to specificity (90% vs. 89.9%, p = 0.4795), thereby possibly improving diagnosis rate. A Klinefelter Syndrome Score Calculator based on the prediction models is available on . Discussion Differentiating Klinefelter Syndrome patients from azoospermic patients with normal karyotype (46,XY) is a problem that can be solved with supervised machine learning techniques, improving patient care. Conclusions Machine learning could improve the diagnostic rate of Klinefelter Syndrome among azoospermic patients, even more for less-experienced physicians

    PREDIKSI PENYAKIT STROKE MENGGUNAKAN SUPPORT VECTOR MACHINE (SVM)

    Get PDF
    Berdasarkan data dari Kementerian Kesehatan Indonesia, telah terjadi peningkatan jumlah pada kasuspenyakit stroke sebesar 3.9% mulai dari tahun 2013 sampai dengan tahun 2018. Secara nasional, jumlahkasus stroke sering terjadi pada kelompok yang memiliki rentang umur antara 55-64 tahun dan palingsedikit terjadi pada rentang umur 15-24. Stroke atau (Cerebrovascular Accidents) merupakan sebuahkeadaan dimana aliran darah ke otak mengalami gangguan mendadak atau berkurang. Hal tersebutdapat disebabkan oleh penyumbatan atau pecah pembuluh darah, sehingga sel-sel pada area otak tidakmendapatkan pasokan darah yang nutrisi dan oksigen. Diperlukan deteksi dini yang bertujuan untukmengurangi jumlah potensi kematian akibat stroke. Prediksi stroke masih menjadi tantang dalam bidangkedokteran, salah satu penyebabnya adalah volume data pada data medis yang memiliki heterogenitasdan kompleksitas yang tinggi. Teknik machine learning merupakan model analisis data yang dapatdigunakan untuk memprediksi penyakit stroke. Berbagai model pembelajaran machine learning telahdiusulkan oleh peneliti-peneliti sebelumnya, salah satunya Support Vector Machine. Penelitian inimencoba menerapkan kembali algoritma SVM dengan mendapatkan hasil kinerja lebih baik daripenelitian sebelumnya. Dalam penelitian ini didapatkan nilai accuracy sebesar 100% dan nilai ROC-AUC sebesar 100%. Perlu dilakukan pengkajian lagi terkait hasil yang didapatkan hingga mencapai100%

    Predicting the Probability of Cargo Theft for Individual Cases in Railway Transport

    Get PDF
    In the heavy industry, the value of cargo transported by rail is very high. Due to high value, poor security and volume of rail transport, the theft cases are often. The main problem of securing rail transport is predicting the location of a high probability of risk. Because of this, the aim of the presented research was to predict the highest probability of rail cargo theft for areas. It is important to prevent theft cases by better securing the railway lines. To solve that problem the authors\u27 model was developed. The model uses information about past transport cases for the learning process of Artificial Neural Networks (ANN) and Machine Learning (ML).The ANN predicted the probability for 94.7% of the cases of theft and the Machine Learning identified 100% of the cases. This method can be used to develop a support system for securing the rail infrastructure

    Hunting for open clusters in \textit{Gaia} DR2: the Galactic anticentre

    Full text link
    The Gaia Data Release 2 (DR2) provided an unprecedented volume of precise astrometric and excellent photometric data. In terms of data mining the Gaia catalogue, machine learning methods have shown to be a powerful tool, for instance in the search for unknown stellar structures. Particularly, supervised and unsupervised learning methods combined together significantly improves the detection rate of open clusters. We systematically scan Gaia DR2 in a region covering the Galactic anticentre and the Perseus arm (120≀l≀205(120 \leq l \leq 205 and −10≀b≀10)-10 \leq b \leq 10), with the goal of finding any open clusters that may exist in this region, and fine tuning a previously proposed methodology successfully applied to TGAS data, adapting it to different density regions. Our methodology uses an unsupervised, density-based, clustering algorithm, DBSCAN, that identifies overdensities in the five-dimensional astrometric parameter space (l,b,ϖ,Όα∗,ΌΎ)(l,b,\varpi,\mu_{\alpha^*},\mu_{\delta}) that may correspond to physical clusters. The overdensities are separated into physical clusters (open clusters) or random statistical clusters using an artificial neural network to recognise the isochrone pattern that open clusters show in a colour magnitude diagram. The method is able to recover more than 75% of the open clusters confirmed in the search area. Moreover, we detected 53 open clusters unknown previous to Gaia DR2, which represents an increase of more than 22% with respect to the already catalogued clusters in this region. We find that the census of nearby open clusters is not complete. Different machine learning methodologies for a blind search of open clusters are complementary to each other; no single method is able to detect 100% of the existing groups. Our methodology has shown to be a reliable tool for the automatic detection of open clusters, designed to be applied to the full Gaia DR2 catalogue.Comment: 8 pages, accepted by Astronomy and Astrophysics (A&A) the 14th May, 2019. Tables 1 and 2 available at the CD

    Performance Evaluation of Apache Spark MLlib Algorithms on an Intrusion Detection Dataset

    Full text link
    The increase in the use of the Internet and web services and the advent of the fifth generation of cellular network technology (5G) along with ever-growing Internet of Things (IoT) data traffic will grow global internet usage. To ensure the security of future networks, machine learning-based intrusion detection and prevention systems (IDPS) must be implemented to detect new attacks, and big data parallel processing tools can be used to handle a huge collection of training data in these systems. In this paper Apache Spark, a general-purpose and fast cluster computing platform is used for processing and training a large volume of network traffic feature data. In this work, the most important features of the CSE-CIC-IDS2018 dataset are used for constructing machine learning models and then the most popular machine learning approaches, namely Logistic Regression, Support Vector Machine (SVM), three different Decision Tree Classifiers, and Naive Bayes algorithm are used to train the model using up to eight number of worker nodes. Our Spark cluster contains seven machines acting as worker nodes and one machine is configured as both a master and a worker. We use the CSE-CIC-IDS2018 dataset to evaluate the overall performance of these algorithms on Botnet attacks and distributed hyperparameter tuning is used to find the best single decision tree parameters. We have achieved up to 100% accuracy using selected features by the learning method in our experimentsComment: Journal of Computing and Security (Isfahan University, Iran), Vol. 9, No.1, 202

    Machine-learning-based radiomics identifies atrial fibrillation on the epicardial fat in contrast-enhanced and non-enhanced chest CT

    Get PDF
    Objective: The purpose is to establish and validate a machine-learning-derived radiomics approach to deter-mine the existence of atrial fibrillation (AF) by analyzing epicardial adipose tissue (EAT) in CT images. Methods: Patients with AF based on electrocardio-graphic tracing who underwent contrast-enhanced (n = 200) or non-enhanced (n = 300) chest CT scans were analyzed retrospectively. After EAT segmentation and radiomics feature extraction, the segmented EAT yielded 1691 radiomics features. The most contributive features to AF were selected by the Boruta algorithm and machine-learning-based random forest algorithm, and combined to construct a radiomics signature (EAT-score). Multivariate logistic regression was used to build clinical factor and nested models. Results: In the test cohort of contrast-enhanced scanning (n = 60/200), the AUC of EAT-score for identifying patients with AF was 0.92 (95%CI: 0.84–1.00), higher than 0.71 (0.58–0.85) of the clinical factor model (total cholesterol and body mass index) (DeLong’s p = 0.01), and higher than 0.73 (0.61–0.86) of the EAT volume model (p = 0.01). In the test cohort of non-enhanced scanning (n = 100/300), the AUC of EAT-score was 0.85 (0.77–0.92), higher than that of the CT attenuation model (p 0.05). Conclusion: EAT-score generated by machine-learning-based radiomics achieved high performance in identifying patients with AF. Advances in knowledge: A radiomics analysis based on machine learning allows for the identification of AF on the EAT in contrast-enhanced and non-enhanced chest CT

    The CAMELS project: Cosmology and Astrophysics with MachinE Learning Simulations

    Get PDF
    We present the Cosmology and Astrophysics with MachinE Learning Simulations --CAMELS-- project. CAMELS is a suite of 4,233 cosmological simulations of (25 h−1Mpc)3(25~h^{-1}{\rm Mpc})^3 volume each: 2,184 state-of-the-art (magneto-)hydrodynamic simulations run with the AREPO and GIZMO codes, employing the same baryonic subgrid physics as the IllustrisTNG and SIMBA simulations, and 2,049 N-body simulations. The goal of the CAMELS project is to provide theory predictions for different observables as a function of cosmology and astrophysics, and it is the largest suite of cosmological (magneto-)hydrodynamic simulations designed to train machine learning algorithms. CAMELS contains thousands of different cosmological and astrophysical models by way of varying Ωm\Omega_m, σ8\sigma_8, and four parameters controlling stellar and AGN feedback, following the evolution of more than 100 billion particles and fluid elements over a combined volume of (400 h−1Mpc)3(400~h^{-1}{\rm Mpc})^3. We describe the simulations in detail and characterize the large range of conditions represented in terms of the matter power spectrum, cosmic star formation rate density, galaxy stellar mass function, halo baryon fractions, and several galaxy scaling relations. We show that the IllustrisTNG and SIMBA suites produce roughly similar distributions of galaxy properties over the full parameter space but significantly different halo baryon fractions and baryonic effects on the matter power spectrum. This emphasizes the need for marginalizing over baryonic effects to extract the maximum amount of information from cosmological surveys. We illustrate the unique potential of CAMELS using several machine learning applications, including non-linear interpolation, parameter estimation, symbolic regression, data generation with Generative Adversarial Networks (GANs), dimensionality reduction, and anomaly detection.Comment: 33 pages, 18 figures, CAMELS webpage at https://www.camel-simulations.or
    • 

    corecore