Search CORE

530,102 research outputs found

Machine Learning the Dimension of a Polytope

Author: Coates Tom
Hofscheier Johannes
Kasprzyk Alexander
Publication venue
Publication date: 15/07/2022
Field of study

We use machine learning to predict the dimension of a lattice polytope directly from its Ehrhart series. This is highly effective, achieving almost 100% accuracy. We also use machine learning to recover the volume of a lattice polytope from its Ehrhart series, and to recover the dimension, volume, and quasi-period of a rational polytope from its Ehrhart series. In each case we achieve very high accuracy, and we propose mathematical explanations for why this should be so.Comment: 13 pages, 7 figure

arXiv.org e-Print Archive

Machine learning based prediction models in male reproductive health: Development of a proof-of-concept model for Klinefelter Syndrome in azoospermic patients

Author: Dugas Martin
Fujarski Michael
Gromoll Jörg
Kliesch Sabine
Krallmann Claudia
Krenz Henrike
Sansone Andrea
Tüttelmann Frank
Varghese Julian
Zitzmann Michael
Publication venue: 'Wiley'
Publication date: 01/01/2022
Field of study

Background Due to the highly variable clinical phenotype, Klinefelter Syndrome is underdiagnosed. Objective Assessment of supervised machine learning based prediction models for identification of Klinefelter Syndrome among azoospermic patients, and comparison to expert clinical evaluation. Materials and methods Retrospective patient data (karyotype, age, height, weight, testis volume, follicle-stimulating hormone, luteinizing hormone, testosterone, estradiol, prolactin, semen pH and semen volume) collected between January 2005 and June 2019 were retrieved from a patient data bank of a University Centre. Models were trained, validated and benchmarked based on different supervised machine learning algorithms. Models were then tested on an independent, prospectively acquired set of patient data (between July 2019 and July 2020). Benchmarking against physicians was performed in addition. Results Based on average performance, support vector machines and CatBoost were particularly well-suited models, with 100% sensitivity and >93% specificity on the test dataset. Compared to a group of 18 expert clinicians, the machine learning models had significantly better median sensitivity (100% vs. 87.5%, p = 0.0455) and fared comparably with regards to specificity (90% vs. 89.9%, p = 0.4795), thereby possibly improving diagnosis rate. A Klinefelter Syndrome Score Calculator based on the prediction models is available on . Discussion Differentiating Klinefelter Syndrome patients from azoospermic patients with normal karyotype (46,XY) is a problem that can be solved with supervised machine learning techniques, improving patient care. Conclusions Machine learning could improve the diagnostic rate of Klinefelter Syndrome among azoospermic patients, even more for less-experienced physicians

ART

Recommended from our members

Exploratory analysis using machine learning to predict for chest wall pain in patients with stage I non-small-cell lung cancer treated with stereotactic body radiation therapy.

Author: Berman Abigail T
Chao Hann-Hsiang
Heskel Marina
Luna Jose M
Simone Charles B
Solberg Timothy D
Valdes Gilmer
Publication venue: eScholarship, University of California
Publication date: 01/09/2018
Field of study

Background and purposeChest wall toxicity is observed after stereotactic body radiation therapy (SBRT) for peripherally located lung tumors. We utilize machine learning algorithms to identify toxicity predictors to develop dose-volume constraints.Materials and methodsTwenty-five patient, tumor, and dosimetric features were recorded for 197 consecutive patients with Stage I NSCLC treated with SBRT, 11 of whom (5.6%) developed CTCAEv4 grade ≥2 chest wall pain. Decision tree modeling was used to determine chest wall syndrome (CWS) thresholds for individual features. Significant features were determined using independent multivariate methods. These methods incorporate out-of-bag estimation using Random forests (RF) and bootstrapping (100 iterations) using decision trees.ResultsUnivariate analysis identified rib dose to 1 cc < 4000 cGy (P = 0.01), chest wall dose to 30 cc < 1900 cGy (P = 0.035), rib Dmax < 5100 cGy (P = 0.05) and lung dose to 1000 cc < 70 cGy (P = 0.039) to be statistically significant thresholds for avoiding CWS. Subsequent multivariate analysis confirmed the importance of rib dose to 1 cc, chest wall dose to 30 cc, and rib Dmax. Using learning-curve experiments, the dataset proved to be self-consistent and provides a realistic model for CWS analysis.ConclusionsUsing machine learning algorithms in this first of its kind study, we identify robust features and cutoffs predictive for the rare clinical event of CWS. Additional data in planned subsequent multicenter studies will help increase the accuracy of multivariate analysis

eScholarship - University of California

PREDIKSI PENYAKIT STROKE MENGGUNAKAN SUPPORT VECTOR MACHINE (SVM)

Author: Patmawati Patmawati
Publication venue: Gwex Net Publisher
Publication date: 20/04/2023
Field of study

Berdasarkan data dari Kementerian Kesehatan Indonesia, telah terjadi peningkatan jumlah pada kasuspenyakit stroke sebesar 3.9% mulai dari tahun 2013 sampai dengan tahun 2018. Secara nasional, jumlahkasus stroke sering terjadi pada kelompok yang memiliki rentang umur antara 55-64 tahun dan palingsedikit terjadi pada rentang umur 15-24. Stroke atau (Cerebrovascular Accidents) merupakan sebuahkeadaan dimana aliran darah ke otak mengalami gangguan mendadak atau berkurang. Hal tersebutdapat disebabkan oleh penyumbatan atau pecah pembuluh darah, sehingga sel-sel pada area otak tidakmendapatkan pasokan darah yang nutrisi dan oksigen. Diperlukan deteksi dini yang bertujuan untukmengurangi jumlah potensi kematian akibat stroke. Prediksi stroke masih menjadi tantang dalam bidangkedokteran, salah satu penyebabnya adalah volume data pada data medis yang memiliki heterogenitasdan kompleksitas yang tinggi. Teknik machine learning merupakan model analisis data yang dapatdigunakan untuk memprediksi penyakit stroke. Berbagai model pembelajaran machine learning telahdiusulkan oleh peneliti-peneliti sebelumnya, salah satunya Support Vector Machine. Penelitian inimencoba menerapkan kembali algoritma SVM dengan mendapatkan hasil kinerja lebih baik daripenelitian sebelumnya. Dalam penelitian ini didapatkan nilai accuracy sebesar 100% dan nilai ROC-AUC sebesar 100%. Perlu dilakukan pengkajian lagi terkait hasil yang didapatkan hingga mencapai100%

BULLETIN OF NETWORK ENGINEER AND INFORMATICS

Predicting the Probability of Cargo Theft for Individual Cases in Railway Transport

Author: Augustyn Lorenc
Maciej Szkoda
Małgorzata Kuźnar
Tone Lerher
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2020
Field of study

In the heavy industry, the value of cargo transported by rail is very high. Due to high value, poor security and volume of rail transport, the theft cases are often. The main problem of securing rail transport is predicting the location of a high probability of risk. Because of this, the aim of the presented research was to predict the highest probability of rail cargo theft for areas. It is important to prevent theft cases by better securing the railway lines. To solve that problem the authors\u27 model was developed. The model uses information about past transport cases for the learning process of Artificial Neural Networks (ANN) and Machine Learning (ML).The ANN predicted the probability for 94.7% of the cases of theft and the Machine Learning identified 100% of the cases. This method can be used to develop a support system for securing the rail infrastructure

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Hunting for open clusters in \textit{Gaia} DR2: the Galactic anticentre

Author: Balaguer-Núñez L.
Cantat-Gaudin T.
Castro-Ginard A.
Jordi C.
Luri X.
Publication venue: 'EDP Sciences'
Publication date: 05/06/2019
Field of study

The Gaia Data Release 2 (DR2) provided an unprecedented volume of precise astrometric and excellent photometric data. In terms of data mining the Gaia catalogue, machine learning methods have shown to be a powerful tool, for instance in the search for unknown stellar structures. Particularly, supervised and unsupervised learning methods combined together significantly improves the detection rate of open clusters. We systematically scan Gaia DR2 in a region covering the Galactic anticentre and the Perseus arm

(120 \leq l \leq 205

and

-10 \leq b \leq 10)

, with the goal of finding any open clusters that may exist in this region, and fine tuning a previously proposed methodology successfully applied to TGAS data, adapting it to different density regions. Our methodology uses an unsupervised, density-based, clustering algorithm, DBSCAN, that identifies overdensities in the five-dimensional astrometric parameter space

(l,b,\varpi,\mu_{\alpha^*},\mu_{\delta})

that may correspond to physical clusters. The overdensities are separated into physical clusters (open clusters) or random statistical clusters using an artificial neural network to recognise the isochrone pattern that open clusters show in a colour magnitude diagram. The method is able to recover more than 75% of the open clusters confirmed in the search area. Moreover, we detected 53 open clusters unknown previous to Gaia DR2, which represents an increase of more than 22% with respect to the already catalogued clusters in this region. We find that the census of nearby open clusters is not complete. Different machine learning methodologies for a blind search of open clusters are complementary to each other; no single method is able to detect 100% of the existing groups. Our methodology has shown to be a reliable tool for the automatic detection of open clusters, designed to be applied to the full Gaia DR2 catalogue.Comment: 8 pages, accepted by Astronomy and Astrophysics (A&A) the 14th May, 2019. Tables 1 and 2 available at the CD

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Performance Evaluation of Apache Spark MLlib Algorithms on an Intrusion Detection Dataset

Author: Ahmadi Mahmood
Atefinia Ramin
Publication venue
Publication date: 10/12/2022
Field of study

The increase in the use of the Internet and web services and the advent of the fifth generation of cellular network technology (5G) along with ever-growing Internet of Things (IoT) data traffic will grow global internet usage. To ensure the security of future networks, machine learning-based intrusion detection and prevention systems (IDPS) must be implemented to detect new attacks, and big data parallel processing tools can be used to handle a huge collection of training data in these systems. In this paper Apache Spark, a general-purpose and fast cluster computing platform is used for processing and training a large volume of network traffic feature data. In this work, the most important features of the CSE-CIC-IDS2018 dataset are used for constructing machine learning models and then the most popular machine learning approaches, namely Logistic Regression, Support Vector Machine (SVM), three different Decision Tree Classifiers, and Naive Bayes algorithm are used to train the model using up to eight number of worker nodes. Our Spark cluster contains seven machines acting as worker nodes and one machine is configured as both a master and a worker. We use the CSE-CIC-IDS2018 dataset to evaluate the overall performance of these algorithms on Botnet attacks and distributed hyperparameter tuning is used to find the best single decision tree parameters. We have achieved up to 100% accuracy using selected features by the learning method in our experimentsComment: Journal of Computing and Security (Isfahan University, Iran), Vol. 9, No.1, 202

arXiv.org e-Print Archive

Machine-learning-based radiomics identifies atrial fibrillation on the epicardial fat in contrast-enhanced and non-enhanced chest CT

Author: De Bock Geertruida H.
Jiang Beibei
Vliegenthart Rozemarijn
Wang Lingyun
Xie Xueqian
Xu Zhihan
Zhang Lu
Zhang Yaping
Publication venue: 'British Institute of Radiology'
Publication date: 01/07/2022
Field of study

Objective: The purpose is to establish and validate a machine-learning-derived radiomics approach to deter-mine the existence of atrial fibrillation (AF) by analyzing epicardial adipose tissue (EAT) in CT images. Methods: Patients with AF based on electrocardio-graphic tracing who underwent contrast-enhanced (n = 200) or non-enhanced (n = 300) chest CT scans were analyzed retrospectively. After EAT segmentation and radiomics feature extraction, the segmented EAT yielded 1691 radiomics features. The most contributive features to AF were selected by the Boruta algorithm and machine-learning-based random forest algorithm, and combined to construct a radiomics signature (EAT-score). Multivariate logistic regression was used to build clinical factor and nested models. Results: In the test cohort of contrast-enhanced scanning (n = 60/200), the AUC of EAT-score for identifying patients with AF was 0.92 (95%CI: 0.84–1.00), higher than 0.71 (0.58–0.85) of the clinical factor model (total cholesterol and body mass index) (DeLong’s p = 0.01), and higher than 0.73 (0.61–0.86) of the EAT volume model (p = 0.01). In the test cohort of non-enhanced scanning (n = 100/300), the AUC of EAT-score was 0.85 (0.77–0.92), higher than that of the CT attenuation model (p 0.05). Conclusion: EAT-score generated by machine-learning-based radiomics achieved high performance in identifying patients with AF. Advances in knowledge: A radiomics analysis based on machine learning allows for the identification of AF on the EAT in contrast-enhanced and non-enhanced chest CT

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

The CAMELS project: Cosmology and Astrophysics with MachinE Learning Simulations

Author: Anglés-Alcázar Daniel
Battaglia Nicholas
Bryan Greg L.
Burkhart Blakesley
Contardo Gabriella
Dave Romeel
Delgado Ana Maria
Genel Shy
Hassan Sultan
Hernquist Lars
Ho Shirley
Li Yin
Narayanan Desika
Nelson Dylan
Philcox Oliver
Pillepich Annalisa
Somerville Rachel S.
Spergel David N.
Torre Valentina La
Torrey Paul
Villaescusa-Navarro Francisco
Wadekar Digvijay
Publication venue: 'American Astronomical Society'
Publication date: 01/10/2020
Field of study

We present the Cosmology and Astrophysics with MachinE Learning Simulations --CAMELS-- project. CAMELS is a suite of 4,233 cosmological simulations of

(25~h^{-1}{\rm Mpc})^3

volume each: 2,184 state-of-the-art (magneto-)hydrodynamic simulations run with the AREPO and GIZMO codes, employing the same baryonic subgrid physics as the IllustrisTNG and SIMBA simulations, and 2,049 N-body simulations. The goal of the CAMELS project is to provide theory predictions for different observables as a function of cosmology and astrophysics, and it is the largest suite of cosmological (magneto-)hydrodynamic simulations designed to train machine learning algorithms. CAMELS contains thousands of different cosmological and astrophysical models by way of varying

\Omega_m

\sigma_8

, and four parameters controlling stellar and AGN feedback, following the evolution of more than 100 billion particles and fluid elements over a combined volume of

(400~h^{-1}{\rm Mpc})^3

. We describe the simulations in detail and characterize the large range of conditions represented in terms of the matter power spectrum, cosmic star formation rate density, galaxy stellar mass function, halo baryon fractions, and several galaxy scaling relations. We show that the IllustrisTNG and SIMBA suites produce roughly similar distributions of galaxy properties over the full parameter space but significantly different halo baryon fractions and baryonic effects on the matter power spectrum. This emphasizes the need for marginalizing over baryonic effects to extract the maximum amount of information from cosmological surveys. We illustrate the unique potential of CAMELS using several machine learning applications, including non-linear interpolation, parameter estimation, symbolic regression, data generation with Generative Adversarial Networks (GANs), dimensionality reduction, and anomaly detection.Comment: 33 pages, 18 figures, CAMELS webpage at https://www.camel-simulations.or

arXiv.org e-Print Archive

Edinburgh Research Explorer