Search CORE

29,188 research outputs found

A critical look at studies applying over-sampling on the TPEHGDB dataset

Author: A García-Blanco
A Smrdel
AJ Hussain
AL Goldberger
DA Silva De
G Fele-Žorž
H Watson
J Ryu
K Subramaniam
L Liu
LJ Meertens
M Shahrdad
MU Ahmed
N Sadi-Ahmed
NV Chawla
P Fergus
P Fergus
P Fergus
P Ren
S Sim
SM Naeem
UR Acharya
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Preterm birth is the leading cause of death among young children and has a large prevalence globally. Machine learning models, based on features extracted from clinical sources such as electronic patient files, yield promising results. In this study, we review similar studies that constructed predictive models based on a publicly available dataset, called the Term-Preterm EHG Database (TPEHGDB), which contains electrohysterogram signals on top of clinical data. These studies often report near-perfect prediction results, by applying over-sampling as a means of data augmentation. We reconstruct these results to show that they can only be achieved when data augmentation is applied on the entire dataset prior to partitioning into training and testing set. This results in (i) samples that are highly correlated to data points from the test set are introduced and added to the training set, and (ii) artificial samples that are highly correlated to points from the training set being added to the test set. Many previously reported results therefore carry little meaning in terms of the actual effectiveness of the model in making predictions on unseen data in a real-world setting. After focusing on the danger of applying over-sampling strategies before data partitioning, we present a realistic baseline for the TPEHGDB dataset and show how the predictive performance and clinical use can be improved by incorporating features from electrohysterogram sensors and by applying over-sampling on the training set

Crossref

Ghent University Academic Bibliography

Persistence Bag-of-Words for Topological Data Analysis

Author: Dłotko Paweł
Juda Mateusz
Lipiński Michał
Zeppelzauer Matthias
Zieliński Bartosz
Publication venue
Publication date: 01/01/2019
Field of study

Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with machine learning. Comprehensive experiments show that the new representation achieves state-of-the-art performance and beyond in much less time than alternative approaches.Comment: Accepted for the Twenty-Eight International Joint Conference on Artificial Intelligence (IJCAI-19). arXiv admin note: substantial text overlap with arXiv:1802.0485

arXiv.org e-Print Archive

Crossref

Jagiellonian Univeristy Repository

Mining Unclassified Traffic Using Automatic Clustering Techniques

Author: Finamore Alessandro
Mellia Marco
Meo Michela
Publication venue: Springer
Publication date: 01/01/2011
Field of study

In this paper we present a fully unsupervised algorithm to identify classes of traffic inside an aggregate. The algorithm leverages on the K-means clustering algorithm, augmented with a mechanism to automatically determine the number of traffic clusters. The signatures used for clustering are statistical representations of the application layer protocols. The proposed technique is extensively tested considering UDP traffic traces collected from operative networks. Performance tests show that it can clusterize the traffic in few tens of pure clusters, achieving an accuracy above 95%. Results are promising and suggest that the proposed approach might effectively be used for automatic traffic monitoring, e.g., to identify the birth of new applications and protocols, or the presence of anomalous or unexpected traffi

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Recommended from our members

Multiomics modeling of the immunome, transcriptome, microbiome, proteome and metabolome adaptations during human pregnancy.

MotivationMultiple biological clocks govern a healthy pregnancy. These biological mechanisms produce immunologic, metabolomic, proteomic, genomic and microbiomic adaptations during the course of pregnancy. Modeling the chronology of these adaptations during full-term pregnancy provides the frameworks for future studies examining deviations implicated in pregnancy-related pathologies including preterm birth and preeclampsia.ResultsWe performed a multiomics analysis of 51 samples from 17 pregnant women, delivering at term. The datasets included measurements from the immunome, transcriptome, microbiome, proteome and metabolome of samples obtained simultaneously from the same patients. Multivariate predictive modeling using the Elastic Net (EN) algorithm was used to measure the ability of each dataset to predict gestational age. Using stacked generalization, these datasets were combined into a single model. This model not only significantly increased predictive power by combining all datasets, but also revealed novel interactions between different biological modalities. Future work includes expansion of the cohort to preterm-enriched populations and in vivo analysis of immune-modulating interventions based on the mechanisms identified.Availability and implementationDatasets and scripts for reproduction of results are available through: https://nalab.stanford.edu/multiomics-pregnancy/.Supplementary informationSupplementary data are available at Bioinformatics online

eScholarship - University of California

PolyPublie

Application of Machine Learning to Mortality Modeling and Forecasting

Author: Levantesi Susanna
Pizzorusso Virginia
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Estimation of future mortality rates still plays a central role among life insurers in pricing their products and managing longevity risk. In the literature on mortality modeling, a wide number of stochastic models have been proposed, most of them forecasting future mortality rates by extrapolating one or more latent factors. The abundance of proposed models shows that forecasting future mortality from historical trends is non-trivial. Following the idea proposed in Deprez et al. (2017), we use machine learning algorithms, able to catch patterns that are not commonly identifiable, to calibrate a parameter (the machine learning estimator), improving the goodness of fit of standard stochastic mortality models. The machine learning estimator is then forecasted according to the Lee-Carter framework, allowing one to obtain a higher forecasting quality of the standard stochastic models. Out-of sample forecasts are provided to verify the model accuracy

Directory of Open Access Journals

Archivio della ricerca- Università di Roma La Sapienza