Search CORE

34,708 research outputs found

Class imbalance impact on the prediction of complications during home hospitalization: a comparative study.

Author: Calvo González Mireia
Cano Isaac
Henández Carmen
Jané Campos Raimon
Miralles Felip
Ribas Vicent
Roca Josep
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksHome hospitalization (HH) is presented as a healthcare alternative capable of providing high standards of care when patients no longer need hospital facilities. Although HH seems to lower healthcare costs by shortening hospital stays and improving patient's quality of life, the lack of continuous observation at home may lead to complications in some patients. Since blood tests have been proven to provide relevant prognosis information in many diseases, this paper analyzes the impact of different sampling methods on the prediction of HH outcomes. After a first exploratory analysis, some variables extracted from routine blood tests performed at the moment of HH admission, such as hemoglobin, lymphocytes or creatinine, were found to unmask statistically significant differences between patients undergoing successful and unsucessful HH stays. Then, predictive models were built with these data, in order to identify unsuccessful cases eventually needing hospital facilities. However, since these hospital admissions during HH programs are rare, their identification through conventional machine-learning approaches is challenging. Thus, several sampling strategies designed to face class imbalance were herein overviewed and compared. Among the analyzed approaches, over-sampling strategies, such as ROSE (Random Over-Sampling Examples) and conventional random over-sampling, showed the best performances. Nevertheless, further improvements should be proposed in the future so as to better identify those patients not benefiting from HHPeer ReviewedPostprint (author's final draft

Using machine learning techniques to develop forecasting algorithms for postoperative complications: Protocol for a retrospective study

Author: Avidan Michael Simon
Ben Abdallah Arbi
Budelier Thaddeus
Chen Yixin
Fritz Bradley A
Gregory Stephen
Helsten Daniel L
Kronzer Alex
McKinnon Sherry Lynn
Murray-Torres Teresa M
Sharma Anshuman
Wildes Troy S
Publication venue: Digital Commons@Becker
Publication date: 01/01/2018
Field of study

Interactive exploration of population scale pharmacoepidemiology datasets

Author: Abadi M.
Furu K.
Salathé M.
Ventola C. L.
Wishart D. S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/05/2020
Field of study

Population-scale drug prescription data linked with adverse drug reaction (ADR) data supports the fitting of models large enough to detect drug use and ADR patterns that are not detectable using traditional methods on smaller datasets. However, detecting ADR patterns in large datasets requires tools for scalable data processing, machine learning for data analysis, and interactive visualization. To our knowledge no existing pharmacoepidemiology tool supports all three requirements. We have therefore created a tool for interactive exploration of patterns in prescription datasets with millions of samples. We use Spark to preprocess the data for machine learning and for analyses using SQL queries. We have implemented models in Keras and the scikit-learn framework. The model results are visualized and interpreted using live Python coding in Jupyter. We apply our tool to explore a 384 million prescription data set from the Norwegian Prescription Database combined with a 62 million prescriptions for elders that were hospitalized. We preprocess the data in two minutes, train models in seconds, and plot the results in milliseconds. Our results show the power of combining computational power, short computation times, and ease of use for analysis of population scale pharmacoepidemiology datasets. The code is open source and available at: https://github.com/uit-hdl/norpd_prescription_analyse

arXiv.org e-Print Archive

Establishment of a integrative multi-omics expression database CKDdb in the context of chronic kidney disease (CKD)

Author: Fernandes Marco
Husi Holger
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Complex human traits such as chronic kidney disease (CKD) are a major health and financial burden in modern societies. Currently, the description of the CKD onset and progression at the molecular level is still not fully understood. Meanwhile, the prolific use of high-throughput omic technologies in disease biomarker discovery studies yielded a vast amount of disjointed data that cannot be easily collated. Therefore, we aimed to develop a molecule-centric database featuring CKD-related experiments from available literature publications. We established the Chronic Kidney Disease database CKDdb, an integrated and clustered information resource that covers multi-omic studies (microRNAs, genomics, peptidomics, proteomics and metabolomics) of CKD and related disorders by performing literature data mining and manual curation. The CKDdb database contains differential expression data from 49395 molecule entries (redundant), of which 16885 are unique molecules (non-redundant) from 377 manually curated studies of 230 publications. This database was intentionally built to allow disease pathway analysis through a systems approach in order to yield biological meaning by integrating all existing information and therefore has the potential to unravel and gain an in-depth understanding of the key molecular events that modulate CKD pathogenesis

Enlighten

maigesPack: A Computational Environment for Microarray Data Analysis

Author: Esteves Gustavo H.
Hirata Jr Roberto
Publication venue
Publication date: 11/11/2015
Field of study

Microarray technology is still an important way to assess gene expression in molecular biology, mainly because it measures expression profiles for thousands of genes simultaneously, what makes this technology a good option for some studies focused on systems biology. One of its main problem is complexity of experimental procedure, presenting several sources of variability, hindering statistical modeling. So far, there is no standard protocol for generation and evaluation of microarray data. To mitigate the analysis process this paper presents an R package, named maigesPack, that helps with data organization. Besides that, it makes data analysis process more robust, reliable and reproducible. Also, maigesPack aggregates several data analysis procedures reported in literature, for instance: cluster analysis, differential expression, supervised classifiers, relevance networks and functional classification of gene groups or gene networks

arXiv.org e-Print Archive

CiteSeerX