Search CORE

373,132 research outputs found

Reliability-based cleaning of noisy training labels with inductive conformal prediction in multi-modal biomedical data mining

Author: Gevaert Olivier
Lu Guangming
Xu Qinmei
Zhan Xianghao
Zheng Yuanning
Publication venue
Publication date: 13/09/2023
Field of study

Accurately labeling biomedical data presents a challenge. Traditional semi-supervised learning methods often under-utilize available unlabeled data. To address this, we propose a novel reliability-based training data cleaning method employing inductive conformal prediction (ICP). This method capitalizes on a small set of accurately labeled training data and leverages ICP-calculated reliability metrics to rectify mislabeled data and outliers within vast quantities of noisy training data. The efficacy of the method is validated across three classification tasks within distinct modalities: filtering drug-induced-liver-injury (DILI) literature with title and abstract, predicting ICU admission of COVID-19 patients through CT radiomics and electronic health records, and subtyping breast cancer using RNA-sequencing data. Varying levels of noise to the training labels were introduced through label permutation. Results show significant enhancements in classification performance: accuracy enhancement in 86 out of 96 DILI experiments (up to 11.4%), AUROC and AUPRC enhancements in all 48 COVID-19 experiments (up to 23.8% and 69.8%), and accuracy and macro-average F1 score improvements in 47 out of 48 RNA-sequencing experiments (up to 74.6% and 89.0%). Our method offers the potential to substantially boost classification performance in multi-modal biomedical machine learning tasks. Importantly, it accomplishes this without necessitating an excessive volume of meticulously curated training data

arXiv.org e-Print Archive

Inferring transportation modes from GPS trajectories using a convolutional neural network

Author: Dabiri Sina
Heaslip Kevin
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Identifying the distribution of users' transportation modes is an essential part of travel demand analysis and transportation planning. With the advent of ubiquitous GPS-enabled devices (e.g., a smartphone), a cost-effective approach for inferring commuters' mobility mode(s) is to leverage their GPS trajectories. A majority of studies have proposed mode inference models based on hand-crafted features and traditional machine learning algorithms. However, manual features engender some major drawbacks including vulnerability to traffic and environmental conditions as well as possessing human's bias in creating efficient features. One way to overcome these issues is by utilizing Convolutional Neural Network (CNN) schemes that are capable of automatically driving high-level features from the raw input. Accordingly, in this paper, we take advantage of CNN architectures so as to predict travel modes based on only raw GPS trajectories, where the modes are labeled as walk, bike, bus, driving, and train. Our key contribution is designing the layout of the CNN's input layer in such a way that not only is adaptable with the CNN schemes but represents fundamental motion characteristics of a moving object including speed, acceleration, jerk, and bearing rate. Furthermore, we ameliorate the quality of GPS logs through several data preprocessing steps. Using the clean input layer, a variety of CNN configurations are evaluated to achieve the best CNN architecture. The highest accuracy of 84.8% has been achieved through the ensemble of the best CNN configuration. In this research, we contrast our methodology with traditional machine learning algorithms as well as the seminal and most related studies to demonstrate the superiority of our framework.Comment: 12 pages, 3 figures, 7 tables, Transportation Research Part C: Emerging Technologie

arXiv.org e-Print Archive

Monash University, Institute of Transport Studies: World Transit Research (WTR)

Recommended from our members

Machine Learning Decision Tree Models for Differentiation of Posterior Fossa Tumors Using Diffusion Histogram Analysis and Structural MRI Findings.

Author: Aboian Mariam
Cha Soonmee
Payabvash Seyedmehdi
Tihan Tarik
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

We applied machine learning algorithms for differentiation of posterior fossa tumors using apparent diffusion coefficient (ADC) histogram analysis and structural MRI findings. A total of 256 patients with intra-axial posterior fossa tumors were identified, of whom 248 were included in machine learning analysis, with at least 6 representative subjects per each tumor pathology. The ADC histograms of solid components of tumors, structural MRI findings, and patients' age were applied to construct decision models using Classification and Regression Tree analysis. We also compared different machine learning classification algorithms (i.e., naïve Bayes, random forest, neural networks, support vector machine with linear and polynomial kernel) for dichotomized differentiation of the 5 most common tumors in our cohort: metastasis (n = 65), hemangioblastoma (n = 44), pilocytic astrocytoma (n = 43), ependymoma (n = 27), and medulloblastoma (n = 26). The decision tree model could differentiate seven tumor histopathologies with terminal nodes yielding up to 90% accurate classification rates. In receiver operating characteristics (ROC) analysis, the decision tree model achieved greater area under the curve (AUC) for differentiation of pilocytic astrocytoma (p = 0.020); and atypical teratoid/rhabdoid tumor ATRT (p = 0.001) from other types of neoplasms compared to the official clinical report. However, neuroradiologists' interpretations had greater accuracy in differentiating metastases (p = 0.001). Among different machine learning algorithms, random forest models yielded the highest accuracy in dichotomized classification of the 5 most common tumor types; and in multiclass differentiation of all tumor types random forest yielded an averaged AUC of 0.961 in training datasets, and 0.873 in validation samples. Our study demonstrates the potential application of machine learning algorithms and decision trees for accurate differentiation of brain tumors based on pretreatment MRI. Using easy to apply and understandable imaging metrics, the proposed decision tree model can help radiologists with differentiation of posterior fossa tumors, especially in tumors with similar qualitative imaging characteristics. In particular, our decision tree model provided more accurate differentiation of pilocytic astrocytomas from ATRT than by neuroradiologists in clinical reads

eScholarship - University of California

Recommended from our members

Investigation of Machine Learning Approaches for Traumatic Brain Injury Classification via EEG Assessment in Mice.

Author: Cao Hung
Dutt Nikil
Jafarlou Salar
Lim Miranda M
Rahmani Amir M
Shin Ikhwan
Vishwanath Manoj
Publication venue: eScholarship, University of California
Publication date: 01/04/2020
Field of study

Due to the difficulties and complications in the quantitative assessment of traumatic brain injury (TBI) and its increasing relevance in today's world, robust detection of TBI has become more significant than ever. In this work, we investigate several machine learning approaches to assess their performance in classifying electroencephalogram (EEG) data of TBI in a mouse model. Algorithms such as decision trees (DT), random forest (RF), neural network (NN), support vector machine (SVM), K-nearest neighbors (KNN) and convolutional neural network (CNN) were analyzed based on their performance to classify mild TBI (mTBI) data from those of the control group in wake stages for different epoch lengths. Average power in different frequency sub-bands and alpha:theta power ratio in EEG were used as input features for machine learning approaches. Results in this mouse model were promising, suggesting similar approaches may be applicable to detect TBI in humans in practical scenarios

eScholarship - University of California