373,132 research outputs found
Reliability-based cleaning of noisy training labels with inductive conformal prediction in multi-modal biomedical data mining
Accurately labeling biomedical data presents a challenge. Traditional
semi-supervised learning methods often under-utilize available unlabeled data.
To address this, we propose a novel reliability-based training data cleaning
method employing inductive conformal prediction (ICP). This method capitalizes
on a small set of accurately labeled training data and leverages ICP-calculated
reliability metrics to rectify mislabeled data and outliers within vast
quantities of noisy training data. The efficacy of the method is validated
across three classification tasks within distinct modalities: filtering
drug-induced-liver-injury (DILI) literature with title and abstract, predicting
ICU admission of COVID-19 patients through CT radiomics and electronic health
records, and subtyping breast cancer using RNA-sequencing data. Varying levels
of noise to the training labels were introduced through label permutation.
Results show significant enhancements in classification performance: accuracy
enhancement in 86 out of 96 DILI experiments (up to 11.4%), AUROC and AUPRC
enhancements in all 48 COVID-19 experiments (up to 23.8% and 69.8%), and
accuracy and macro-average F1 score improvements in 47 out of 48 RNA-sequencing
experiments (up to 74.6% and 89.0%). Our method offers the potential to
substantially boost classification performance in multi-modal biomedical
machine learning tasks. Importantly, it accomplishes this without necessitating
an excessive volume of meticulously curated training data
Inferring transportation modes from GPS trajectories using a convolutional neural network
Identifying the distribution of users' transportation modes is an essential
part of travel demand analysis and transportation planning. With the advent of
ubiquitous GPS-enabled devices (e.g., a smartphone), a cost-effective approach
for inferring commuters' mobility mode(s) is to leverage their GPS
trajectories. A majority of studies have proposed mode inference models based
on hand-crafted features and traditional machine learning algorithms. However,
manual features engender some major drawbacks including vulnerability to
traffic and environmental conditions as well as possessing human's bias in
creating efficient features. One way to overcome these issues is by utilizing
Convolutional Neural Network (CNN) schemes that are capable of automatically
driving high-level features from the raw input. Accordingly, in this paper, we
take advantage of CNN architectures so as to predict travel modes based on only
raw GPS trajectories, where the modes are labeled as walk, bike, bus, driving,
and train. Our key contribution is designing the layout of the CNN's input
layer in such a way that not only is adaptable with the CNN schemes but
represents fundamental motion characteristics of a moving object including
speed, acceleration, jerk, and bearing rate. Furthermore, we ameliorate the
quality of GPS logs through several data preprocessing steps. Using the clean
input layer, a variety of CNN configurations are evaluated to achieve the best
CNN architecture. The highest accuracy of 84.8% has been achieved through the
ensemble of the best CNN configuration. In this research, we contrast our
methodology with traditional machine learning algorithms as well as the seminal
and most related studies to demonstrate the superiority of our framework.Comment: 12 pages, 3 figures, 7 tables, Transportation Research Part C:
Emerging Technologie
Recommended from our members
Machine Learning Decision Tree Models for Differentiation of Posterior Fossa Tumors Using Diffusion Histogram Analysis and Structural MRI Findings.
We applied machine learning algorithms for differentiation of posterior fossa tumors using apparent diffusion coefficient (ADC) histogram analysis and structural MRI findings. A total of 256 patients with intra-axial posterior fossa tumors were identified, of whom 248 were included in machine learning analysis, with at least 6 representative subjects per each tumor pathology. The ADC histograms of solid components of tumors, structural MRI findings, and patients' age were applied to construct decision models using Classification and Regression Tree analysis. We also compared different machine learning classification algorithms (i.e., naïve Bayes, random forest, neural networks, support vector machine with linear and polynomial kernel) for dichotomized differentiation of the 5 most common tumors in our cohort: metastasis (n = 65), hemangioblastoma (n = 44), pilocytic astrocytoma (n = 43), ependymoma (n = 27), and medulloblastoma (n = 26). The decision tree model could differentiate seven tumor histopathologies with terminal nodes yielding up to 90% accurate classification rates. In receiver operating characteristics (ROC) analysis, the decision tree model achieved greater area under the curve (AUC) for differentiation of pilocytic astrocytoma (p = 0.020); and atypical teratoid/rhabdoid tumor ATRT (p = 0.001) from other types of neoplasms compared to the official clinical report. However, neuroradiologists' interpretations had greater accuracy in differentiating metastases (p = 0.001). Among different machine learning algorithms, random forest models yielded the highest accuracy in dichotomized classification of the 5 most common tumor types; and in multiclass differentiation of all tumor types random forest yielded an averaged AUC of 0.961 in training datasets, and 0.873 in validation samples. Our study demonstrates the potential application of machine learning algorithms and decision trees for accurate differentiation of brain tumors based on pretreatment MRI. Using easy to apply and understandable imaging metrics, the proposed decision tree model can help radiologists with differentiation of posterior fossa tumors, especially in tumors with similar qualitative imaging characteristics. In particular, our decision tree model provided more accurate differentiation of pilocytic astrocytomas from ATRT than by neuroradiologists in clinical reads
Recommended from our members
Investigation of Machine Learning Approaches for Traumatic Brain Injury Classification via EEG Assessment in Mice.
Due to the difficulties and complications in the quantitative assessment of traumatic brain injury (TBI) and its increasing relevance in today's world, robust detection of TBI has become more significant than ever. In this work, we investigate several machine learning approaches to assess their performance in classifying electroencephalogram (EEG) data of TBI in a mouse model. Algorithms such as decision trees (DT), random forest (RF), neural network (NN), support vector machine (SVM), K-nearest neighbors (KNN) and convolutional neural network (CNN) were analyzed based on their performance to classify mild TBI (mTBI) data from those of the control group in wake stages for different epoch lengths. Average power in different frequency sub-bands and alpha:theta power ratio in EEG were used as input features for machine learning approaches. Results in this mouse model were promising, suggesting similar approaches may be applicable to detect TBI in humans in practical scenarios
- …