6,332 research outputs found
Automatic Emphysema Detection using Weakly Labeled HRCT Lung Images
A method for automatically quantifying emphysema regions using
High-Resolution Computed Tomography (HRCT) scans of patients with chronic
obstructive pulmonary disease (COPD) that does not require manually annotated
scans for training is presented. HRCT scans of controls and of COPD patients
with diverse disease severity are acquired at two different centers. Textural
features from co-occurrence matrices and Gaussian filter banks are used to
characterize the lung parenchyma in the scans. Two robust versions of multiple
instance learning (MIL) classifiers, miSVM and MILES, are investigated. The
classifiers are trained with the weak labels extracted from the forced
expiratory volume in one minute (FEV) and diffusing capacity of the lungs
for carbon monoxide (DLCO). At test time, the classifiers output a patient
label indicating overall COPD diagnosis and local labels indicating the
presence of emphysema. The classifier performance is compared with manual
annotations by two radiologists, a classical density based method, and
pulmonary function tests (PFTs). The miSVM classifier performed better than
MILES on both patient and emphysema classification. The classifier has a
stronger correlation with PFT than the density based method, the percentage of
emphysema in the intersection of annotations from both radiologists, and the
percentage of emphysema annotated by one of the radiologists. The correlation
between the classifier and the PFT is only outperformed by the second
radiologist. The method is therefore promising for facilitating assessment of
emphysema and reducing inter-observer variability.Comment: Accepted at PLoS ON
Multimodal Machine Learning for Automated ICD Coding
This study presents a multimodal machine learning model to predict ICD-10
diagnostic codes. We developed separate machine learning models that can handle
data from different modalities, including unstructured text, semi-structured
text and structured tabular data. We further employed an ensemble method to
integrate all modality-specific models to generate ICD-10 codes. Key evidence
was also extracted to make our prediction more convincing and explainable. We
used the Medical Information Mart for Intensive Care III (MIMIC -III) dataset
to validate our approach. For ICD code prediction, our best-performing model
(micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other
baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and
Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability,
our approach achieves a Jaccard Similarity Coefficient (JSC) of 0.1806 on text
data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780
and 0.5002 respectively.Comment: Machine Learning for Healthcare 201
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
PadChest: A large chest x-ray image dataset with multi-label annotated reports
We present a labeled large-scale, high resolution chest x-ray dataset for the
automated exploration of medical images along with their associated reports.
This dataset includes more than 160,000 images obtained from 67,000 patients
that were interpreted and reported by radiologists at Hospital San Juan
Hospital (Spain) from 2009 to 2017, covering six different position views and
additional information on image acquisition and patient demography. The reports
were labeled with 174 different radiographic findings, 19 differential
diagnoses and 104 anatomic locations organized as a hierarchical taxonomy and
mapped onto standard Unified Medical Language System (UMLS) terminology. Of
these reports, 27% were manually annotated by trained physicians and the
remaining set was labeled using a supervised method based on a recurrent neural
network with attention mechanisms. The labels generated were then validated in
an independent test set achieving a 0.93 Micro-F1 score. To the best of our
knowledge, this is one of the largest public chest x-ray database suitable for
training supervised models concerning radiographs, and the first to contain
radiographic reports in Spanish. The PadChest dataset can be downloaded from
http://bimcv.cipf.es/bimcv-projects/padchest/
Clinical Assistant Diagnosis for Electronic Medical Record Based on Convolutional Neural Network
Automatically extracting useful information from electronic medical records
along with conducting disease diagnoses is a promising task for both clinical
decision support(CDS) and neural language processing(NLP). Most of the existing
systems are based on artificially constructed knowledge bases, and then
auxiliary diagnosis is done by rule matching. In this study, we present a
clinical intelligent decision approach based on Convolutional Neural
Networks(CNN), which can automatically extract high-level semantic information
of electronic medical records and then perform automatic diagnosis without
artificial construction of rules or knowledge bases. We use collected 18,590
copies of the real-world clinical electronic medical records to train and test
the proposed model. Experimental results show that the proposed model can
achieve 98.67\% accuracy and 96.02\% recall, which strongly supports that using
convolutional neural network to automatically learn high-level semantic
features of electronic medical records and then conduct assist diagnosis is
feasible and effective.Comment: 9 pages, 4 figures, Accepted by Scientific Report
- …