57,007 research outputs found
Multimodal Machine Learning for Automated ICD Coding
This study presents a multimodal machine learning model to predict ICD-10
diagnostic codes. We developed separate machine learning models that can handle
data from different modalities, including unstructured text, semi-structured
text and structured tabular data. We further employed an ensemble method to
integrate all modality-specific models to generate ICD-10 codes. Key evidence
was also extracted to make our prediction more convincing and explainable. We
used the Medical Information Mart for Intensive Care III (MIMIC -III) dataset
to validate our approach. For ICD code prediction, our best-performing model
(micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other
baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and
Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability,
our approach achieves a Jaccard Similarity Coefficient (JSC) of 0.1806 on text
data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780
and 0.5002 respectively.Comment: Machine Learning for Healthcare 201
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Rationale in Development Chat Messages: An Exploratory Study
Chat messages of development teams play an increasingly significant role in
software development, having replaced emails in some cases. Chat messages
contain information about discussed issues, considered alternatives and
argumentation leading to the decisions made during software development. These
elements, defined as rationale, are invaluable during software evolution for
documenting and reusing development knowledge. Rationale is also essential for
coping with changes and for effective maintenance of the software system.
However, exploiting the rationale hidden in the chat messages is challenging
due to the high volume of unstructured messages covering a wide range of
topics. This work presents the results of an exploratory study examining the
frequency of rationale in chat messages, the completeness of the available
rationale and the potential of automatic techniques for rationale extraction.
For this purpose, we apply content analysis and machine learning techniques on
more than 8,700 chat messages from three software development projects. Our
results show that chat messages are a rich source of rationale and that machine
learning is a promising technique for detecting rationale and identifying
different rationale elements.Comment: 11 pages, 6 figures. The 14th International Conference on Mining
Software Repositories (MSR'17
Analyzing collaborative learning processes automatically
In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a publicly available tool set called TagHelper tools. Analyzing the variety of pedagogically valuable facets of learners’ interactions is a time consuming and effortful process. Improving automated analyses of such highly valued processes of collaborative learning by adapting and applying recent text classification technologies would make it a less arduous task to obtain insights from corpus data. This endeavor also holds the potential for enabling substantially improved on-line instruction both by providing teachers and facilitators with reports about the groups they are moderating and by triggering context sensitive collaborative learning support on an as-needed basis. In this article, we report on an interdisciplinary research project, which has been investigating the effectiveness of applying text classification technology to a large CSCL corpus that has been analyzed by human coders using a theory-based multidimensional coding scheme. We report promising results and include an in-depth discussion of important issues such as reliability, validity, and efficiency that should be considered when deciding on the appropriateness of adopting a new technology such as TagHelper tools. One major technical contribution of this work is a demonstration that an important piece of the work towards making text classification technology effective for this purpose is designing and building linguistic pattern detectors, otherwise known as features, that can be extracted reliably from texts and that have high predictive power for the categories of discourse actions that the CSCL community is interested in
Mapping Subsets of Scholarly Information
We illustrate the use of machine learning techniques to analyze, structure,
maintain, and evolve a large online corpus of academic literature. An emerging
field of research can be identified as part of an existing corpus, permitting
the implementation of a more coherent community structure for its
practitioners.Comment: 10 pages, 4 figures, presented at Arthur M. Sackler Colloquium on
"Mapping Knowledge Domains", 9--11 May 2003, Beckman Center, Irvine, CA,
proceedings to appear in PNA
- …