3,363 research outputs found
Recommended from our members
Learning Semantic Annotations for Tabular Data
The usefulness of tabular data such as web tables critically depends on understanding their semantics. This study focuses on column type prediction for tables without any meta data. Unlike traditional lexical matching-based methods, we propose a deep prediction model that can fully exploit a table's contextual semantics, including table locality features learned by a Hybrid Neural Network (HNN), and inter-column semantics features learned by a knowledge base (KB) lookup and query answering algorithm.It exhibits good performance not only on individual table sets, but also when transferring from one table set to another
Recommended from our members
SemTab 2019: Resources to Benchmark Tabular Data to Knowledge Graph Matching Systems
Tabular data to Knowledge Graph matching is the process of assigning semantic tags from knowledge graphs (e.g., Wikidata or DBpedia) to the elements of a table. This task is a challenging problem for various reasons, including the lack of metadata (e.g., table and column names), the noisiness, heterogeneity, incompleteness and ambiguity in the data. The results of this task provide significant insights about potentially highly valuable tabular data, as recent works have shown, enabling a new family of data analytics and data science applications. Despite significant amount of work on various flavors of this problem, there is a lack of a common framework to conduct a systematic evaluation of state-of-the-art systems. The creation of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) aims at filling this gap. In this paper, we report about the datasets, infrastructure and lessons learned from the first edition of the SemTab challenge
Multimodal Machine Learning for Automated ICD Coding
This study presents a multimodal machine learning model to predict ICD-10
diagnostic codes. We developed separate machine learning models that can handle
data from different modalities, including unstructured text, semi-structured
text and structured tabular data. We further employed an ensemble method to
integrate all modality-specific models to generate ICD-10 codes. Key evidence
was also extracted to make our prediction more convincing and explainable. We
used the Medical Information Mart for Intensive Care III (MIMIC -III) dataset
to validate our approach. For ICD code prediction, our best-performing model
(micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other
baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and
Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability,
our approach achieves a Jaccard Similarity Coefficient (JSC) of 0.1806 on text
data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780
and 0.5002 respectively.Comment: Machine Learning for Healthcare 201
- …