25,157 research outputs found
Network On Network for Tabular Data Classification in Real-world Applications
Tabular data is the most common data format adopted by our customers ranging
from retail, finance to E-commerce, and tabular data classification plays an
essential role to their businesses. In this paper, we present Network On
Network (NON), a practical tabular data classification model based on deep
neural network to provide accurate predictions. Various deep methods have been
proposed and promising progress has been made. However, most of them use
operations like neural network and factorization machines to fuse the
embeddings of different features directly, and linearly combine the outputs of
those operations to get the final prediction. As a result, the intra-field
information and the non-linear interactions between those operations (e.g.
neural network and factorization machines) are ignored. Intra-field information
is the information that features inside each field belong to the same field.
NON is proposed to take full advantage of intra-field information and
non-linear interactions. It consists of three components: field-wise network at
the bottom to capture the intra-field information, across field network in the
middle to choose suitable operations data-drivenly, and operation fusion
network on the top to fuse outputs of the chosen operations deeply. Extensive
experiments on six real-world datasets demonstrate NON can outperform the
state-of-the-art models significantly. Furthermore, both qualitative and
quantitative study of the features in the embedding space show NON can capture
intra-field information effectively
TreeMAN: Tree-enhanced Multimodal Attention Network for ICD Coding
ICD coding is designed to assign the disease codes to electronic health
records (EHRs) upon discharge, which is crucial for billing and clinical
statistics. In an attempt to improve the effectiveness and efficiency of manual
coding, many methods have been proposed to automatically predict ICD codes from
clinical notes. However, most previous works ignore the decisive information
contained in structured medical data in EHRs, which is hard to be captured from
the noisy clinical notes. In this paper, we propose a Tree-enhanced Multimodal
Attention Network (TreeMAN) to fuse tabular features and textual features into
multimodal representations by enhancing the text representations with
tree-based features via the attention mechanism. Tree-based features are
constructed according to decision trees learned from structured multimodal
medical data, which capture the decisive information about ICD coding. We can
apply the same multi-label classifier from previous text models to the
multimodal representations to predict ICD codes. Experiments on two MIMIC
datasets show that our method outperforms prior state-of-the-art ICD coding
approaches. The code is available at https://github.com/liu-zichen/TreeMAN
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care
The COVID-19 pandemic has posed a heavy burden to the healthcare system
worldwide and caused huge social disruption and economic loss. Many deep
learning models have been proposed to conduct clinical predictive tasks such as
mortality prediction for COVID-19 patients in intensive care units using
Electronic Health Record (EHR) data. Despite their initial success in certain
clinical applications, there is currently a lack of benchmarking results to
achieve a fair comparison so that we can select the optimal model for clinical
use. Furthermore, there is a discrepancy between the formulation of traditional
prediction tasks and real-world clinical practice in intensive care. To fill
these gaps, we propose two clinical prediction tasks, Outcome-specific
length-of-stay prediction and Early mortality prediction for COVID-19 patients
in intensive care units. The two tasks are adapted from the naive
length-of-stay and mortality prediction tasks to accommodate the clinical
practice for COVID-19 patients. We propose fair, detailed, open-source
data-preprocessing pipelines and evaluate 17 state-of-the-art predictive models
on two tasks, including 5 machine learning models, 6 basic deep learning models
and 6 deep learning predictive models specifically designed for EHR data. We
provide benchmarking results using data from two real-world COVID-19 EHR
datasets. One dataset is publicly available without needing any inquiry and
another dataset can be accessed on request. We provide fair, reproducible
benchmarking results for two tasks. We deploy all experiment results and models
on an online platform. We also allow clinicians and researchers to upload their
data to the platform and get quick prediction results using our trained models.
We hope our efforts can further facilitate deep learning and machine learning
research for COVID-19 predictive modeling.Comment: Junyi Gao, Yinghao Zhu and Wenqing Wang contributed equall
A Simple and Interpretable Predictive Model for Healthcare
Deep Learning based models are currently dominating most state-of-the-art
solutions for disease prediction. Existing works employ RNNs along with
multiple levels of attention mechanisms to provide interpretability. These deep
learning models, with trainable parameters running into millions, require huge
amounts of compute and data to train and deploy. These requirements are
sometimes so huge that they render usage of such models as unfeasible. We
address these challenges by developing a simpler yet interpretable non-deep
learning based model for application to EHR data. We model and showcase our
work's results on the task of predicting first occurrence of a diagnosis, often
overlooked in existing works. We push the capabilities of a tree based model
and come up with a strong baseline for more sophisticated models. Its
performance shows an improvement over deep learning based solutions (both, with
and without the first-occurrence constraint) all the while maintaining
interpretability.Comment: 7 pages, 10 figure
Diagnosis and Prognosis of Occupational disorders based on Machine Learn- ing Techniques applied to Occupational Profiles
Work-related disorders have a global influence on people’s well-being and quality of life
and are a financial burden for organizations because they reduce productivity, increase
absenteeism, and promote early retirement. Work-related musculoskeletal disorders, in
particular, represent a significant fraction of the total in all occupational contexts. In
automotive and industrial settings where workers are exposed to work-related muscu-
loskeletal disorders risk factors, occupational physicians are responsible for monitoring
workers’ health protection profiles. Occupational technicians report in the Occupational
Health Protection Profiles database to understand which exposure to occupational work-
related musculoskeletal disorder risk factors should be ensured for a given worker. Occu-
pational Health Protection Profiles databases describe the occupational physician states,
and which exposure the physicians considers necessary to ensure the worker’s health
protection in terms of their functional work ability. The application of Human-Centered
explainable artificial intelligence can support the decision making to go from worker’s
Functional Work Ability to explanations by integrating explainability into medical (re-
striction) and supporting in two decision contexts: prognosis and diagnosis of individual,
work related and organizational risk condition. Although previous machine learning ap-
proaches provided good predictions, their application in an actual occupational setting
is limited because their predictions are difficult to interpret and hence, not actionable.
In this thesis, injured body parts in which the ability changed in a worker’s functional
work ability status are targeted. On the one hand, artificial intelligence algorithms can
help technical teams, occupational physicians, and ergonomists determine a worker’s
workplace risk via the diagnosis and prognosis of body part(s) injuries; on the other hand,
these approaches can help prevent work-related musculoskeletal disorders by identifying
which processes are lacking in working condition improvement and which workplaces
have a better match between the remaining functional work abilities. A sample of 2025
for the prognosis part (from the years of 2019 to 2020) and 7857 for the prognosis part
of Occupational Health Protection Profiles based on Functional Work Ability textual re-
ports in the Portuguese language in automotive industry factory. Machine learning-based Natural Language Processing methods were implemented to extract standardized infor-
mation. The prognosis and diagnosis of Occupational Health Protection Profiles factors
were developed in reliable Human-Centered explainable artificial intelligence system to
promote a trustworthy Human-Centered explainable artificial intelligence system (enti-
tled Industrial microErgo application). The most suitable regression models to predict
the next medical appointment for the injured body regions were the models based on
CatBoost regression, with R square and an RMSLE of 0.84 and 1.23 weeks, respectively.
In parallel, CatBoost’s best regression model for most body parts is the prediction of
the next injured body parts based on these two errors. This information can help tech-
nical industrial teams understand potential risk factors for Occupational Health Protec-
tion Profiles and identify warning signs of the early stages of musculoskeletal disorders.Os transtornos relacionados ao trabalho têm influência global no bem-estar e na quali-
dade de vida das pessoas e são um ônus financeiro para as organizações, pois reduzem a
produtividade, aumentam o absenteÃsmo e promovem a aposentadoria precoce. Os distúr-
bios osteomusculares relacionados ao trabalho, em particular, representam uma fração
significativa do total em todos os contextos ocupacionais. Em ambientes automotivos e
industriais onde os trabalhadores estão expostos a fatores de risco de distúrbios osteomus-
culares relacionados ao trabalho, os médicos do trabalho são responsáveis por monitorar
os perfis de proteção à saúde dos trabalhadores. Os técnicos do trabalho reportam-se Ã
base de dados dos Perfis de Proteção da Saúde Ocupacional para compreender quais os
fatores de risco de exposição a perturbações músculo-esqueléticas relacionadas com o tra-
balho que devem ser assegurados para um determinado trabalhador. As bases de dados
de Perfis de Proteção à Saúde Ocupacional descrevem os estados do médico do trabalho
e quais exposições os médicos consideram necessária para garantir a proteção da saúde
do trabalhador em termos de sua capacidade funcional para o trabalho. A aplicação da
inteligência artificial explicável centrada no ser humano pode apoiar a tomada de decisão
para ir da capacidade funcional de trabalho do trabalhador às explicações, integrando a
explicabilidade à médica (restrição) e apoiando em dois contextos de decisão: prognóstico
e diagnóstico da condição de risco individual, relacionado ao trabalho e organizacional .
Embora as abordagens anteriores de aprendizado de máquina tenham fornecido boas pre-
visões, sua aplicação em um ambiente ocupacional real é limitada porque suas previsões
são difÃceis de interpretar e portanto, não acionável. Nesta tese, as partes do corpo lesiona-
das nas quais a habilidade mudou no estado de capacidade funcional para o trabalho do
trabalhador são visadas. Por um lado, os algoritmos de inteligência artificial podem aju-
dar as equipes técnicas, médicos do trabalho e ergonomistas a determinar o risco no local
de trabalho de um trabalhador por meio do diagnóstico e prognóstico de lesões em partes
do corpo; por outro lado, essas abordagens podem ajudar a prevenir distúrbios muscu-
loesqueléticos relacionados ao trabalho, identificando quais processos estão faltando na
melhoria das condições de trabalho e quais locais de trabalho têm uma melhor correspon-
dência entre as habilidades funcionais restantes do trabalho. Para esta tese, foi utilizada uma base de dados com Perfis de Proteção à Saúde Ocupacional, que se baseiam em relató-
rios textuais de Aptidão para o Trabalho em lÃngua portuguesa, de uma fábrica da indús-
tria automóvel (Auto Europa). Uma amostra de 2025 ficheiros foi utilizada para a parte de
prognóstico (de 2019 a 2020) e uma amostra de 7857 ficheiros foi utilizada para a parte de
diagnóstico. . Aprendizado de máquina- métodos baseados em Processamento de Lingua-
gem Natural foram implementados para extrair informações padronizadas. O prognóstico
e diagnóstico dos fatores de Perfis de Proteção à Saúde Ocupacional foram desenvolvidos
em um sistema confiável de inteligência artificial explicável centrado no ser humano (inti-
tulado Industrial microErgo application). Os modelos de regressão mais adequados para
prever a próxima consulta médica para as regiões do corpo lesionadas foram os modelos
baseados na regressão CatBoost, com R quadrado e RMSLE de 0,84 e 1,23 semanas, res-
pectivamente. Em paralelo, a previsão das próximas partes do corpo lesionadas com base
nesses dois erros relatados pelo CatBoost como o melhor modelo de regressão para a mai-
oria das partes do corpo. Essas informações podem ajudar as equipes técnicas industriais
a entender os possÃveis fatores de risco para os Perfis de Proteção à Saúde Ocupacio-
nal e identificar sinais de alerta dos estágios iniciais de distúrbios musculoesqueléticos
- …