6 research outputs found
Implementation of the COVID-19 vulnerability index across an international network of health care data sets:Collaborative external validation study
Background: SARS-CoV-2 is straining health care systems globally. The burden on hospitals during the pandemic could be reduced by implementing prediction models that can discriminate patients who require hospitalization from those who do not. The COVID-19 vulnerability (C-19) index, a model that predicts which patients will be admitted to hospital for treatment of pneumonia or pneumonia proxies, has been developed and proposed as a valuable tool for decision-making during the pandemic. However, the model is at high risk of bias according to the "prediction model risk of bias assessment" criteria, and it has not been externally validated.Objective: The aim of this study was to externally validate the C-19 index across a range of health care settings to determine how well it broadly predicts hospitalization due to pneumonia in COVID-19 cases.Methods: We followed the Observational Health Data Sciences and Informatics (OHDSI) framework for external validation to assess the reliability of the C-19 index. We evaluated the model on two different target populations, 41,381 patients who presented with SARS-CoV-2 at an outpatient or emergency department visit and 9,429,285 patients who presented with influenza or related symptoms during an outpatient or emergency department visit, to predict their risk of hospitalization with pneumonia during the following 0-30 days. In total, we validated the model across a network of 14 databases spanning the United States, Europe, Australia, and Asia.Results: The internal validation performance of the C-19 index had a C statistic of 0.73, and the calibration was not reported by the authors. When we externally validated it by transporting it to SARS-CoV-2 data, the model obtained C statistics of 0.36, 0.53 (0.473-0.584) and 0.56 (0.488-0.636) on Spanish, US, and South Korean data sets, respectively. The calibration was poor, with the model underestimating risk. When validated on 12 data sets containing influenza patients across the OHDSI network, the C statistics ranged between 0.40 and 0.68.Conclusions: Our results show that the discriminative performance of the C-19 index model is low for influenza cohorts and even worse among patients with COVID-19 in the United States, Spain, and South Korea. These results suggest that C-19 should not be used to aid decision-making during the COVID-19 pandemic. Our findings highlight the importance of performing external validation across a range of settings, especially when a prediction model is being extrapolated to a different population. In the field of prediction, extensive validation is required to create appropriate trust in a model.</p
A method for the cohort selection of cardiovascular disease records from an electronic health record system
A informação coletada de prontuários manuais ou eletrônicos, quando usada para propósitos não diretamente relacionados ao atendimento do paciente, é chamado de uso secundário de dados. A adoção de um sistema de registro eletrônico em saúde (RES) pode facilitar a coleta de dados para uso secundário em pesquisa, aproveitando as melhorias na estruturação e recuperação da informação do paciente, recursos não disponíveis nos tradicionais prontuários em papel. Estudos observacionais baseados no uso secundário de dados têm o potencial de prover evidências para a construção de políticas em saúde. No entanto, a pesquisa através desses dados apresenta problemas característicos a essa fonte de dados. Ao longo do tempo, os sistemas e seus métodos de armazenar dados se tornam obsoletos ou são reestruturados, existem questões de privacidade para o compartilhamento dos dados dos indivíduos e questões relacionadas ao uso desses dados em um contexto diferente do seu propósito original. É necessária uma abordagem sistemática para contornar esses problemas, onde o processamento dos dados é efetuado antes do seu compartilhamento. O objetivo desta Tese é propor um método de extração de coortes de pacientes para estudos observacionais contemplando quatro etapas: (1) mapeamento: a reorganização de dados a partir de um esquema lógico existente em um esquema externo comum sobre o qual é aplicado o método; (2) limpeza: preparação dos dados, levantamento do perfil da base de dados e cálculo dos indicadores de qualidade; (3) seleção da coorte: aplicação dos parâmetros do estudo para seleção de dados longitudinais dos pacientes para a formação da coorte; (4) transformação: derivação de variáveis de estudo que não estão presentes nos dados originais e transformação dos dados longitudinais em dados anonimizados prontos para análise estatística e compartilhamento. O mapeamento é uma etapa específica para cada RES e não é objeto desse trabalho, mas foi realizada para a aplicação do método. As etapas de limpeza, seleção de coorte e transformação são comuns para qualquer RES. A utilização de um esquema externo possibilita o uso parâmetros que facilitam a extração de diferentes coortes para diferentes estudos sem a necessidade de alterações nos algoritmos e garante que a extração seja efetuada sem perda de informações por um processo idempotente. A geração de indicadores e a análise estatística fazem parte do processo e permitem descrever o perfil e qualidade da base de dados e os resultados do estudo. Os algoritmos computacionais e os dados são disponibilizados em um repositório versionado e podem ser usados a qualquer momento para reproduzir os resultados, permitindo a verificação, alterações e correções de erros. Este método foi aplicado no RES utilizado no Instituto do Coração - HC FMUSP, considerando uma base de dados de 1.116.848 pacientes cadastrados no período de 1999 até 2013, resultando em 312.469 registros de pacientes após o processo de limpeza. Para efetuar uma análise da doença cardiovascular em relação ao uso de estatinas na prevenção secundária de eventos evolutivos, foi constituída uma coorte de 27.915 pacientes, segundo os seguintes critérios: período de 2003 a 2013, pacientes do gênero masculino e feminino, maiores de 18 anos, com um diagnóstico no padrão CID-10 (códigos I20 a I25, I64 a I70 e G45) e com registro de no mínimo duas consultas ambulatoriais. Como resultados, cerca de 80% dos pacientes tiveram registro de estatinas, sendo que, 30% tiveram registro de estatinas por mais de 5 anos, 42% não tiveram registro de nenhum evento evolutivo e 9,7% tiveram registro de dois ou mais eventos. O tempo médio de sobrevida calculado pelo método Kaplan-Meier foi de 115 meses (intervalo de confiança 95% 114-116) e os pacientes sem registro de estatinas apresentaram uma maior probabilidade de óbito pelo teste log-rank p = 18 years old, at least 2 outpatient visits, diagnosis of CVD (ICD-10 codes: I20-I25, I64-I70 and G45). Results showed that around 80% of patients had a prescription for statins, of which 30% had a prescription for statins for more than 5 years. 42% had no record of a future event and 9,7% had two or more future events. Survival time was measured using a univariate Kaplan-Meier method resulting in 115 months (CI 95% 114-116) and patients without statin prescription showed a higher probability of death when measured by log-rank (p < 0.001) tests. The conclusion is that the adoption of systematised methods for cohort extraction of patients from EHRs can be a viable approach for conducting epidemiological studie
Implementation of the COVID-19 vulnerability index across an international network of health care data sets: Collaborative external validation study
Background: SARS-CoV-2 is straining health care systems globally. The burden on hospitals during the pandemic could be reduced by implementing prediction models that can discriminate patients who require hospitalization from those who do not. The COVID-19 vulnerability (C-19) index, a model that predicts which patients will be admitted to hospital for treatment of pneumonia or pneumonia proxies, has been developed and proposed as a valuable tool for decision-making during the pandemic. However, the model is at high risk of bias according to the "prediction model risk of bias assessment" criteria, and it has not been externally validated. Objective: The aim of this study was to externally validate the C-19 index across a range of health care settings to determine how well it broadly predicts hospitalization due to pneumonia in COVID-19 cases. Methods: We followed the Observational Health Data Sciences and Informatics (OHDSI) framework for external validation to assess the reliability of the C-19 index. We evaluated the model on two different target populations, 41,381 patients who presented with SARS-CoV-2 at an outpatient or emergency department visit and 9,429,285 patients who presented with influenza or related symptoms during an outpatient or emergency department visit, to predict their risk of hospitalization with pneumonia during the following 0-30 days. In total, we validated the model across a network of 14 databases spanning the United States, Europe, Australia, and Asia. Results: The internal validation performance of the C-19 index had a C statistic of 0.73, and the calibration was not reported by the authors. When we externally validated it by transporting it to SARS-CoV-2 data, the model obtained C statistics of 0.36, 0.53 (0.473-0.584) and 0.56 (0.488-0.636) on Spanish, US, and South Korean data sets, respectively. The calibration was poor, with the model underestimating risk. When validated on 12 data sets containing influenza patients across the OHDSI network, the C statistics ranged between 0.40 and 0.68. Conclusions: Our results show that the discriminative performance of the C-19 index model is low for influenza cohorts and even worse among patients with COVID-19 in the United States, Spain, and South Korea. These results suggest that C-19 should not be used to aid decision-making during the COVID-19 pandemic. Our findings highlight the importance of performing external validation across a range of settings, especially when a prediction model is being extrapolated to a different population. In the field of prediction, extensive validation is required to create appropriate trust in a model
NEOTROPICAL ALIEN MAMMALS: a data set of occurrence and abundance of alien mammals in the Neotropics
Biological invasion is one of the main threats to native biodiversity. For a species to become invasive, it must be voluntarily or involuntarily introduced by humans into a nonnative habitat. Mammals were among first taxa to be introduced worldwide for game, meat, and labor, yet the number of species introduced in the Neotropics remains unknown. In this data set, we make available occurrence and abundance data on mammal species that (1) transposed a geographical barrier and (2) were voluntarily or involuntarily introduced by humans into the Neotropics. Our data set is composed of 73,738 historical and current georeferenced records on alien mammal species of which around 96% correspond to occurrence data on 77 species belonging to eight orders and 26 families. Data cover 26 continental countries in the Neotropics, ranging from Mexico and its frontier regions (southern Florida and coastal-central Florida in the southeast United States) to Argentina, Paraguay, Chile, and Uruguay, and the 13 countries of Caribbean islands. Our data set also includes neotropical species (e.g., Callithrix sp., Myocastor coypus, Nasua nasua) considered alien in particular areas of Neotropics. The most numerous species in terms of records are from Bos sp. (n = 37,782), Sus scrofa (n = 6,730), and Canis familiaris (n = 10,084); 17 species were represented by only one record (e.g., Syncerus caffer, Cervus timorensis, Cervus unicolor, Canis latrans). Primates have the highest number of species in the data set (n = 20 species), partly because of uncertainties regarding taxonomic identification of the genera Callithrix, which includes the species Callithrix aurita, Callithrix flaviceps, Callithrix geoffroyi, Callithrix jacchus, Callithrix kuhlii, Callithrix penicillata, and their hybrids. This unique data set will be a valuable source of information on invasion risk assessments, biodiversity redistribution and conservation-related research. There are no copyright restrictions. Please cite this data paper when using the data in publications. We also request that researchers and teachers inform us on how they are using the data