Search CORE

2,249 research outputs found

A comparative study of machine learning methods for verbal autopsy text classification

Author: Atwell ES
Danso SO
Johnson O
Publication venue: IJCSI Press
Publication date: 01/01/2013
Field of study

A Verbal Autopsy is the record of an interview about the circumstances of an uncertified death. In developing countries, if a death occurs away from health facilities, a field-worker interviews a relative of the deceased about the circumstances of the death; this Verbal Autopsy can be reviewed offsite. We report on a comparative study of the processes involved in Text Classification applied to classifying Cause of Death: feature value representation; machine learning classification algorithms; and feature reduction strategies in order to identify the suitable approaches applicable to the classification of Verbal Autopsy text. We demonstrate that normalised term frequency and the standard TFiDF achieve comparable performance across a number of classifiers. The results also show Support Vector Machine is superior to other classification algorithms employed in this research. Finally, we demonstrate the effectiveness of employing a ’locally-semisupervised’ feature reduction strategy in order to increase performance accuracy

arXiv.org e-Print Archive

CiteSeerX

White Rose Research Online

Linguistic and statistically derived features for cause of death prediction from verbal autopsy text

Author: A. Moschitti
A.M. Cohen
C.J.L. Murray
E. Loper
G. King
K. Kahn
M. Gamon
P. Byass
P.D. Turney
S. Matsumoto
S. Pakhomov
T. Dunning
W.N. Francis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Automatic Text Classification (ATC) is an emerging technology with economic importance given the unprecedented growth of text data. This paper reports on work in progress to develop methods for predicting Cause of Death from Verbal Autopsy (VA) documents recommended for use in low-income countries by the World Health Organisation. VA documents contain both coded data and open narrative. The task is formulated as a Text Classification problem and explores various combinations of linguistic and statistical approaches to determine how these may improve on the standard bag-of-words approach using a dataset of over 6400 VA documents that were manually annotated with cause of death. We demonstrate that a significant improvement of prediction accuracy can be obtained through a novel combination of statistical and linguistic features derived from the VA text. The paper explores the methods by which ATC may leads to improved accuracy in Cause of Death prediction

Crossref

White Rose Research Online

Cause of death estimation from verbal autopsies: is the open response redundant or synergistic?

Author: Casillas A.
Cejudo A.
Cobos Muñoz D.
Oronoz M.
Pérez A.
Publication venue
Publication date: 01/01/2023
Field of study

Civil registration and vital statistics systems capture birth and death events to compile vital statistics and to provide legal rights to citizens. Vital statistics are a key factor in promoting public health policies and the health of the population. Medical certification of cause of death is the preferred source of cause of death information. However, two thirds of all deaths worldwide are not captured in routine mortality information systems and their cause of death is unknown. Verbal autopsy is an interim solution for estimating the cause of death distribution at the population level in the absence of medical certification. A Verbal Autopsy (VA) consists of an interview with the relative or the caregiver of the deceased. The VA includes both Closed Questions (CQs) with structured answer options, and an Open Response (OR) consisting of a free narrative of the events expressed in natural language and without any pre-determined structure. There are a number of automated systems to analyze the CQs to obtain cause specific mortality fractions with limited performance. We hypothesize that the incorporation of the text provided by the OR might convey relevant information to discern the CoD. The experimental layout compares existing Computer Coding Verbal Autopsy methods such as Tariff 2.0 with other approaches well suited to the processing of structured inputs as is the case of the CQs. Next, alternative approaches based on language models are employed to analyze the OR. Finally, we propose a new method with a bi-modal input that combines the CQs and the OR. Empirical results corroborated that the CoD prediction capability of the Tariff 2.0 algorithm is outperformed by our method taking into account the valuable information conveyed by the OR. As an added value, with this work we made available the software to enable the reproducibility of the results attained with a version implemented in R to make the comparison with Tariff 2.0 evident

edoc

Cause of Death estimation from Verbal Autopsies: Is the Open Response redundant or synergistic?

Author: Casillas Rubio Arantza
Cejudo Taramona Ander
Cobos Daniel
Oronoz Anchordoqui Maite
Pérez Ramírez Alicia
Publication venue: Elsevier
Publication date: 01/09/2023
Field of study

Archivo Digital para la Docencia y la Investigación

Text Analytics to Predict Time and Cause of Death from Verbal Autopsies

Author: Danso Samuel Odei
Publication venue: University of Leeds
Publication date: 01/09/2015
Field of study

This thesis describes the first Text Analytics approach to predicting Causes of Death (CoD) from Verbal Autopsies (VA). VA is an alternative technique recommended by the World Health Organisation for ascertaining CoD in low and middle-income countries (LMIC). CoD information is vitally important in the provision of healthcare. CoD information from VA can be obtained via two main approaches: manual, also referred to as the physician-review and automatic. The automatic-based approach is an active research area due to its efficiency and cost effectiveness over the manual approach. VA contains both closed responses and open narrative text. However, the open narrative text has been ignored by the state-of-art automatic approaches and this remains a challenge and an important research issue. We hypothesise that it is feasible to predict CoD from the narratives of VA. We further contend that an automatic approach that could utilise the information contained in both narrative and closed response text of VA could lead to an improved prediction accuracy of CoD. This research has been formulated as a Text Classification problem, which employs Corpus and Computational Linguistics, Natural Language Processing and Machine Learning techniques to automatically classify VA documents according to CoD. Firstly, the research uses a VA corpus built from a sample collection of over 11,400 VA documents collected during a 10 year period in Ghana, West Africa. About 80 per cent of these documents have been annotated with CoD by medical experts. Secondly, we design experiments to identify Machine Learning techniques (algorithm, feature representation scheme, and feature reduction strategy) suitable for classifying VA open narratives (VAModel1). Thirdly, we propose novel methods of extracting features to build a model that predicts CoD from VA narratives using the annotated VA corpus as training and testing set. Furthermore, we develop two additional models: only closed responses based (VAModel2); and a hybrid of closed and open narrative based model (VAModel3). Our VAModel1 performs reasonably better than our baseline model, suggesting the feasibility of predicting the CoD from the VA open narratives. Overall, VAModel3 performance was observed to achieve better performance than VAModel1 but not significantly better than VAModel2. Also, in terms of reliability, VAModel1 obtained a moderate agreement (kappa score = 0.4) when compared with the gold standard– medical experts (average annotation agreement between medical experts, kappa score= 0.64). Furthermore, an acceptable agreement was obtained for VAModel2 (kappa score =0.71) and VAModel3 (kappa score =0.75), suggesting the reliability of these two models is better than medical experts. Also, a detailed analysis suggested that combining information from narratives and closed responses leads to an increase in performance for some CoD categories whereas information obtained from the closed responses part is enough for other CoD categories. Our research provides an alternative automatic approach to predicting CoD from VA, which is essential for LMIC. Therefore, further research into various aspects of the modelling process could improve the current performance of automatically predicting CoD from VAs

White Rose E-theses Online

Using Verbal Autopsy to Measure Causes of Death: the Comparative Performance of Existing Methods.

Author: A Khosravi
Aarti Kumar
Abdullah H Baqui
Abraham D Flaxman
AD Flaxman
AD Flaxman
AD Lopez
AD Ngo
Alan D Lopez
Alireza Vahdatpour
Andrea Stewart
Arup Dutta
B Hernández
B Kodio
Bernardo Hernández
Bruce Neal
C Engmann
C Rao
CD Mathers
Charles Atkinson
Christopher JL Murray
CJ Murray
CJ Murray
CJ Murray
CJ Murray
D Campos
D Chandramohan
David Phillips
Devarsetty Praveen
DG Bassani
Diozele Sanvictores
Dolores Ramírez-Villalobos
Emily Dantzer
F Baiden
G Yang
G Yang
Gary L Darmstadt
Hazel Remolador
Henry D Kalter
HR Chowdhury
Ian Riley
IM Gaskin
Instituto Nacional de Estatística
J Pattaraarchachai
KJ Foreman
Lalit Dandona
LT Ruzicka
Marilla Lucero
MC Asuzu
Michael K Freeman
Minerva Romero
Mwanaidi Said
N Dhingra
P Byass
P Jha
P Mahapatra
Peter Serina
PW Setel
R Development Core Team
R Joshi
R Kumar
R Lozano
R Lozano
Rafael Lozano
RM Bell
Robert Black
Rohina Joshi
Said Mohammed Ali
Sara Gómez
Saurabh Mehta
SK Morris
SL James
Spencer L James
Summer Lockett Ohno
Sunil Sazawal
TN Krishnamurti
United Nations
Usha Dhingra
V Gajalakshmi
V Gajalakshmi
Veronica Tallo
Vinita Das
Vishwajeet Kumar
W Polprasert
Wafaie Fawzi
Zul Premji
Publication venue: Biomed Central
Publication date: 01/01/2014
Field of study

Monitoring progress with disease and injury reduction in many populations will require widespread use of verbal autopsy (VA). Multiple methods have been developed for assigning cause of death from a VA but their application is restricted by uncertainty about their reliability. We investigated the validity of five automated VA methods for assigning cause of death: InterVA-4, Random Forest (RF), Simplified Symptom Pattern (SSP), Tariff method (Tariff), and King-Lu (KL), in addition to physician review of VA forms (PCVA), based on 12,535 cases from diverse populations for which the true cause of death had been reliably established. For adults, children, neonates and stillbirths, performance was assessed separately for individuals using sensitivity, specificity, Kappa, and chance-corrected concordance (CCC) and for populations using cause specific mortality fraction (CSMF) accuracy, with and without additional diagnostic information from prior contact with health services. A total of 500 train-test splits were used to ensure that results are robust to variation in the underlying cause of death distribution. Three automated diagnostic methods, Tariff, SSP, and RF, but not InterVA-4, performed better than physician review in all age groups, study sites, and for the majority of causes of death studied. For adults, CSMF accuracy ranged from 0.764 to 0.770, compared with 0.680 for PCVA and 0.625 for InterVA; CCC varied from 49.2% to 54.1%, compared with 42.2% for PCVA, and 23.8% for InterVA. For children, CSMF accuracy was 0.783 for Tariff, 0.678 for PCVA, and 0.520 for InterVA; CCC was 52.5% for Tariff, 44.5% for PCVA, and 30.3% for InterVA. For neonates, CSMF accuracy was 0.817 for Tariff, 0.719 for PCVA, and 0.629 for InterVA; CCC varied from 47.3% to 50.3% for the three automated methods, 29.3% for PCVA, and 19.4% for InterVA. The method with the highest sensitivity for a specific cause varied by cause. Physician review of verbal autopsy questionnaires is less accurate than automated methods in determining both individual and population causes of death. Overall, Tariff performs as well or better than other methods and should be widely applied in routine mortality surveillance systems with poor cause of death certification practices

Crossref

Harvard University - DASH

Springer - Publisher Connector

University of Melbourne Institutional Repository

Automated verbal autopsy classification: using one-against-all ensemble method and Naïve Bayes classifier [version 2; referees: 2 approved]

Author: Ayse Bener
Patrycja Kolpak
Prabhat Jha
Syed Shariyar Murtaza
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2019
Field of study

Verbal autopsy (VA) deals with post-mortem surveys about deaths, mostly in low and middle income countries, where the majority of deaths occur at home rather than a hospital, for retrospective assignment of causes of death (COD) and subsequently evidence-based health system strengthening. Automated algorithms for VA COD assignment have been developed and their performance has been assessed against physician and clinical diagnoses. Since the performance of automated classification methods remains low, we aimed to enhance the Naïve Bayes Classifier (NBC) algorithm to produce better ranked COD classifications on 26,766 deaths from four globally diverse VA datasets compared to some of the leading VA classification methods, namely Tariff, InterVA-4, InSilicoVA and NBC. We used a different strategy, by training multiple NBC algorithms using the one-against-all approach (OAA-NBC). To compare performance, we computed the cumulative cause-specific mortality fraction (CSMF) accuracies for population-level agreement from rank one to five COD classifications. To assess individual-level COD assignments, cumulative partially-chance corrected concordance (PCCC) and sensitivity was measured for up to five ranked classifications. Overall results show that OAA-NBC consistently assigns CODs that are the most alike physician and clinical COD assignments compared to some of the leading algorithms based on the cumulative CSMF accuracy, PCCC and sensitivity scores. The results demonstrate that our approach improves the performance of classification (sensitivity) by between 6% and 8% compared with other VA algorithms. Population-level agreements for OAA-NBC and NBC were found to be similar or higher than the other algorithms used in the experiments. Although OAA-NBC still requires improvement for individual-level COD assignment, the one-against-all approach improved its ability to assign CODs that more closely resemble physician or clinical COD classifications compared to some of the other leading VA classifiers

Directory of Open Access Journals

Recommended from our members

Speech and language markers of neurodegeneration: a call for global equity

Author: Blasi Damián E
de Leon Jessica
García Adolfo M
Gorno-Tempini Maria Luisa
Tee Boon Lead
Publication venue: eScholarship, University of California
Publication date: 01/12/2023
Field of study

In the field of neurodegeneration, speech and language assessments are useful for diagnosing aphasic syndromes and for characterizing other disorders. As a complement to classic tests, scalable and low-cost digital tools can capture relevant anomalies automatically, potentially supporting the quest for globally equitable markers of brain health. However, this promise remains unfulfilled due to limited linguistic diversity in scientific works and clinical instruments. Here we argue for cross-linguistic research as a core strategy to counter this problem. First, we survey the contributions of linguistic assessments in the study of primary progressive aphasia and the three most prevalent neurodegenerative disorders worldwide-Alzheimer's disease, Parkinson's disease, and behavioural variant frontotemporal dementia. Second, we address two forms of linguistic unfairness in the literature: the neglect of most of the world's 7000 languages and the preponderance of English-speaking cohorts. Third, we review studies showing that linguistic dysfunctions in a given disorder may vary depending on the patient's language and that English speakers offer a suboptimal benchmark for other language groups. Finally, we highlight different approaches, tools and initiatives for cross-linguistic research, identifying core challenges for their deployment. Overall, we seek to inspire timely actions to counter a looming source of inequity in behavioural neurology

eScholarship - University of California

Extracción de información de las autopsias verbales

Author: Cejudo Taramona Ander
Publication venue
Publication date: 01/07/2022
Field of study

Civil registration and vital statistics registers births and deaths and compiles statistics. These statistics are a key factor to promote public health policies, register longevity and the health of the population. Death certificates issued in health institutions are the main source to collect the cause of death (CoD). Nevertheless, such counts are not straightforward, indeed, it is estimated that 65% of deaths in the world remain uncounted [D’Ambruoso, 2013]. In places where there is no access to health facilities and, hence, to death certificates, the World Health Organization (WHO) designed the Verbal Autopsy as an instrument to collect evidences about the CoD statistics. A Verbal Autopsy (VA) consists of an interview to the relative or the caregiver of the deceased. The VA conveys both an open response (OR) and the closed questions (CQs). On the one hand, the OR consists of a free narrative of the events expressed in natural language and without any pre-determined structure. On the other hand, the CQs are a set of a few hundreds controlled questions each with a small number of permitted answers (e.g. yes/no). InterVA is a suite of computer models and it is included in the WHO 2016 instrument, which gathers several algorithms chosen by the WHO for the analysis of verbal autopsies. InterVA estimates the CoD, based, merely, upon the CQs while the OR is disregarded. We hypothesize that the incorporation of the text provided by the OR might convey relevant information to discern the CoD and, accordingly, InterVA could be benefited from Natural Language Processing approaches. Empirical results corroborated that the CoD prediction capability of the InterVA algorithm is outperformed taking into account the valuable information conveyed by the OR. The experimental layout compares InterVA with other approaches well suited to the processing of structured inputs as is the case of the CQs. Next, alternative approaches based on language models are employed to analyze the OR. Finally, the best approach for each facet (CQs and OR) was combined leading to a multi-modal approach

Archivo Digital para la Docencia y la Investigación