64 research outputs found

    Making effective use of healthcare data using data-to-text technology

    Full text link
    Healthcare organizations are in a continuous effort to improve health outcomes, reduce costs and enhance patient experience of care. Data is essential to measure and help achieving these improvements in healthcare delivery. Consequently, a data influx from various clinical, financial and operational sources is now overtaking healthcare organizations and their patients. The effective use of this data, however, is a major challenge. Clearly, text is an important medium to make data accessible. Financial reports are produced to assess healthcare organizations on some key performance indicators to steer their healthcare delivery. Similarly, at a clinical level, data on patient status is conveyed by means of textual descriptions to facilitate patient review, shift handover and care transitions. Likewise, patients are informed about data on their health status and treatments via text, in the form of reports or via ehealth platforms by their doctors. Unfortunately, such text is the outcome of a highly labour-intensive process if it is done by healthcare professionals. It is also prone to incompleteness, subjectivity and hard to scale up to different domains, wider audiences and varying communication purposes. Data-to-text is a recent breakthrough technology in artificial intelligence which automatically generates natural language in the form of text or speech from data. This chapter provides a survey of data-to-text technology, with a focus on how it can be deployed in a healthcare setting. It will (1) give an up-to-date synthesis of data-to-text approaches, (2) give a categorized overview of use cases in healthcare, (3) seek to make a strong case for evaluating and implementing data-to-text in a healthcare setting, and (4) highlight recent research challenges.Comment: 27 pages, 2 figures, book chapte

    Predicting sample size required for classification performance

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Supervised learning methods need annotated data in order to generate efficient models. Annotated data, however, is a relatively scarce resource and can be expensive to obtain. For both passive and active learning methods, there is a need to estimate the size of the annotated sample required to reach a performance target.</p> <p>Methods</p> <p>We designed and implemented a method that fits an inverse power law model to points of a given learning curve created using a small annotated training set. Fitting is carried out using nonlinear weighted least squares optimization. The fitted model is then used to predict the classifier's performance and confidence interval for larger sample sizes. For evaluation, the nonlinear weighted curve fitting method was applied to a set of learning curves generated using clinical text and waveform classification tasks with active and passive sampling methods, and predictions were validated using standard goodness of fit measures. As control we used an un-weighted fitting method.</p> <p>Results</p> <p>A total of 568 models were fitted and the model predictions were compared with the observed performances. Depending on the data set and sampling method, it took between 80 to 560 annotated samples to achieve mean average and root mean squared error below 0.01. Results also show that our weighted fitting method outperformed the baseline un-weighted method (p < 0.05).</p> <p>Conclusions</p> <p>This paper describes a simple and effective sample size prediction algorithm that conducts weighted fitting of learning curves. The algorithm outperformed an un-weighted algorithm described in previous literature. It can help researchers determine annotation sample size for supervised machine learning.</p

    CREGEX: A Biomedical Text Classifier Based on Automatically Generated Regular Expressions

    No full text
    © 2013 IEEE. High accuracy text classifiers are used nowadays in organizing large amounts of biomedical information and supporting clinical decision-making processes. In medical informatics, regular expression-based classifiers have emerged as an alternative to traditional, discriminative classification algorithms due to their ability to model sequential patterns. This article presents CREGEX (Classifier Regular Expression), a biomedical text classifier based on an automatically generated regular-expressions-based feature space. We conceived an algorithm for automatically constructing an informative and discriminative regular-expressions-based feature space, suitable for binary and multiclass discrimination problems. Regular expressions are automatically generated from training texts using a coarse-to-fine text aligning method, which trades off the lexical variants of words, in terms of gender and grammatical number, and the generation of a feature space containing a large number of noisy features. CREGEX carries out feature selection by filtering keywords and also computes a confidence metric to classify test texts. Three de-identified datasets in Spanish, with information on smoking habits, obesity, and obesity types, were used here to assess the performance of CREGEX. For comparison, Support Vector Machine (SVM) and Naïve Bayes (NB) supervised classifiers were also trained with consecutive sequences of tokens (n-grams) as features. Results show that, in all the datasets used for evaluation, CREGEX not only outperformed both the SVM and NB classifiers in terms of accuracy and F-measure (p-value\u3c0.05) but also used a fewer amount of training examples to achieve the same performance. Such a superior performance is attributed to the regular expressions\u27 ability to represent complex text patterns

    Enhancing Clinical Data Analysis by Explaining Interaction Effects between Covariates in Deep Neural Network Models

    No full text
    Deep neural network (DNN) is a powerful technology that is being utilized by a growing number and range of research projects, including disease risk prediction models. One of the key strengths of DNN is its ability to model non-linear relationships, which include covariate interactions. We developed a novel method called interaction scores for measuring the covariate interactions captured by DNN models. As the method is model-agnostic, it can also be applied to other types of machine learning models. It is designed to be a generalization of the coefficient of the interaction term in a logistic regression; hence, its values are easily interpretable. The interaction score can be calculated at both an individual level and population level. The individual-level score provides an individualized explanation for covariate interactions. We applied this method to two simulated datasets and a real-world clinical dataset on Alzheimer&rsquo;s disease and related dementia (ADRD). We also applied two existing interaction measurement methods to those datasets for comparison. The results on the simulated datasets showed that the interaction score method can explain the underlying interaction effects, there are strong correlations between the population-level interaction scores and the ground truth values, and the individual-level interaction scores vary when the interaction was designed to be non-uniform. Another validation of our new method is that the interactions discovered from the ADRD data included both known and novel relationships
    • …
    corecore