41 research outputs found

    Towards Measuring the Food Quality of Grocery Purchases: An Estimation Model of the Healthy Eating Index-2010 Using only Food Item Counts

    Get PDF
    AbstractMeasuring the quality of food consumed by individuals or groups in the U.S. is essential to informed public health surveillance efforts and sound nutrition policymaking. For example, the Healthy Eating Index-2010 (HEI) is an ideal metric to assess the food quality of households, but the traditional methods of collecting the data required to calculate the HEI are expensive and burdensome. We evaluated an alternative source: rather than measuring the quality of the foods consumers eat, we want to estimate the quality of the foods consumers buy. To accomplish that we need a way of estimating the HEI based solely on the count of food items. We developed an estimation model of the HEI, using an augmented set of the What We Eat In America (WWEIA) food categories. Then we mapped ∼92,000 grocery food items to it. The model uses an inverse Cumulative Distribution Function sampling technique. Here we describe the model and report reliability metrics based on NHANES data from 2003-2010

    Single and multiple time-point prediction models in kidney transplant outcomes

    Get PDF
    abstractThis study predicted graft and recipient survival in kidney transplantation based on the USRDS dataset by regression models and artificial neural networks (ANNs). We examined single time-point models (logistic regression and single-output ANNs) versus multiple time-point models (Cox models and multiple-output ANNs). These models in general achieved good prediction discrimination (AUC up to 0.82) and model calibration. This study found that: (1) Single time-point and multiple time-point models can achieve comparable AUC, except for multiple-output ANNs, which may perform poorly when a large proportion of observations are censored, (2) Logistic regression is able to achieve comparable performance as ANNs if there are no strong interactions or non-linear relationships among the predictors and the outcomes, (3) Time-varying effects must be modeled explicitly in Cox models when predictors have significantly different effects on short-term versus long-term survival, and (4) Appropriate baseline survivor function should be specified for Cox models to achieve good model calibration, especially when clinical decision support is designed to provide exact predicted survival rates

    Dynamic summarization of bibliographic-based data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Traditional information retrieval techniques typically return excessive output when directed at large bibliographic databases. Natural Language Processing applications strive to extract salient content from the excessive data. Semantic MEDLINE, a National Library of Medicine (NLM) natural language processing application, highlights relevant information in PubMed data. However, Semantic MEDLINE implements manually coded schemas, accommodating few information needs. Currently, there are only five such schemas, while many more would be needed to realistically accommodate all potential users. The aim of this project was to develop and evaluate a statistical algorithm that automatically identifies relevant bibliographic data; the new algorithm could be incorporated into a dynamic schema to accommodate various information needs in Semantic MEDLINE, and eliminate the need for multiple schemas.</p> <p>Methods</p> <p>We developed a flexible algorithm named Combo that combines three statistical metrics, the Kullback-Leibler Divergence (KLD), Riloff's RlogF metric (RlogF), and a new metric called PredScal, to automatically identify salient data in bibliographic text. We downloaded citations from a PubMed search query addressing the genetic etiology of bladder cancer. The citations were processed with SemRep, an NLM rule-based application that produces semantic predications. SemRep output was processed by Combo, in addition to the standard Semantic MEDLINE genetics schema and independently by the two individual KLD and RlogF metrics. We evaluated each summarization method using an existing reference standard within the task-based context of genetic database curation.</p> <p>Results</p> <p>Combo asserted 74 genetic entities implicated in bladder cancer development, whereas the traditional schema asserted 10 genetic entities; the KLD and RlogF metrics individually asserted 77 and 69 genetic entities, respectively. Combo achieved 61% recall and 81% precision, with an F-score of 0.69. The traditional schema achieved 23% recall and 100% precision, with an F-score of 0.37. The KLD metric achieved 61% recall, 70% precision, with an F-score of 0.65. The RlogF metric achieved 61% recall, 72% precision, with an F-score of 0.66.</p> <p>Conclusions</p> <p>Semantic MEDLINE summarization using the new Combo algorithm outperformed a conventional summarization schema in a genetic database curation task. It potentially could streamline information acquisition for other needs without having to hand-build multiple saliency schemas.</p

    The role of networks to overcome large-scale challenges in tomography : the non-clinical tomography users research network

    Get PDF
    Our ability to visualize and quantify the internal structures of objects via computed tomography (CT) has fundamentally transformed science. As tomographic tools have become more broadly accessible, researchers across diverse disciplines have embraced the ability to investigate the 3D structure-function relationships of an enormous array of items. Whether studying organismal biology, animal models for human health, iterative manufacturing techniques, experimental medical devices, engineering structures, geological and planetary samples, prehistoric artifacts, or fossilized organisms, computed tomography has led to extensive methodological and basic sciences advances and is now a core element in science, technology, engineering, and mathematics (STEM) research and outreach toolkits. Tomorrow's scientific progress is built upon today's innovations. In our data-rich world, this requires access not only to publications but also to supporting data. Reliance on proprietary technologies, combined with the varied objectives of diverse research groups, has resulted in a fragmented tomography-imaging landscape, one that is functional at the individual lab level yet lacks the standardization needed to support efficient and equitable exchange and reuse of data. Developing standards and pipelines for the creation of new and future data, which can also be applied to existing datasets is a challenge that becomes increasingly difficult as the amount and diversity of legacy data grows. Global networks of CT users have proved an effective approach to addressing this kind of multifaceted challenge across a range of fields. Here we describe ongoing efforts to address barriers to recently proposed FAIR (Findability, Accessibility, Interoperability, Reuse) and open science principles by assembling interested parties from research and education communities, industry, publishers, and data repositories to approach these issues jointly in a focused, efficient, and practical way. By outlining the benefits of networks, generally, and drawing on examples from efforts by the Non-Clinical Tomography Users Research Network (NoCTURN), specifically, we illustrate how standardization of data and metadata for reuse can foster interdisciplinary collaborations and create new opportunities for future-looking, large-scale data initiatives
    corecore