3 research outputs found

    Vaccine semantics : Automatic methods for recognizing, representing, and reasoning about vaccine-related information

    Get PDF
    Post-marketing management and decision-making about vaccines builds on the early detection of safety concerns and changes in public sentiment, the accurate access to established evidence, and the ability to promptly quantify effects and verify hypotheses about the vaccine benefits and risks. A variety of resources provide relevant information but they use different representations, which makes rapid evidence generation and extraction challenging. This thesis presents automatic methods for interpreting heterogeneously represented vaccine information. Part I evaluates social media messages for monitoring vaccine adverse events and public sentiment in social media messages, using automatic methods for information recognition. Parts II and III develop and evaluate automatic methods and res

    Extraction of chemical-induced diseases using prior knowledge and textual information

    Get PDF
    We describe our approach to the chemical-disease relation (CDR) task in the BioCreative V challenge. The CDR task consists of two subtasks: Automatic disease-named entity recognition and normalization (DNER), and extraction of chemical-induced diseases (CIDs) from Medline abstracts. For the DNER subtask, we used our concept recognition tool Peregrine, in combination with several optimization steps. For the CID subtask, our system, which we named RELigator, was trained on a rich feature set, comprising features derived from a graph database containing prior knowledge about chemicals and diseases, and linguistic and statistical features derived from the abstracts in the CDR training corpus. We describe the systems that were developed and present evaluation results for both subtasks on the CDR test set. For DNER, our Peregrine system reached an F-score of 0.757. For CID, the system achieved an F-score of 0.526, which ranked second among 18 participating teams. Several post-challenge modifications of the systems resulted in substantially improved F-scores (0.828 for DNER and 0.602 for CID)

    Quantifying outcome misclassification in multi-database studies: The case study of pertussis in the ADVANCE project

    Get PDF
    Background: The Accelerated Development of VAccine beNefit-risk Collaboration in Europe (ADVANCE) is a public-private collaboration aiming to develop and test a system for rapid benefit-risk (B/R) monitoring of vaccines using European healthcare databases. Event misclassification can result in biased estimates. Using different algorithms for identifying cases of Bordetella pertussis (BorPer) infection as a test case, we aimed to describe a strategy to quantify event misclassification, when manual chart review is not feasible. Methods: Four participating databases retrieved data from primary care (PC) setting: BIFAP: (Spain), THIN and RCGP RSC (UK) and PEDIANET (Italy); SIDIAP (Spain) retrieved data from both PC and hospital settings. BorPer algorithms were defined by healthcare setting, data domain (diagnoses, drugs, or laboratory tests) and concept sets (specific or unspecified pertussis). Algorithm- and database-specific BorPer incidence rates (IRs) were estimated in children aged 0–14 years enrolled in 2012 and 2014 and followed up until the end of each calendar year and compared with IRs of confirmed pertussis from the ECDC surveillance system (TESSy). Novel formulas were used to approximate validity indices, based on a small set of assumptions. They were applied to approximately estimate positive predictive value (PPV) and sensitivity in SIDIAP. Results: The number of cases and the estimated BorPer IRs per 100,000 person-years in PC, using data representing 3,173,268 person-years, were 0 (IR = 0.0), 21 (IR = 4.3), 21 (IR = 5.1), 79 (IR = 5.7), and 2 (IR = 2.3) in BIFAP, SIDIAP, THIN, RCGP RSC and PEDIANET respectively. The IRs for combined specific/unspecified pertussis were higher than TESSy, suggesting that some false positives had been included. In SIDIAP the estimated IR was 45.0 when discharge diagnoses were included. The sensitivity and PPV of combined PC specific and unspecific diagnoses for BorPer cases in SIDIAP were approximately 85% and 72%, respectively. Conclusion: Retrieving BorPer cases using only specific concepts has low sensitivity in PC databases, while including cases retrieved by unspecified concepts introduces false positives, which were approximately estimated to be 28% in one database. The share of cases that cannot be retrieved from a PC database because they are only seen in hospital was approximately estimated to be 15% in one database. This study demonstrated that quantifying the impact of different event-finding algorithms across databases and benchmarking with disease surveillance data can provide approximate estimates of algorithm validity
    corecore