68 research outputs found

    CHEMDNER: The drugs and chemical names extraction challenge

    Get PDF
    Natural language processing (NLP) and text mining technologies for the chemical domain (ChemNLP or chemical text mining) are key to improve the access and integration of information from unstructured data such as patents or the scientific literature. Therefore, the BioCreative organizers posed the CHEMDNER (chemical compound and drug name recognition) community challenge, which promoted the development of novel, competitive and accessible chemical text mining systems. This task allowed a comparative assessment of the performance of various methodologies using a carefully prepared collection of manually labeled text prepared by specially trained chemists as Gold Standard data. We evaluated two important aspects: one covered the indexing of documents with chemicals (chemical document indexing - CDI task), and the other was concerned with finding the exact mentions of chemicals in text (chemical entity mention recognition - CEM task). 27 teams (23 academic and 4 commercial, a total of 87 researchers) returned results for the CHEMDNER tasks: 26 teams for CEM and 23 for the CDI task. Top scoring teams obtained an F-score of 87.39% for the CEM task and 88.20% for the CDI task, a very promising result when compared to the agreement between human annotators (91%). The strategies used to detect chemicals included machine learning methods (e.g. conditional random fields) using a variety of features, chemistry and drug lexica, and domain-specific rules. We expect that the tools and resources resulting from this effort will have an impact in future developments of chemical text mining applications and will form the basis to find related chemical information for the detected entities, such as toxicological or pharmacogenomic properties

    The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text

    Get PDF
    BACKGROUND: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.RESULTS:A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89 and the best AUC iP/R was 68. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35) the macro-averaged precision ranged between 50 and 80, with a maximum F-Score of 55. CONCLUSIONS: The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows

    Increasing test specificity without impairing sensitivity: lessons learned from SARS-CoV-2 serology

    Get PDF
    Background: Serological tests are widely used in various medical disciplines for diagnostic and monitoring purposes. Unfortunately, the sensitivity and specificity of test systems are often poor, leaving room for false-positive and false-negative results. However, conventional methods were used to increase specificity and decrease sensitivity and vice versa. Using SARS-CoV-2 serology as an example, we propose here a novel testing strategy: the € sensitivity improved two-test' or € SIT²' algorithm. Methods: SIT² involves confirmatory retesting of samples with results falling in a predefined retesting zone of an initial screening test, with adjusted cut-offs to increase sensitivity. We verified and compared the performance of SIT² to single tests and orthogonal testing (OTA) in an Austrian cohort (1117 negative, 64 post-COVID-positive samples) and validated the algorithm in an independent British cohort (976 negatives and 536 positives). Results: The specificity of SIT² was superior to single tests and non-inferior to OTA. The sensitivity was maintained or even improved using SIT² when compared with single tests or OTA. SIT² allowed correct identification of infected individuals even when a live virus neutralisation assay could not detect antibodies. Compared with single testing or OTA, SIT² significantly reduced total test errors to 0.46% (0.24-0.65) or 1.60% (0.94-2.38) at both 5% or 20% seroprevalence. Conclusion: For SARS-CoV-2 serology, SIT² proved to be the best diagnostic choice at both 5% and 20% seroprevalence in all tested scenarios. It is an easy to apply algorithm and can potentially be helpful for the serology of other infectious diseases

    Measurements of fiducial and differential cross sections for Higgs boson production in the diphoton decay channel at s√=8 TeV with ATLAS

    Get PDF
    Measurements of fiducial and differential cross sections are presented for Higgs boson production in proton-proton collisions at a centre-of-mass energy of s√=8 TeV. The analysis is performed in the H → γγ decay channel using 20.3 fb−1 of data recorded by the ATLAS experiment at the CERN Large Hadron Collider. The signal is extracted using a fit to the diphoton invariant mass spectrum assuming that the width of the resonance is much smaller than the experimental resolution. The signal yields are corrected for the effects of detector inefficiency and resolution. The pp → H → γγ fiducial cross section is measured to be 43.2 ±9.4(stat.) − 2.9 + 3.2 (syst.) ±1.2(lumi)fb for a Higgs boson of mass 125.4GeV decaying to two isolated photons that have transverse momentum greater than 35% and 25% of the diphoton invariant mass and each with absolute pseudorapidity less than 2.37. Four additional fiducial cross sections and two cross-section limits are presented in phase space regions that test the theoretical modelling of different Higgs boson production mechanisms, or are sensitive to physics beyond the Standard Model. Differential cross sections are also presented, as a function of variables related to the diphoton kinematics and the jet activity produced in the Higgs boson events. The observed spectra are statistically limited but broadly in line with the theoretical expectations

    Evidence for the Higgs-boson Yukawa coupling to tau leptons with the ATLAS detector

    Get PDF
    Results of a search for H → τ τ decays are presented, based on the full set of proton-proton collision data recorded by the ATLAS experiment at the LHC during 2011 and 2012. The data correspond to integrated luminosities of 4.5 fb−1 and 20.3 fb−1 at centre-of-mass energies of √s = 7 TeV and √s = 8 TeV respectively. All combinations of leptonic (τ → `νν¯ with ` = e, µ) and hadronic (τ → hadrons ν) tau decays are considered. An excess of events over the expected background from other Standard Model processes is found with an observed (expected) significance of 4.5 (3.4) standard deviations. This excess provides evidence for the direct coupling of the recently discovered Higgs boson to fermions. The measured signal strength, normalised to the Standard Model expectation, of µ = 1.43 +0.43 −0.37 is consistent with the predicted Yukawa coupling strength in the Standard Model

    Searches for the Zγ decay mode of the Higgs boson and for new high-mass resonances in pp collisions at √s=13 TeV with the ATLAS detector

    Get PDF
    This article presents searches for the Zγ decay of the Higgs boson and for narrow high-mass resonances decaying to Zγ, exploiting Z boson decays to pairs of electrons or muons. The data analysis uses 36.1 fb−1 of pp collisions at √s=13 recorded by the ATLAS detector at the CERN Large Hadron Collider. The data are found to be consistent with the expected Standard Model background. The observed (expected — assuming Standard Model pp → H → Zγ production and decay) upper limit on the production cross section times the branching ratio for pp → H → Zγ is 6.6. (5.2) times the Standard Model prediction at the 95% confidence level for a Higgs boson mass of 125.09 GeV. In addition, upper limits are set on the production cross section times the branching ratio as a function of the mass of a narrow resonance between 250 GeV and 2.4 TeV, assuming spin-0 resonances produced via gluon-gluon fusion, and spin-2 resonances produced via gluon-gluon or quark-antiquark initial states. For high-mass spin-0 resonances, the observed (expected) limits vary between 88 fb (61 fb) and 2.8 fb (2.7 fb) for the mass range from 250 GeV to 2.4 TeV at the 95% confidence level

    Search for a high-mass Higgs boson decaying to a W boson pair in pp collisions at √s = 8 TeV with the ATLAS detector

    Get PDF
    A search for a high-mass Higgs boson H is performed in the H → WW → ℓνℓν and H → WW → ℓνqq decay channels using pp collision data corresponding to an integrated luminosity of 20.3 fb−¹ collected at √s = 8 TeV by the ATLAS detector at the Large Hadron Collider. No evidence of a high-mass Higgs boson is found. Limits on σH × BR(H → WW) as a function of the Higgs boson mass mH are determined in three different scenarios: one in which the heavy Higgs boson has a narrow width compared to the experimental resolution, one for a width increasing with the boson mass and modeled by the complex-pole scheme following the same behavior as in the Standard Model, and one for intermediate widths. The upper range of the search is mH = 1500 GeV for the narrow-width scenario and mH = 1000 GeV for the other two scenarios. The lower edge of the search range is 200–300 GeV and depends on the analysis channel and search scenario. For each signal interpretation, individual and combined limits from the two WW decay channels are presented. At mH = 1500 GeV, the highest-mass point tested, σH × BR(H → WW) for a narrow-width Higgs boson is constrained to be less than 22 fb and 6.6 fb at 95% CL for the gluon fusion and vector-boson fusion production modes, respectively

    Fiducial and differential cross sections of Higgs boson production measured in the four-lepton decay channel in pp collisions at √s = 8 TeV with the ATLAS detector

    Get PDF
    Measurements of fiducial and differential cross sections of Higgs boson production in the H→ZZ∗ → 4ℓ decay channel are presented. The cross sections are determined within a fiducial phase space and corrected for detection efficiency and resolution effects. They are based on 20.3 fb−¹ of pp collision data, produced at √s = 8 TeV centre-of-mass energy at the LHC and recorded by the ATLAS detector. The differential measurements are performed in bins of transverse momentum and rapidity of the four-lepton system, the invariant mass of the subleading lepton pair and the decay angle of the leading lepton pair with respect to the beam line in the four-lepton rest frame, as well as the number of jets and the transverse momentum of the leading jet. The measured cross sections are compared to selected theoretical calculations of the Standard Model expectations. No significant deviation from any of the tested predictions is found

    Search for the Standard Model Higgs boson decay to μ+μ− with the ATLAS detector

    Get PDF
    A search is reported for Higgs boson decay to μ+μ−μ+μ− using data with an integrated luminosity of 24.8 fb−124.8 fb−¹ collected with the ATLAS detector in pp collisions at √s=7 and 8 TeV at the CERN Large Hadron Collider. The observed dimuon invariant mass distribution is consistent with the Standard Model background-only hypothesis in the 120–150 GeV search range. For a Higgs boson with a mass of 125.5 GeV, the observed (expected) upper limit at the 95% confidence level is 7.0 (7.2) times the Standard Model expectation. This corresponds to an upper limit on the branching ratio BR(H→μ+μ−)of 1.5×10−31.5×10−3

    Robust estimation of bacterial cell count from optical density

    Get PDF
    Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
    corecore