213 research outputs found

    Finding related sentence pairs in MEDLINE

    Get PDF
    We explore the feasibility of automatically identifying sentences in different MEDLINE abstracts that are related in meaning. We compared traditional vector space models with machine learning methods for detecting relatedness, and found that machine learning was superior. The Huber method, a variant of Support Vector Machines which minimizes the modified Huber loss function, achieves 73% precision when the score cutoff is set high enough to identify about one related sentence per abstract on average. We illustrate how an abstract viewed in PubMed might be modified to present the related sentences found in other abstracts by this automatic procedure

    How to Get the Most out of Your Curation Effort

    Get PDF
    Large-scale annotation efforts typically involve several experts who may disagree with each other. We propose an approach for modeling disagreements among experts that allows providing each annotation with a confidence value (i.e., the posterior probability that it is correct). Our approach allows computing certainty-level for individual annotations, given annotator-specific parameters estimated from data. We developed two probabilistic models for performing this analysis, compared these models using computer simulation, and tested each model's actual performance, based on a large data set generated by human annotators specifically for this study. We show that even in the worst-case scenario, when all annotators disagree, our approach allows us to significantly increase the probability of choosing the correct annotation. Along with this publication we make publicly available a corpus of 10,000 sentences annotated according to several cardinal dimensions that we have introduced in earlier work. The 10,000 sentences were all 3-fold annotated by a group of eight experts, while a 1,000-sentence subset was further 5-fold annotated by five new experts. While the presented data represent a specialized curation task, our modeling approach is general; most data annotation studies could benefit from our methodology

    PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval

    Get PDF
    Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these algorithms to related document networks comprised of automatically-generated content-similarity links. Specifically, this work tackles the problem of document retrieval in the biomedical domain, in the context of the PubMed search engine. A series of reranking experiments demonstrate that incorporating evidence extracted from link structure yields significant improvements in terms of standard ranked retrieval metrics. These results extend the applicability of link analysis algorithms to different environments

    Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

    Get PDF
    Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR

    Biochemical and developmental characterization of carbonic anhydrase II from chicken erythrocytes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Carbonic anhydrase (CA) of the chicken has attracted attention for a long time because it has an important role in the eggshell formation. The developmental profile of CA-II isozyme levels in chicken erythrocytes has not been determined or reported. Furthermore, the relations with CA-II in erythrocyte and egg production are not discussed. In the present study, we isolated CA-II from erythrocytes of chickens and determined age-related changes of CA-II levels in erythrocytes.</p> <p>Methods</p> <p>Chicken CA-II was purified by a combination of column chromatography. The levels of CA-II in the hemolysate of the chicken were determined using the ELISA system in blood samples from 279 female chickens, ages 1 to 93 weeks, 69 male chickens, ages 3 to 59 weeks and 52 weeks female Araucana-chickens.</p> <p>Results</p> <p>The mean concentration of CA-II in hemolysate from 1-week-old female was 50.8 ± 11.9 mg/g of Hb. The mean levels of CA-II in 25-week-old (188.1 ± 82.6 mg/g of Hb), 31-week-old (193.6 ± 69.7 mg/g of Hb) and 49-week-old (203.8 ± 123.5 mg/g of Hb) female-chickens showed the highest level of CA-II. The levels of CA-II in female WL-chickens significantly decreased at 63 week (139.0 ± 19.3 mg/g of Hb). The levels of CA-II in female WL-chicken did not change from week 63 until week 93.The mean level of CA-II in hemolysate of 3-week-old male WL-chickens was 78.3 ± 20.7 mg/g of Hb. The levels of CA-II in male WL-chickens did not show changes in the week 3 to week 59 timeframe. The mean level of CA-II in 53-week-old female Araucana-chickens was 23.4 ± 1.78 mg/g of Hb. These levels of CA-II were about 11% of those of 49-week-old female WL-chickens. Simple linear regression analysis showed significant associations between the level of CA-II and egg laying rate from 16 week-old at 63 week-old WL-chicken (p < 0.01).</p> <p>Conclusions</p> <p>Developmental changes and sexual differences of CA-II concentration in WL-chicken erythrocytes were observed. The concentration of CA-II in the erythrocyte of WL-chicken was much higher than that in Araucana-chicken (p < 0.01).</p

    Incidence, patterns and severity of reported unintentional injuries in Pakistan for persons five years and older: results of the National Health Survey of Pakistan 1990–94

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>National level estimates of injuries are not readily available for developing countries. This study estimated the annual incidence, patterns and severity of unintentional injuries among persons over five years of age in Pakistan.</p> <p>Methods</p> <p>National Health Survey of Pakistan (NHSP 1990–94) is a nationally representative survey of the household. Through a two-stage stratified design, 18, 315 persons over 5 years of age were interviewed to estimate the overall annual incidence, patterns and severity of unintentional injuries for males and females in urban and rural areas over the preceding one year. Weighted estimates were computed adjusting for complex survey design using <it>surveyfreq </it>and <it>surveylogistic </it>option of SAS 9.1 software.</p> <p>Results</p> <p>The overall annual incidence of all unintentional injuries was 45.9 (CI: 39.3–52.5) per 1000 per year; 59.2 (CI: 49.2–69.2) and 33.2 (CI: 27.0–39.4) per 1000 per year among males and females over five years of age, respectively. An estimated 6.16 million unintentional injuries occur in Pakistan annually among persons over five years of age. Urban and rural injuries were 55.9 (95% CI: 48.1–63.7) and 41.2 (95% CI: 32.2–50.0) per 1000 per year, respectively. The annual incidence of injuries due to falls were 22.2 (95% CI: 18.0–26.4), poisoning 3.3 (95%CI: 0.5–6.1) and burn was 1.5 (95%CI: 0.9–2.1) per 1000 per year. The majority of injuries occurred at home 19.2 (95%CI: 16.0–22.4) or on the roads 17.0 (95%CI: 13.8–20.2). Road traffic/street, school and urban injuries were more likely to result in handicap.</p> <p>Conclusion</p> <p>There is high burden of unintentional injuries among persons over five years of age in Pakistan. These results are useful to plan further studies and prioritizing prevention programs on injuries nationally and other developing countries with similar situation.</p

    Investigating heterogeneous protein annotations toward cross-corpora utilization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The number of corpora, collections of structured texts, has been increasing, as a result of the growing interest in the application of natural language processing methods to biological texts. Many named entity recognition (NER) systems have been developed based on these corpora. However, in the biomedical community, there is yet no general consensus regarding named entity annotation; thus, the resources are largely incompatible, and it is difficult to compare the performance of systems developed on resources that were divergently annotated. On the other hand, from a practical application perspective, it is desirable to utilize as many existing annotated resources as possible, because annotation is costly. Thus, it becomes a task of interest to integrate the heterogeneous annotations in these resources.</p> <p>Results</p> <p>We explore the potential sources of incompatibility among gene and protein annotations that were made for three common corpora: GENIA, GENETAG and AIMed. To show the inconsistency in the corpora annotations, we first tackle the incompatibility problem caused by corpus integration, and we quantitatively measure the effect of this incompatibility on protein mention recognition. We find that the F-score performance declines tremendously when training with integrated data, instead of training with pure data; in some cases, the performance drops nearly 12%. This degradation may be caused by the newly added heterogeneous annotations, and cannot be fixed without an understanding of the heterogeneities that exist among the corpora. Motivated by the result of this preliminary experiment, we further qualitatively analyze a number of possible sources for these differences, and investigate the factors that would explain the inconsistencies, by performing a series of well-designed experiments. Our analyses indicate that incompatibilities in the gene/protein annotations exist mainly in the following four areas: the boundary annotation conventions, the scope of the entities of interest, the distribution of annotated entities, and the ratio of overlap between annotated entities. We further suggest that almost all of the incompatibilities can be prevented by properly considering the four aspects aforementioned.</p> <p>Conclusion</p> <p>Our analysis covers the key similarities and dissimilarities that exist among the diverse gene/protein corpora. This paper serves to improve our understanding of the differences in the three studied corpora, which can then lead to a better understanding of the performance of protein recognizers that are based on the corpora.</p

    Measurement of CP-violation asymmetries in D0 to Ks pi+ pi-

    Get PDF
    We report a measurement of time-integrated CP-violation asymmetries in the resonant substructure of the three-body decay D0 to Ks pi+ pi- using CDF II data corresponding to 6.0 invfb of integrated luminosity from Tevatron ppbar collisions at sqrt(s) = 1.96 TeV. The charm mesons used in this analysis come from D*+(2010) to D0 pi+ and D*-(2010) to D0bar pi-, where the production flavor of the charm meson is determined by the charge of the accompanying pion. We apply a Dalitz-amplitude analysis for the description of the dynamic decay structure and use two complementary approaches, namely a full Dalitz-plot fit employing the isobar model for the contributing resonances and a model-independent bin-by-bin comparison of the D0 and D0bar Dalitz plots. We find no CP-violation effects and measure an asymmetry of ACP = (-0.05 +- 0.57 (stat) +- 0.54 (syst))% for the overall integrated CP-violation asymmetry, consistent with the standard model prediction.Comment: 15 page

    Studying the Underlying Event in Drell-Yan and High Transverse Momentum Jet Production at the Tevatron

    Get PDF
    We study the underlying event in proton-antiproton collisions by examining the behavior of charged particles (transverse momentum pT > 0.5 GeV/c, pseudorapidity |\eta| < 1) produced in association with large transverse momentum jets (~2.2 fb-1) or with Drell-Yan lepton-pairs (~2.7 fb-1) in the Z-boson mass region (70 < M(pair) < 110 GeV/c2) as measured by CDF at 1.96 TeV center-of-mass energy. We use the direction of the lepton-pair (in Drell-Yan production) or the leading jet (in high-pT jet production) in each event to define three regions of \eta-\phi space; toward, away, and transverse, where \phi is the azimuthal scattering angle. For Drell-Yan production (excluding the leptons) both the toward and transverse regions are very sensitive to the underlying event. In high-pT jet production the transverse region is very sensitive to the underlying event and is separated into a MAX and MIN transverse region, which helps separate the hard component (initial and final-state radiation) from the beam-beam remnant and multiple parton interaction components of the scattering. The data are corrected to the particle level to remove detector effects and are then compared with several QCD Monte-Carlo models. The goal of this analysis is to provide data that can be used to test and improve the QCD Monte-Carlo models of the underlying event that are used to simulate hadron-hadron collisions.Comment: Submitted to Phys.Rev.

    Measurement of the W+WW^+W^- Production Cross Section and Search for Anomalous WWγWW\gamma and WWZWWZ Couplings in ppˉp \bar p Collisions at s=1.96\sqrt{s} = 1.96 TeV

    Get PDF
    This Letter describes the current most precise measurement of the WW boson pair production cross section and most sensitive test of anomalous WWγWW\gamma and WWZWWZ couplings in ppˉp \bar p collisions at a center-of-mass energy of 1.96 TeV. The WWWW candidates are reconstructed from decays containing two charged leptons and two neutrinos, where the charged leptons are either electrons or muons. Using data collected by the CDF II detector from 3.6 fb1^{-1} of integrated luminosity, a total of 654 candidate events are observed with an expected background contribution of 320±47320 \pm 47 events. The measured total cross section is σ(ppˉW+W+X)=12.1±0.9(stat)1.4+1.6(syst)\sigma (p \bar p \to W^+ W^- + X) = 12.1 \pm 0.9 \textrm{(stat)} ^{+1.6}_{-1.4} \textrm{(syst)} pb, which is in good agreement with the standard model prediction. The same data sample is used to place constraints on anomalous WWγWW\gamma and WWZWWZ couplings.Comment: submitted to Phys. Rev. Let
    corecore