23 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Screening Clinical Cell Products for Replication Competent Retrovirus: The National Gene Vector Biorepository Experience

    Get PDF
    Replication-competent retrovirus (RCR) is a safety concern for individuals treated with retroviral gene therapy. RCR detection assays are used to detect RCR in manufactured vector, transduced cell products infused into research subjects, and in the research subjects after treatment. In this study, we reviewed 286 control (n = 4) and transduced cell products (n = 282) screened for RCR in the National Gene Vector Biorepository. The transduced cell samples were submitted from 14 clinical trials. All vector products were previously shown to be negative for RCR prior to use in cell transduction. After transduction, all 282 transduced cell products were negative for RCR. In addition, 241 of the clinical trial participants were also screened for RCR by analyzing peripheral blood at least 1 month after infusion, all of which were also negative for evidence of RCR infection. The majority of vector products used in the clinical trials were generated in the PG13 packaging cell line. The findings suggest that screening of the retroviral vector product generated in PG13 cell line may be sufficient and that further screening of transduced cells does not provide added value. Keywords: retrovirus, safety testing, replicating virus, lentiviru

    Base de données mondiale des diazotrophes océaniques version 2 et estimation élevée de la fixation de N 2 dans l'océan mondial

    No full text
    International audienceAbstract. Marine diazotrophs convert dinitrogen (N2) gas into bioavailable nitrogen (N), supporting life in the global ocean. In 2012, the first version of the global oceanic diazotroph database (version 1) was published. Here, we present an updated version of the database (version 2), significantly increasing the number of in situ diazotrophic measurements from 13 565 to 55 286. Data points for N2 fixation rates, diazotrophic cell abundance, and nifH gene copy abundance have increased by 184 %, 86 %, and 809 %, respectively. Version 2 includes two new data sheets for the nifH gene copy abundance of non-cyanobacterial diazotrophs and cell-specific N2 fixation rates. The measurements of N2 fixation rates approximately follow a log-normal distribution in both version 1 and version 2. However, version 2 considerably extends both the left and right tails of the distribution. Consequently, when estimating global oceanic N2 fixation rates using the geometric means of different ocean basins, version 1 and version 2 yield similar rates (43–57 versus 45–63 Tg N yr−1; ranges based on one geometric standard error). In contrast, when using arithmetic means, version 2 suggests a significantly higher rate of 223±30 Tg N yr−1 (mean ± standard error; same hereafter) compared to version 1 (74±7 Tg N yr−1). Specifically, substantial rate increases are estimated for the South Pacific Ocean (88±23 versus 20±2 Tg N yr−1), primarily driven by measurements in the southwestern subtropics, and for the North Atlantic Ocean (40±9 versus 10±2 Tg N yr−1). Moreover, version 2 estimates the N2 fixation rate in the Indian Ocean to be 35±14 Tg N yr−1, which could not be estimated using version 1 due to limited data availability. Furthermore, a comparison of N2 fixation rates obtained through different measurement methods at the same months, locations, and depths reveals that the conventional 15N2 bubble method yields lower rates in 69 % cases compared to the new 15N2 dissolution method. This updated version of the database can facilitate future studies in marine ecology and biogeochemistry. The database is stored at the Figshare repository (https://doi.org/10.6084/m9.figshare.21677687; Shao et al., 2022).RĂ©sumĂ©. Les diazotrophes marins convertissent le diazote (N2) gazeux en azote (N) biodisponible, ce qui favorise la vie dans l'ocĂ©an mondial. En 2012, la premiĂšre version de la base de donnĂ©es mondiale des diazotrophes ocĂ©aniques (version 1) a Ă©tĂ© publiĂ©e. Nous prĂ©sentons ici une version actualisĂ©e de la base de donnĂ©es (version 2), augmentant de maniĂšre significative le nombre de mesures diazotrophiques in situ de 13 565 Ă  55 286. Les points de donnĂ©es pour les taux de fixation de N2, l'abondance des cellules diazotrophes et l'abondance des copies du gĂšne nifH ont augmentĂ© de 184 %, 86 % et 809 %, respectivement. La version 2 comprend deux nouvelles fiches de donnĂ©es pour l'abondance des copies du gĂšne nifH des diazotrophes non cyanobactĂ©riens et les taux de fixation de N2 spĂ©cifiques aux cellules. Les mesures des taux de fixation N2 suivent approximativement une distribution log-normale dans les versions 1 et 2. Cependant, la version 2 Ă©tend considĂ©rablement les queues gauche et droite de la distribution. Par consĂ©quent, lorsque l'on estime les taux de fixation de N2 dans l'ocĂ©an mondial en utilisant les moyennes gĂ©omĂ©triques des diffĂ©rents bassins ocĂ©aniques, la version 1 et la version 2 donnent des taux similaires (43-57 contre 45-63 Tg N an-1 ; fourchettes basĂ©es sur une erreur gĂ©omĂ©trique type). En revanche, lorsque l'on utilise les moyennes arithmĂ©tiques, la version 2 suggĂšre un taux significativement plus Ă©levĂ© de 223±30 Tg N an-1 (moyenne ± erreur standard ; idem ci-aprĂšs) par rapport Ă  la version 1 (74±7 Tg N an-1). Plus prĂ©cisĂ©ment, des augmentations substantielles du taux sont estimĂ©es pour l'ocĂ©an Pacifique Sud (88±23 contre 20±2 Tg N an-1), principalement grĂące aux mesures effectuĂ©es dans les rĂ©gions subtropicales du sud-ouest, et pour l'ocĂ©an Atlantique Nord (40±9 contre 10±2 Tg N an-1). En outre, la version 2 estime le taux de fixation de N2 dans l'ocĂ©an Indien Ă  35±14 Tg N an-1, ce qui n'a pas pu ĂȘtre estimĂ© avec la version 1 en raison de la disponibilitĂ© limitĂ©e des donnĂ©es. En outre, une comparaison des taux de fixation de N2 obtenus par diffĂ©rentes mĂ©thodes de mesure aux mĂȘmes mois, lieux et profondeurs rĂ©vĂšle que la mĂ©thode conventionnelle des bulles de 15N2 donne des taux infĂ©rieurs dans 69 % des cas par rapport Ă  la nouvelle mĂ©thode de dissolution de 15N2. Cette version actualisĂ©e de la base de donnĂ©es peut faciliter les Ă©tudes futures en Ă©cologie marine et en biogĂ©ochimie. La base de donnĂ©es est stockĂ©e dans le dĂ©pĂŽt Figshare (https://doi.org/10.6084/m9.figshare.21677687 ; Shao et al., 2022)

    Global oceanic diazotroph database version 2 and elevated estimate of global oceanic N2 fixation

    Get PDF
    Marine diazotrophs convert dinitrogen (N2) gas into bioavailable nitrogen (N), supporting life in the global ocean. In 2012, the first version of the global oceanic diazotroph database (version 1) was published. Here, we present an updated version of the database (version 2), significantly increasing the number of in situ diazotrophic measurements from 13 565 to 55 286. Data points for N2 fixation rates, diazotrophic cell abundance, and nifH gene copy abundance have increased by 184 %, 86 %, and 809 %, respectively. Version 2 includes two new data sheets for the nifH gene copy abundance of non-cyanobacterial diazotrophs and cell-specific N2 fixation rates. The measurements of N2 fixation rates approximately follow a log-normal distribution in both version 1 and version 2. However, version 2 considerably extends both the left and right tails of the distribution. Consequently, when estimating global oceanic N2 fixation rates using the geometric means of different ocean basins, version 1 and version 2 yield similar rates (43–57 versus 45–63 Tg N yr−1; ranges based on one geometric standard error). In contrast, when using arithmetic means, version 2 suggests a significantly higher rate of 223±30 Tg N yr−1 (mean ± standard error; same hereafter) compared to version 1 (74±7 Tg N yr−1). Specifically, substantial rate increases are estimated for the South Pacific Ocean (88±23 versus 20±2 Tg N yr−1), primarily driven by measurements in the southwestern subtropics, and for the North Atlantic Ocean (40±9 versus 10±2 Tg N yr−1). Moreover, version 2 estimates the N2 fixation rate in the Indian Ocean to be 35±14 Tg N yr−1, which could not be estimated using version 1 due to limited data availability. Furthermore, a comparison of N2 fixation rates obtained through different measurement methods at the same months, locations, and depths reveals that the conventional 15N2 bubble method yields lower rates in 69 % cases compared to the new 15N2 dissolution method. This updated version of the database can facilitate future studies in marine ecology and biogeochemistry. The database is stored at the Figshare repository (https://doi.org/10.6084/m9.figshare.21677687; Shao et al., 2022)

    Analysis of heritability and shared heritability based on genome-wide association studies for 13 cancer types

    No full text
    Background: Studies of related individuals have consistently demonstrated notable familial aggregation of cancer. We aim to estimate the heritability and genetic correlation attributable to the additive effects of common single-nucleotide polymorphisms (SNPs) for cancer at 13 anatomical sites. Methods: Between 2007 and 2014, the US National Cancer Institute has generated data from genome-wide association studies (GWAS) for 49 492 cancer case patients and 34 131 control patients. We apply novel mixed model methodology (GCTA) to this GWAS data to estimate the heritability of individual cancers, as well as the proportion of heritability attributable to cigarette smoking in smoking-related cancers, and the genetic correlation between pairs of cancers. Results: GWAS heritability was statistically significant at nearly all sites, with the estimates of array-based heritability, hlÂČ, on the liability threshold (LT) scale ranging from 0.05 to 0.38. Estimating the combined heritability of multiple smoking characteristics, we calculate that at least 24% (95% confidence interval [CI] = 14% to 37%) and 7% (95% CI = 4% to 11%) of the heritability for lung and bladder cancer, respectively, can be attributed to genetic determinants of smoking. Most pairs of cancers studied did not show evidence of strong genetic correlation. We found only four pairs of cancers with marginally statistically significant correlations, specifically kidney and testes (ρ = 0.73, SE = 0.28), diffuse large B-cell lymphoma (DLBCL) and pediatric osteosarcoma (ρ = 0.53, SE = 0.21), DLBCL and chronic lymphocytic leukemia (CLL) (ρ = 0.51, SE =0.18), and bladder and lung (ρ = 0.35, SE = 0.14). Correlation analysis also indicates that the genetic architecture of lung cancer differs between a smoking population of European ancestry and a nonsmoking Asian population, allowing for the possibility that the genetic etiology for the same disease can vary by population and environmental exposures. Conclusion: Our results provide important insights into the genetic architecture of cancers and suggest new avenues for investigation.11 page(s
    corecore