14 research outputs found
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
Association of genetic polymorphisms with survival of pancreatic ductal adenocarcinoma patients
Germline genetic variability might contribute, at least partially, to the survival of pancreatic ductal adenocarcinoma (PDAC) patients. Two recently performed genome-wide association studies (GWAS) on PDAC overall survival (OS) suggested (p<10-5) the association between 30 genomic regions and PDAC OS. With the aim to highlight the true associations within these regions, we analysed 44 single-nucleotide polymorphisms (SNPs) in the 30 candidate regions in 1722 PDAC patients within the PANcreatic Disease ReseArch (PANDoRA) consortium. We observed statistically significant associations for five of the selected regions. One association in the CTNNA2 gene on chromosome 2p12 (rs1567532, HR=1.75, 95% CI 1.19-2.58, p=0.005 for homozygotes for the minor allele) and one in the last intron of the RUNX2 gene on chromosome 6p21 (rs12209785, HR=0.88, 95% CI 0.80-0.98, p=0.014 for heterozygotes) are of particular relevance. These loci do not coincide with those that showed the strongest associations in the previous GWASs. In silico analysis strongly suggested a possible mechanistic link between these two SNPs and pancreatic cancer survival. Functional studies are warranted to confirm the link between these genes (or other genes mapping in those regions) and PDAC prognosis in order to understand whether these variants may have the potential to impact treatment decisions and design of clinical trials
Genome-wide meta-analysis identifies five new susceptibility loci for pancreatic cancer
In 2020, 146,063 deaths due to pancreatic cancer are estimated to occur in Europe and the United States combined. To identify common susceptibility alleles, we performed the largest pancreatic cancer GWAS to date, including 9040 patients and 12,496 controls of European ancestry from the Pancreatic Cancer Cohort Consortium (PanScan) and the Pancreatic Cancer Case-Control Consortium (PanC4). Here, we find significant evidence of a novel association at rs78417682 (7p12/TNS3, P = 4.35 x 10(-8)). Replication of 10 promising signals in up to 2737 patients and 4752 controls from the PANcreatic Disease ReseArch (PAN-DoRA) consortium yields new genome-wide significant loci: rs13303010 at 1p36.33 (NOC2L, P = 8.36 x 10(-14)), rs2941471 at 8q21.11 (HNF4G, P = 6.60 x 10(-10)), rs4795218 at 17q12 (HNF1B, P = 1.32 x 10(-8)), and rs1517037 at 18q21.32 (GRP, P = 3.28 x 10(-8)). rs78417682 is not statistically significantly associated with pancreatic cancer in PANDoRA. Expression quantitative trait locus analysis in three independent pancreatic data sets provides molecular support of NOC2L as a pancreatic cancer susceptibility gene