31 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research

    Genome-wide association study of lung adenocarcinoma in East Asia and comparison with a European population.

    Get PDF
    Lung adenocarcinoma is the most common type of lung cancer. Known risk variants explain only a small fraction of lung adenocarcinoma heritability. Here, we conducted a two-stage genome-wide association study of lung adenocarcinoma of East Asian ancestry (21,658 cases and 150,676 controls; 54.5% never-smokers) and identified 12 novel susceptibility variants, bringing the total number to 28 at 25 independent loci. Transcriptome-wide association analyses together with colocalization studies using a Taiwanese lung expression quantitative trait loci dataset (n = 115) identified novel candidate genes, including FADS1 at 11q12 and ELF5 at 11p13. In a multi-ancestry meta-analysis of East Asian and European studies, four loci were identified at 2p11, 4q32, 16q23, and 18q12. At the same time, most of our findings in East Asian populations showed no evidence of association in European populations. In our studies drawn from East Asian populations, a polygenic risk score based on the 25 loci had a stronger association in never-smokers vs. individuals with a history of smoking (Pinteraction = 0.0058). These findings provide new insights into the etiology of lung adenocarcinoma in individuals from East Asian populations, which could be important in developing translational applications

    Knowledge and preventive behaviour among pregnant women with latent toxoplasmosis in Malaysia

    Get PDF
    Latent toxoplasmosis could induce various hormonal and behavioural perturbations in infected hosts. We aimed to study the latent seroprevalence of Toxoplasma gondii (T. gondii) and the relationship between infection, knowledge and behaviour among 400 pregnant mothers. Plasma samples were tested for the presence of T. gondii IgG antibodies while a structured questionnaire was used to record respondents’ socio-demographic characteristics, general information and knowledge on plausible risk factors, symptoms, timing of infection, and preventive knowledge and behaviour regarding toxoplasmosis. The seroprevalence of latent toxoplasmosis among respondents was at 31.8%. This study indicated that 69.5% of them had poor knowledge of toxoplasmosis but most of them (99.8%) practised preventive behaviours. Multiple logistic regression analysis showed that pregnant women with low education levels (aOR: 1.91, 95% CI 1.18, 3.10; p = 0.008) and past medical history (aOR: 2.32, 95% CI 1.32, 4.06; p = 0.003) were both twice as likely to have anti-T. gondii IgG seropositivity. Besides, women who were unsure regarding the transmission mode of the disease via blood transfusion were four times more likely (aOR: 3.93, 95% CI 1.54, 10.01; p = 0.004) to have chronic toxoplasmosis seroprevalence. Women who were unsure regarding the necessities of avoiding stray cats had aOR of 0.42 (95% CI 0.24, 0.71, p = 0.001) for chronic toxoplasmosis seroprevalence. Translating the knowledge on toxoplasmosis into the practice of preventive behaviour via a health education programme is crucial in reducing the risk of disease transmission especially among pregnant women
    corecore