7 research outputs found
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
Evaluating HIV/STD interventions in developing countries: do current indicators do justice to advances in intervention approaches?
HIV continues to spread unabated in many developing countries. Here we consider the interventions that are currently in place and critically discuss the methods that are being used to evaluate them as reported in the published literature. In recent years there has been a move away from highly individual-oriented interventions towards more participatory approaches that emphasise techniques such as community-led peer education and group discussions. However, this move towards more community orientated intervention techniques has not been matched by the development of evaluation methods with which to capture and explain the community and social changes which are often necessary preconditions for health-enhancing behaviour change. Evaluation research continues to rely on quantitative methodologies that fail to elucidate the complex changes that the newer interventions seek to promote within target communities. In addition, these methods of evaluation tend to rely on the use of highly individualistic and quantitative biomedical indicators such as HIV/STD rates, or knowledge, attitude, perception and behaviour (KAPB) survey questionnaires. We argue that such approaches are inadequate for the task of tracking and measuring important determinants of programme success such as psycho-social changes, features of the community-intervention interface and the degree of trust and identification with which members of target communities regard particular interventions. Rigorously conducted qualitative process evaluations taking account of the above factors could make a key contribution to the development of more successful HIV-prevention interventions
Worldwide outdoor round robin study of organic photovoltaic devices and modules
Accurate characterization and reporting of organic photovoltaic (OPV) device performance remains one of the important challenges in the field. The large spread among the efficiencies of devices with the same structure reported by different groups is significantly caused by different procedures and equipment used during testing. The presented article addresses this issue by offering a new method of device testing using "suitcase sample" approach combined with outdoor testing that limits the diversity of the equipment, and a strict measurement protocol. A round robin outdoor characterization of roll-to-roll coated OPV cells and modules conducted among 46 laboratories worldwide is presented, where the samples and the testing equipment were integrated in a compact suitcase that served both as a sample transportation tool and as a holder and test equipment during testing. In addition, an internet based coordination was used via plasticphotovoltaics.org that allowed fast and efficient communication among participants and provided a controlled reporting format for the results that eased the analysis of the data. The reported deviations among the laboratories were limited to 5% when compared to the Si reference device integrated in the suitcase and were up to 8% when calculated using the local irradiance data. Therefore, this method offers a fast, cheap and efficient tool for sample sharing and testing that allows conducting outdoor measurements of OPV devices in a reproducible manner
Stroke genetics informs drug discovery and risk prediction across ancestries
Previous genome-wide association studies (GWASs) of stroke - the second leading cause of death worldwide - were conducted predominantly in populations of European ancestry(1,2). Here, in cross-ancestry GWAS meta-analyses of 110,182 patients who have had a stroke (five ancestries, 33% non-European) and 1,503,898 control individuals, we identify association signals for stroke and its subtypes at 89 (61 new) independent loci: 60 in primary inverse-variance-weighted analyses and 29 in secondary meta-regression and multitrait analyses. On the basis of internal cross-ancestry validation and an independent follow-up in 89,084 additional cases of stroke (30% non-European) and 1,013,843 control individuals, 87% of the primary stroke risk loci and 60% of the secondary stroke risk loci were replicated (P < 0.05). Effect sizes were highly correlated across ancestries. Cross-ancestry fine-mapping, in silico mutagenesis analysis(3), and transcriptome-wide and proteome-wide association analyses revealed putative causal genes (such as SH3PXD2A and FURIN) and variants (such as at GRK5 and NOS3). Using a three-pronged approach(4), we provide genetic evidence for putative drug effects, highlighting F11, KLKB1, PROC, GP1BA, LAMC2 and VCAM1 as possible targets, with drugs already under investigation for stroke for F11 and PROC. A polygenic score integrating cross-ancestry and ancestry-specific stroke GWASs with vascular-risk factor GWASs (integrative polygenic scores) strongly predicted ischaemic stroke in populations of European, East Asian and African ancestry(5). Stroke genetic risk scores were predictive of ischaemic stroke independent of clinical risk factors in 52,600 clinical-trial participants with cardiometabolic disease. Our results provide insights to inform biology, reveal potential drug targets and derive genetic risk prediction tools across ancestries.Paroxysmal Cerebral Disorder
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical science. © The Author(s) 2019. Published by Oxford University Press