17 research outputs found
Measuring industry-science links through inventor-author relations: A profiling method
In this pilot study we examine the performance of text-based profiling in recovering a set of validated inventor-author links. In a first step we match patents and publications solely based on their similarity in content. Next, we compare inventor and author names on the highest ranked matches for the occurrence of name matches. Finally, we compare these candidate matches with the names listed in a validated set of inventor-author names. Our text-based profile methodology performs significantly better than a random matching of patents and publications, suggesting that text-based profiling is a valuable complementary tool to the name searches used in previous studies.innovation; industry-science links; text-based profiling;
TXTGate: profiling gene groups with text-based information
We implemented a framework called TXTGate that combines literature indices of selected public biological resources in a flexible text-mining system designed towards the analysis of groups of genes. By means of tailored vocabularies, term- as well as gene-centric views are offered on selected textual fields and MEDLINE abstracts used in LocusLink and the Saccharomyces Genome Database. Subclustering and links to external resources allow for in-depth analysis of the resulting term profiles
Stroke genetics informs drug discovery and risk prediction across ancestries
Previous genome-wide association studies (GWASs) of stroke â the second leading cause of death worldwide â were conducted predominantly in populations of European ancestry1,2. Here, in cross-ancestry GWAS meta-analyses of 110,182 patients who have had a stroke (five ancestries, 33% non-European) and 1,503,898 control individuals, we identify association signals for stroke and its subtypes at 89 (61 new) independent loci: 60 in primary inverse-variance-weighted analyses and 29 in secondary meta-regression and multitrait analyses. On the basis of internal cross-ancestry validation and an independent follow-up in 89,084 additional cases of stroke (30% non-European) and 1,013,843 control individuals, 87% of the primary stroke risk loci and 60% of the secondary stroke risk loci were replicated (Pâ<â0.05). Effect sizes were highly correlated across ancestries. Cross-ancestry fine-mapping, in silico mutagenesis analysis3, and transcriptome-wide and proteome-wide association analyses revealed putative causal genes (such as SH3PXD2A and FURIN) and variants (such as at GRK5 and NOS3). Using a three-pronged approach4, we provide genetic evidence for putative drug effects, highlighting F11, KLKB1, PROC, GP1BA, LAMC2 and VCAM1 as possible targets, with drugs already under investigation for stroke for F11 and PROC. A polygenic score integrating cross-ancestry and ancestry-specific stroke GWASs with vascular-risk factor GWASs (integrative polygenic scores) strongly predicted ischaemic stroke in populations of European, East Asian and African ancestry5. Stroke genetic risk scores were predictive of ischaemic stroke independent of clinical risk factors in 52,600 clinical-trial participants with cardiometabolic disease. Our results provide insights to inform biology, reveal potential drug targets and derive genetic risk prediction tools across ancestries
Combining full-text analysis and bibliometric indicators. A pilot study
In the present study full-text analysis and traditional bibliometric methods are combined to improve the efficiency of the individual methods in the mapping of science. The methodology is applied to map research papers from a special issue of Scientometrics. The outcomes substantiate that such hybrid methodology can be applied to both research evaluation and information retrieval. The subject classification given by the guest-editors of the special issue is used for validation purposes. Because of the limited number of papers underlying the study the paper is considered a pilot study that will be extended in a later study on the basis of a larger corpus.status: publishe
On the potential of domain literature for clustering and Bayesian network learning
Thanks to its increasing availability, electronic literature can now be a major source of information when developing complex statistical models where data is scarce or contains much noise. This raises the question of how to integrate information from domain literature with statistical data. Because quantifying similarities or dependencies between variables is a basic building block in knowledge discovery, we consider here the following question. Which vector representations of text and which statistical scores of similarity or dependency support best the use of literature in statistical models? For the text source, we assume to have annotations for the domain variables as short free-text descriptions and optionally to have a large literature repository from which we can further expand the annotations. For evaluation, we contrast the variable similarities or dependencies obtained from text using di#erent annotation sources and vector representations with those obtained from measurement data or expert assessments. Specifically, we consider two learning problems: clustering and Bayesian network learning. Firstly, we report performance (against an expert reference) for clustering yeast genes from textual annotations. Secondly, we assess the agreement between text-based and data-based scores of variable dependencies when learning Bayesian network substructures for the task of modeling the joint distribution of clinical measurements of ovarian tumors
Combining full-text analysis and bibliometric indicators. A pilot study.
In the present study full-text analysis and traditional bibliometric methods are combined to improve the efficiency of the individual methods in the mapping of science. The methodology is applied to map research papers from a special issue of Scientometrics. The outcomes substantiate that such hybrid methodology can be applied to both research evaluation and information retrieval. The subject classification given by the guest-editors of the special issue is used for validation purposes. Because of the limited number of papers underlying the study the paper is considered a pilot study that will be extended in a later study on the basis of a larger corpus.Classification; Combined cocitation; Database; Efficiency; Evaluation; Field; Growth; Impact; Indicators; Information; Methods; Research; Science; Studies; Tool; Validation; Websites; Word analysis;
Measuring industry-science links through inventor-author relations: A profiling methodology
status: publishe
Combining full text and bibliometric information in mapping scientific disciplines
In the present study results of an earlier pilot study by Glenisson, Glanzel and Persson are extended on the basis of larger sets of papers. Full text analysis and traditional bibliometric methods are serially combined to improve the efficiency of the two individual methods. The text mining methodology already introduced in the pilot study is applied to the complete publication year 2003 of the journal Scientometrics. Altogether 85 documents that can be considered research articles or notes have been selected for this exercise. The outcomes confirm the main results of the pilot study, namely, that such hybrid methodology can be applied to both research evaluation and information retrieval. Nevertheless, Scientometrics documents published in 2003 cover a much broader and more heterogeneous spectrum of bibliometrics and related research than those analysed in the pilot study. A modified subject classification based on the scheme used in an earlier study by Schoepflin and Glanzel has been applied for validation purposes. (c) 2005 Elsevier Ltd. All rights reserved.status: publishe