Search CORE

9 research outputs found

Characterizing Interdisciplinarity of Researchers and Research Topics Using Web Search Engines

Author: AA Hagberg
AL Barabási
AL Barabási
AL Porter
CS Wagner
D Sullivan
DJ de Solla Price
F Janssens
F Åström
H Kautz
Hiroki Sayama
I Rafols
J Akaishi
J Mori
Jin Akaishi
JP Eaton
K Börner
L Leydesdorff
MEJ Newman
MEJ Newman
MEJ Newman
NE Friedkin
P Levy
P Mika
R Klavans
Renaud Lambiotte
RR Braam
S Wasserman
SD Dionne
SH Lee
TW Malone
X Liu
Y Asada
Y Matsuo
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Researchers' networks have been subject to active modeling and analysis. Earlier literature mostly focused on citation or co-authorship networks reconstructed from annotated scientific publication databases, which have several limitations. Recently, general-purpose web search engines have also been utilized to collect information about social networks. Here we reconstructed, using web search engines, a network representing the relatedness of researchers to their peers as well as to various research topics. Relatedness between researchers and research topics was characterized by visibility boost-increase of a researcher's visibility by focusing on a particular topic. It was observed that researchers who had high visibility boosts by the same research topic tended to be close to each other in their network. We calculated correlations between visibility boosts by research topics and researchers' interdisciplinarity at individual level (diversity of topics related to the researcher) and at social level (his/her centrality in the researchers' network). We found that visibility boosts by certain research topics were positively correlated with researchers' individual-level interdisciplinarity despite their negative correlations with the general popularity of researchers. It was also found that visibility boosts by network-related topics had positive correlations with researchers' social-level interdisciplinarity. Research topics' correlations with researchers' individual- and social-level interdisciplinarities were found to be nearly independent from each other. These findings suggest that the notion of "interdisciplinarity" of a researcher should be understood as a multi-dimensional concept that should be evaluated using multiple assessment means.Comment: 20 pages, 7 figures. Accepted for publication in PLoS On

arXiv.org e-Print Archive

The Open Repository @Binghamton (The ORB)

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Evaluating human versus machine learning performance in classifying research abstracts

Author: CAI Xin Qing
GOH Yeow Chong
KHOR Khiam Aik
KO Giovanni
THESEIRA Walter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

We study whether humans or machine learning (ML) classification models are better at classifying scientific research abstracts according to a fixed set of discipline groups. We recruit both undergraduate and postgraduate assistants for this task in separate stages, and compare their performance against the support vectors machine ML algorithm at classifying European Research Council Starting Grant project abstracts to their actual evaluation panels, which are organised by discipline groups. On average, ML is more accurate than human classifiers, across a variety of training and test datasets, and across evaluation panels. ML classifiers trained on different training sets are also more reliable than human classifiers, meaning that different ML classifiers are more consistent in assigning the same classifications to any given abstract, compared to different human classifiers. While the top five percentile of human classifiers can outperform ML in limited cases, selection and training of such classifiers is likely costly and difficult compared to training ML models. Our results suggest ML models are a cost effective and highly accurate method for addressing problems in comparative bibliometric analysis, such as harmonising the discipline classifications of research from different funding agencies or countries.National Research Foundation (NRF)Published versionThe study was partially funded by the Singapore National Research Foundation, Grant No. NRF2014-NRF-SRIE001-02

Institutional Knowledge at Singapore Management University

DR-NTU (Digital Repository of NTU)

Measuring Author Research Relatedness: A Comparison of Word-based,Topic-based and Author Cocitation Approaches

Author: Lu Kun
Wolfram Dietmar
Publication venue: UWM Digital Commons
Publication date: 05/09/2012
Field of study

Relationships between authors based on characteristics of published literature have been studied for decades. Author cocitation analysis using mapping techniques has been most frequently used to study how closely two authors are thought to be in intellectual space based on how members of the research community co-cite their works. Other approaches exist to study author relatedness based more directly on the text of their published works. In this study we present static and dynamic word-based approaches using vector space modeling, as well as a topic-based approach based on Latent Dirichlet Allocation for mapping author research relatedness. Vector space modeling is used to define an author space consisting of works by a given author. Outcomes for the two word-based approaches and a topic-based approach for 50 prolific authors in library and information science are compared with more traditional author cocitation analysis using multidimensional scaling and hierarchical cluster analysis. The two word-based approaches produced similar outcomes except where two authors were frequent co-authors for the majority of their articles. The topic-based approach produced the most distinctive map

Crossref

University of Wisconsin-Milwaukee

CITREC: An Evaluation Framework for Citation-Based Similarity Measures based on TREC Genomics and PubMed Central

Author: Gipp Bela
Lipinsk Mario
Meuschke Norman
Publication venue: 'iSchools'
Publication date: 01/01/2015
Field of study

Citation-based similarity measures such as Bibliographic Coupling and Co-Citation are an integral component of many information retrieval systems. However, comparisons of the strengths and weaknesses of measures are challenging due to the lack of suitable test collections. This paper presents CITREC, an open evaluation framework for citation-based and text-based similarity measures. CITREC prepares the data from the PubMed Central Open Access Subset and the TREC Genomics collection for a citation-based analysis and provides tools necessary for performing evaluations of similarity measures. To account for different evaluation purposes, CITREC implements 35 citation-based and text-based similarity measures, and features two gold standards. The first gold standard uses the Medical Subject Headings (MeSH) thesaurus and the second uses the expert relevance feedback that is part of the TREC Genomics collection to gauge similarity. CITREC additionally offers a system that allows creating user defined gold standards to adapt the evaluation framework to individual information needs and evaluation purposes.ye

KOPS - The Institutional Repository of the University of Konstanz

Illinois Digital Environment for Access to Learning and Scholarship Repository

Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database

Author: Ayad
Batagelj
Bickel
Boyack
Braam
Braam
Brin
Efron
Fred
Glenisson
Glänzel
Gospodnetic
He
Hubert
Jain
Janssens
Janssens
Janssens
Leydesdorff
Leydesdorff
Liu
Marshakova
Mirkin
Modha
Moya-Anegón
Newman
Rousseeuw
Small
Small
Strehl
Topchy
Wang
Zitt
Publication venue: 'Wiley'
Publication date: 01/01/2010
Field of study

Crossref

An academic perspective on the entrepreneurship policy agenda: themes, geographies and evolution

Author: Arenal Alberto
Armuña Cristina
Feijoo Claudio
Moreno Ana
Ramos Sergio
Publication venue
Publication date: 05/06/2019
Field of study

Text mining is being increasingly used for the automatic analysis of different corpus of documents, either standalone or complementarily to other bibliometric techniques. The case of academic research into entrepreneurship policy is particularly interesting due to the increasing relevance of the topic and since the knowledge about the evolution of themes in this field is still rather limited. Consequently, this paper analyses the key topics, trends and shifts that have shaped the entrepreneurship policy research agenda to date using text mining techniques, cluster analysis and complementary bibliographic data to examine the evolution of a corpus of 1,048 academic papers focused on entrepreneurial-related policies and published during the period 1990-2016 in ten of the most relevant entrepreneurship journals. The results of the analysis show that inclusion, employment and regulation-related papers have largely dominated the research in the field, evolving from an initial classical approach about the relationship between entrepreneurship and employment to a wider and multidisciplinary perspective, including the relevance of management, geographies, and narrower topics such as agglomeration economics or internationalization instead of previous generic sectorial approaches. Overall, the text mining analysis reveals how entrepreneurship policy research has gained increasing attention and has become both more open, with a growing cooperation among researchers from different affiliations; and more sophisticated, with concepts and themes that moved forward the research agenda closer to the priorites of policies implementatio

Munich RePEc Personal Archive

An academic perspective on the entrepreneurship policy agenda: themes, geographies and evolution

Author: Arenal Alberto
Armuña Cristina
Feijoo Claudio
Moreno Ana
Ramos Sergio
Publication venue
Publication date: 05/06/2019
Field of study

Rare Feature Selection in High Dimensions

Author: Bien Jacob
Yan Xiaohan
Publication venue
Publication date: 08/07/2020
Field of study

It is common in modern prediction problems for many predictor variables to be counts of rarely occurring events. This leads to design matrices in which many columns are highly sparse. The challenge posed by such "rare features" has received little attention despite its prevalence in diverse areas, ranging from natural language processing (e.g., rare words) to biology (e.g., rare species). We show, both theoretically and empirically, that not explicitly accounting for the rareness of features can greatly reduce the effectiveness of an analysis. We next propose a framework for aggregating rare features into denser features in a flexible manner that creates better predictors of the response. Our strategy leverages side information in the form of a tree that encodes feature similarity. We apply our method to data from TripAdvisor, in which we predict the numerical rating of a hotel based on the text of the associated review. Our method achieves high accuracy by making effective use of rare words; by contrast, the lasso is unable to identify highly predictive words if they are too rare. A companion R package, called rare, implements our new estimator, using the alternating direction method of multipliers.Comment: 42 pages, 10 figure

arXiv.org e-Print Archive