32 research outputs found
Assessing Educational Research -- An Information Service for Monitoring a Heterogeneous Research Field
The paper presents a web prototype that visualises different characteristics
of research projects in the heterogeneous domain of educational research. The
concept of the application derives from the project "Monitoring Educational
Research" (MoBi) that aims at identifying and implementing indicators that
adequately describe structural properties and dynamics of the research field.
The prototype enables users to visualise data regarding different indicators,
e.g. "research activity", "funding", "qualification project", "disciplinary
area". Since the application is based on Semantic MediaWikitechnology it
furthermore provides an easily accessible opportunity to collaboratively work
on a database of research projects. Users can jointly and in a semantically
controlled way enter metadata on research projects which are the basis for the
computation and visualisation of indicators.Comment: 8 pages, 10 figures, Libraries in the digital age (LIDA) 2014
conferenc
Modeling and Analysis of Scholar Mobility on Scientific Landscape
Scientific literature till date can be thought of as a partially revealed
landscape, where scholars continue to unveil hidden knowledge by exploring
novel research topics. How do scholars explore the scientific landscape , i.e.,
choose research topics to work on? We propose an agent-based model of topic
mobility behavior where scholars migrate across research topics on the space of
science following different strategies, seeking different utilities. We use
this model to study whether strategies widely used in current scientific
community can provide a balance between individual scientific success and the
efficiency and diversity of the whole academic society. Through extensive
simulations, we provide insights into the roles of different strategies, such
as choosing topics according to research potential or the popularity. Our model
provides a conceptual framework and a computational approach to analyze
scholars' behavior and its impact on scientific production. We also discuss how
such an agent-based modeling approach can be integrated with big real-world
scholarly data.Comment: To appear in BigScholar, WWW 201
Science Models as Value-Added Services for Scholarly Information Systems
The paper introduces scholarly Information Retrieval (IR) as a further
dimension that should be considered in the science modeling debate. The IR use
case is seen as a validation model of the adequacy of science models in
representing and predicting structure and dynamics in science. Particular
conceptualizations of scholarly activity and structures in science are used as
value-added search services to improve retrieval quality: a co-word model
depicting the cognitive structure of a field (used for query expansion), the
Bradford law of information concentration, and a model of co-authorship
networks (both used for re-ranking search results). An evaluation of the
retrieval quality when science model driven services are used turned out that
the models proposed actually provide beneficial effects to retrieval quality.
From an IR perspective, the models studied are therefore verified as expressive
conceptualizations of central phenomena in science. Thus, it could be shown
that the IR perspective can significantly contribute to a better understanding
of scholarly structures and activities.Comment: 26 pages, to appear in Scientometric
The structural role of the core literature in history
The intellectual landscapes of the humanities are mostly uncharted territory.
Little is known on the ways published research of humanist scholars defines
areas of intellectual activity. An open question relates to the structural role
of core literature: highly cited sources, naturally playing a disproportionate
role in the definition of intellectual landscapes. We introduce four indicators
in order to map the structural role played by core sources into connecting
different areas of the intellectual landscape of citing publications (i.e.
communities in the bibliographic coupling network). All indicators factor out
the influence of degree distributions by internalizing a null configuration
model. By considering several datasets focused on history, we show that two
distinct structural actions are performed by the core literature: a global one,
by connecting otherwise separated communities in the landscape, or a local one,
by rising connectivity within communities. In our study, the global action is
mainly performed by small sets of scholarly monographs, reference works and
primary sources, while the rest of the core, and especially most journal
articles, acts mostly locally
Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
Bibliometric techniques are not yet widely used to enhance retrieval
processes in digital libraries, although they offer value-added effects for
users. In this paper we will explore how statistical modelling of scholarship,
such as Bradfordizing or network analysis of coauthorship network, can improve
retrieval services for specific communities, as well as for large, cross-domain
large collections. This paper aims to raise awareness of the missing link
between information retrieval (IR) and bibliometrics / scientometrics and to
create a common ground for the incorporation of bibliometric-enhanced services
into retrieval at the digital library interface.Comment: 4 pages, IEEE BigData 2013, Workshop on Scholarly Big Data:
Challenges and Idea
How to Create an Innovation Accelerator
Too many policy failures are fundamentally failures of knowledge. This has
become particularly apparent during the recent financial and economic crisis,
which is questioning the validity of mainstream scholarly paradigms. We propose
to pursue a multi-disciplinary approach and to establish new institutional
settings which remove or reduce obstacles impeding efficient knowledge
creation. We provided suggestions on (i) how to modernize and improve the
academic publication system, and (ii) how to support scientific coordination,
communication, and co-creation in large-scale multi-disciplinary projects. Both
constitute important elements of what we envision to be a novel ICT
infrastructure called "Innovation Accelerator" or "Knowledge Accelerator".Comment: 32 pages, Visioneer White Paper, see http://www.visioneer.ethz.c
Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches
We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models--BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE.PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts