2,462 research outputs found
The Extraction of Community Structures from Publication Networks to Support Ethnographic Observations of Field Differences in Scientific Communication
The scientific community of researchers in a research specialty is an
important unit of analysis for understanding the field specific shaping of
scientific communication practices. These scientific communities are, however,
a challenging unit of analysis to capture and compare because they overlap,
have fuzzy boundaries, and evolve over time. We describe a network analytic
approach that reveals the complexities of these communities through examination
of their publication networks in combination with insights from ethnographic
field studies. We suggest that the structures revealed indicate overlapping
sub- communities within a research specialty and we provide evidence that they
differ in disciplinary orientation and research practices. By mapping the
community structures of scientific fields we aim to increase confidence about
the domain of validity of ethnographic observations as well as of collaborative
patterns extracted from publication networks thereby enabling the systematic
study of field differences. The network analytic methods presented include
methods to optimize the delineation of a bibliographic data set in order to
adequately represent a research specialty, and methods to extract community
structures from this data. We demonstrate the application of these methods in a
case study of two research specialties in the physical and chemical sciences.Comment: Accepted for publication in JASIS
Cross-concordances: terminology mapping and its effectiveness for information retrieval
The German Federal Ministry for Education and Research funded a major
terminology mapping initiative, which found its conclusion in 2007. The task of
this terminology mapping initiative was to organize, create and manage
'cross-concordances' between controlled vocabularies (thesauri, classification
systems, subject heading lists) centred around the social sciences but quickly
extending to other subject areas. 64 crosswalks with more than 500,000
relations were established. In the final phase of the project, a major
evaluation effort to test and measure the effectiveness of the vocabulary
mappings in an information system environment was conducted. The paper reports
on the cross-concordance work and evaluation results.Comment: 19 pages, 4 figures, 11 tables, IFLA conference 200
Query-Driven Sampling for Collective Entity Resolution
Probabilistic databases play a preeminent role in the processing and
management of uncertain data. Recently, many database research efforts have
integrated probabilistic models into databases to support tasks such as
information extraction and labeling. Many of these efforts are based on batch
oriented inference which inhibits a realtime workflow. One important task is
entity resolution (ER). ER is the process of determining records (mentions) in
a database that correspond to the same real-world entity. Traditional pairwise
ER methods can lead to inconsistencies and low accuracy due to localized
decisions. Leading ER systems solve this problem by collectively resolving all
records using a probabilistic graphical model and Markov chain Monte Carlo
(MCMC) inference. However, for large datasets this is an extremely expensive
process. One key observation is that, such exhaustive ER process incurs a huge
up-front cost, which is wasteful in practice because most users are interested
in only a small subset of entities. In this paper, we advocate pay-as-you-go
entity resolution by developing a number of query-driven collective ER
techniques. We introduce two classes of SQL queries that involve ER operators
--- selection-driven ER and join-driven ER. We implement novel variations of
the MCMC Metropolis Hastings algorithm to generate biased samples and
selectivity-based scheduling algorithms to support the two classes of ER
queries. Finally, we show that query-driven ER algorithms can converge and
return results within minutes over a database populated with the extraction
from a newswire dataset containing 71 million mentions
Methods of Hierarchical Clustering
We survey agglomerative hierarchical clustering algorithms and discuss
efficient implementations that are available in R and other software
environments. We look at hierarchical self-organizing maps, and mixture models.
We review grid-based clustering, focusing on hierarchical density-based
approaches. Finally we describe a recently developed very efficient (linear
time) hierarchical clustering algorithm, which can also be viewed as a
hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference
EUCAT: A Pan-European Index of Union Catalogs
The Andrew W. Mellon Foundation and the National Library of Estonia organized a Conference on Union Catalogs which took place in Tallinn, in the National Library of Estonia on October 17–19, 2002. The Conference presented and discussed analytical papers dealing with various aspects of designing and implementing union catalogs and shared cataloging systems as revealed through the experiences of Eastern European, Baltic and South African research libraries. Here you can find the texts of the conference papers and the list of contributors and participants.The Andrew W. Mellon Foundation and the National Library of Estonia organized a Conference on Union Catalogs which took place in Tallinn, in the National Library of Estonia on October 17–19, 2002. The Conference presented and discussed analytical papers dealing with various aspects of designing and implementing union catalogs and shared cataloging systems as revealed through the experiences of Eastern European, Baltic and South African research libraries. Here you can find the texts of the conference papers and the list of contributors and participants
The man/machine interface in information retrieval: Providing access to the casual user
This study is concerned with the difficulties encountered by casual users wishing to employ Information Storage and Retrieval Systems. A casual user is defined as a professional who has neither time nor desire to pursue in depth the study of the numerous and varied retrieval systems. His needs for on-line search are only occasional, and not limited to any particular system. The paper takes a close look at the state of the art of research concerned with aiding casual users of Information Storage and Retrieval Systems. Current experiments such as LEXIS, CONIT, IIDA, CITE, and CCL are presented and discussed. Comments and proposals are offered, specifically in the areas of training, learning and cost as experienced by the casual user. An extensive bibliography of recent works on the subject follows the text
Knowledge Organization Systems (KOS) in the Semantic Web: A Multi-Dimensional Review
Since the Simple Knowledge Organization System (SKOS) specification and its
SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009 a
significant number of conventional knowledge organization systems (KOS)
(including thesauri, classification schemes, name authorities, and lists of
codes and terms, produced before the arrival of the ontology-wave) have made
their journeys to join the Semantic Web mainstream. This paper uses "LOD KOS"
as an umbrella term to refer to all of the value vocabularies and lightweight
ontologies within the Semantic Web framework. The paper provides an overview of
what the LOD KOS movement has brought to various communities and users. These
are not limited to the colonies of the value vocabulary constructors and
providers, nor the catalogers and indexers who have a long history of applying
the vocabularies to their products. The LOD dataset producers and LOD service
providers, the information architects and interface designers, and researchers
in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper
examines a set of the collected cases (experimental or in real applications)
and aims to find the usages of LOD KOS in order to share the practices and
ideas among communities and users. Through the viewpoints of a number of
different user groups, the functions of LOD KOS are examined from multiple
dimensions. This paper focuses on the LOD dataset producers, vocabulary
producers, and researchers (as end-users of KOS).Comment: 31 pages, 12 figures, accepted paper in International Journal on
Digital Librarie
- …