144 research outputs found
TNO/UT at TREC-9: How different are Web documents?
Although at first sight, the web track might seem a copy of the ad hoc track, we discovered that some small adjustments had to be made to our systems to run the web evaluation. As we expected, the basic language model based IR model worked effectively on this data. Blind feedback methods however, seem less effective on web data. We also experimented with rescoring the documents based on several algorithms that exploit link information. These methods yielded no positive result
Measuring Author Research Relatedness: A Comparison of Word-based,Topic-based and Author Cocitation Approaches
Relationships between authors based on characteristics of published literature have been studied for decades. Author cocitation analysis using mapping techniques has been most frequently used to study how closely two authors are thought to be in intellectual space based on how members of the research community co-cite their works. Other approaches exist to study author relatedness based more directly on the text of their published works. In this study we present static and dynamic word-based approaches using vector space modeling, as well as a topic-based approach based on Latent Dirichlet Allocation for mapping author research relatedness. Vector space modeling is used to define an author space consisting of works by a given author. Outcomes for the two word-based approaches and a topic-based approach for 50 prolific authors in library and information science are compared with more traditional author cocitation analysis using multidimensional scaling and hierarchical cluster analysis. The two word-based approaches produced similar outcomes except where two authors were frequent co-authors for the majority of their articles. The topic-based approach produced the most distinctive map
A unified approach to mapping and clustering of bibliometric networks
In the analysis of bibliometric networks, researchers often use mapping and
clustering techniques in a combined fashion. Typically, however, mapping and
clustering techniques that are used together rely on very different ideas and
assumptions. We propose a unified approach to mapping and clustering of
bibliometric networks. We show that the VOS mapping technique and a weighted
and parameterized variant of modularity-based clustering can both be derived
from the same underlying principle. We illustrate our proposed approach by
producing a combined mapping and clustering of the most frequently cited
publications that appeared in the field of information science in the period
1999-2008
A comparison of two techniques for bibliometric mapping: Multidimensional scaling and VOS
VOS is a new mapping technique that can serve as an alternative to the
well-known technique of multidimensional scaling. We present an extensive
comparison between the use of multidimensional scaling and the use of VOS for
constructing bibliometric maps. In our theoretical analysis, we show the
mathematical relation between the two techniques. In our experimental analysis,
we use the techniques for constructing maps of authors, journals, and keywords.
Two commonly used approaches to bibliometric mapping, both based on
multidimensional scaling, turn out to produce maps that suffer from artifacts.
Maps constructed using VOS turn out not to have this problem. We conclude that
in general maps constructed using VOS provide a more satisfactory
representation of a data set than maps constructed using well-known
multidimensional scaling approaches
Novel interface for an Online Public Access Catalogue: a citation network approach
The conventional subject search strategy of querying with words and
phrases has been creating a lot of difficulties for the users of Online Public
Access Catalogue (OPAC) systems because of the matching problems with
the system vocabulary. An alternative is to use search by browsing through
related records. In the proposed novel interface for the OPAC, a citation
network approach is employed for subject access by browsing. [Continues.
Improved lexical similarities for hybrid clustering through the use of noun phrases extraction
Clustering of hybrid document networks combining citation based links with lexical similarities suffered for a long time from the different properties of these underlying networks. In this paper we evaluate different processing options of noun phrases extracted from abstracts using natural language processing to improve the measurement of the lexical component. Term shingles of different length are created from each of the extracted noun phrases. We discuss twenty different extraction-shingling scenarios and compare their results. Some scenarios show no improvement compared with the previously used single term lexical approach used so far. But when all single term shingles are removed from the dataset the lexical network has properties which are comparable with those from a bibliographic coupling based network. Next, hybrid networks are built based on weighted combination of the two types of similarities with seven different weights. We demonstrate that removing all single term shingles provides the best results at the level of computational feasibility, comparability with bibliographic coupling and also in a community detection application
Informetrics through Advanced Data Management. Complex Object Restructuring, Data Aggregation and Transitive Computation
This article considers how informetric calculations can easily and declaratively be specified through advanced data management techniques. In particular, bibliographic data and its modeling as complex objects (non-first normal form relations) as well as terminological and citation networks involving transitive relationships are considered. A very high-level declarative query interface, based on this data model, is introduced. The article demonstrates that such data modeling and query interface enable end-users to perform basic informetric ad hoc calculations, such as bibliographic coupling, author cocitation analysis, generalized impact factors, international visibility and international impact, productivity calculations in a given area, etc., easily and often with much less effort than in the contemporary online retrieval systems. Several fruitful generalizations of typical informetric measurements are also proposed. These are based on substituting traditional foci of analysis, e.g., journals, by other object types, such as authors, organizations, countries or classes of a classification scheme. It is shown that the proposed data modeling and query interface make it trivial to switch focus between various object types for informetric calculations. Moreover, it is demonstrated that all informetric data can easily be broken down by criteria that foster advanced analysis, e.g., by years or content-bearing attributes. Such modeling allows flexible data aggregation along many dimensions and the utilization of transitive relationships. These salient features emanate from the query interface’s general data restructuring and aggregation capabilities combined with transitive processing capabilities. The features are illustrated by means of sample queries and results
- …