144 research outputs found

    TNO/UT at TREC-9: How different are Web documents?

    Get PDF
    Although at first sight, the web track might seem a copy of the ad hoc track, we discovered that some small adjustments had to be made to our systems to run the web evaluation. As we expected, the basic language model based IR model worked effectively on this data. Blind feedback methods however, seem less effective on web data. We also experimented with rescoring the documents based on several algorithms that exploit link information. These methods yielded no positive result

    Measuring Author Research Relatedness: A Comparison of Word-based,Topic-based and Author Cocitation Approaches

    Get PDF
    Relationships between authors based on characteristics of published literature have been studied for decades. Author cocitation analysis using mapping techniques has been most frequently used to study how closely two authors are thought to be in intellectual space based on how members of the research community co-cite their works. Other approaches exist to study author relatedness based more directly on the text of their published works. In this study we present static and dynamic word-based approaches using vector space modeling, as well as a topic-based approach based on Latent Dirichlet Allocation for mapping author research relatedness. Vector space modeling is used to define an author space consisting of works by a given author. Outcomes for the two word-based approaches and a topic-based approach for 50 prolific authors in library and information science are compared with more traditional author cocitation analysis using multidimensional scaling and hierarchical cluster analysis. The two word-based approaches produced similar outcomes except where two authors were frequent co-authors for the majority of their articles. The topic-based approach produced the most distinctive map

    A unified approach to mapping and clustering of bibliometric networks

    Get PDF
    In the analysis of bibliometric networks, researchers often use mapping and clustering techniques in a combined fashion. Typically, however, mapping and clustering techniques that are used together rely on very different ideas and assumptions. We propose a unified approach to mapping and clustering of bibliometric networks. We show that the VOS mapping technique and a weighted and parameterized variant of modularity-based clustering can both be derived from the same underlying principle. We illustrate our proposed approach by producing a combined mapping and clustering of the most frequently cited publications that appeared in the field of information science in the period 1999-2008

    A comparison of two techniques for bibliometric mapping: Multidimensional scaling and VOS

    Get PDF
    VOS is a new mapping technique that can serve as an alternative to the well-known technique of multidimensional scaling. We present an extensive comparison between the use of multidimensional scaling and the use of VOS for constructing bibliometric maps. In our theoretical analysis, we show the mathematical relation between the two techniques. In our experimental analysis, we use the techniques for constructing maps of authors, journals, and keywords. Two commonly used approaches to bibliometric mapping, both based on multidimensional scaling, turn out to produce maps that suffer from artifacts. Maps constructed using VOS turn out not to have this problem. We conclude that in general maps constructed using VOS provide a more satisfactory representation of a data set than maps constructed using well-known multidimensional scaling approaches

    Novel interface for an Online Public Access Catalogue: a citation network approach

    Get PDF
    The conventional subject search strategy of querying with words and phrases has been creating a lot of difficulties for the users of Online Public Access Catalogue (OPAC) systems because of the matching problems with the system vocabulary. An alternative is to use search by browsing through related records. In the proposed novel interface for the OPAC, a citation network approach is employed for subject access by browsing. [Continues.

    Improved lexical similarities for hybrid clustering through the use of noun phrases extraction

    Get PDF
    Clustering of hybrid document networks combining citation based links with lexical similarities suffered for a long time from the different properties of these underlying networks. In this paper we evaluate different processing options of noun phrases extracted from abstracts using natural language processing to improve the measurement of the lexical component. Term shingles of different length are created from each of the extracted noun phrases. We discuss twenty different extraction-shingling scenarios and compare their results. Some scenarios show no improvement compared with the previously used single term lexical approach used so far. But when all single term shingles are removed from the dataset the lexical network has properties which are comparable with those from a bibliographic coupling based network. Next, hybrid networks are built based on weighted combination of the two types of similarities with seven different weights. We demonstrate that removing all single term shingles provides the best results at the level of computational feasibility, comparability with bibliographic coupling and also in a community detection application

    Informetrics through Advanced Data Management. Complex Object Restructuring, Data Aggregation and Transitive Computation

    Get PDF
    This article considers how informetric calculations can easily and declaratively be specified through advanced data management techniques. In particular, bibliographic data and its modeling as complex objects (non-first normal form relations) as well as terminological and citation networks involving transitive relationships are considered. A very high-level declarative query interface, based on this data model, is introduced. The article demonstrates that such data modeling and query interface enable end-users to perform basic informetric ad hoc calculations, such as bibliographic coupling, author cocitation analysis, generalized impact factors, international visibility and international impact, productivity calculations in a given area, etc., easily and often with much less effort than in the contemporary online retrieval systems. Several fruitful generalizations of typical informetric measurements are also proposed. These are based on substituting traditional foci of analysis, e.g., journals, by other object types, such as authors, organizations, countries or classes of a classification scheme. It is shown that the proposed data modeling and query interface make it trivial to switch focus between various object types for informetric calculations. Moreover, it is demonstrated that all informetric data can easily be broken down by criteria that foster advanced analysis, e.g., by years or content-bearing attributes. Such modeling allows flexible data aggregation along many dimensions and the utilization of transitive relationships. These salient features emanate from the query interface’s general data restructuring and aggregation capabilities combined with transitive processing capabilities. The features are illustrated by means of sample queries and results
    corecore