19,202 research outputs found

    Estimating Position Bias without Intrusive Interventions

    Full text link
    Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal. While it was recently shown how counterfactual learning-to-rank (LTR) approaches \cite{Joachims/etal/17a} can provably overcome presentation bias when observation propensities are known, it remains to show how to effectively estimate these propensities. In this paper, we propose the first method for producing consistent propensity estimates without manual relevance judgments, disruptive interventions, or restrictive relevance modeling assumptions. First, we show how to harvest a specific type of intervention data from historic feedback logs of multiple different ranking functions, and show that this data is sufficient for consistent propensity estimation in the position-based model. Second, we propose a new extremum estimator that makes effective use of this data. In an empirical evaluation, we find that the new estimator provides superior propensity estimates in two real-world systems -- Arxiv Full-text Search and Google Drive Search. Beyond these two points, we find that the method is robust to a wide range of settings in simulation studies

    Access and usability issues of scholarly electronic publications

    Get PDF
    This chapter looks at the various access and usability issues related to scholarly information resources. It first looks at the various channels through which a user can get access to scholarly electronic publications. It then discusses the issues and studies surrounding usability. Some important parameters for measuring the usability of information access systems have been identified. Finally the chapter looks at the major problems facing the users in getting access to scholarly information through today's hybrid libraries, and mentions some possible measures to resolve these problems

    Methodologies for the Automatic Location of Academic and Educational Texts on the Internet

    Get PDF
    Traditionally online databases of web resources have been compiled by a human editor, or though the submissions of authors or interested parties. Considerable resources are needed to maintain a constant level of input and relevance in the face of increasing material quantity and quality, and much of what is in databases is of an ephemeral nature. These pressures dictate that many databases stagnate after an initial period of enthusiastic data entry. The solution to this problem would seem to be the automatic harvesting of resources, however, this process necessitates the automatic classification of resources as ‘appropriate’ to a given database, a problem only solved by complex text content analysis. This paper outlines the component methodologies necessary to construct such an automated harvesting system, including a number of novel approaches. In particular this paper looks at the specific problems of automatically identifying academic research work and Higher Education pedagogic materials. Where appropriate, experimental data is presented from searches in the field of Geography as well as the Earth and Environmental Sciences. In addition, appropriate software is reviewed where it exists, and future directions are outlined

    Open Access Metadata for Journals in Directory of Open Access Journals: Who, How, and What Scheme?

    Get PDF
    Open access (OA) is a form of publication that allows some level of free access to scholarly publications. The Directory of Open Access Journals (DOAJ) is a repository to which OA journals may apply and upload content to increase discoverability. OA also refers to metadata that is freely available for harvesting. In making metadata open access, standards for schemes and protocols are needed to facilitate interoperability. For open access journals, such as those listed in the DOAJ, providing open access metadata in a form that promotes interoperability is essential for discoverability of their content. This paper investigates what standards exist or are emerging, who within journals is creating the metadata for DOAJ journals, and how are those journals and DOAJ sharing the metadata for articles. Moreover, since creating metadata requires specialized knowledge of both librarians and programmers, it is imperative that journals wanting to publish with OA metadata formulate plans to coordinate these experts and to be sure their efforts are compatible with current standards and protocols

    Building a domain-specific document collection for evaluating metadata effects on information retrieval

    Get PDF
    This paper describes the development of a structured document collection containing user-generated text and numerical metadata for exploring the exploitation of metadata in information retrieval (IR). The collection consists of more than 61,000 documents extracted from YouTube video pages on basketball in general and NBA (National Basketball Association) in particular, together with a set of 40 topics and their relevance judgements. In addition, a collection of nearly 250,000 user profiles related to the NBA collection is available. Several baseline IR experiments report the effect of using video-associated metadata on retrieval effectiveness. The results surprisingly show that searching the videos titles only performs significantly better than searching additional metadata text fields of the videos such as the tags or the description
    • 

    corecore