40,197 research outputs found

    Toward a document evaluation methodology: What does research tell us about the validity and reliability of evaluation methods?

    Get PDF
    Although the usefulness of evaluating documents has become generally accepted among communication professionals, the supporting research that puts evaluation practices empirically to the test is only beginning to emerge. This article presents an overview of the available research on troubleshooting evaluation methods. Four lines of research are distinguished concerning the validity of evaluation methods, sample composition, sample size, and the implementation of evaluation results during revisio

    Does document relevance affect the searcher's perception 0f time?

    Get PDF
    Time plays an essential role in multiple areas of Information Retrieval (IR) studies such as search evaluation, user behavior analysis, temporal search result ranking and query understanding. Especially, in search evaluation studies, time is usually adopted as a measure to quantify users' efforts in search processes. Psychological studies have reported that the time perception of human beings can be affected by many stimuli, such as attention and motivation, which are closely related to many cognitive factors in search. Considering the fact that users' search experiences are affected by their subjective feelings of time, rather than the objective time measured by timing devices, it is necessary to look into the different factors that have impacts on search users' perception of time. In this work, we make a first step towards revealing the time perception mechanism of search users with the following contributions: (1) We establish an experimental research framework to measure the subjective perception of time while reading documents in search scenario, which originates from but is also different from traditional time perception measurements in psychological studies. (2) With the framework, we show that while users are reading result documents, document relevance has small yet visible effect on search users' perception of time. By further examining the impact of other factors, we demonstrate that the effect on relevant documents can also be influenced by individuals and tasks. (3) We conduct a preliminary experiment in which the difference between perceived time and dwell time is taken into consideration in a search evaluation task. We found that the revised framework achieved a better correlation with users' satisfaction feedbacks. This work may help us better understand the time perception mechanism of search users and provide insights in how to better incorporate time factor in search evaluation studies

    REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering

    Full text link
    Considering the limited internal parametric knowledge, retrieval-augmented generation (RAG) has been widely used to extend the knowledge scope of large language models (LLMs). Despite the extensive efforts on RAG research, in existing methods, LLMs cannot precisely assess the relevance of retrieved documents, thus likely leading to misleading or even incorrect utilization of external knowledge (i.e., retrieved documents). To address this issue, in this paper, we propose REAR, a RElevance-Aware Retrieval-augmented approach for open-domain question answering (QA). As the key motivation, we aim to enhance the self-awareness of source relevance for LLMs, so as to adaptively utilize external knowledge in RAG systems. Specially, we develop a new architecture for LLM based RAG system, by incorporating a specially designed rank head that precisely assesses the relevance of retrieved documents. Furthermore, we propose an improved training method based on bi-granularity relevance fusion and noise-resistant training. By combining the improvements in both architecture and training, our proposed REAR can better utilize external knowledge by effectively perceiving the relevance of retrieved documents. Experiments on four open-domain QA tasks show that REAR significantly outperforms previous a number of competitive RAG approaches. Our code and data can be accessed at https://github.com/RUCAIBox/REAR


    Get PDF
    Purpose - Critical role of accounting and financial reporting is providing useful information for different and entitled users to help them in making economical decisions. While repeatedly it is stressed that the quality of financial information is a function of both the quality of accounting standards and the regulatory enforcement, it is vital that standard setting authorities bodies to have independence and suitable enforcement power to guarantee their issued standards implementation with accountants in preparing and releasing accounting information, where their enacting mechanisms differ significantly across countries, even being non-existent in some countries. This study seeks with aid of Abdolmohammadi’s enforcement powers classification of standards (2002) including: Reward, Legitimate, Referent, Expert and Coercive Powers, determine from perspective of respondents and current condition of accounting profession, which powers are dominant, besides it also tends to evaluate past performance of Iranian accounting regulatory. Design/methodology/approach - in order to test two main hypotheses of the study, a suitable questionnaire was used with some questions about current condition of enforcement ways of accounting standards in Iran. 281 questionnaires distributed among accounting related financial society members including: accountants, auditors, bank specialists, and accounting students as agents of financial society. After assuring of its validity and reliability, collected data tested by Kruskal-Wallis, Friedman, and T-test statistical methods. Findings - The results showed that among various enforcement accounting standards powers respondents believe coercive power is more apparent and main motivation for providing accounting formal reports in accordance to GAAP come from managers’ concern of blocking their companies stock dealing by Tehran Stock Exchange organization, besides they accept standard setting professional abilities. Also respondents believe that Iran’s Audit organization in standard setting process has had behaved unfairly and didn’t pay attention to regulate accounting of governmental and Not-For-Profits parts as equal as large private corporation accounting. Research limitation/implication - A key technical result is that the five original powers of enforcement accounting standards don’t have equal weight and influence on current accounting environment of Iran and to enhance disclosure quality and reduce information asymmetry, some work must been done to more highlight powers with positive and professional perspective. Practical implications - The paper will be of interest to standards setting authority bodies’ when regulating accounting information releasing process to achieve high level of market efficiency and also to academics’ investigating the reliability and value of current standard setting condition. Originality value - The paper reports an original application of accounting standard enforcement origins as a determinant level of dominance financial wisdom in financial society of Iran.Accounting standard, enforcement powers, Information asymmetry, financial wisdom, and fairness in standard setting.

    Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models

    Full text link
    Listwise rerankers based on large language models (LLM) are the zero-shot state-of-the-art. However, current works in this direction all depend on the GPT models, making it a single point of failure in scientific reproducibility. Moreover, it raises the concern that the current research findings only hold for GPT models but not LLM in general. In this work, we lift this pre-condition and build for the first time effective listwise rerankers without any form of dependency on GPT. Our passage retrieval experiments show that our best list se reranker surpasses the listwise rerankers based on GPT-3.5 by 13% and achieves 97% effectiveness of the ones built on GPT-4. Our results also show that the existing training datasets, which were expressly constructed for pointwise ranking, are insufficient for building such listwise rerankers. Instead, high-quality listwise ranking data is required and crucial, calling for further work on building human-annotated listwise data resources

    Weighting Passages Enhances Accuracy

    Get PDF
    We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of the document. We study a multiplicity of partitioning schemes of document content into passages and compute the collection-dependent weights associated with them on the basis of the distribution of occurrences of salient terms in documents. Moreover, we tune BM25P hyperparameters and investigate their impact on ad hoc document retrieval through fully reproducible experiments conducted using four publicly available datasets. Our findings demonstrate that our BM25P weighting model markedly and consistently outperforms BM25 in terms of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1, and up to 21% in MRR