106,363 research outputs found

    On the evaluation of aggregated web search

    Get PDF
    Aggregating search results from a variety of heterogeneous sources or so-called verticals such as news, image and video into a single interface is a popular paradigm in web search. This search paradigm is commonly referred to as aggregated search. The heterogeneity of the information, the richer user interaction, and the more complex presentation strategy, make the evaluation of the aggregated search paradigm quite challenging. The Cranfield paradigm, use of test collections and evaluation measures to assess the effectiveness of information retrieval (IR) systems, is the de-facto standard evaluation strategy in the IR research community and it has its origins in work dating to the early 1960s. This thesis focuses on applying this evaluation paradigm to the context of aggregated web search, contributing to the long-term goal of a complete, reproducible and reliable evaluation methodology for aggregated search in the research community. The Cranfield paradigm for aggregated search consists of building a test collection and developing a set of evaluation metrics. In the context of aggregated search, a test collection should contain results from a set of verticals, some information needs relating to this task and a set of relevance assessments. The metrics proposed should utilize the information in the test collection in order to measure the performance of any aggregated search pages. The more complex user behavior of aggregated search should be reflected in the test collection through assessments and modeled in the metrics. Therefore, firstly, we aim to better understand the factors involved in determining relevance for aggregated search and subsequently build a reliable and reusable test collection for this task. By conducting several user studies to assess vertical relevance and creating a test collection by reusing existing test collections, we create a testbed with both the vertical-level (user orientation) and document-level relevance assessments. In addition, we analyze the relationship between both types of assessments and find that they are correlated in terms of measuring the system performance for the user. Secondly, by utilizing the created test collection, we aim to investigate how to model the aggregated search user in a principled way in order to propose reliable, intuitive and trustworthy evaluation metrics to measure the user experience. We start our investigations by studying solely evaluating one key component of aggregated search: vertical selection, i.e. selecting the relevant verticals. Then we propose a general utility-effort framework to evaluate the ultimate aggregated search pages. We demonstrate the fidelity (predictive power) of the proposed metrics by correlating them to the user preferences of aggregated search pages. Furthermore, we meta-evaluate the reliability and intuitiveness of a variety of metrics and show that our proposed aggregated search metrics are the most reliable and intuitive metrics, compared to adapted diversity-based and traditional IR metrics. To summarize, in this thesis, we mainly demonstrate the feasibility to apply the Cranfield Paradigm for aggregated search for reproducible, cheap, reliable and trustworthy evaluation

    Evaluation of e-learning web sites using fuzzy axiomatic design based approach

    Get PDF
    High quality web site has been generally recognized as a critical enabler to conduct online business. Numerous studies exist in the literature to measure the business performance in relation to web site quality. In this paper, an axiomatic design based approach for fuzzy group decision making is adopted to evaluate the quality of e-learning web sites. Another multi-criteria decision making technique, namely fuzzy TOPSIS, is applied in order to validate the outcome. The methodology proposed in this paper has the advantage of incorporating requirements and enabling reductions in the problem size, as compared to fuzzy TOPSIS. A case study focusing on Turkish e-learning websites is presented, and based on the empirical findings, managerial implications and recommendations for future research are offered

    Long-Term Load Forecasting Considering Volatility Using Multiplicative Error Model

    Full text link
    Long-term load forecasting plays a vital role for utilities and planners in terms of grid development and expansion planning. An overestimate of long-term electricity load will result in substantial wasted investment in the construction of excess power facilities, while an underestimate of future load will result in insufficient generation and unmet demand. This paper presents first-of-its-kind approach to use multiplicative error model (MEM) in forecasting load for long-term horizon. MEM originates from the structure of autoregressive conditional heteroscedasticity (ARCH) model where conditional variance is dynamically parameterized and it multiplicatively interacts with an innovation term of time-series. Historical load data, accessed from a U.S. regional transmission operator, and recession data for years 1993-2016 is used in this study. The superiority of considering volatility is proven by out-of-sample forecast results as well as directional accuracy during the great economic recession of 2008. To incorporate future volatility, backtesting of MEM model is performed. Two performance indicators used to assess the proposed model are mean absolute percentage error (for both in-sample model fit and out-of-sample forecasts) and directional accuracy.Comment: 19 pages, 11 figures, 3 table

    Beyond the Power Law: Uncovering Stylized Facts in Interbank Networks

    Full text link
    We use daily data on bilateral interbank exposures and monthly bank balance sheets to study network characteristics of the Russian interbank market over Aug 1998 - Oct 2004. Specifically, we examine the distributions of (un)directed (un)weighted degree, nodal attributes (bank assets, capital and capital-to-assets ratio) and edge weights (loan size and counterparty exposure). We search for the theoretical distribution that fits the data best and report the "best" fit parameters. We observe that all studied distributions are heavy tailed. The fat tail typically contains 20% of the data and can be mostly described well by a truncated power law. Also the power law, stretched exponential and log-normal provide reasonably good fits to the tails of the data. In most cases, however, separating the bulk and tail parts of the data is hard, so we proceed to study the full range of the events. We find that the stretched exponential and the log-normal distributions fit the full range of the data best. These conclusions are robust to 1) whether we aggregate the data over a week, month, quarter or year; 2) whether we look at the "growth" versus "maturity" phases of interbank market development; and 3) with minor exceptions, whether we look at the "normal" versus "crisis" operation periods. In line with prior research, we find that the network topology changes greatly as the interbank market moves from a "normal" to a "crisis" operation period.Comment: 17 pages, 9 figure

    Network Model Selection for Task-Focused Attributed Network Inference

    Full text link
    Networks are models representing relationships between entities. Often these relationships are explicitly given, or we must learn a representation which generalizes and predicts observed behavior in underlying individual data (e.g. attributes or labels). Whether given or inferred, choosing the best representation affects subsequent tasks and questions on the network. This work focuses on model selection to evaluate network representations from data, focusing on fundamental predictive tasks on networks. We present a modular methodology using general, interpretable network models, task neighborhood functions found across domains, and several criteria for robust model selection. We demonstrate our methodology on three online user activity datasets and show that network model selection for the appropriate network task vs. an alternate task increases performance by an order of magnitude in our experiments
    • 

    corecore