106,363 research outputs found
On the evaluation of aggregated web search
Aggregating search results from a variety of heterogeneous sources or so-called verticals such as news, image and video into a single interface is a popular paradigm in web search. This search paradigm is commonly referred to as aggregated search. The heterogeneity of the information, the richer user interaction, and the more complex presentation strategy, make the evaluation of the aggregated search paradigm quite challenging. The Cranfield paradigm, use of test collections and evaluation measures to assess the effectiveness of information retrieval (IR) systems, is the de-facto standard evaluation strategy in the IR research community and it has its origins in work dating to the early 1960s. This thesis focuses on applying this evaluation paradigm to the context of aggregated web search, contributing to the long-term goal of a complete, reproducible and reliable evaluation methodology for aggregated search in the research community.
The Cranfield paradigm for aggregated search consists of building a test collection and developing a set of evaluation metrics. In the context of aggregated search, a test collection should contain results from a set of verticals, some information needs relating to this task and a set of relevance assessments. The metrics proposed should utilize the information in the test collection in order to measure the performance of any aggregated search pages. The more complex user behavior of aggregated search should be reflected in the test collection through assessments and modeled in the metrics.
Therefore, firstly, we aim to better understand the factors involved in determining relevance for aggregated search and subsequently build a reliable and reusable test collection for this task. By conducting several user studies to assess vertical relevance and creating a test collection by reusing existing test collections, we create a testbed with both the vertical-level (user orientation) and document-level relevance assessments. In addition, we analyze the relationship between both types of assessments and find that they are correlated in terms of measuring the system performance for the user.
Secondly, by utilizing the created test collection, we aim to investigate how to model the aggregated search user in a principled way in order to propose reliable, intuitive and trustworthy evaluation metrics to measure the user experience. We start our investigations by studying solely evaluating one key component of aggregated search: vertical selection, i.e. selecting the relevant verticals. Then we propose a general utility-effort framework to evaluate the ultimate aggregated search pages. We demonstrate the fidelity (predictive power) of the proposed metrics by correlating them to the user preferences of aggregated search pages. Furthermore, we meta-evaluate the reliability and intuitiveness of a variety of metrics and show that our proposed aggregated search metrics are the most reliable and intuitive metrics, compared to adapted diversity-based and traditional IR metrics.
To summarize, in this thesis, we mainly demonstrate the feasibility to apply the Cranfield Paradigm for aggregated search for reproducible, cheap, reliable and trustworthy evaluation
Evaluation of e-learning web sites using fuzzy axiomatic design based approach
High quality web site has been generally recognized as a critical enabler to conduct online business. Numerous studies exist in the literature to measure the business performance in relation to web site quality. In this paper, an axiomatic design based approach for fuzzy group decision making is adopted to evaluate the quality of e-learning web sites. Another multi-criteria decision making technique, namely fuzzy TOPSIS, is applied in order to validate the outcome. The methodology proposed in this paper has the advantage of incorporating requirements and enabling reductions in the problem size, as compared to fuzzy TOPSIS. A case study focusing on Turkish e-learning websites is presented, and based on the empirical findings, managerial implications and recommendations for future research are offered
Long-Term Load Forecasting Considering Volatility Using Multiplicative Error Model
Long-term load forecasting plays a vital role for utilities and planners in
terms of grid development and expansion planning. An overestimate of long-term
electricity load will result in substantial wasted investment in the
construction of excess power facilities, while an underestimate of future load
will result in insufficient generation and unmet demand. This paper presents
first-of-its-kind approach to use multiplicative error model (MEM) in
forecasting load for long-term horizon. MEM originates from the structure of
autoregressive conditional heteroscedasticity (ARCH) model where conditional
variance is dynamically parameterized and it multiplicatively interacts with an
innovation term of time-series. Historical load data, accessed from a U.S.
regional transmission operator, and recession data for years 1993-2016 is used
in this study. The superiority of considering volatility is proven by
out-of-sample forecast results as well as directional accuracy during the great
economic recession of 2008. To incorporate future volatility, backtesting of
MEM model is performed. Two performance indicators used to assess the proposed
model are mean absolute percentage error (for both in-sample model fit and
out-of-sample forecasts) and directional accuracy.Comment: 19 pages, 11 figures, 3 table
Beyond the Power Law: Uncovering Stylized Facts in Interbank Networks
We use daily data on bilateral interbank exposures and monthly bank balance
sheets to study network characteristics of the Russian interbank market over
Aug 1998 - Oct 2004. Specifically, we examine the distributions of (un)directed
(un)weighted degree, nodal attributes (bank assets, capital and
capital-to-assets ratio) and edge weights (loan size and counterparty
exposure). We search for the theoretical distribution that fits the data best
and report the "best" fit parameters. We observe that all studied distributions
are heavy tailed. The fat tail typically contains 20% of the data and can be
mostly described well by a truncated power law. Also the power law, stretched
exponential and log-normal provide reasonably good fits to the tails of the
data. In most cases, however, separating the bulk and tail parts of the data is
hard, so we proceed to study the full range of the events. We find that the
stretched exponential and the log-normal distributions fit the full range of
the data best. These conclusions are robust to 1) whether we aggregate the data
over a week, month, quarter or year; 2) whether we look at the "growth" versus
"maturity" phases of interbank market development; and 3) with minor
exceptions, whether we look at the "normal" versus "crisis" operation periods.
In line with prior research, we find that the network topology changes greatly
as the interbank market moves from a "normal" to a "crisis" operation period.Comment: 17 pages, 9 figure
Recommended from our members
An evaluation methodology for ergonomic design of electronic consumer products based on fuzzy axiomatic design
This article is posted with permission of OCP Science imprint. Copyright @ 2008 Old City Publishing Group.The development life cycle of software and electronic products has been shortened by the growth of rapid prototyping techniques. The evaluation of electronic consumer products should consider hardware and software as well as the ergonomic usability, emotional appeal and aesthetic integrity of the design. This research follows a systematic approach to develop an evaluation methodology for electronic mobile products on ergonomic design. The proposed methodology is based on fuzzy multi attribute decision making and fuzzy axiomatic design realized in three steps; determination of ergonomic attributes for electronic consumer products, determination of a representative set of alternatives, and selection of the best alternative in terms of ergonomic design by utilizing fuzzy axiomatic design. A case study is also provided to support the proposed methodology
Network Model Selection for Task-Focused Attributed Network Inference
Networks are models representing relationships between entities. Often these
relationships are explicitly given, or we must learn a representation which
generalizes and predicts observed behavior in underlying individual data (e.g.
attributes or labels). Whether given or inferred, choosing the best
representation affects subsequent tasks and questions on the network. This work
focuses on model selection to evaluate network representations from data,
focusing on fundamental predictive tasks on networks. We present a modular
methodology using general, interpretable network models, task neighborhood
functions found across domains, and several criteria for robust model
selection. We demonstrate our methodology on three online user activity
datasets and show that network model selection for the appropriate network task
vs. an alternate task increases performance by an order of magnitude in our
experiments
- âŠ