214 research outputs found
Investigating Bell Inequalities for Multidimensional Relevance Judgments in Information Retrieval
Relevance judgment in Information Retrieval is influenced by multiple factors. These include not only the topicality of the documents but also other user oriented factors like trust, user interest, etc. Recent works have identified and classified these various factors into seven dimensions of relevance. In a previous work, these relevance dimensions were quantified and user's cognitive state with respect to a document was represented as a state vector in a Hilbert Space, with each relevance dimension representing a basis. It was observed that relevance dimensions are incompatible in some documents, when making a judgment. Incompatibility being a fundamental feature of Quantum Theory, this motivated us to test the Quantum nature of relevance judgments using Bell type inequalities. However, none of the Bell-type inequalities tested have shown any violation. We discuss our methodology to construct incompatible basis for documents from real world query log data, the experiments to test Bell inequalities on this dataset and possible reasons for the lack of violation
On crowdsourcing relevance magnitudes for information retrieval evaluation
4siMagnitude estimation is a psychophysical scaling technique for the measurement of sensation, where observers assign numbers to stimuli in response to their perceived intensity. We investigate the use of magnitude estimation for judging the relevance of documents for information retrieval evaluation, carrying out a large-scale user study across 18 TREC topics and collecting over 50,000 magnitude estimation judgments using crowdsourcing. Our analysis shows that magnitude estimation judgments can be reliably collected using crowdsourcing, are competitive in terms of assessor cost, and are, on average, rank-aligned with ordinal judgments made by expert relevance assessors. We explore the application of magnitude estimation for IR evaluation, calibrating two gain-based effectiveness metrics, nDCG and ERR, directly from user-reported perceptions of relevance. A comparison of TREC system effectiveness rankings based on binary, ordinal, and magnitude estimation relevance shows substantial variation; in particular, the top systems ranked using magnitude estimation and ordinal judgments differ substantially. Analysis of the magnitude estimation scores shows that this effect is due in part to varying perceptions of relevance: different users have different perceptions of the impact of relative differences in document relevance. These results have direct implications for IR evaluation, suggesting that current assumptions about a single view of relevance being sufficient to represent a population of users are unlikely to hold.partially_openopenMaddalena, Eddy; Mizzaro, Stefano; Scholer, Falk; Turpin, AndrewMaddalena, Eddy; Mizzaro, Stefano; Scholer, Falk; Turpin, Andre
Crowdsourcing for Engineering Design: Objective Evaluations and Subjective Preferences
Crowdsourcing enables designers to reach out to large numbers of people who may not have been previously considered when designing a new product, listen to their input by aggregating their preferences and evaluations over potential designs, aiming to improve ``good'' and catch ``bad'' design decisions during the early-stage design process. This approach puts human designers--be they industrial designers, engineers, marketers, or executives--at the forefront, with computational crowdsourcing systems on the backend to aggregate subjective preferences (e.g., which next-generation Brand A design best competes stylistically with next-generation Brand B designs?) or objective evaluations (e.g., which military vehicle design has the best situational awareness?). These crowdsourcing aggregation systems are built using probabilistic approaches that account for the irrationality of human behavior (i.e., violations of reflexivity, symmetry, and transitivity), approximated by modern machine learning algorithms and optimization techniques as necessitated by the scale of data (millions of data points, hundreds of thousands of dimensions).
This dissertation presents research findings suggesting the unsuitability of current off-the-shelf crowdsourcing aggregation algorithms for real engineering design tasks due to the sparsity of expertise in the crowd, and methods that mitigate this limitation by incorporating appropriate information for expertise prediction. Next, we introduce and interpret a number of new probabilistic models for crowdsourced design to provide large-scale preference prediction and full design space generation, building on statistical and machine learning techniques such as sampling methods, variational inference, and deep representation learning. Finally, we show how these models and algorithms can advance crowdsourcing systems by abstracting away the underlying appropriate yet unwieldy mathematics, to easier-to-use visual interfaces practical for engineering design companies and governmental agencies engaged in complex engineering systems design.PhDDesign ScienceUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133438/1/aburnap_1.pd
Study of Relevance and Effort across Devices
Relevance judgements are essential for designing information
retrieval systems. Traditionally, judgements have been judgements have been gathered via desktop interfaces. However, with the rise in popularity of smaller devices for information access, it has become imperative to investigate whether desktop based judgements are different from judgements gathered using mobiles. Recently, user effort and document usefulness have also emerged as important dimensions to optimize and evaluate information retrieval systems. Since existing work is limited to desktops, it remains to be seen how these judgements are affected by user’s search device.
In this paper, we address these shortcomings by collecting
and analyzing relevance, usefulness and effort judgements on
mobiles and desktops. Analysis of these judgements indicates
that high agreement rate between desktop and mobile judges
for relevance, followed by usefulness and findability. We also found that desktop judges are likely to spend more time and examine documents in greater depth on non-relevant/notuseful/difficult documents compared to mobile judges. Based on our findings, we suggest that relevance judgements should be gathered via desktops and effort judgements should be collected on each device independently
Mining Crowdsourced First Impressions in Online Social Video
While multimedia and social computing research have used crowdsourcing techniques to annotate objects, actions, and scenes in social video sites like YouTube, little work has ad- dressed the crowdsourcing of personal and social traits in online social video or social media content in general. In this paper, we address the problems of (1) crowdsourcing the annotation of first impressions of video bloggers (vloggers) personal and social traits in conversational YouTube videos, and (2) mining the impressions with the goal of modeling the interplay of different vlogger facets. First, we design a human annotation task to crowdsource impressions of vloggers that extends a tradition of studies of personality impressions with the addition of attractiveness and mood impressions. Second, we propose a probabilistic framework using Topic Models to discover prototypical impressions that are data driven, and that combine multiple facets of vloggers. Finally, we address the task of automatically predicting topic impressions using nonverbal and verbal content extracted from videos and comments. Our study of 442 YouTube vlogs and 2,210 annotations collected in Mechanical Turk supports recent literature showing the feasibility to crowdsource interpersonal human impression with comparable quality to what is reported in social psychology research, and provides insights on the interplay among human first impressions. We also show that topic models are useful to discover meaningful prototypical impressions that can be validated by humans, and that different topics can be predicted using different sources of information from vloggers’ nonverbal and verbal content, as well as comments from the audience
Understanding relationship quality in hospitality services : A study based on text analytics and partial least squares
Purpose – The purpose of this paper is to analyze the occurrence of terms to identify the relevant topics and
then to investigate the area (based on topics) of hospitality services that is highly associated with relationship
quality. This research represents an opportunity to fill the gap in the current literature, and clarify the
understanding of guests’ affective states by evaluating all aspects of their relationship with a hotel.
Design/methodology/approach – This research focuses on natural opinions upon which machine-learning
algorithms can be executed: text summarization, sentiment analysis and latent Dirichlet allocation (LDA).
Our data set contains 47,172 reviews of 33 hotels located in Las Vegas, and registered with Yelp. A component-
based structural equation modeling (partial least squares (PLS)) is applied, with a dual – exploratory and
predictive – purpose.
Findings – To maintain a truly loyal relationship and to achieve competitive success, hospitality managers
must take into account both tangible and intangible features when allocating their marketing efforts to
satisfaction-, trust- and commitment-based cues. On the other hand, the application of the PLS predict
algorithm demonstrates the predictive performance (out-of-sample prediction) of our model that supports its
ability to predict new and accurate values for individual cases when further samples are added.
Originality/value – LDA and PLS produce relevant informative summaries of corpora, and confirm and
address more specifically the results of the previous literature concerning relationship quality. Our results
are more reliable and accurate (providing insights not indicated in guests’ ratings into how hotels can
improve their services) than prior statistical results based on limited sample data and on numerical
satisfaction ratings alone
Recommended from our members
Qualitative meta-analysis of propensity to trust measurement
In a rapidly changing and dynamic world, individuals’ propensity to trust is likely to become an increasingly important facet for understanding human behaviour, yet its measurement has mostly been unexplored. We undertake the first systematic qualitative survey of propensity to trust scales using qualitative meta-analysis methodology to review the literature (1966–2018) and identify 26 measures and their applications in 179 studies. Using content analysis, we thematically organise these scales into six thematic areas and discuss the emerging implications. We find that while most of these scales reflect propensity to trust in terms of a positive belief in human nature, other themes include general trust, role expectations, institutional trust, cautiousness and other personality attributes. We reveal significant methodological concerns regarding several scales and argue for more considered selection of scales for use in research. We examine the case for multidimensionality in measures of propensity to trust used within organisational research. Rather than treating a lack of generalisability of findings in existing organisational studies as purely a problem of measurement design, we instead outline an agenda for further conceptual and empirical study
A Survey of Quantum Theory Inspired Approaches to Information Retrieval
Since 2004, researchers have been using the mathematical framework of Quantum Theory (QT) in Information Retrieval (IR). QT offers a generalized probability and logic framework. Such a framework has been shown capable of unifying the representation, ranking and user cognitive aspects of IR, and helpful in developing more dynamic, adaptive and context-aware IR systems. Although Quantum-inspired IR is still a growing area, a wide array of work in different aspects of IR has been done and produced promising results. This paper presents a survey of the research done in this area, aiming to show the landscape of the field and draw a road-map of future directions
- …