218 research outputs found
Toward Reasoning Methods for Automatic Mechanical Repair
A knowledge representation scheme, QUORUM (Qualitative reasoning Of Repair and Understanding of Mechanisms), has been constructed to apply qualitative techniques to the mechanical domain, which is an area that has been neglected in the qualitative reasoning field. In addition, QUORUM aims at providing foundations for building a repair expert system.
The problem in constructing such a representation is the difficulty of recognizing a feasible ontology with which we can express the behavior of mechanical devices and, more importantly, faulty behaviors of a device and their causes. Unlike most other approaches, our ontology employs the notion of force and energy transfer and motion propagation. We discuss how the overall behavior of a device can be derived from knowledge of the structure and the topology of the device, and how faulty behaviors can be predicted based on information about the perturbation of some of the original conditions of the device. Necessary predicates and functions are constructed to express the physical properties of a wide variety of basic and complex mechanisms, and the connection relationships among the parts of mechanisms. Several examples analyzed with QUORUM include a pair of gears, a spring-driven ratchet mechanism, and a pendulum clock. An algorithm for the propagation of force, motion, and causality is proposed and examined
The Evaluation of a Hybrid Critiquing System with Preference-based Recommendations Organization
The critiquing-based recommender system mainly aims to guide users to make an accurate and confident decision, while requiring them to consume a low level of effort. We have previously found that the hybrid critiquing system of combining the strengths from both system-proposed critiques and user self-motivated critiquing facility can highly improve users ’ subjective perceptions such as their decision confidence and trusting intentions. In this paper, we continue to investigate how to further reduce users ’ objective decision effort (e.g. time consumption) in such system by increasing the critique prediction accuracy of the system-proposed critiques. By means of real user evaluation, we proved that a new hybrid critiquing system design that integrates the preferencebased recommendations organization technique for critiques suggestion can effectively help to increase the proposed critiques’ application frequency and significantly contribute to saving users’ task time and interaction effort
Interaction design guidelines on critiquing-based recommender systems
A critiquing-based recommender system acts like an artificial salesperson. It engages users in a conversational dialog where users can provide feedback in the form of critiques to the sample items that were shown to them. The feedback, in turn, enables the system to refine its understanding of the user's preferences and prediction of what the user truly wants. The system is then able to recommend products that may better stimulate the user's interest in the next interaction cycle. In this paper, we report our extensive investigation of comparing various approaches in devising critiquing opportunities designed in these recommender systems. More specifically, we have investigated two major design elements which are necessary for a critiquing-based recommender system: critiquing coverage—one vs. multiple items that are returned during each recommendation cycle to be critiqued; and critiquing aid—system-suggested critiques (i.e., a set of critique suggestions for users to select) vs. user-initiated critiquing facility (i.e., facilitating users to create critiques on their own). Through a series of three user trials, we have measured how real-users reacted to systems with varied setups of the two elements. In particular, it was found that giving users the choice of critiquing one of multiple items (as opposed to just one) has significantly positive impacts on increasing users' decision accuracy (particularly in the first recommendation cycle) and saving their objective effort (in the later critiquing cycles). As for critiquing aids, the hybrid design with both system-suggested critiques and user-initiated critiquing support exhibits the best performance in inspiring users' decision confidence and increasing their intention to return, in comparison with the uncombined exclusive approaches. Therefore, the results from our studies shed light on the design guidelines for determining the sweetspot balancing user initiative and system support in the development of an effective and user-centric critiquing-based recommender syste
An evaluation of semantic fisheye views for opportunistic search in an annotated image collection
Visual interfaces are potentially powerful tools for users to explore a representation of a collection and opportunistically discover information that will guide them toward relevant documents. Semantic fisheye views (SFEVs) are focus + context visualization techniques that manage visual complexity by selectively emphasizing and increasing the detail of information related to the user's focus and deemphasizing or filtering less important information. In this paper we describe a prototype for visualizing an annotated image collection and an experiment to compare the effectiveness of two distinctly different SFEVs for a complex opportunistic search task. The first SFEV calculates relevance based on keyword-content similarity and the second based on conceptual relationships between images derived using WordNet. The results of the experiment suggest that semantic-guided search is significantly more effective than similarity-guided search for discovering and using domain knowledge in a collectio
Is ChatGPT More Empathetic than Humans?
This paper investigates the empathetic responding capabilities of ChatGPT,
particularly its latest iteration, GPT-4, in comparison to human-generated
responses to a wide range of emotional scenarios, both positive and negative.
We employ a rigorous evaluation methodology, involving a between-groups study
with 600 participants, to evaluate the level of empathy in responses generated
by humans and ChatGPT. ChatGPT is prompted in two distinct ways: a standard
approach and one explicitly detailing empathy's cognitive, affective, and
compassionate counterparts. Our findings indicate that the average empathy
rating of responses generated by ChatGPT exceeds those crafted by humans by
approximately 10%. Additionally, instructing ChatGPT to incorporate a clear
understanding of empathy in its responses makes the responses align
approximately 5 times more closely with the expectations of individuals
possessing a high degree of empathy, compared to human responses. The proposed
evaluation framework serves as a scalable and adaptable framework to assess the
empathetic capabilities of newer and updated versions of large language models,
eliminating the need to replicate the current study's results in future
research.Comment: 21 pages, 16 figure
Approximating Human Evaluation of Social Chatbots with Prompting
Once powerful conversational models have become available for a wide
audience, users started actively engaging in social interactions with this
technology. Such unprecedented interaction experiences may pose considerable
social and psychological risks to the users unless the technology is properly
controlled. This creates an urgent need for scalable and robust evaluation
metrics for conversational chatbots. Existing automatic evaluation metrics
usually focus on objective quality measures and disregard subjective
perceptions of social dimensions. Moreover, most of these approaches operate on
pre-produced dialogs from available benchmark corpora, which implies human
involvement for preparing the material for evaluation and, thus, impeded
scalability of the metrics. To address this limitation, we propose to make use
of the emerging large language models (LLMs) from the GPT-family and describe a
new framework allowing to conduct dialog system evaluation with prompting. With
this framework, we are able to achieve full automation of the evaluation
pipeline and reach impressive correlation with the human judgement (up to
Pearson r=0.95 on system level). The underlying concept is to collect synthetic
chat logs of evaluated bots with a LLM in the other-play setting, where LLM is
carefully conditioned to follow a specific scenario. We further explore
different prompting approaches to produce evaluation scores with the same LLM.
The best-performing prompts, containing few-show demonstrations and
instructions, show outstanding performance on the tested dataset and
demonstrate the ability to generalize to other dialog corpora
Agile preference models based on soft constraints
An accurate model of the user’s preferences is a crucial element of most decision support systems. It is often assumed that users have a well-defined and stable set of preferences that can be elicited through a set of questions. However, recent research has shown that people very often construct their preferences on the fly depending on the available decision options. Thus, their answers to a series of questions before seeing decision options are likely to be inconsistent and often lead to erroneous models. To accurately capture preference expressions as people make them, it is necessary for the preference model to be agile: it should allow decision making with an incomplete preference model, and it should let users add, retract or revise individual preferences easily. We show how constraint satisfaction and in particular soft constraints provide the right formalism to do this, and give examples of its implementation in a travel planning tool
Evaluating product search and recommender systems for E-commerce environments
Online systems that help users select the most preferential item from a large electronic catalog are known as product search and recommender systems. Evaluation of various proposed technologies is essential for further development in this area. This paper describes the design and implementation of two user studies in which a particular product search tool, known as example critiquing, was evaluated against a chosen baseline model. The results confirm that example critiquing significantly reduces users' task time and error rate while increasing decision accuracy. Additionally, the results of the second user study show that a particular implementation of example critiquing also made users more confident about their choices. The main contribution is that through these two user studies, an evaluation framework of three criteria was successfully identified, which can be used for evaluating general product search and recommender systems in E-commerce environments. These two experiments and the actual procedures also shed light on some of the most important issues which need to be considered for evaluating such tools, such as the preparation of materials for evaluation, user task design, the context of evaluation, the criteria, the measures and the methodology of result analyse
- …