1,389 research outputs found
Towards Task Understanding in Visual Settings
We consider the problem of understanding real world tasks depicted in visual
images. While most existing image captioning methods excel in producing natural
language descriptions of visual scenes involving human tasks, there is often
the need for an understanding of the exact task being undertaken rather than a
literal description of the scene. We leverage insights from real world task
understanding systems, and propose a framework composed of convolutional neural
networks, and an external hierarchical task ontology to produce task
descriptions from input images. Detailed experiments highlight the efficacy of
the extracted descriptions, which could potentially find their way in many
applications, including image alt text generation.Comment: Accepted as Student Abstract at 33rd AAAI Conference on Artificial
Intelligence, 201
PRESCHOOL TEACHERS’ VIEWS TOWARD HOME VISITS IN THE ACTIVITY OF FAMILY PARTICIPATION WITHIN THE PRESCHOOL CURRICULUM
The aim of this study is to determine preschool teachers’ views on home visits. The study used a semi-structured interview to collect data. The form was developed by the researcher and it consists of two sections: a demographic information section and the views on home visits of preschool teachers’ section. Thirty preschool teachers participated in this study. The results of the research suggested that the pre-school teachers thought that they needed to make home visits to children with problems, to get information about the child’s home environment, to get to know the child, and to learn about their family relations.  Article visualizations
INFORMATION TECHNOLOGY USAGE OF ACCOUNTANTS
The purpose of this article is to investigate the reasons behind the information technology (IT) usage of accountants. On this account in the study, based on the Theory of Reasoned Action developed by Ajzen and Fishbein, attitude-subjective norms-intention and behavior relation is investigated. The effect of attitude and subjective norms towards IT usage behavior on the intention towards IT usage behavior, and the effect of intention towards IT usage behavior on IT usage are investigated. For this purpose, the data is obtained from 456 accountants via a questionnaire. As a result of the regression analysis, it may be determined that the intention towards IT usage behavior has a statistically significant impact on IT usage behavior. If the intention towards IT usage is positive, behavior is also positive. Attitude and subjective norms towards IT usage behavior also have a statistically significant impact on the intention towards IT usage behavior. If an individual’s attitude and subjective norms towards IT usage is positive, the intention towards IT usage is also positive
Feedback driven adaptive combinatorial testing
The configuration spaces of modern software systems are too large to test exhaustively. Combinatorial interaction testing (CIT) approaches, such as covering arrays, systematically sample the configuration space and test only the selected configurations. The basic justification for CIT approaches is that they can cost-effectively exercise all system behaviors caused by the settings of t or fewer options. We conjecture, however, that in practice many such behaviors are not actually tested because of masking effects – failures that perturb execution so as to prevent some behaviors from being exercised. In this work we present a feedback-driven, adaptive, combinatorial testing approach aimed at detecting and working around masking effects. At each iteration we detect potential masking effects, heuristically isolate their likely causes, and then generate new covering arrays that allow previously masked combinations to be tested in the subsequent iteration. We empirically assess the effectiveness of the proposed approach on two large widely used open source software systems. Our results suggest that masking effects do exist and that our approach provides a promising and efficient way to work around them
Auditing Search Engines for Differential Satisfaction Across Demographics
Many online services, such as search engines, social media platforms, and
digital marketplaces, are advertised as being available to any user, regardless
of their age, gender, or other demographic factors. However, there are growing
concerns that these services may systematically underserve some groups of
users. In this paper, we present a framework for internally auditing such
services for differences in user satisfaction across demographic groups, using
search engines as a case study. We first explain the pitfalls of na\"ively
comparing the behavioral metrics that are commonly used to evaluate search
engines. We then propose three methods for measuring latent differences in user
satisfaction from observed differences in evaluation metrics. To develop these
methods, we drew on ideas from the causal inference literature and the
multilevel modeling literature. Our framework is broadly applicable to other
online services, and provides general insight into interpreting their
evaluation metrics.Comment: 8 pages Accepted at WWW 201
Generating Query Suggestions to Support Task-Based Search
We address the problem of generating query suggestions to support users in
completing their underlying tasks (which motivated them to search in the first
place). Given an initial query, these query suggestions should provide a
coverage of possible subtasks the user might be looking for. We propose a
probabilistic modeling framework that obtains keyphrases from multiple sources
and generates query suggestions from these keyphrases. Using the test suites of
the TREC Tasks track, we evaluate and analyze each component of our model.Comment: Proceedings of the 40th International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR '17), 201
MultiWOZ 2.4: A Multi-Domain Task-Oriented Dialogue Dataset with Essential Annotation Corrections to Improve State Tracking Evaluation
The MultiWOZ 2.0 dataset has greatly stimulated the research of task-oriented
dialogue systems. However, its state annotations contain substantial noise,
which hinders a proper evaluation of model performance. To address this issue,
massive efforts were devoted to correcting the annotations. Three improved
versions (i.e., MultiWOZ 2.1-2.3) have then been released. Nonetheless, there
are still plenty of incorrect and inconsistent annotations. This work
introduces MultiWOZ 2.4, which refines the annotations in the validation set
and test set of MultiWOZ 2.1. The annotations in the training set remain
unchanged (same as MultiWOZ 2.1) to elicit robust and noise-resilient model
training. We benchmark eight state-of-the-art dialogue state tracking models on
MultiWOZ 2.4. All of them demonstrate much higher performance than on MultiWOZ
2.1
ASSIST: Towards Label Noise-Robust Dialogue State Tracking
The MultiWOZ 2.0 dataset has greatly boosted the research on dialogue state
tracking (DST). However, substantial noise has been discovered in its state
annotations. Such noise brings about huge challenges for training DST models
robustly. Although several refined versions, including MultiWOZ 2.1-2.4, have
been published recently, there are still lots of noisy labels, especially in
the training set. Besides, it is costly to rectify all the problematic
annotations. In this paper, instead of improving the annotation quality
further, we propose a general framework, named ASSIST (lAbel noiSe-robuSt
dIalogue State Tracking), to train DST models robustly from noisy labels.
ASSIST first generates pseudo labels for each sample in the training set by
using an auxiliary model trained on a small clean dataset, then puts the
generated pseudo labels and vanilla noisy labels together to train the primary
model. We show the validity of ASSIST theoretically. Experimental results also
demonstrate that ASSIST improves the joint goal accuracy of DST by up to
on MultiWOZ 2.0 and on MultiWOZ 2.4, compared to using only
the vanilla noisy labels
Evaluating the Cranfield Paradigm for Conversational Search Systems
Due to the sequential and interactive nature of conversations, the
application of traditional Information Retrieval (IR) methods like
the Cranfield paradigm require stronger assumptions. When building a test collection for Ad Hoc search, it is fair to assume that the
relevance judgments provided by an annotator correlate well with
the relevance judgments perceived by an actual user of the search
engine. However, when building a test collection for conversational
search, we do not know if it is fair to assume that the relevance judgments provided by an annotator correlate well with the relevance
judgments perceived by an actual user of the conversational search
system. In this paper, we perform a crowdsourcing study to evaluate
the applicability of the Cranfield paradigm to conversational search
systems. Our main aim is to understand what is the agreement in
terms of user satisfaction between the users performing a search
task in a conversational search system (i.e., directly assessing the
system) and the users observing the search task being performed
(i.e., indirectly assessing the system). The result of this study is
paramount because it underpins and guides 1) the development of
more realistic user models and simulators, and 2) the design of more
reliable and robust evaluation measures for conversational search
systems. Our results show that there is a fair agreement between
direct and indirect assessments in terms of user satisfaction and
that these two kinds of assessments share similar conversational
patterns. Indeed, by collecting relevance assessments for each system utterance, we tested several conversational patterns that show
a promising ability to predict user satisfaction
- …