1,434 research outputs found
Activity Recognition Using Gazed Text and Viewpoint Information for User Support Systems
The development of information technology has added many conveniences to our lives. On the other hand, however, we have to deal with various kinds of information, which can be a difficult task for elderly people or those who are not familiar with information devices. A technology to recognize each person’s activity and providing appropriate support based on that activity could be useful for such people. In this paper, we propose a novel fine-grained activity recognition method for user support systems that focuses on identifying the text at which a user is gazing, based on the idea that the content of the text is related to the activity of the user. It is necessary to keep in mind that the meaning of the text depends on its location. To tackle this problem, we propose the simultaneous use of a wearable device and fixed camera. To obtain the global location of the text, we perform image matching using the local features of the images obtained by these two devices. Then, we generate a feature vector based on this information and the content of the text. To show the effectiveness of the proposed approach, we performed activity recognition experiments with six subjects in a laboratory environment
What is the influence of genre during the perception of structured text for retrieval and search?
This thesis presents an investigation into the high value of structured text (or form) in the context of genre within Information Retrieval. In particular, how are these structured texts perceived and why are they not more heavily used within Information Retrieval & Search communities? The main motivation is to show the features in which people can exploit genre within Information Search & Retrieval, in particular, categorisation and search tasks. To do this, it was vital to record and analyse how and why this was done during typical tasks. The literature review highlighted two previous studies (Toms & Campbell 1999a; Watt 2009) which have reported pilot studies consisting of genre categorisation and information searching. Both studies and other findings within the literature review inspired the work contained within this thesis. Genre is notoriously hard to define, but a very useful framework of Purpose and Form, developed by Yates & Orlikowski (1992), was utilised to design two user studies for the research reported within the thesis. The two studies consisted of, first, a categorisation task (e-mails), and second, a set of six simulated situations in Wikipedia, both of which collected quantitative data from eye tracking experiments as well as qualitative user data. The results of both studies showed the extent to which the participants utilised the form features of the stimuli presented, in particular, how these were used, which ocular behaviours (skimming or scanning) and actual features were used, and which were the most important. The main contributions to research made by this thesis were, first of all, that the task-based user evaluations employing simulated search scenarios revealed how and why users make decisions while interacting with the textual features of structure and layout within a discourse community, and, secondly, an extensive evaluation of the quantitative data revealed the features that were used by the participants in the user studies and the effects of the interpretation of genre in the search and categorisation process as well as the perceptual processes used in the various communities. This will be of benefit for the re-development of information systems. As far as is known, this is the first detailed and systematic investigation into the types of features, value of form, perception of features, and layout of genre using eye tracking in online communities, such as Wikipedia
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
Recently, Multimodal Large Language Models (MLLMs) that enable Large Language
Models (LLMs) to interpret images through visual instruction tuning have
achieved significant success. However, existing visual instruction tuning
methods only utilize image-language instruction data to align the language and
image modalities, lacking a more fine-grained cross-modal alignment. In this
paper, we propose Position-enhanced Visual Instruction Tuning (PVIT), which
extends the functionality of MLLMs by integrating an additional region-level
vision encoder. This integration promotes a more detailed comprehension of
images for the MLLM. In addition, to efficiently achieve a fine-grained
alignment between the vision modules and the LLM, we design multiple data
generation strategies to construct an image-region-language instruction
dataset. Finally, we present both quantitative experiments and qualitative
analysis that demonstrate the superiority of the proposed model. Code and data
will be released at https://github.com/PVIT-official/PVIT
Unobtrusive and pervasive video-based eye-gaze tracking
Eye-gaze tracking has long been considered a desktop technology that finds its use inside the traditional office setting, where the operating conditions may be controlled. Nonetheless, recent advancements in mobile technology and a growing interest in capturing natural human behaviour have motivated an emerging interest in tracking eye movements within unconstrained real-life conditions, referred to as pervasive eye-gaze tracking. This critical review focuses on emerging passive and unobtrusive video-based eye-gaze tracking methods in recent literature, with the aim to identify different research avenues that are being followed in response to the challenges of pervasive eye-gaze tracking. Different eye-gaze tracking approaches are discussed in order to bring out their strengths and weaknesses, and to identify any limitations, within the context of pervasive eye-gaze tracking, that have yet to be considered by the computer vision community.peer-reviewe
Artificial Intelligence: Robots, Avatars, and the Demise of the Human Mediator
Published in cooperation with the American Bar Association Section of Dispute Resolutio
Artificial Intelligence: Robots, Avatars, and the Demise of the Human Mediator
Published in cooperation with the American Bar Association Section of Dispute Resolutio
Artificial Intelligence: Robots, Avatars and the Demise of the Human Mediator
As technology has advanced, many have wondered whether (or simply when) artificial intelligent devices will replace the humans who perform complex, interactive, interpersonal tasks such as dispute resolution. Has science now progressed to the point that artificial intelligence devices can replace human mediators, arbitrators, dispute resolvers and problem solvers? Can humanoid robots, attractive avatars and other relational agents create the requisite level of trust and elicit the truthful, perhaps intimate or painful, disclosures often necessary to resolve a dispute or solve a problem? This article will explore these questions. Regardless of whether the reader is convinced that the demise of the human mediator or arbitrator is imminent, one cannot deny that artificial intelligence now has the capability to assume many of the responsibilities currently being performed by alternative dispute resolution (ADR) practitioners. It is fascinating (and perhaps unsettling) to realize the complexity and seriousness of tasks currently delegated to avatars and robots. This article will review some of those delegations and suggest how the artificial intelligence developed to complete those assignments may be relevant to dispute resolution and problem solving. “Relational Agents,” which can have a physical presence such as a robot, be embodied in an avatar, or have no detectable form whatsoever and exist only as software, are able to create long term socio-economic relationships with users built on trust, rapport and therapeutic goals. Relational agents are interacting with humans in circumstances that have significant consequences in the physical world. These interactions provide insights as to how robots and avatars can participate productively in dispute resolution processes. Can human mediators and arbitrators be replaced by robots and avatars that not only physically resemble humans, but also act, think, and reason like humans? And to raise a particularly interesting question, can robots, avatars and other relational agents look, move, act, think, and reason even “better” than humans
Quality Evaluation of Requirements Models: The Case of Goal Models and Scenarios
Context: Requirements Engineering approaches provide expressive model techniques
for requirements elicitation and analysis. Yet, these approaches struggle to manage the
quality of their models, causing difficulties in understanding requirements, and increase
development costs. The models’ quality should be a permanent concern. Objectives: We
propose a mixed-method process for the quantitative evaluation of the quality of requirements
models and their modelling activities. We applied the process to goal-oriented (i*
1.0 and iStar 2.0) and scenario-based (ARNE and ALCO use case templates) models, to
evaluate their usability in terms of appropriateness recognisability and learnability. We
defined (bio)metrics about the models and the way stakeholders interact with them, with
the GQM approach. Methods: The (bio)metrics were evaluated through a family of 16
quasi-experiments with a total of 660 participants. They performed creation, modification,
understanding, and review tasks on the models. We measured their accuracy, speed,
and ease, using metrics of task success, time, and effort, collected with eye-tracking,
electroencephalography and electro-dermal activity, and participants’ opinion, through
NASA-TLX. We characterised the participants with GenderMag, a method for evaluating
usability with a focus on gender-inclusiveness. Results: For i*, participants had better
performance and lower effort when using iStar 2.0, and produced models with lower accidental
complexity. For use cases, participants had better performance and lower effort
when using ALCO. Participants using a textual representation of requirements had higher
performance and lower effort. The results were better for ALCO, followed by ARNE, iStar
2.0, and i* 1.0. Participants with a comprehensive information processing and a conservative
attitude towards risk (characteristics that are frequently seen in females) took
longer to start the tasks but had a higher accuracy. The visual and mental effort was also
higher for these participants. Conclusions: A mixed-method process, with (bio)metric
measurements, can provide reliable quantitative information about the success and effort
of a stakeholder while working on different requirements models’ tasks
- …