1,434 research outputs found

    Activity Recognition Using Gazed Text and Viewpoint Information for User Support Systems

    Get PDF
    The development of information technology has added many conveniences to our lives. On the other hand, however, we have to deal with various kinds of information, which can be a difficult task for elderly people or those who are not familiar with information devices. A technology to recognize each person’s activity and providing appropriate support based on that activity could be useful for such people. In this paper, we propose a novel fine-grained activity recognition method for user support systems that focuses on identifying the text at which a user is gazing, based on the idea that the content of the text is related to the activity of the user. It is necessary to keep in mind that the meaning of the text depends on its location. To tackle this problem, we propose the simultaneous use of a wearable device and fixed camera. To obtain the global location of the text, we perform image matching using the local features of the images obtained by these two devices. Then, we generate a feature vector based on this information and the content of the text. To show the effectiveness of the proposed approach, we performed activity recognition experiments with six subjects in a laboratory environment

    What is the influence of genre during the perception of structured text for retrieval and search?

    Get PDF
    This thesis presents an investigation into the high value of structured text (or form) in the context of genre within Information Retrieval. In particular, how are these structured texts perceived and why are they not more heavily used within Information Retrieval & Search communities? The main motivation is to show the features in which people can exploit genre within Information Search & Retrieval, in particular, categorisation and search tasks. To do this, it was vital to record and analyse how and why this was done during typical tasks. The literature review highlighted two previous studies (Toms & Campbell 1999a; Watt 2009) which have reported pilot studies consisting of genre categorisation and information searching. Both studies and other findings within the literature review inspired the work contained within this thesis. Genre is notoriously hard to define, but a very useful framework of Purpose and Form, developed by Yates & Orlikowski (1992), was utilised to design two user studies for the research reported within the thesis. The two studies consisted of, first, a categorisation task (e-mails), and second, a set of six simulated situations in Wikipedia, both of which collected quantitative data from eye tracking experiments as well as qualitative user data. The results of both studies showed the extent to which the participants utilised the form features of the stimuli presented, in particular, how these were used, which ocular behaviours (skimming or scanning) and actual features were used, and which were the most important. The main contributions to research made by this thesis were, first of all, that the task-based user evaluations employing simulated search scenarios revealed how and why users make decisions while interacting with the textual features of structure and layout within a discourse community, and, secondly, an extensive evaluation of the quantitative data revealed the features that were used by the participants in the user studies and the effects of the interpretation of genre in the search and categorisation process as well as the perceptual processes used in the various communities. This will be of benefit for the re-development of information systems. As far as is known, this is the first detailed and systematic investigation into the types of features, value of form, perception of features, and layout of genre using eye tracking in online communities, such as Wikipedia

    Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models

    Full text link
    Recently, Multimodal Large Language Models (MLLMs) that enable Large Language Models (LLMs) to interpret images through visual instruction tuning have achieved significant success. However, existing visual instruction tuning methods only utilize image-language instruction data to align the language and image modalities, lacking a more fine-grained cross-modal alignment. In this paper, we propose Position-enhanced Visual Instruction Tuning (PVIT), which extends the functionality of MLLMs by integrating an additional region-level vision encoder. This integration promotes a more detailed comprehension of images for the MLLM. In addition, to efficiently achieve a fine-grained alignment between the vision modules and the LLM, we design multiple data generation strategies to construct an image-region-language instruction dataset. Finally, we present both quantitative experiments and qualitative analysis that demonstrate the superiority of the proposed model. Code and data will be released at https://github.com/PVIT-official/PVIT

    Unobtrusive and pervasive video-based eye-gaze tracking

    Get PDF
    Eye-gaze tracking has long been considered a desktop technology that finds its use inside the traditional office setting, where the operating conditions may be controlled. Nonetheless, recent advancements in mobile technology and a growing interest in capturing natural human behaviour have motivated an emerging interest in tracking eye movements within unconstrained real-life conditions, referred to as pervasive eye-gaze tracking. This critical review focuses on emerging passive and unobtrusive video-based eye-gaze tracking methods in recent literature, with the aim to identify different research avenues that are being followed in response to the challenges of pervasive eye-gaze tracking. Different eye-gaze tracking approaches are discussed in order to bring out their strengths and weaknesses, and to identify any limitations, within the context of pervasive eye-gaze tracking, that have yet to be considered by the computer vision community.peer-reviewe

    Artificial Intelligence: Robots, Avatars, and the Demise of the Human Mediator

    Get PDF
    Published in cooperation with the American Bar Association Section of Dispute Resolutio

    Artificial Intelligence: Robots, Avatars, and the Demise of the Human Mediator

    Get PDF
    Published in cooperation with the American Bar Association Section of Dispute Resolutio

    Artificial Intelligence: Robots, Avatars and the Demise of the Human Mediator

    Get PDF
    As technology has advanced, many have wondered whether (or simply when) artificial intelligent devices will replace the humans who perform complex, interactive, interpersonal tasks such as dispute resolution. Has science now progressed to the point that artificial intelligence devices can replace human mediators, arbitrators, dispute resolvers and problem solvers? Can humanoid robots, attractive avatars and other relational agents create the requisite level of trust and elicit the truthful, perhaps intimate or painful, disclosures often necessary to resolve a dispute or solve a problem? This article will explore these questions. Regardless of whether the reader is convinced that the demise of the human mediator or arbitrator is imminent, one cannot deny that artificial intelligence now has the capability to assume many of the responsibilities currently being performed by alternative dispute resolution (ADR) practitioners. It is fascinating (and perhaps unsettling) to realize the complexity and seriousness of tasks currently delegated to avatars and robots. This article will review some of those delegations and suggest how the artificial intelligence developed to complete those assignments may be relevant to dispute resolution and problem solving. “Relational Agents,” which can have a physical presence such as a robot, be embodied in an avatar, or have no detectable form whatsoever and exist only as software, are able to create long term socio-economic relationships with users built on trust, rapport and therapeutic goals. Relational agents are interacting with humans in circumstances that have significant consequences in the physical world. These interactions provide insights as to how robots and avatars can participate productively in dispute resolution processes. Can human mediators and arbitrators be replaced by robots and avatars that not only physically resemble humans, but also act, think, and reason like humans? And to raise a particularly interesting question, can robots, avatars and other relational agents look, move, act, think, and reason even “better” than humans

    Quality Evaluation of Requirements Models: The Case of Goal Models and Scenarios

    Get PDF
    Context: Requirements Engineering approaches provide expressive model techniques for requirements elicitation and analysis. Yet, these approaches struggle to manage the quality of their models, causing difficulties in understanding requirements, and increase development costs. The models’ quality should be a permanent concern. Objectives: We propose a mixed-method process for the quantitative evaluation of the quality of requirements models and their modelling activities. We applied the process to goal-oriented (i* 1.0 and iStar 2.0) and scenario-based (ARNE and ALCO use case templates) models, to evaluate their usability in terms of appropriateness recognisability and learnability. We defined (bio)metrics about the models and the way stakeholders interact with them, with the GQM approach. Methods: The (bio)metrics were evaluated through a family of 16 quasi-experiments with a total of 660 participants. They performed creation, modification, understanding, and review tasks on the models. We measured their accuracy, speed, and ease, using metrics of task success, time, and effort, collected with eye-tracking, electroencephalography and electro-dermal activity, and participants’ opinion, through NASA-TLX. We characterised the participants with GenderMag, a method for evaluating usability with a focus on gender-inclusiveness. Results: For i*, participants had better performance and lower effort when using iStar 2.0, and produced models with lower accidental complexity. For use cases, participants had better performance and lower effort when using ALCO. Participants using a textual representation of requirements had higher performance and lower effort. The results were better for ALCO, followed by ARNE, iStar 2.0, and i* 1.0. Participants with a comprehensive information processing and a conservative attitude towards risk (characteristics that are frequently seen in females) took longer to start the tasks but had a higher accuracy. The visual and mental effort was also higher for these participants. Conclusions: A mixed-method process, with (bio)metric measurements, can provide reliable quantitative information about the success and effort of a stakeholder while working on different requirements models’ tasks
    corecore