2,387 research outputs found

    Personalization in cultural heritage: the road travelled and the one ahead

    Get PDF
    Over the last 20 years, cultural heritage has been a favored domain for personalization research. For years, researchers have experimented with the cutting edge technology of the day; now, with the convergence of internet and wireless technology, and the increasing adoption of the Web as a platform for the publication of information, the visitor is able to exploit cultural heritage material before, during and after the visit, having different goals and requirements in each phase. However, cultural heritage sites have a huge amount of information to present, which must be filtered and personalized in order to enable the individual user to easily access it. Personalization of cultural heritage information requires a system that is able to model the user (e.g., interest, knowledge and other personal characteristics), as well as contextual aspects, select the most appropriate content, and deliver it in the most suitable way. It should be noted that achieving this result is extremely challenging in the case of first-time users, such as tourists who visit a cultural heritage site for the first time (and maybe the only time in their life). In addition, as tourism is a social activity, adapting to the individual is not enough because groups and communities have to be modeled and supported as well, taking into account their mutual interests, previous mutual experience, and requirements. How to model and represent the user(s) and the context of the visit and how to reason with regard to the information that is available are the challenges faced by researchers in personalization of cultural heritage. Notwithstanding the effort invested so far, a definite solution is far from being reached, mainly because new technology and new aspects of personalization are constantly being introduced. This article surveys the research in this area. Starting from the earlier systems, which presented cultural heritage information in kiosks, it summarizes the evolution of personalization techniques in museum web sites, virtual collections and mobile guides, until recent extension of cultural heritage toward the semantic and social web. The paper concludes with current challenges and points out areas where future research is needed

    Learning to Retrieve Videos by Asking Questions

    Full text link
    The majority of traditional text-to-video retrieval systems operate in static environments, i.e., there is no interaction between the user and the agent beyond the initial textual query provided by the user. This can be sub-optimal if the initial query has ambiguities, which would lead to many falsely retrieved videos. To overcome this limitation, we propose a novel framework for Video Retrieval using Dialog (ViReD), which enables the user to interact with an AI agent via multiple rounds of dialog, where the user refines retrieved results by answering questions generated by an AI agent. Our novel multimodal question generator learns to ask questions that maximize the subsequent video retrieval performance using (i) the video candidates retrieved during the last round of interaction with the user and (ii) the text-based dialog history documenting all previous interactions, to generate questions that incorporate both visual and linguistic cues relevant to video retrieval. Furthermore, to generate maximally informative questions, we propose an Information-Guided Supervision (IGS), which guides the question generator to ask questions that would boost subsequent video retrieval accuracy. We validate the effectiveness of our interactive ViReD framework on the AVSD dataset, showing that our interactive method performs significantly better than traditional non-interactive video retrieval systems. We also demonstrate that our proposed approach generalizes to the real-world settings that involve interactions with real humans, thus, demonstrating the robustness and generality of our framewor

    Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine

    Get PDF
    The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far. Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews. Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level. In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data. The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience

    Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine

    Get PDF
    The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far. Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews. Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level. In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data. The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience

    Agent AI: Surveying the Horizons of Multimodal Interaction

    Full text link
    Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the ability of models to process and interpret visual and contextual data, which is critical for the creation of more sophisticated and context-aware AI systems. For example, a system that can perceive user actions, human behavior, environmental objects, audio expressions, and the collective sentiment of a scene can be used to inform and direct agent responses within the given environment. To accelerate research on agent-based multimodal intelligence, we define "Agent AI" as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied actions. In particular, we explore systems that aim to improve agents based on next-embodied action prediction by incorporating external knowledge, multi-sensory inputs, and human feedback. We argue that by developing agentic AI systems in grounded environments, one can also mitigate the hallucinations of large foundation models and their tendency to generate environmentally incorrect outputs. The emerging field of Agent AI subsumes the broader embodied and agentic aspects of multimodal interactions. Beyond agents acting and interacting in the physical world, we envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment

    Human-Robot interaction with low computational-power humanoids

    Get PDF
    This article investigates the possibilities of human-humanoid interaction with robots whose computational power is limited. The project has been carried during a year of work at the Computer and Robot Vision Laboratory (VisLab), part of the Institute for Systems and Robotics in Lisbon, Portugal. Communication, the basis of interaction, is simultaneously visual, verbal, and gestural. The robot's algorithm provides users a natural language communication, being able to catch and understand the person’s needs and feelings. The design of the system should, consequently, give it the capability to dialogue with people in a way that makes possible the understanding of their needs. The whole experience, to be natural, is independent from the GUI, used just as an auxiliary instrument. Furthermore, the humanoid can communicate with gestures, touch and visual perceptions and feedbacks. This creates a totally new type of interaction where the robot is not just a machine to use, but a figure to interact and talk with: a social robot

    Multimodal language learning environment of the Korean digital kitchen : a study on the impact of physicality and technological affordances on Korean vocabulary learning

    Get PDF
    PhD ThesisTask Based Language Learning and Teaching (TBLT) has been integrated with computer-assisted language learning (CALL), contributing to pedagogical developments in the field of foreign/second language teaching and learning (Thomas and Reinders, 2010). While the majority of studies have used the integrated pedagogy inside the classroom context, little attention has been paid to the area outside the classroom (Seedhouse et al., 2013; Seedhouse et al., 2014; Preston et al., 2015). This issue has recently been addressed by the European Digital Kitchen (EDK) project team (Seedhouse, 2017), which has successfully investigated the efficacy of digital technology on foreign language learning out of the classroom. However, as the EDK was designed as a holistic learning environment in which many different environmental factors would contribute to learning, there was a need to disaggregate some of these factors and discover which factors were more or less significant. In order to determine one of the environmental factors to learning, this study attempted to use the technological components of a previous project to create Korean pedagogical materials. This formed the Korean Digital Kitchen (KDK), a real-world environment of a kitchen where students can simultaneously learn Korean language and culture by carrying out the real-world task of cooking. Korean is one of the important global languages to be taught, according to an Ethnologue report (Lewis et al., 2016). Based on the literature on vocabulary learning, especially Nattinger’s (1988) claim that touching and manipulating real objects, as opposed to seeing them, increases learnability, this study explored whether kinesthetic mode adds extra value to foreign language learning processes. Would there be any significant difference between vocabulary learning which involves seeing the learning items only in a classroom and learning which also involves touching the items in the KDK? Thus, this study examined the power of physicality. Furthermore, the salience of real-world and pedagogical tasks has been investigated as factors to different level of vocabulary learning. To this end, a quasi-experimental design was employed for users to conduct two cooking sessions, one in a digital kitchen by using real objects and the other in a classroom by looking at pictures/photos in the textbook. Participants were 48 adults of both British and international origins, living in Newcastle, UK, coming from 20 different countries. To determine which environment between a digital kitchen and a classroom is more conducive to vocabulary learning, users needed to carry out two ii different recipes in the two locations in order to control a practice effect. Subjects went through the real-life cooking activities in three stages of TBLT in both settings using two different recipes with two different set of vocabularies. There were tests before and after cooking to compare their scores to examine the results of learning. Ten vocabulary noun items were targeted in this research. In addition to test score data, three more data sources were employed, namely questionnaires, semi-structured interviews and video-observations for triangulation, revealing the outcomes and processes of learning in two different learning environments. A series of data sets clearly demonstrated which of the two learning settings was more effective to learn foreign language vocabulary and culture in and what their attitudes towards a digitalized learning environment were. Findings suggest that physicality in the KDK makes students link the word and cultural aspects to their memory better than simply looking at photos of objects in the classroom. The learning differences reached statistical significance. Other environmental factors such as technology and its affordances may have contributed to different learning outcomes, playing a role in learners taking positive attitudes (Stricker et al., 2004). In contrast, users in the conventional setting demonstrated relatively less learning, due to fewer senses and its typical features such as the relationship with a teacher, less interaction with peers (Shen et al., 2008) and boredom. It is these differences that contributed to the different results and processes of learning in two settings. From these findings, it could be concluded that the digital kitchen can provide a motivating learning environment which is multi-modal, multi-sensory, multi-interactional, multi-experiential and multi-layered. It is physicality, meaningful tasks and computer technology that foster learning in vocabulary and cultural aspects. This project contributes to building up one more dimension of psycholinguistic factor in language learning, and supports the development of innovative ICT for foreign language learning across the world

    An HCI-Centric Survey and Taxonomy of Human-Generative-AI Interactions

    Full text link
    Generative AI (GenAI) has shown remarkable capabilities in generating diverse and realistic content across different formats like images, videos, and text. In Generative AI, human involvement is essential, thus HCI literature has investigated how to effectively create collaborations between humans and GenAI systems. However, the current literature lacks a comprehensive framework to better understand Human-GenAI Interactions, as the holistic aspects of human-centered GenAI systems are rarely analyzed systematically. In this paper, we present a survey of 291 papers, providing a novel taxonomy and analysis of Human-GenAI Interactions from both human and Gen-AI perspectives. The dimensions of design space include 1) Purposes of Using Generative AI, 2) Feedback from Models to Users, 3) Control from Users to Models, 4) Levels of Engagement, 5) Application Domains, and 6) Evaluation Strategies. Our work is also timely at the current development stage of GenAI, where the Human-GenAI interaction design is of paramount importance. We also highlight challenges and opportunities to guide the design of Gen-AI systems and interactions towards the future design of human-centered Generative AI applications
    • …
    corecore