2,129 research outputs found

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Attention Allocation for Human Multi-Robot Control: Cognitive Analysis based on Behavior Data and Hidden States

    Get PDF
    Human multi-robot interaction exploits both the human operator’s high-level decision-making skills and the robotic agents’ vigorous computing and motion abilities. While controlling multi-robot teams, an operator’s attention must constantly shift between individual robots to maintain sufficient situation awareness. To conserve an operator’s attentional resources, a robot with self reflect capability on its abnormal status can help an operator focus her attention on emergent tasks rather than unneeded routine checks. With the proposing self-reflect aids, the human-robot interaction becomes a queuing framework, where the robots act as the clients to request for interaction and an operator acts as the server to respond these job requests. This paper examined two types of queuing schemes, the self-paced Open-queue identifying all robots’ normal/abnormal conditions, whereas the forced-paced shortest-job-first (SJF) queue showing a single robot’s request at one time by following the SJF approach. As a robot may miscarry its experienced failures in various situations, the effects of imperfect automation were also investigated in this paper. The results suggest that the SJF attentional scheduling approach can provide stable performance in both primary (locate potential targets) and secondary (resolve robots’ failures) tasks, regardless of the system’s reliability levels. However, the conventional results (e.g., number of targets marked) only present little information about users’ underlying cognitive strategies and may fail to reflect the user’s true intent. As understanding users’ intentions is critical to providing appropriate cognitive aids to enhance task performance, a Hidden Markov Model (HMM) is used to examine operators’ underlying cognitive intent and identify the unobservable cognitive states. The HMM results demonstrate fundamental differences among the queuing mechanisms and reliability conditions. The findings suggest that HMM can be helpful in investigating the use of human cognitive resources under multitasking environments

    Hybrid Temporal Dynamics Feature Extraction in Recommendation Systems for Improved Ranking of Items

    Get PDF
    In today's retail landscape, shopping malls and e-commerce platforms employ various psychological tactics to influence customer behavior and increase profits. In line with these strategies, this paper introduces an innovative method for recognizing sentiment patterns, with a specific emphasis on the evolving temporal aspects of user interests within Recommendation Systems (RS). The projected method, called Temporal Dynamic Features based User Sentiment Pattern for Recommendation System (TDF-USPRS), aims to enhance the performance of RS by leveraging sentiment trends derived from a user's past preferences. TDF-USPRS utilizes a hybrid model combining Short Time Fourier Transform (STFT) and a layered architecture based on Bidirectional Long Short-Term Memory (BiLSTM) to retrieve temporal dynamics and discern a user's sentiment trend. Through an examination of a user's sequential history of item preferences, TDF-USPRS produces sentiment patterns to offer exceptionally pertinent recommendations, even in cases of sparse datasets. A variety of popular datasets, including as MovieLens, Amazon Rating Beauty, YOOCHOOSE, and CiaoDVD are utilised to assess the suggested technique. The TDF-USPRS model outperforms existing approaches, according to experimental data, resulting in recommendations with greater accuracy and relevance. Comparing the projected model to existing approaches, the projected model displays a 6.5% reduction in RMSE and a 4.5% gain in precision. Specifically, the model achieves an RMSE of 0.7623 and 0.996 on the MovieLens and CiaoDVD datasets, while attaining a precision score of 0.5963 and 0.165 on the YOOCHOOSE and Amazon datasets, respectively

    3D Medical Image Segmentation based on multi-scale MPU-Net

    Full text link
    The high cure rate of cancer is inextricably linked to physicians' accuracy in diagnosis and treatment, therefore a model that can accomplish high-precision tumor segmentation has become a necessity in many applications of the medical industry. It can effectively lower the rate of misdiagnosis while considerably lessening the burden on clinicians. However, fully automated target organ segmentation is problematic due to the irregular stereo structure of 3D volume organs. As a basic model for this class of real applications, U-Net excels. It can learn certain global and local features, but still lacks the capacity to grasp spatial long-range relationships and contextual information at multiple scales. This paper proposes a tumor segmentation model MPU-Net for patient volume CT images, which is inspired by Transformer with a global attention mechanism. By combining image serialization with the Position Attention Module, the model attempts to comprehend deeper contextual dependencies and accomplish precise positioning. Each layer of the decoder is also equipped with a multi-scale module and a cross-attention mechanism. The capability of feature extraction and integration at different levels has been enhanced, and the hybrid loss function developed in this study can better exploit high-resolution characteristic information. Moreover, the suggested architecture is tested and evaluated on the Liver Tumor Segmentation Challenge 2017 (LiTS 2017) dataset. Compared with the benchmark model U-Net, MPU-Net shows excellent segmentation results. The dice, accuracy, precision, specificity, IOU, and MCC metrics for the best model segmentation results are 92.17%, 99.08%, 91.91%, 99.52%, 85.91%, and 91.74%, respectively. Outstanding indicators in various aspects illustrate the exceptional performance of this framework in automatic medical image segmentation.Comment: 37 page

    Explain and Conquer: Personalised Text-based Reviews to Achieve Transparency

    Full text link
    There are many contexts in which dyadic data are present. Social networks are a well-known example. In these contexts, pairs of elements are linked building a network that reflects interactions. Explaining why these relationships are established is essential to obtain transparency, an increasingly important notion. These explanations are often presented using text, thanks to the spread of the natural language understanding tasks. Our aim is to represent and explain pairs established by any agent (e.g., a recommender system or a paid promotion mechanism), so that text-based personalisation is taken into account. We have focused on the TripAdvisor platform, considering the applicability to other dyadic data contexts. The items are a subset of users and restaurants and the interactions the reviews posted by these users. We propose the PTER (Personalised TExt-based Reviews) model. We predict, from the available reviews for a given restaurant, those that fit to the specific user interactions. PTER leverages the BERT (Bidirectional Encoders Representations from Transformers) transformer-encoder model. We customised a deep neural network following the feature-based approach, presenting a LTR (Learning To Rank) downstream task. We carried out several comparisons of our proposal with a random baseline and other models of the state of the art, following the EXTRA (EXplanaTion RAnking) benchmark. Our method outperforms other collaborative filtering proposals

    QUESTION ANSWERING, GROUNDING, AND GENERATION FOR VISION AND LANGUAGE

    Get PDF
    One ultimate goal of AI is to develop an artificial intelligent (AI) system that can communicate with people in a natural way. Such communication includes but is not limited to asking we humans questions, answering our questions, conducting dialogue with human beings, and performing some actions to better serve people. Imagine in the future where the service robot is everywhere, and we could ask our home robot to “grab me the red cup on the table.” To perform this command, the AI system needs to understand this spoken English sentence, perceive the visual world, navigate to the right place “table”, recognize the right object “the red cup”, then grab it and finally return it back to the commander. Just for this single command, it already involves many techniques, such as speech recognition, language understanding, scene understanding, embodied navigation, object recognition, pose estimation, robot manipulation, etc. Each of these techniques are not well solved yet, but we are on a rapid way toward the success. This thesis is in advancing our knowledge to explore various connections between vision, language and even beyond to push forward this ultimate goal. We study 3 popular vision and language tasks, including visual question answering, language grounding, and image-to-text language generation. Inside each, we will introduce our proposed novel task, accompanied with high-quality dataset and well-performing data-driven approaches. Specifically, we first introduce Visual Madlibs for image-based and region-based question answering. Then we introduce referring expressions, where we study both referring expression comprehension and generation, covering both language grounding and generation. Next, we study album summarization, which not only selects the key photos inside an album but also generates a natural language story describing the whole album. Last but not least, we describe multi-target embodied question answering, a task that is even closer to our ultimate goal that requires both language understanding and navigation ability from the AI system.Doctor of Philosoph

    Listener expectations and the perceptual accommodation of talker variability: A pre-registered replication

    Get PDF
    Published: 04 May 2021Researchers have hypothesized that in order to accommodate variability in how talkers produce their speech sounds, listeners must perform a process of talker normalization. Consistent with this proposal, several studies have shown that spoken word recognition is slowed when speech is produced by multiple talkers compared with when all speech is produced by one talker (a multitalker processing cost). Nusbaum and colleagues have argued that talker normalization is modulated by attention (e.g., Nusbaum & Morin, 1992, Speech Perception, Production and Linguistic Structure, pp. 113–134). Some of the strongest evidence for this claim is from a speeded monitoring study where a group of participants who expected to hear two talkers showed a multitalker processing cost, but a separate group who expected one talker did not (Magnuson & Nusbaum, 2007, Journal of Experimental Psychology, 33[2], 391–409). In that study, however, the sample size was small and the crucial interaction was not significant. In this registered report, we present the results of a well-powered attempt to replicate those findings. In contrast to the previous study, we did not observe multitalker processing costs in either of our groups. To rule out the possibility that the null result was due to task constraints, we conducted a second experiment using a speeded classification task. As in Experiment 1, we found no influence of expectations on talker normalization, with no multitalker processing cost observed in either group. Our data suggest that the previous findings of Magnuson and Nusbaum (2007) be regarded with skepticism and that talker normalization may not be permeable to high-level expectations.This research was supported by NSF 1754284, NSF IGERT 1144399 & NSF NRT 1747486 (PI: JSM) and NSF BCS 1554810 & NIH R01 DC013064 (PI: EBM). This research was also supported in part by the Basque Government through the BERC 2018- 2021 program and by the Agencia Estatal de Investigación through BCBL Severo Ochoa excellence accreditation SEV-2015-0490. SL was supported by an NSF Graduate Research Fellowshi

    Experimental Effects and Individual Differences in Linear Mixed Models: Estimating the Relationship between Spatial, Object, and Attraction Effects in Visual Attention

    Get PDF
    Linear mixed models (LMMs) provide a still underused methodological perspective on combining experimental and individual-differences research. Here we illustrate this approach with two-rectangle cueing in visual attention (Egly et al., 1994). We replicated previous experimental cue-validity effects relating to a spatial shift of attention within an object (spatial effect), to attention switch between objects (object effect), and to the attraction of attention toward the display centroid (attraction effect), also taking into account the design-inherent imbalance of valid and other trials. We simultaneously estimated variance/covariance components of subject-related random effects for these spatial, object, and attraction effects in addition to their mean reaction times (RTs). The spatial effect showed a strong positive correlation with mean RT and a strong negative correlation with the attraction effect. The analysis of individual differences suggests that slow subjects engage attention more strongly at the cued location than fast subjects. We compare this joint LMM analysis of experimental effects and associated subject-related variances and correlations with two frequently used alternative statistical procedures
    • …
    corecore