2,129 research outputs found
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
Attention Allocation for Human Multi-Robot Control: Cognitive Analysis based on Behavior Data and Hidden States
Human multi-robot interaction exploits both the human operator’s high-level decision-making skills and the robotic agents’ vigorous computing and motion abilities. While controlling multi-robot teams, an operator’s attention must constantly shift between individual robots to maintain sufficient situation awareness. To conserve an operator’s attentional resources, a robot with self reflect capability on its abnormal status can help an operator focus her attention on emergent tasks rather than unneeded routine checks. With the proposing self-reflect aids, the human-robot interaction becomes a queuing framework, where the robots act as the clients to request for interaction and an operator acts as the server to respond these job requests. This paper examined two types of queuing schemes, the self-paced Open-queue identifying all robots’ normal/abnormal conditions, whereas the forced-paced shortest-job-first (SJF) queue showing a single robot’s request at one time by following the SJF approach. As a robot may miscarry its experienced failures in various situations, the effects of imperfect automation were also investigated in this paper. The results suggest that the SJF attentional scheduling approach can provide stable performance in both primary (locate potential targets) and secondary (resolve robots’ failures) tasks, regardless of the system’s reliability levels. However, the conventional results (e.g., number of targets marked) only present little information about users’ underlying cognitive strategies and may fail to reflect the user’s true intent. As understanding users’ intentions is critical to providing appropriate cognitive aids to enhance task performance, a Hidden Markov Model (HMM) is used to examine operators’ underlying cognitive intent and identify the unobservable cognitive states. The HMM results demonstrate fundamental differences among the queuing mechanisms and reliability conditions. The findings suggest that HMM can be helpful in investigating the use of human cognitive resources under multitasking environments
Hybrid Temporal Dynamics Feature Extraction in Recommendation Systems for Improved Ranking of Items
In today's retail landscape, shopping malls and e-commerce platforms employ various psychological tactics to influence customer behavior and increase profits. In line with these strategies, this paper introduces an innovative method for recognizing sentiment patterns, with a specific emphasis on the evolving temporal aspects of user interests within Recommendation Systems (RS). The projected method, called Temporal Dynamic Features based User Sentiment Pattern for Recommendation System (TDF-USPRS), aims to enhance the performance of RS by leveraging sentiment trends derived from a user's past preferences. TDF-USPRS utilizes a hybrid model combining Short Time Fourier Transform (STFT) and a layered architecture based on Bidirectional Long Short-Term Memory (BiLSTM) to retrieve temporal dynamics and discern a user's sentiment trend. Through an examination of a user's sequential history of item preferences, TDF-USPRS produces sentiment patterns to offer exceptionally pertinent recommendations, even in cases of sparse datasets. A variety of popular datasets, including as MovieLens, Amazon Rating Beauty, YOOCHOOSE, and CiaoDVD are utilised to assess the suggested technique. The TDF-USPRS model outperforms existing approaches, according to experimental data, resulting in recommendations with greater accuracy and relevance. Comparing the projected model to existing approaches, the projected model displays a 6.5% reduction in RMSE and a 4.5% gain in precision. Specifically, the model achieves an RMSE of 0.7623 and 0.996 on the MovieLens and CiaoDVD datasets, while attaining a precision score of 0.5963 and 0.165 on the YOOCHOOSE and Amazon datasets, respectively
3D Medical Image Segmentation based on multi-scale MPU-Net
The high cure rate of cancer is inextricably linked to physicians' accuracy
in diagnosis and treatment, therefore a model that can accomplish
high-precision tumor segmentation has become a necessity in many applications
of the medical industry. It can effectively lower the rate of misdiagnosis
while considerably lessening the burden on clinicians. However, fully automated
target organ segmentation is problematic due to the irregular stereo structure
of 3D volume organs. As a basic model for this class of real applications,
U-Net excels. It can learn certain global and local features, but still lacks
the capacity to grasp spatial long-range relationships and contextual
information at multiple scales. This paper proposes a tumor segmentation model
MPU-Net for patient volume CT images, which is inspired by Transformer with a
global attention mechanism. By combining image serialization with the Position
Attention Module, the model attempts to comprehend deeper contextual
dependencies and accomplish precise positioning. Each layer of the decoder is
also equipped with a multi-scale module and a cross-attention mechanism. The
capability of feature extraction and integration at different levels has been
enhanced, and the hybrid loss function developed in this study can better
exploit high-resolution characteristic information. Moreover, the suggested
architecture is tested and evaluated on the Liver Tumor Segmentation Challenge
2017 (LiTS 2017) dataset. Compared with the benchmark model U-Net, MPU-Net
shows excellent segmentation results. The dice, accuracy, precision,
specificity, IOU, and MCC metrics for the best model segmentation results are
92.17%, 99.08%, 91.91%, 99.52%, 85.91%, and 91.74%, respectively. Outstanding
indicators in various aspects illustrate the exceptional performance of this
framework in automatic medical image segmentation.Comment: 37 page
Explain and Conquer: Personalised Text-based Reviews to Achieve Transparency
There are many contexts in which dyadic data are present. Social networks are
a well-known example. In these contexts, pairs of elements are linked building
a network that reflects interactions. Explaining why these relationships are
established is essential to obtain transparency, an increasingly important
notion. These explanations are often presented using text, thanks to the spread
of the natural language understanding tasks. Our aim is to represent and
explain pairs established by any agent (e.g., a recommender system or a paid
promotion mechanism), so that text-based personalisation is taken into account.
We have focused on the TripAdvisor platform, considering the applicability to
other dyadic data contexts. The items are a subset of users and restaurants and
the interactions the reviews posted by these users. We propose the PTER
(Personalised TExt-based Reviews) model. We predict, from the available reviews
for a given restaurant, those that fit to the specific user interactions. PTER
leverages the BERT (Bidirectional Encoders Representations from Transformers)
transformer-encoder model. We customised a deep neural network following the
feature-based approach, presenting a LTR (Learning To Rank) downstream task. We
carried out several comparisons of our proposal with a random baseline and
other models of the state of the art, following the EXTRA (EXplanaTion RAnking)
benchmark. Our method outperforms other collaborative filtering proposals
QUESTION ANSWERING, GROUNDING, AND GENERATION FOR VISION AND LANGUAGE
One ultimate goal of AI is to develop an artificial intelligent (AI) system that can communicate with people in a natural way. Such communication includes but is not limited to asking we humans questions, answering our questions, conducting dialogue with human beings, and performing some actions to better serve people. Imagine in the future where the service robot is everywhere, and we could ask our home robot to “grab me the red cup on the table.” To perform this command, the AI system needs to understand this spoken English sentence, perceive the visual world, navigate to the right place “table”, recognize the right object “the red cup”, then grab it and finally return it back to the commander. Just for this single command, it already involves many techniques, such as speech recognition, language understanding, scene understanding, embodied navigation, object recognition, pose estimation, robot manipulation, etc. Each of these techniques are not well solved yet, but we are on a rapid way toward the success. This thesis is in advancing our knowledge to explore various connections between vision, language and even beyond to push forward this ultimate goal. We study 3 popular vision and language tasks, including visual question answering, language grounding, and image-to-text language generation. Inside each, we will introduce our proposed novel task, accompanied with high-quality dataset and well-performing data-driven approaches. Specifically, we first introduce Visual Madlibs for image-based and region-based question answering. Then we introduce referring expressions, where we study both referring expression comprehension and generation, covering both language grounding and generation. Next, we study album summarization, which not only selects the key photos inside an album but also generates a natural language story describing the whole album. Last but not least, we describe multi-target embodied question answering, a task that is even closer to our ultimate goal that requires both language understanding and navigation ability from the AI system.Doctor of Philosoph
Listener expectations and the perceptual accommodation of talker variability: A pre-registered replication
Published: 04 May 2021Researchers have hypothesized that in order to accommodate variability in how talkers produce their speech sounds, listeners
must perform a process of talker normalization. Consistent with this proposal, several studies have shown that spoken word
recognition is slowed when speech is produced by multiple talkers compared with when all speech is produced by one talker (a
multitalker processing cost). Nusbaum and colleagues have argued that talker normalization is modulated by attention (e.g.,
Nusbaum & Morin, 1992, Speech Perception, Production and Linguistic Structure, pp. 113–134). Some of the strongest
evidence for this claim is from a speeded monitoring study where a group of participants who expected to hear two talkers
showed a multitalker processing cost, but a separate group who expected one talker did not (Magnuson & Nusbaum, 2007,
Journal of Experimental Psychology, 33[2], 391–409). In that study, however, the sample size was small and the crucial
interaction was not significant. In this registered report, we present the results of a well-powered attempt to replicate those
findings. In contrast to the previous study, we did not observe multitalker processing costs in either of our groups. To rule out the
possibility that the null result was due to task constraints, we conducted a second experiment using a speeded classification task.
As in Experiment 1, we found no influence of expectations on talker normalization, with no multitalker processing cost observed
in either group. Our data suggest that the previous findings of Magnuson and Nusbaum (2007) be regarded with skepticism and
that talker normalization may not be permeable to high-level expectations.This research was supported by NSF 1754284, NSF
IGERT 1144399 & NSF NRT 1747486 (PI: JSM) and NSF BCS
1554810 & NIH R01 DC013064 (PI: EBM). This research was also
supported in part by the Basque Government through the BERC 2018-
2021 program and by the Agencia Estatal de InvestigaciĂłn through
BCBL Severo Ochoa excellence accreditation SEV-2015-0490. SL was
supported by an NSF Graduate Research Fellowshi
Experimental Effects and Individual Differences in Linear Mixed Models: Estimating the Relationship between Spatial, Object, and Attraction Effects in Visual Attention
Linear mixed models (LMMs) provide a still underused methodological perspective on combining experimental and individual-differences research. Here we illustrate this approach with two-rectangle cueing in visual attention (Egly et al., 1994). We replicated previous experimental cue-validity effects relating to a spatial shift of attention within an object (spatial effect), to attention switch between objects (object effect), and to the attraction of attention toward the display centroid (attraction effect), also taking into account the design-inherent imbalance of valid and other trials. We simultaneously estimated variance/covariance components of subject-related random effects for these spatial, object, and attraction effects in addition to their mean reaction times (RTs). The spatial effect showed a strong positive correlation with mean RT and a strong negative correlation with the attraction effect. The analysis of individual differences suggests that slow subjects engage attention more strongly at the cued location than fast subjects. We compare this joint LMM analysis of experimental effects and associated subject-related variances and correlations with two frequently used alternative statistical procedures
- …