21 research outputs found

    Object-based visual attention for computer vision

    Get PDF
    AbstractIn this paper, a novel model of object-based visual attention extending Duncan's Integrated Competition Hypothesis [Phil. Trans. R. Soc. London B 353 (1998) 1307–1317] is presented. In contrast to the attention mechanisms used in most previous machine vision systems which drive attention based on the spatial location hypothesis, the mechanisms which direct visual attention in our system are object-driven as well as feature-driven. The competition to gain visual attention occurs not only within an object but also between objects. For this purpose, two new mechanisms in the proposed model are described and analyzed in detail. The first mechanism computes the visual salience of objects and groupings; the second one implements the hierarchical selectivity of attentional shifts. The results of the new approach on synthetic and natural images are reported

    Structure of Peer-to-Peer Social Networks

    Get PDF
    This paper presents a statistical analysis of the structure of Peer-to-Peer (P2P) social networks that captures social associations of distributed peers in resource sharing. Peer social networks appear to be mainly composed of pure resource providers that guarantee high resource availability and reliability of P2P systems. The major peers that both provide and request resources are only a small fraction. The connectivity between peers, including undirected, directed (out and in) and weighted connections, is scale-free and the social networks of all peers and major peers are small world networks. The analysis also confirms that peer social networks show in general disassortative correlations, except that active providers are connected between each other and by active requesters. The study presented in this paper gives a better understanding of peer relationships in resource sharing, which may help a better design of future P2P networks and open the path to the study of transport processes on top of real P2P topologies.Comment: APS Style, 8 pages, 5 figures and 4 tables. Final versio

    A Natural Hand Gesture System for People with Brachial Plexus Injuries

    Get PDF
    This paper focuses on a design case study of a natural hand gesture system for users with intact motion control of the metacarpophalangeal joint and thumb basal joint of the hand after brachial plexus injuries. The lexicon of hand gestures had eight entries and was demonstrated to be natural and ergonomic with the limited hand motions. A cooperative multi-cue system was proposed for the key hand posture recognition of the proposed hand gestures. We utilized the designed system into a remote smart car control and electric wheelchair control. Experimental a results demonstrated the robustness and potential feasibility of the system in human-computer interaction for the proposed users

    Hierarchical Object-Based Visual Attention for Machine Vision

    Get PDF
    Institute of Perception, Action and BehaviourHuman vision uses mechanisms of covert attention to selectively process interesting information and overt eye movements to extend this selectivity ability. Thus, visual tasks can be effectively dealt with by limited processing resources. Modelling visual attention for machine vision systems is not only critical but also challenging. In the machine vision literature there have been many conventional attention models developed but they are all space-based only and cannot perform object-based selection. In consequence, they fail to work in real-world visual environments due to the intrinsic limitations of the space-based attention theory upon which these models are built. The aim of the work presented in this thesis is to provide a novel human-like visual selection framework based on the object-based attention theory recently being developed in psychophysics. The proposed solution – a Hierarchical Object-based Attention Framework (HOAF) based on grouping competition, consists of two closely-coupled visual selection models of (1) hierarchical object-based visual (covert) attention and (2) object-based attention-driven (overt) saccadic eye movements. The Hierarchical Object-based Attention Model (HOAM) is the primary selection mechanism and the Object-based Attention-Driven Saccading model (OADS) has a supporting role, both of which are combined in the integrated visual selection framework HOAF. This thesis first describes the proposed object-based attention model HOAM which is the primary component of the selection framework HOAF. The model is based on recent psychophysical results on object-based visual attention and adopted grouping-based competition to integrate object-based and space-based attention together so as to achieve object-based hierarchical selectivity. The behaviour of the model is demonstrated on a number of synthetic images simulating psychophysical experiments and real-world natural scenes. The experimental results showed that the performance of our object-based attention model HOAM concurs with the main findings in the psychophysical literature on object-based and space-based visual attention. Moreover, HOAM has outstanding hierarchical selectivity from far to near and from coarse to fine by features, objects, spatial regions, and their groupings in complex natural scenes. This successful performance arises from three original mechanisms in the model: grouping-based saliency evaluation, integrated competition between groupings, and hierarchical selectivity. The model is the first implemented machine vision model of integrated object-based and space-based visual attention. The thesis then addresses another proposed model of Object-based Attention-Driven Saccadic eye movements (OADS) built upon the object-based attention model HOAM, ii as an overt saccading component within the object-based selection framework HOAF. This model, like our object-based attention model HOAM, is also the first implemented machine vision saccading model which makes a clear distinction between (covert) visual attention and overt saccading movements in a two-level selection system – an important feature of human vision but not yet explored in conventional machine vision saccading systems. In the saccading model OADS, a log-polar retina-like sensor is employed to simulate the human-like foveation imaging for space variant sensing. Through a novel mechanism for attention-driven orienting, the sensor fixates on new destinations determined by object-based attention. Hence it helps attention to selectively process interesting objects located at the periphery of the whole field of view to accomplish the large-scale visual selection tasks. By another proposed novel mechanism for temporary inhibition of return, OADS can simulate the human saccading/ attention behaviour to refixate/reattend interesting objects for further detailed inspection. This thesis concludes that the proposed human-like visual selection solution – HOAF, which is inspired by psychophysical object-based attention theory and grouping-based competition, is particularly useful for machine vision. HOAF is a general and effective visual selection framework integrating object-based attention and attentiondriven saccadic eye movements with biological plausibility and object-based hierarchical selectivity from coarse to fine in a space-time context

    Task-oriented Memory-efficient Pruning-Adapter

    Full text link
    The Outstanding performance and growing size of Large Language Models has led to increased attention in parameter efficient learning. The two predominant approaches are Adapters and Pruning. Adapters are to freeze the model and give it a new weight matrix on the side, which can significantly reduce the time and memory of training, but the cost is that the evaluation and testing will increase the time and memory consumption. Pruning is to cut off some weight and re-distribute the remaining weight, which sacrifices the complexity of training at the cost of extremely high memory and training time, making the cost of evaluation and testing relatively low. So efficiency of training and inference can't be obtained in the same time. In this work, we propose a task-oriented Pruning-Adapter method that achieve a high memory efficiency of training and memory, and speeds up training time and ensures no significant decrease in accuracy in GLUE tasks, achieving training and inference efficiency at the same time

    Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout

    Full text link
    Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex long-horizon reinforcement learning (RL) tasks via temporal abstraction. Yet, most goal-conditioned HRL algorithms focused on the subgoal discovery, regardless of inter-level coupling. In essence, for hierarchical systems, the increased inter-level communication and coordination can induce more stable and robust policy improvement. Here, we present a goal-conditioned HRL framework with Guided Cooperation via Model-based Rollout (GCMR), which estimates forward dynamics to promote inter-level cooperation. The GCMR alleviates the state-transition error within off-policy correction through a model-based rollout, further improving the sample efficiency. Meanwhile, to avoid being disrupted by these corrected but possibly unseen or faraway goals, lower-level Q-function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy. Besides, we propose a one-step rollout-based planning to further facilitate inter-level cooperation, where the higher-level Q-function is used to guide the lower-level policy by estimating the value of future states so that global task information is transmitted downwards to avoid local pitfalls. Experimental results demonstrate that incorporating the proposed GCMR framework with ACLG, a disentangled variant of HIGL, yields more stable and robust policy improvement than baselines and substantially outperforms previous state-of-the-art (SOTA) HRL algorithms in both hard-exploration problems and robotic control

    Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation

    Full text link
    Within the multimodal field, the key to integrating vision and language lies in establishing a good alignment strategy. Recently, benefiting from the success of self-supervised learning, significant progress has been made in multimodal semantic representation based on pre-trained models for vision and language. However, there is still room for improvement in visual semantic representation. The lack of spatial semantic coherence and vulnerability to noise makes it challenging for current pixel or patch-based methods to accurately extract complex scene boundaries. To this end, this paper develops superpixel as a comprehensive compact representation of learnable image data, which effectively reduces the number of visual primitives for subsequent processing by clustering perceptually similar pixels. To mine more precise topological relations, we propose a Multiscale Difference Graph Convolutional Network (MDGCN). It parses the entire image as a fine-to-coarse hierarchical structure of constituent visual patterns, and captures multiscale features by progressively merging adjacent superpixels as graph nodes. Moreover, we predict the differences between adjacent nodes through the graph structure, facilitating key information aggregation of graph nodes to reason actual semantic relations. Afterward, we design a multi-level fusion rule in a bottom-up manner to avoid understanding deviation by learning complementary spatial information at different regional scales. Our proposed method can be well applied to multiple downstream task learning. Extensive experiments demonstrate that our method is competitive with other state-of-the-art methods in visual reasoning. Our code will be released upon publication

    A computer vision model for visual-object-based attention and eye movements

    Get PDF
    This is the post-print version of the final paper published in Computer Vision and Image Understanding. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2008 Elsevier B.V.This paper presents a new computational framework for modelling visual-object-based attention and attention-driven eye movements within an integrated system in a biologically inspired approach. Attention operates at multiple levels of visual selection by space, feature, object and group depending on the nature of targets and visual tasks. Attentional shifts and gaze shifts are constructed upon their common process circuits and control mechanisms but also separated from their different function roles, working together to fulfil flexible visual selection tasks in complicated visual environments. The framework integrates the important aspects of human visual attention and eye movements resulting in sophisticated performance in complicated natural scenes. The proposed approach aims at exploring a useful visual selection system for computer vision, especially for usage in cluttered natural visual environments.National Natural Science of Founda- tion of Chin

    The pairwise phase consistency in cortical network and its relationship with neuronal activation

    No full text
    Gamma-band neuronal oscillation and synchronization with the range of 30-90 Hz are ubiquitous phenomenon across numerous brain areas and various species, and correlated with plenty of cognitive functions. The phase of the oscillation, as one aspect of CTC (Communication through Coherence) hypothesis, underlies various functions for feature coding, memory processing and behaviour performing. The PPC (Pairwise Phase Consistency), an improved coherence measure, statistically quantifies the strength of phase synchronization. In order to evaluate the PPC and its relationships with input stimulus, neuronal activation and firing rate, a simplified spiking neuronal network is constructed to simulate orientation columns in primary visual cortex. If the input orientation stimulus is preferred for a certain orientation column, neurons within this corresponding column will obtain higher firing rate and stronger neuronal activation, which consequently engender higher PPC values, with higher PPC corresponding to higher firing rate. In addition, we investigate the PPC in time resolved analysis with a sliding window

    The pairwise phase consistency in cortical network and its relationship with neuronal activation

    No full text
    Gamma-band neuronal oscillation and synchronization with the range of 30-90 Hz are ubiquitous phenomenon across numerous brain areas and various species, and correlated with plenty of cognitive functions. The phase of the oscillation, as one aspect of CTC (Communication through Coherence) hypothesis, underlies various functions for feature coding, memory processing and behaviour performing. The PPC (Pairwise Phase Consistency), an improved coherence measure, statistically quantifies the strength of phase synchronization. In order to evaluate the PPC and its relationships with input stimulus, neuronal activation and firing rate, a simplified spiking neuronal network is constructed to simulate orientation columns in primary visual cortex. If the input orientation stimulus is preferred for a certain orientation column, neurons within this corresponding column will obtain higher firing rate and stronger neuronal activation, which consequently engender higher PPC values, with higher PPC corresponding to higher firing rate. In addition, we investigate the PPC in time resolved analysis with a sliding window
    corecore