779,998 research outputs found

    Feature and Region Selection for Visual Learning

    Full text link
    Visual learning problems such as object classification and action recognition are typically approached using extensions of the popular bag-of-words (BoW) model. Despite its great success, it is unclear what visual features the BoW model is learning: Which regions in the image or video are used to discriminate among classes? Which are the most discriminative visual words? Answering these questions is fundamental for understanding existing BoW models and inspiring better models for visual recognition. To answer these questions, this paper presents a method for feature selection and region selection in the visual BoW model. This allows for an intermediate visualization of the features and regions that are important for visual learning. The main idea is to assign latent weights to the features or regions, and jointly optimize these latent variables with the parameters of a classifier (e.g., support vector machine). There are four main benefits of our approach: (1) Our approach accommodates non-linear additive kernels such as the popular χ2\chi^2 and intersection kernel; (2) our approach is able to handle both regions in images and spatio-temporal regions in videos in a unified way; (3) the feature selection problem is convex, and both problems can be solved using a scalable reduced gradient method; (4) we point out strong connections with multiple kernel learning and multiple instance learning approaches. Experimental results in the PASCAL VOC 2007, MSR Action Dataset II and YouTube illustrate the benefits of our approach

    Implicit Attentional Selection of Bound Visual Features

    Get PDF
    SummaryTraditionally, research on visual attention has been focused on the processes involved in conscious, explicit selection of task-relevant sensory input. Recently, however, it has been shown that attending to a specific feature of an object automatically increases neural sensitivity to this feature throughout the visual field. Here we show that directing attention to a specific color of an object results in attentional modulation of the processing of task-irrelevant and not consciously perceived motion signals that are spatiotemporally associated with this color throughout the visual field. Such implicit cross-feature spreading of attention takes place according to the veridical physical associations between the color and motion signals, even under special circumstances when they are perceptually misbound. These results imply that the units of implicit attentional selection are spatiotemporally colocalized feature clusters that are automatically bound throughout the visual field

    Coding of details in very low bit-rate video systems

    Get PDF
    In this paper, the importance of including small image features at the initial levels of a progressive second generation video coding scheme is presented. It is shown that a number of meaningful small features called details should be coded, even at very low data bit-rates, in order to match their perceptual significance to the human visual system. We propose a method for extracting, perceptually selecting and coding of visual details in a video sequence using morphological techniques. Its application in the framework of a multiresolution segmentation-based coding algorithm yields better results than pure segmentation techniques at higher compression ratios, if the selection step fits some main subjective requirements. Details are extracted and coded separately from the region structure and included in the reconstructed images in a later stage. The bet of considering the local background of a given detail for its perceptual selection breaks the concept ofPeer ReviewedPostprint (published version

    Audio-Visual Automatic Speech Recognition Using PZM, MFCC and Statistical Analysis

    Get PDF
    Audio-Visual Automatic Speech Recognition (AV-ASR) has become the most promising research area when the audio signal gets corrupted by noise. The main objective of this paper is to select the important and discriminative audio and visual speech features to recognize audio-visual speech. This paper proposes Pseudo Zernike Moment (PZM) and feature selection method for audio-visual speech recognition. Visual information is captured from the lip contour and computes the moments for lip reading. We have extracted 19th order of Mel Frequency Cepstral Coefficients (MFCC) as speech features from audio. Since all the 19 speech features are not equally important, therefore, feature selection algorithms are used to select the most efficient features. The various statistical algorithm such as Analysis of Variance (ANOVA), Kruskal-wallis, and Friedman test are employed to analyze the significance of features along with Incremental Feature Selection (IFS) technique. Statistical analysis is used to analyze the statistical significance of the speech features and after that IFS is used to select the speech feature subset. Furthermore, multiclass Support Vector Machine (SVM), Artificial Neural Network (ANN) and Naive Bayes (NB) machine learning techniques are used to recognize the speech for both the audio and visual modalities. Based on the recognition rate combined decision is taken from the two individual recognition systems. This paper compares the result achieved by the proposed model and the existing model for both audio and visual speech recognition. Zernike Moment (ZM) is compared with PZM and shows that our proposed model using PZM extracts better discriminative features for visual speech recognition. This study also proves that audio feature selection using statistical analysis outperforms methods without any feature selection technique

    Separable mechanisms underlying global feature-based attention

    Get PDF
    Feature-based attention is known to operate in a spatially global manner, in that the selection of attended features is not bound to the spatial focus of attention. Here we used electromagnetic recordings in human observers to characterize the spatiotemporal signature of such global selection of an orientation feature. Observers performed a simple orientation-discrimination task while ignoring task-irrelevant orientation probes outside the focus of attention. We observed that global feature-based selection, indexed by the brain response to unattended orientation probes, is composed of separable functional components. One such component reflects global selection based on the similarity of the probe with task-relevant orientation values ("template matching"), which is followed by a component reflecting selection based on the similarity of the probe with the orientation value under discrimination in the focus of attention ("discrimination matching"). Importantly, template matching occurs at similar to 150 ms after stimulus onset, similar to 80 ms before the onset of discrimination matching. Moreover, source activity underlying template matching and discrimination matching was found to originate from ventral extrastriate cortex, with the former being generated in more anterolateral and the latter in more posteromedial parts, suggesting template matching to occur in visual cortex higher up in the visual processing hierarchy than discrimination matching. We take these observations to indicate that the population-level signature of global feature-based selection reflects a sequence of hierarchically ordered operations in extrastriate visual cortex, in which the selection based on task relevance has temporal priority over the selection based on the sensory similarity between input representations

    Weighted feature selection criteria for visual servoing of a telerobot

    Get PDF
    Because of the continually changing environment of a space station, visual feedback is a vital element of a telerobotic system. A real time visual servoing system would allow a telerobot to track and manipulate randomly moving objects. Methodologies for the automatic selection of image features to be used to visually control the relative position between an eye-in-hand telerobot and a known object are devised. A weighted criteria function with both image recognition and control components is used to select the combination of image features which provides the best control. Simulation and experimental results of a PUMA robot arm visually tracking a randomly moving carburetor gasket with a visual update time of 70 milliseconds are discussed

    Adaptive sequential feature selection in visual perception and pattern recognition

    Get PDF
    In the human visual system, one of the most prominent functions of the extensive feedback from the higher brain areas within and outside of the visual cortex is attentional modulation. The feedback helps the brain to concentrate its resources on visual features that are relevant for recognition, i. e. it iteratively selects certain aspects of the visual scene for refined processing by the lower areas until the inference process in the higher areas converges to a single hypothesis about this scene. In order to minimize a number of required selection-refinement iterations, one has to find a short sequence of maximally informative portions of the visual input. Since the feedback is not static, the selection process is adapted to a scene that should be recognized. To find a scene-specific subset of informative features, the adaptive selection process on every iteration utilizes results of previous processing in order to reduce the remaining uncertainty about the visual scene. This phenomenon inspired us to develop a computational algorithm solving a visual classification task that would incorporate such principle, adaptive feature selection. It is especially interesting because usually feature selection methods are not adaptive as they define a unique set of informative features for a task and use them for classifying all objects. However, an adaptive algorithm selects features that are the most informative for the particular input. Thus, the selection process should be driven by statistics of the environment concerning the current task and the object to be classified. Applied to a classification task, our adaptive feature selection algorithm favors features that maximally reduce the current class uncertainty, which is iteratively updated with values of the previously selected features that are observed on the testing sample. In information-theoretical terms, the selection criterion is the mutual information of a class variable and a feature-candidate conditioned on the already selected features, which take values observed on the current testing sample. Then, the main question investigated in this thesis is whether the proposed adaptive way of selecting features is advantageous over the conventional feature selection and in which situations. Further, we studied whether the proposed adaptive information-theoretical selection scheme, which is a computationally complex algorithm, is utilized by humans while they perform a visual classification task. For this, we constructed a psychophysical experiment where people had to select image parts that as they think are relevant for classification of these images. We present the analysis of behavioral data where we investigate whether human strategies of task-dependent selective attention can be explained by a simple ranker based on the mutual information, a more complex feature selection algorithm based on the conventional static mutual information and the proposed here adaptive feature selector that mimics a mechanism of the iterative hypothesis refinement. Hereby, the main contribution of this work is the adaptive feature selection criterion based on the conditional mutual information. Also it is shown that such adaptive selection strategy is indeed used by people while performing visual classification.:1. Introduction 2. Conventional feature selection 3. Adaptive feature selection 4. Experimental investigations of ACMIFS 5. Information-theoretical strategies of selective attention 6. Discussion Appendix Bibliograph
    • …
    corecore