4 research outputs found

    Attention Mechanism for Recognition in Computer Vision

    Get PDF
    It has been proven that humans do not focus their attention on an entire scene at once when they perform a recognition task. Instead, they pay attention to the most important parts of the scene to extract the most discriminative information. Inspired by this observation, in this dissertation, the importance of attention mechanism in recognition tasks in computer vision is studied by designing novel attention-based models. In specific, four scenarios are investigated that represent the most important aspects of attention mechanism.First, an attention-based model is designed to reduce the visual features\u27 dimensionality by selectively processing only a small subset of the data. We study this aspect of the attention mechanism in a framework based on object recognition in distributed camera networks. Second, an attention-based image retrieval system (i.e., person re-identification) is proposed which learns to focus on the most discriminative regions of the person\u27s image and process those regions with higher computation power using a deep convolutional neural network. Furthermore, we show how visualizing the attention maps can make deep neural networks more interpretable. In other words, by visualizing the attention maps we can observe the regions of the input image where the neural network relies on, in order to make a decision. Third, a model for estimating the importance of the objects in a scene based on a given task is proposed. More specifically, the proposed model estimates the importance of the road users that a driver (or an autonomous vehicle) should pay attention to in a driving scenario in order to have safe navigation. In this scenario, the attention estimation is the final output of the model. Fourth, an attention-based module and a new loss function in a meta-learning based few-shot learning system is proposed in order to incorporate the context of the task into the feature representations of the samples and increasing the few-shot recognition accuracy.In this dissertation, we showed that attention can be multi-facet and studied the attention mechanism from the perspectives of feature selection, reducing the computational cost, interpretable deep learning models, task-driven importance estimation, and context incorporation. Through the study of four scenarios, we further advanced the field of where \u27\u27attention is all you need\u27\u27

    Learning Multimodal Structures in Computer Vision

    Get PDF
    A phenomenon or event can be received from various kinds of detectors or under different conditions. Each such acquisition framework is a modality of the phenomenon. Due to the relation between the modalities of multimodal phenomena, a single modality cannot fully describe the event of interest. Since several modalities report on the same event introduces new challenges comparing to the case of exploiting each modality separately. We are interested in designing new algorithmic tools to apply sensor fusion techniques in the particular signal representation of sparse coding which is a favorite methodology in signal processing, machine learning and statistics to represent data. This coding scheme is based on a machine learning technique and has been demonstrated to be capable of representing many modalities like natural images. We will consider situations where we are not only interested in support of the model to be sparse, but also to reflect a-priorily known knowledge about the application in hand. Our goal is to extract a discriminative representation of the multimodal data that leads to easily finding its essential characteristics in the subsequent analysis step, e.g., regression and classification. To be more precise, sparse coding is about representing signals as linear combinations of a small number of bases from a dictionary. The idea is to learn a dictionary that encodes intrinsic properties of the multimodal data in a decomposition coefficient vector that is favorable towards the maximal discriminatory power. We carefully design a multimodal representation framework to learn discriminative feature representations by fully exploiting, the modality-shared which is the information shared by various modalities, and modality-specific which is the information content of each modality individually. Plus, it automatically learns the weights for various feature components in a data-driven scheme. In other words, the physical interpretation of our learning framework is to fully exploit the correlated characteristics of the available modalities, while at the same time leverage the modality-specific character of each modality and change their corresponding weights for different parts of the feature in recognition

    Wide-Area Control Schemes to Improve Small Signal Stability in Power Systems

    Get PDF
    One of the main concerns for the secure and reliable operation of power systems is the small signal stability problem. In the complex and highly interconnected structure of future power systems, relying solely on operator responses and conventional controls cannot assure reliability. Therefore, there is a need for advanced Wide-Area Control Schemes (WACS) that can automatically respond to degradation of reliability in the system. The main objective of this dissertation is to address two key challenges regarding the design and implementation of wide-area control schemes for damping inter-area oscillations. First is the high communication cost associated with optimal centralized control approaches. As power networks are large-scale systems, both the synthesis and the implementation of centralized controllers suggested by most of the previous studies are often impossible in practice. Second is the difficulty of obtaining accurate system-wide dynamic models for initiating and updating the control design. In this research, we introduced wide-area damping control strategies that not only ensure the small signal stability with the desired performance but also consider communication and model information limitations in the design. A state feedback formulation is proposed that aims to simultaneously optimize a standard Linear Quadratic Regulator (LQR) cost criterion and induce a pre-defined communication structure. We solved the proposed problem with three different objectives to target a specific wide-area damping control design challenge in each setting. First, the communication structure is enforced as a constraint in the optimization and solved for a large idealized power network with information symmetry. Second, to make the method suitable for systems with arbitrary structures and information patterns, we proposed a group-sparse regularization to be added to the optimization cost function. Applications of the method for inducing the desired communication network and finding effective measurement and control signal combinations were also investigated. Third, we paired the proposed optimal control with a real-time model identification approach, to create a wide-area control framework that is capable of dealing with model information limitations and inaccuracies in online implementation. The performances of the proposed wide-area damping control architectures are validated through nonlinear simulations on different test systems