3,525 research outputs found

    Data driven techniques for modal decomposition and reduced-order modelling of fluids

    Get PDF
    In this thesis, a number of data-driven techniques are proposed for the analysis and extraction of reduced-order models of fluid flows. Throughout the thesis, there has been an emphasis on the practicality and interpretability of data-driven feature-extraction techniques to aid practitioners in flow-control and estimation. The first contribution uses a graph theoretic approach to analyse the similarity of modes extracted using data-driven modal decomposition algorithms to give a more intuitive understanding of the degrees of freedom in the underlying system. The method extracts clusters of spatially and spectrally similar modes by post-processing the modes extracted using DMD and its variants. The second contribution proposes a method for extracting coherent structures, using snapshots of high dimensional measurements, that can be mapped to a low dimensional output of the system. The importance of finding such coherent structures is that in the context of active flow control and estimation, the practitioner often has to rely on a limited number of measurable outputs to estimate the state of the flow. Therefore, ensuring that the extracted flow features can be mapped to the measured outputs of the system can be beneficial for estimating the state of the flow. The third contribution concentrates on using neural networks for exploiting the nonlinear relationships amongst linearly extracted modal time series to find a reduced order state, which can then be used for modelling the dynamics of the flow. The method utilises recurrent neural networks to find an encoding of a high dimensional set of modal time series, and fully connected neural networks to find a mapping between the encoded state and the physically interpretable modal coefficients. As a result of this architecture, the significantly reduced-order representation maintains an automatically extracted relationship to a higher-dimensional, interpretable state.Open Acces

    Deep Architectures for Visual Recognition and Description

    Get PDF
    In recent times, digital media contents are inherently of multimedia type, consisting of the form text, audio, image and video. Several of the outstanding computer Vision (CV) problems are being successfully solved with the help of modern Machine Learning (ML) techniques. Plenty of research work has already been carried out in the field of Automatic Image Annotation (AIA), Image Captioning and Video Tagging. Video Captioning, i.e., automatic description generation from digital video, however, is a different and complex problem altogether. This study compares various existing video captioning approaches available today and attempts their classification and analysis based on different parameters, viz., type of captioning methods (generation/retrieval), type of learning models employed, the desired output description length generated, etc. This dissertation also attempts to critically analyze the existing benchmark datasets used in various video captioning models and the evaluation metrics for assessing the final quality of the resultant video descriptions generated. A detailed study of important existing models, highlighting their comparative advantages as well as disadvantages are also included. In this study a novel approach for video captioning on the Microsoft Video Description (MSVD) dataset and Microsoft Video-to-Text (MSR-VTT) dataset is proposed using supervised learning techniques to train a deep combinational framework, for achieving better quality video captioning via predicting semantic tags. We develop simple shallow CNN (2D and 3D) as feature extractors, Deep Neural Networks (DNNs and Bidirectional LSTMs (BiLSTMs) as tag prediction models and Recurrent Neural Networks (RNNs) (LSTM) model as the language model. The aim of the work was to provide an alternative narrative to generating captions from videos via semantic tag predictions and deploy simpler shallower deep model architectures with lower memory requirements as solution so that it is not very memory extensive and the developed models prove to be stable and viable options when the scale of the data is increased. This study also successfully employed deep architectures like the Convolutional Neural Network (CNN) for speeding up automation process of hand gesture recognition and classification of the sign languages of the Indian classical dance form, ‘Bharatnatyam’. This hand gesture classification is primarily aimed at 1) building a novel dataset of 2D single hand gestures belonging to 27 classes that were collected from (i) Google search engine (Google images), (ii) YouTube videos (dynamic and with background considered) and (iii) professional artists under staged environment constraints (plain backgrounds). 2) exploring the effectiveness of CNNs for identifying and classifying the single hand gestures by optimizing the hyperparameters, and 3) evaluating the impacts of transfer learning and double transfer learning, which is a novel concept explored for achieving higher classification accuracy

    Pedestrian Attribute Recognition: A Survey

    Full text link
    Recognizing pedestrian attributes is an important task in computer vision community due to it plays an important role in video surveillance. Many algorithms has been proposed to handle this task. The goal of this paper is to review existing works using traditional methods or based on deep learning networks. Firstly, we introduce the background of pedestrian attributes recognition (PAR, for short), including the fundamental concepts of pedestrian attributes and corresponding challenges. Secondly, we introduce existing benchmarks, including popular datasets and evaluation criterion. Thirdly, we analyse the concept of multi-task learning and multi-label learning, and also explain the relations between these two learning algorithms and pedestrian attribute recognition. We also review some popular network architectures which have widely applied in the deep learning community. Fourthly, we analyse popular solutions for this task, such as attributes group, part-based, \emph{etc}. Fifthly, we shown some applications which takes pedestrian attributes into consideration and achieve better performance. Finally, we summarized this paper and give several possible research directions for pedestrian attributes recognition. The project page of this paper can be found from the following website: \url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey: https://sites.google.com/view/ahu-pedestrianattributes

    Knowledge extraction from the learning of sequences in a long short term memory (LSTM) architecture

    Get PDF
    International audienceTransparency and trust in machine learning algorithms have been deemed to be fundamental and yet, from a practical point of view, they remain difficult to implement. Particularly, explainability and interpretability are certainly among the most difficult capabilities to be addressed and imply to be able to understand a decision in terms of simple cues and rules. In this article, we address this specific problem in the context of sequence learning by recurrent neuronal models (and more specifically Long Short Term Memory model). We introduce a general method to extract knowledge from the latent space based on the clustering of the internal states. From these hidden states, we explain how to build and validate an automaton that corresponds to the underlying (unknown) grammar, and allows to predict if a given sequence is valid or not. Finally, we show that it is possible for such complex recurrent model, to extract the knowledge that is implicitly encoded in the sequences and we report a high rate of recognition of the sequences extracted from the original grammar. This method is illustrated on artificial grammars (Reber grammar variants) as well as on a real use-case in the electrical domain, whose underlying grammar is unknown

    Evolution of A Common Vector Space Approach to Multi-Modal Problems

    Get PDF
    A set of methods to address computer vision problems has been developed. Video un- derstanding is an activate area of research in recent years. If one can accurately identify salient objects in a video sequence, these components can be used in information retrieval and scene analysis. This research started with the development of a course-to-fine frame- work to extract salient objects in video sequences. Previous work on image and video frame background modeling involved methods that ranged from simple and efficient to accurate but computationally complex. It will be shown in this research that the novel approach to implement object extraction is efficient and effective that outperforms the existing state-of-the-art methods. However, the drawback to this method is the inability to deal with non-rigid motion. With the rapid development of artificial neural networks, deep learning approaches are explored as a solution to computer vision problems in general. Focusing on image and text, the image (or video frame) understanding can be achieved using CVS. With this concept, modality generation and other relevant applications such as automatic im- age description, text paraphrasing, can be explored. Specifically, video sequences can be modeled by Recurrent Neural Networks (RNN), the greater depth of the RNN leads to smaller error, but that makes the gradient in the network unstable during training.To overcome this problem, a Batch-Normalized Recurrent Highway Network (BNRHN) was developed and tested on the image captioning (image-to-text) task. In BNRHN, the highway layers are incorporated with batch normalization which diminish the gradient vanishing and exploding problem. In addition, a sentence to vector encoding framework that is suitable for advanced natural language processing is developed. This semantic text embedding makes use of the encoder-decoder model which is trained on sentence paraphrase pairs (text-to-text). With this scheme, the latent representation of the text is shown to encode sentences with common semantic information with similar vector rep- resentations. In addition to image-to-text and text-to-text, an image generation model is developed to generate image from text (text-to-image) or another image (image-to- image) based on the semantics of the content. The developed model, which refers to the Multi-Modal Vector Representation (MMVR), builds and encodes different modalities into a common vector space that achieve the goal of keeping semantics and conversion between text and image bidirectional. The concept of CVS is introduced in this research to deal with multi-modal conversion problems. In theory, this method works not only on text and image, but also can be generalized to other modalities, such as video and audio. The characteristics and performance are supported by both theoretical analysis and experimental results. Interestingly, the MMVR model is one of the many possible ways to build CVS. In the final stages of this research, a simple and straightforward framework to build CVS, which is considered as an alternative to the MMVR model, is presented
    • …
    corecore