183 research outputs found

    A Comprehensive Approach to Automated Sign Language Translation

    Get PDF
    Many sign languages are bonafide natural languages with grammatical rules and lexicons, hence can benefit from neural machine translation methods. As significant advances are being made in natural language processing (specifically neural machine translation) and in computer vision processes, specifically image and video captioning, related methods can be further researched to boost automated sign language understanding. This is an especially challenging AI research area due to the involvement of a continuous visual-spatial modality, where meaning is often derived from context. To this end, this thesis is focused on the study and development of new computational methods and training mechanisms to enhance sign language translation in two directions, signs to texts and texts to signs. This work introduces a new, realistic phrase-level American Sign Language dataset (ASL/ ASLing), and investigates the role of different types of visual features (CNN embeddings, human body keypoints, and optical flow vectors) in translating ASL to spoken American English. Additionally, the research considers the role of multiple features for improved translation, via various fusion architectures. As an added benefit, with continuous sign language being challenging to segment, this work also explores the use of overlapping scaled visual segments, across the video, for simultaneously segmenting and translating signs. Finally, a quintessential interpreting agent not only understands sign language and translates to text, but also understands the text and translates to signs. Hence, to facilitate two-way sign language communication, i.e. visual sign to spoken language translation and spoken to visual sign language translation, a dual neural machine translation model, SignNet, is presented. Various training paradigms are investigated for improved translation, using SignNet. By exploiting the notion of similarity (and dissimilarity) of visual signs, a metric embedding learning process proved most useful in training SignNet. The resulting processes outperformed their state-of-the-art counterparts by showing noteworthy improvements in BLEU 1 - BLEU 4 scores

    Time-Series Embedded Feature Selection Using Deep Learning: Data Mining Electronic Health Records for Novel Biomarkers

    Get PDF
    As health information technologies continue to advance, routine collection and digitisation of patient health records in the form of electronic health records present as an ideal opportunity for data-mining and exploratory analysis of biomarkers and risk factors indicative of a potentially diverse domain of patient outcomes. Patient records have continually become more widely available through various initiatives enabling open access whilst maintaining critical patient privacy. In spite of such progress, health records remain not widely adopted within the current clinical statistical analysis domain due to challenging issues derived from such “big data”.Deep learning based temporal modelling approaches present an ideal solution to health record challenges through automated self-optimisation of representation learning, able to man-ageably compose the high-dimensional domain of patient records into data representations able to model complex data associations. Such representations can serve to condense and reduce dimensionality to emphasise feature sparsity and importance through novel embedded feature selection approaches. Accordingly, application towards patient records enable complex mod-elling and analysis of the full domain of clinical features to select biomarkers of predictive relevance.Firstly, we propose a novel entropy regularised neural network ensemble able to highlight risk factors associated with hospitalisation risk of individuals with dementia. The application of which, was able to reduce a large domain of unique medical events to a small set of relevant risk factors able to maintain hospitalisation discrimination.Following on, we continue our work on ensemble architecture approaches with a novel cas-cading LSTM ensembles to predict severe sepsis onset within critical patients in an ICU critical care centre. We demonstrate state-of-the-art performance capabilities able to outperform that of current related literature.Finally, we propose a novel embedded feature selection application dubbed 1D convolu-tion feature selection using sparsity regularisation. Said methodology was evaluated on both domains of dementia and sepsis prediction objectives to highlight model capability and generalisability. We further report a selection of potential biomarkers for the aforementioned case study objectives highlighting clinical relevance and potential novelty value for future clinical analysis.Accordingly, we demonstrate the effective capability of embedded feature selection ap-proaches through the application of temporal based deep learning architectures in the discovery of effective biomarkers across a variety of challenging clinical applications

    Attribute Selection for Unsupervised and Language Independent Classification of Documents

    Get PDF
    Raw text documents are the most common way documents are written, that is, unstruc- tured text. So, they contain most of the information available. Thus, it is desirable that there are tools capable of extracting the core content of each document and, through it, identify the group to which it belongs, since in unstructured texts there is usually no fore- seen place for indicating the document class. Nowadays, English is not the only language documents appear in the available repositories. This suggests the construction of tools that, if possible, do not depend on the language in which the texts are written, which is a challenge. This dissertation focuses mainly on clustering documents according to their content, using no class labels, that is, unsupervised clustering. It aims to mine and to create features from text in order to achieve that purpose. It is also intended to classify new doc- uments, in a supervised approach, according to the classes identified in the unsupervised training phase. In order to solve this, the proposed solution finds the best features inside the docu- ments, and uses their discriminative power to provide clustering. In order to summarise the core content of each cluster found by this approach, key expressions are automatically extracted from their documents.Documentos de texto bruto são a forma mais comum de escrita de documentos, ou seja, texto não estruturado. Assim, eles contêm a maioria das informações disponíveis. Deste modo, é desejável que existam ferramentas capazes de extrair o conteúdo mais importante de um documento e, por este meio, identificar o grupo ao qual o documento pertence, pois em textos não estruturados geralmente não há uma previsão de indicação da classe do mesmo. Atualmente, o Inglês não é a única linguagem em que os documentos aparecem nos repositórios disponíveis. Isto sugere a construção de ferramentas que, se possível, não dependam da linguagem em que os textos são escritos, sendo isto um desafio. Esta dissertação foca-se principalmente em agrupar os documentos de acordo com o seu conteúdo, sem usar rótulos de classes, ou seja, agrupamento não supervisionado. O objetivo será alcançado através da extração e criação de atributos a partir do texto. Pretende-se também classificar novos documentos, numa abordagem supervisionada, de acordo com as classes identificadas na fase de treino não supervisionado. De modo a tentar resolver este problema, é proposta uma solução que encontra os melhores atributos nos documentos, e usa o poder discriminativo das mesmas para fa- zer o agrupamento. De modo a sumarizar o conteúdo principal destes agrupamentos, expressões chave são automaticamente extraídas dos documentos

    Economics of Conflict and Terrorism

    Get PDF
    This book contributes to the literature on conflict and terrorism through a selection of articles that deal with theoretical, methodological and empirical issues related to the topic. The papers study important problems, are original in their approach and innovative in the techniques used. This will be useful for researchers in the fields of game theory, economics and political sciences

    Extending the Work of DT-Fixup: Examining the Effects of PowerNorm and MADGRAD Optimization on DT-Fixup Performance

    Get PDF
    With the introduction of the attention technique, the Bidirectional Encoder Representations from Transformers (BERT) have greatly advanced the study of solving sequence-to-sequence tasks in Natural Language Processing (NLP). When the task-specific annotations are limited, the NLP tasks are commonly performed by pre-training a model using the transformer technique on large-scale general corpora, followed by fine-tuning the model on domain-specific data. Instead of using shallow neural components for fine-tuning, additional transformer layers could be introduced into the architecture. Recent research shows that, by resolving some initialization and optimization issues, these augmented transformer layers could lead to performance gains despite of the limited size of the available data, and this can be successful, especially for well-structured data. Along this direction, we will perform comprehensive experiments on the DT-Fixup algorithm which is designed to mitigate mentioned issues. For possible performance improvement on DT-Fixup, we propose to study the applicability of the power normalization and Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization (MADGRAD) in this setting. This is motivated by the recent literature which shows that, stemming from batch normalization widely adopted in the area of computer vision, power normalization is shown to outperform the layer normalization usually found in the transformers. In the family of AdaGrad adaptive gradient methods, MADGRAD is a new optimization technique that performs exceptionally well on deep learning optimization problems from a variety of fields, including classification and image-to-image tasks in vision and recurrent and bidirectionally-masked models in natural language processing. Even on issues where adaptive methods typically perform badly, MADGRAD matches or beats both SGD and ADAM in test set performance for each of these tasks. This research will be performed on ReClor, and LogiQA datasets selected according to its structure

    Visual Concept Detection in Images and Videos

    Get PDF
    The rapidly increasing proliferation of digital images and videos leads to a situation where content-based search in multimedia databases becomes more and more important. A prerequisite for effective image and video search is to analyze and index media content automatically. Current approaches in the field of image and video retrieval focus on semantic concepts serving as an intermediate description to bridge the “semantic gap” between the data representation and the human interpretation. Due to the large complexity and variability in the appearance of visual concepts, the detection of arbitrary concepts represents a very challenging task. In this thesis, the following aspects of visual concept detection systems are addressed: First, enhanced local descriptors for mid-level feature coding are presented. Based on the observation that scale-invariant feature transform (SIFT) descriptors with different spatial extents yield large performance differences, a novel concept detection system is proposed that combines feature representations for different spatial extents using multiple kernel learning (MKL). A multi-modal video concept detection system is presented that relies on Bag-of-Words representations for visual and in particular for audio features. Furthermore, a method for the SIFT-based integration of color information, called color moment SIFT, is introduced. Comparative experimental results demonstrate the superior performance of the proposed systems on the Mediamill and on the VOC Challenge. Second, an approach is presented that systematically utilizes results of object detectors. Novel object-based features are generated based on object detection results using different pooling strategies. For videos, detection results are assembled to object sequences and a shot-based confidence score as well as further features, such as position, frame coverage or movement, are computed for each object class. These features are used as additional input for the support vector machine (SVM)-based concept classifiers. Thus, other related concepts can also profit from object-based features. Extensive experiments on the Mediamill, VOC and TRECVid Challenge show significant improvements in terms of retrieval performance not only for the object classes, but also in particular for a large number of indirectly related concepts. Moreover, it has been demonstrated that a few object-based features are beneficial for a large number of concept classes. On the VOC Challenge, the additional use of object-based features led to a superior performance for the image classification task of 63.8% mean average precision (AP). Furthermore, the generalization capabilities of concept models are investigated. It is shown that different source and target domains lead to a severe loss in concept detection performance. In these cross-domain settings, object-based features achieve a significant performance improvement. Since it is inefficient to run a large number of single-class object detectors, it is additionally demonstrated how a concurrent multi-class object detection system can be constructed to speed up the detection of many object classes in images. Third, a novel, purely web-supervised learning approach for modeling heterogeneous concept classes in images is proposed. Tags and annotations of multimedia data in the WWW are rich sources of information that can be employed for learning visual concepts. The presented approach is aimed at continuous long-term learning of appearance models and improving these models periodically. For this purpose, several components have been developed: a crawling component, a multi-modal clustering component for spam detection and subclass identification, a novel learning component, called “random savanna”, a validation component, an updating component, and a scalability manager. Only a single word describing the visual concept is required to initiate the learning process. Experimental results demonstrate the capabilities of the individual components. Finally, a generic concept detection system is applied to support interdisciplinary research efforts in the field of psychology and media science. The psychological research question addressed in the field of behavioral sciences is, whether and how playing violent content in computer games may induce aggression. Therefore, novel semantic concepts most notably “violence” are detected in computer game videos to gain insights into the interrelationship of violent game events and the brain activity of a player. Experimental results demonstrate the excellent performance of the proposed automatic concept detection approach for such interdisciplinary research

    Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations

    Get PDF
    Institute for Adaptive and Neural ComputationNon-parametric models and techniques enjoy a growing popularity in the field of machine learning, and among these Bayesian inference for Gaussian process (GP) models has recently received significant attention. We feel that GP priors should be part of the standard toolbox for constructing models relevant to machine learning in the same way as parametric linear models are, and the results in this thesis help to remove some obstacles on the way towards this goal. In the first main chapter, we provide a distribution-free finite sample bound on the difference between generalisation and empirical (training) error for GP classification methods. While the general theorem (the PAC-Bayesian bound) is not new, we give a much simplified and somewhat generalised derivation and point out the underlying core technique (convex duality) explicitly. Furthermore, the application to GP models is novel (to our knowledge). A central feature of this bound is that its quality depends crucially on task knowledge being encoded faithfully in the model and prior distributions, so there is a mutual benefit between a sharp theoretical guarantee and empirically well-established statistical practices. Extensive simulations on real-world classification tasks indicate an impressive tightness of the bound, in spite of the fact that many previous bounds for related kernel machines fail to give non-trivial guarantees in this practically relevant regime. In the second main chapter, sparse approximations are developed to address the problem of the unfavourable scaling of most GP techniques with large training sets. Due to its high importance in practice, this problem has received a lot of attention recently. We demonstrate the tractability and usefulness of simple greedy forward selection with information-theoretic criteria previously used in active learning (or sequential design) and develop generic schemes for automatic model selection with many (hyper)parameters. We suggest two new generic schemes and evaluate some of their variants on large real-world classification and regression tasks. These schemes and their underlying principles (which are clearly stated and analysed) can be applied to obtain sparse approximations for a wide regime of GP models far beyond the special cases we studied here
    corecore