6 research outputs found

    Array of Multilayer Perceptrons with No-class Resampling Training for Face Recognition

    Get PDF
    A face recognition (FR) problem involves the face detection, representation and classification steps. Once a face is located in an image, it has to be represented through a feature extraction process, for later performing a proper face classication task. The most widely used approach for feature extraction is the eigenfaces method, where an eigenspace is established from the image training samples using principal components analysis.In the classification phase, an input face is projected to the obtained eigenspace and classified by an appropriate classifier. Neural network classifiers based on multilayer perceptron models have proven to be well suited to this task. This paper presents an array of multilayer perceptron neural networks trained with a novel no-class resampling strategy which takes into account the balance problem between class and no-class examples andincreases the generalization capabilities. The proposed model is compared against a classical multilayer perceptron classifier for face recognition over the AT&T database of faces, obtaining results that show an improvement over the classification rates of a classical classifier.Fil: Capello, D.. Universidad TecnolĂłgica Nacional. Facultad Regional Santa Fe; ArgentinaFil: MartĂ­nez, CĂ©sar Ernesto. Universidad Nacional del Litoral. Facultad de IngenierĂ­a y Ciencias HĂ­dricas; ArgentinaFil: Milone, Diego Humberto. Universidad Nacional de Entre RĂ­os; ArgentinaFil: Stegmayer, Georgina. Universidad Nacional del Litoral. Facultad de IngenierĂ­a y Ciencias HĂ­dricas; Argentin

    Predicting from aggregated data

    Get PDF
    Aggregated data, which refers to a collection of data summarized from multiple sources, is a technique commonly used in different fields of research including healthcare, web application, and sensor network. Aggregated data is often employed to handle issues such as privacy, scalability, and reliability. However, accurately predicting individual outcomes from grouped datasets can be very difficult. In this thesis, we designed a new learning method, a Mixture of Expert (MoE) model, focused on individual-level prediction when training variables are aggregated. We utilized the MoE model, trained and validated using the eICU Collaborative Research patient datasets, to conduct a series of studies. Our results showed that applying grouping functions to the classification of aggregated data across demographic and behavior metrics could remain effective. This technique was verified by comparing two separately trained MoE models that were evaluated on the same datasets. Finally, we estimated non-aggregated datasets from spatio-temporal aggregated records by expressing the problem into the frequency domain, and trained an autoregressive model for predicting future stock prices. This process can be repeated, offering a potential solution to the issue of learning from aggregated data.Ope

    Low Resolution Face Recognition Using Mixture of Experts

    Get PDF
    Abstract-Human activity is a major concern in a wide variety of applications, such as video surveillance, human computer interface and face image database management. Detecting and recognizing faces is a crucial step in these applications. Furthermore, major advancements and initiatives in security applications in the past years have propelled face recognition technology into the spotlight. The performance of existing face recognition systems declines significantly if the resolution of the face image falls below a certain level. This is especially critical in surveillance imagery where often, due to many reasons, only low-resolution video of faces is available. If these low-resolution images are passed to a face recognition system, the performance is usually unacceptable. Hence, resolution plays a key role in face recognition systems. In this paper we introduce a new low resolution face recognition system based on mixture of expert neural networks. In order to produce the low resolution input images we down-sampled the 48 × 48 ORL images to 12 × 12 ones using the nearest neighbor interpolation method and after that applying the bicubic interpolation method yields enhanced images which is given to the Principal Component Analysis feature extractor system. Comparison with some of the most related methods indicates that the proposed novel model yields excellent recognition rate in low resolution face recognition that is the recognition rate of 100% for the training set and 96.5% for the test set

    Visual question answering with modules and language modeling

    Get PDF
    L’objectif principal de cette thĂšse est d’apprendre les reprĂ©sentations modulaires pour la tĂąche de rĂ©ponse visuelle aux questions (VQA). Apprendre de telles reprĂ©sentations a le potentiel de gĂ©nĂ©raliser au raisonnement d’ordre supĂ©rieur qui prĂ©vaut chez l’ĂȘtre humain. Le chapitre 1 traite de la littĂ©rature relative Ă  VQA, aux rĂ©seaux modulaires et Ă  l’optimisation de la structure neuronale. En particulier, les diffĂ©rents ensembles de donnĂ©es proposĂ©s pour Ă©tudier cette tĂąche y sont dĂ©taillĂ©s. Les modĂšles de VQA peuvent ĂȘtre classĂ©s en deux catĂ©gories en fonction des jeux de donnĂ©es auxquels ils conviennent. La premiĂšre porte sur les questions ouvertes sur les images naturelles. Ces questions concernent principalement quelques objets/personnes prĂ©sents dans l’image et n’exigent aucune capacitĂ© de raisonnement significative pour y rĂ©pondre. La deuxiĂšme catĂ©gorie comprend des questions (principalement sur des images synthĂ©tiques) qui testent la capacitĂ© des modĂšles Ă  effectuer un raisonnement compositionnel. Nous discutons de diffĂ©rentes variantes architecturales de rĂ©seaux de modules neuronaux (NMN). Finalement nous discutons des approches pour apprendre les structures ou modules de rĂ©seau neuronal pour des tĂąches autres que VQA. Au chapitre 2, nous dĂ©crivons un moyen d’exĂ©cuter de maniĂšre parcimonieuse un modĂšle CNN (ResNeXt [110]) et d’enregistrer les calculs effectuĂ©s dans le processus. Ici, nous avons utilisĂ© un mĂ©lange de formulations d’experts pour n’exĂ©cuter que les K meilleurs experts dans chaque bloc convolutionnel. Le groupe d’experts le plus important est sĂ©lectionnĂ© sur la base d’un contrĂŽleur qui utilise un systĂšme d’attention guidĂ© par une question suivie de couches entiĂšrement connectĂ©es dans le but d’attribuer des poids Ă  l’ensemble d’experts. Nos expĂ©riences montrent qu’il est possible de rĂ©aliser des Ă©conomies Ă©normes sur le nombre de FLOP avec un impact minimal sur la performance. Le chapitre 3 est un prologue du chapitre 4. Il mentionne les contributions clĂ©s et fournit une introduction au problĂšme de recherche que nous essayons de traiter dans l’article. Le chapitre 4 contient le contenu de l’article. Ici, nous nous intĂ©ressons Ă  l’apprentissage de la structure interne des modules pour les rĂ©seaux de modules neuronaux (NMN) [3, 37]. Nous introduisons une nouvelle forme de structure de module qui utilise des opĂ©rations arithmĂ©tiques Ă©lĂ©mentaires et la tĂąche consiste maintenant Ă  connaĂźtre les poids de ces opĂ©rations pour former la structure de module. Nous plaçons le problĂšme dans une technique d’optimisation Ă  deux niveaux, dans laquelle le modĂšle prend des gradients de descente alternĂ©s dans l’architecture et des espaces de poids. Le chapitre 5 traite d’autres expĂ©riences et Ă©tudes d’ablation rĂ©alisĂ©es dans le contexte de l’article prĂ©cĂ©dent. La plupart des travaux dans la littĂ©rature utilisent un rĂ©seau de neurones rĂ©current tel que LSTM [33] ou GRU [13] pour modĂ©liser les caractĂ©ristiques de la question. Cependant, les LSTM peuvent Ă©chouer Ă  encoder correctement les caractĂ©ristiques syntaxiques de la question qui pourraient ĂȘtre essentielles [87]. RĂ©cemment, [76] a montrĂ© l’utilitĂ© de la modĂ©lisation du langage pour rĂ©pondre aux questions. Avec cette motivation, nous essayons d’apprendre un meilleur modĂšle linguistique qui peut ĂȘtre formĂ© de maniĂšre non supervisĂ©e. Dans le chapitre 6, nous dĂ©crivons un rĂ©seau rĂ©cursif de modĂ©lisation de langage dont la structure est alignĂ©e pour le langage naturel. Plus techniquement, nous utilisons un modĂšle d’analyse non supervisĂ©e (Parsing Reading Predict Network ou PPRN [86]) et augmentons son Ă©tape de prĂ©diction avec un modĂšle TreeLSTM [99] qui utilise l’arborescence intermĂ©diaire fournie par le modĂšle PRPN dans le but de un Ă©tat cachĂ© en utilisant la structure arborescente. L’étape de prĂ©diction du modĂšle PRPN utilise l’état cachĂ©, qui est une combinaison pondĂ©rĂ©e de l’état cachĂ© du TreeLSTM et de celui obtenu Ă  partir d’une attention structurĂ©e. De cette façon, le modĂšle peut effectuer une analyse non supervisĂ©e et capturer les dĂ©pendances Ă  long terme, car la structure existe maintenant explicitement dans le modĂšle. Nos expĂ©riences dĂ©montrent que ce modĂšle conduit Ă  une amĂ©lioration de la tĂąche de modĂ©lisation du langage par rapport au rĂ©fĂ©rentiel PRPN sur le jeu de donnĂ©es Penn Treebank.The primary focus in this thesis is to learn modularized representations for the task of Visual Question Answering. Learning such representations holds the potential to generalize to higher order reasoning as is prevalent in human beings. Chapter 1 discusses the literature related to VQA, modular networks and neural structure optimization. In particular, it first details different datasets proposed to study this task. The models for VQA can be categorized into two categories based on the datasets they are suitable for. The first one is open-ended questions about natural images. These questions are mostly about a few objects/persons present in the image and don’t require any significant reasoning capability to answer them. The second category comprises of questions (mostly on synthetic images) which tests the ability of models to perform compositional reasoning. We discuss the different architectural variants of Neural Module Networks (NMN). Finally, we discuss approaches to learn the neural network structures or modules for tasks other than VQA. In Chapter 2, we discuss a way to sparsely execute a CNN model (ResNeXt [110]) and save computations in the process. Here, we used a mixture of experts formulation to execute only the top-K experts in each convolutional block. The most important set of experts are selected based on a gate controller which uses a question-guided attention map followed by fully-connected layers to assign weights to the set of experts. Our experiments show that it is possible to get huge savings in the FLOP count with only a minimal degradation in performance. Chapter 3 is a prologue to Chapter 4. It mentions the key contributions and provides an introduction to the research problem which we try to address in the article. Chapter 4 contains the contents of the article. Here, we are interested in learning the internal structure of the modules for Neural Module Networks (NMN) [3, 37]. We introduce a novel form of module structure which uses elementary arithmetic operations and now the task is to learn the weights of these operations to form the module structure. We cast the problem into a bi-level optimization technique in which the model takes alternating gradient descent steps in architecture and weight spaces. Chapter 5 discusses additional experiments and ablation studies that were done in the context of the previous article. Most works in the literature use a recurrent neural network like LSTM [33] or GRU [13] to model the question features. However, LSTMs can fail to properly encode syntactic features of the question which could be vital to answering some VQA questions [87]. Recently, [76] has shown the utility of language modeling for question-answering. With this motivation, we try to learn a better language model which can be trained in an unsupervised manner. In Chapter 6, we discuss a recursive network for language modeling whose structure aligns with the natural language. More technically, we make use of an unsupervised parsing model (Parsing Reading Predict Network or PPRN [86]) and augment its prediction step with a TreeLSTM [99] model which makes use of the intermediate tree structure given by PRPN model to output a hidden state by utilizing the tree structure. The predict step of PRPN model makes use of a hidden state which is a weighted combination of the TreeLSTM’s hidden state and the one obtained from structured attention. This way it helps the model to do unsupervised parsing and also capture long-term dependencies as the structure now explicitly exists in the model. Our experiments demonstrate that this model leads to improvement on language modeling task over the PRPN baseline on Penn Treebank dataset

    Developing and evaluating second level teachers’ technology integration in classroom practice

    Get PDF
    Over the past twenty years, significant advances have been made in addressing intrinsic and extrinsic barriers to technology integration that has led to an increased use of technology-supported teaching and learning in the classroom. However, a key challenge remains in the design and implementation of professional development programmes, for both pre-service and in-service teachers, which can increase the impact of technology-enhanced classroom practices. This thesis presents three studies that examined second level teachers’ technology integration in their classroom practices. The first and second case studies discusses the practices of two cohorts of teachers (n=15). The first cohort were awarded a set of tablet devices for a whole year group, the second cohort were provided tablets by the research team for one academic term. The final study, which built upon the findings in case studies one and two discusses the design and implementation of an undergraduate module for second level pre-service science teachers (n=10), with no prior teaching experience, to extend their technological pedagogical knowledge. These studies present data collected from teacher’s lesson plans, interviews, and independent classroom observations. The Technological Pedagogical Content Knowledge Framework (TPACK), proposed by Koehler & Mishra (2006), was used as an operational framework to discuss teacher’s classroom practices. The results of this thesis highlight that even though the barriers to technology integration have been significantly reduced, and in some cases eliminated, teachers continue to struggle to integrate technology in their pedagogical practices. While school-based professional development was shown to increase in-service teachers’ use of technology-enhanced strategies, the teachers’ felt they required significantly more support, both to design and implement changes in their classroom practices. The pre-service teachers believed the exposure to new technologies and tools enhanced their confidence and attitudes to integrating technology in their pedagogical approaches. However, observations from a micro-teaching observation with this cohort illustrated that these pre-service teachers had good levels of technology literacy but generally low TPACK levels. This research has focussed on the teachers’ approaches to technology-enhanced classroom practices, however further research needs to be conducted to examine the impact on student learning. In addition, it highlights the need for extended studies on the design and implementation of different models for professional learning programmes that can impact on the technology-enhanced classroom practices of both pre- and in-service teachers
    corecore