455 research outputs found

    Learning visual representations with neural networks for video captioning and image generation

    Full text link
    La recherche sur les réseaux de neurones a permis de réaliser de larges progrès durant la dernière décennie. Non seulement les réseaux de neurones ont été appliqués avec succès pour résoudre des problèmes de plus en plus complexes; mais ils sont aussi devenus l’approche dominante dans les domaines où ils ont été testés tels que la compréhension du langage, les agents jouant à des jeux de manière automatique ou encore la vision par ordinateur, grâce à leurs capacités calculatoires et leurs efficacités statistiques. La présente thèse étudie les réseaux de neurones appliqués à des problèmes en vision par ordinateur, où les représentations sémantiques abstraites jouent un rôle fondamental. Nous démontrerons, à la fois par la théorie et par l’expérimentation, la capacité des réseaux de neurones à apprendre de telles représentations à partir de données, avec ou sans supervision. Le contenu de la thèse est divisé en deux parties. La première partie étudie les réseaux de neurones appliqués à la description de vidéo en langage naturel, nécessitant l’apprentissage de représentation visuelle. Le premier modèle proposé permet d’avoir une attention dynamique sur les différentes trames de la vidéo lors de la génération de la description textuelle pour de courtes vidéos. Ce modèle est ensuite amélioré par l’introduction d’une opération de convolution récurrente. Par la suite, la dernière section de cette partie identifie un problème fondamental dans la description de vidéo en langage naturel et propose un nouveau type de métrique d’évaluation qui peut être utilisé empiriquement comme un oracle afin d’analyser les performances de modèles concernant cette tâche. La deuxième partie se concentre sur l’apprentissage non-supervisé et étudie une famille de modèles capables de générer des images. En particulier, l’accent est mis sur les “Neural Autoregressive Density Estimators (NADEs), une famille de modèles probabilistes pour les images naturelles. Ce travail met tout d’abord en évidence une connection entre les modèles NADEs et les réseaux stochastiques génératifs (GSN). De plus, une amélioration des modèles NADEs standards est proposée. Dénommés NADEs itératifs, cette amélioration introduit plusieurs itérations lors de l’inférence du modèle NADEs tout en préservant son nombre de paramètres. Débutant par une revue chronologique, ce travail se termine par un résumé des récents développements en lien avec les contributions présentées dans les deux parties principales, concernant les problèmes d’apprentissage de représentation sémantiques pour les images et les vidéos. De prometteuses directions de recherche sont envisagées.The past decade has been marked as a golden era of neural network research. Not only have neural networks been successfully applied to solve more and more challenging real- world problems, but also they have become the dominant approach in many of the places where they have been tested. These places include, for instance, language understanding, game playing, and computer vision, thanks to neural networks’ superiority in computational efficiency and statistical capacity. This thesis applies neural networks to problems in computer vision where high-level and semantically meaningful representations play a fundamental role. It demonstrates both in theory and in experiment the ability to learn such representations from data with and without supervision. The main content of the thesis is divided into two parts. The first part studies neural networks in the context of learning visual representations for the task of video captioning. Models are developed to dynamically focus on different frames while generating a natural language description of a short video. Such a model is further improved by recurrent convolutional operations. The end of this part identifies fundamental challenges in video captioning and proposes a new type of evaluation metric that may be used experimentally as an oracle to benchmark performance. The second part studies the family of models that generate images. While the first part is supervised, this part is unsupervised. The focus of it is the popular family of Neural Autoregressive Density Estimators (NADEs), a tractable probabilistic model for natural images. This work first makes a connection between NADEs and Generative Stochastic Networks (GSNs). The standard NADE is improved by introducing multiple iterations in its inference without increasing the number of parameters, which is dubbed iterative NADE. With a historical view at the beginning, this work ends with a summary of recent development for work discussed in the first two parts around the central topic of learning visual representations for images and videos. A bright future is envisioned at the end

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

    Energy load forecast in smart buildings with deep learning techniques

    Get PDF
    Predicting energy load is a growing problem these days. The need to study in advance how electricity consumption will behave is key to resource management. Especially interesting is the case of the so-called Smart Buildings, buildings born from the trend towards sustainable development and consumption which is increasingly in vogue, becoming mandatory by law in many countries. One type of model that constitutes an important part of the state of the art are the models based on Deep Learning. These models represented great advances in Artificial Intelligence recently, since although they were born in the 20th century, it has not been until 10 years ago that they have re-emerged thanks to the computational advances that allow them to be trained by the general public. In this Final Degree Project, advanced Deep Learning techniques applied to the problem of load prediction in Smart Buildings are presented, mainly basing the development on the data from the Alice Perry building of the National University of Ireland Galway, in collaboration with the Informatics Research Unit for Sustainable Engineering of the same university. The datasets used were obtained from the time series of aggregated electricity consumption of the air handling units (AHUs) in the Alice Perry building. Along with this information, historical weather data were also collected from the weather station in the same building in order to study if these climatic variables help to a better prediction in the models. Time series prediction on this energy load data will be made in two different ways with hourly granularity: one-step prediction in which studying the previous observations an estimate of the value of the load in the next hour is obtained and sequence prediction, in which we will try to predict the behaviour of the series in the next hours from the previous values.La predicción de carga energética es un problema al alza actualmente. La necesidad de estudiar con antelación cómo se va a comportar el consumo eléctrico es clave para la gestión de recursos. Especialmente interesante es el caso de los llamados Smart Buildings, edificios nacidos por la tendencia hacia un desarrollo y consumo sostenible el cual cada vez está más en boga, llegando a ser obligatorio por ley en muchos países. Un tipo de modelos que constituyen una parte importante del estado del arte son los modelos basados en Deep Learning. Estos modelos supusieron grandes avances en la Inteligencia Artificial recientemente, ya que aunque nacidos en el Siglo XX, no ha sido hasta escasos 10 años cuando han resurgido gracias a los avances computacionales que permiten entrenarlos por el público general. En este trabajo de fin de grado se presentan técnicas avanzadas de Deep Learning aplicadas al problema de la predicción de carga en Smart Buildings, principalmente basando el desarrollo en los datos del edificio Alice Perry de la National University of Ireland Galway, en colaboración con el grupo Informatics Research Unit for Sustainable Engineering de la misma universidad. Los conjuntos de datos utilizados se obtuvieron datos sobre la serie temporal de consumo eléctrico agregado de los aires acondicionados en el edificio Alice Perry. Junto a esta información, se recopilaron también datos meteorológicos históricos de la estación meteorológica en el mismo edificio con el objetivo de estudiar si estas variables climáticas ayudan a una mejor predicción en los modelos. La predicción de series temporales sobre estos datos de carga energética se realizará en dos modos con granularidad horaria: La predicción a un paso en la que estudiando las observaciones anteriores se obtiene una estimación del valor de la carga en la próxima hora y predicción de secuencias, en la que se intentará predecir el comportamiento de la serie en las próximas horas a partir de los valores anteriores.Grado en Ingeniería Informátic

    Modeling and Optimization of Active Distribution Network Operation Based on Deep Learning

    Get PDF

    Applications of Machine Learning to Optimizing Polyolefin Manufacturing

    Full text link
    This chapter is a preprint from our book by , focusing on leveraging machine learning (ML) in chemical and polyolefin manufacturing optimization. It's crafted for both novices and seasoned professionals keen on the latest ML applications in chemical processes. We trace the evolution of AI and ML in chemical industries, delineate core ML components, and provide resources for ML beginners. A detailed discussion on various ML methods is presented, covering regression, classification, and unsupervised learning techniques, with performance metrics and examples. Ensemble methods, deep learning networks, including MLP, DNNs, RNNs, CNNs, and transformers, are explored for their growing role in chemical applications. Practical workshops guide readers through predictive modeling using advanced ML algorithms. The chapter culminates with insights into science-guided ML, advocating for a hybrid approach that enhances model accuracy. The extensive bibliography offers resources for further research and practical implementation. This chapter aims to be a thorough primer on ML's practical application in chemical engineering, particularly for polyolefin production, and sets the stage for continued learning in subsequent chapters. Please cite the original work [169,170] when referencing

    Data analytics for mobile traffic in 5G networks using machine learning techniques

    Get PDF
    This thesis collects the research works I pursued as Ph.D. candidate at the Universitat Politecnica de Catalunya (UPC). Most of the work has been accomplished at the Mobile Network Department Centre Tecnologic de Telecomunicacions de Catalunya (CTTC). The main topic of my research is the study of mobile network traffic through the analysis of operative networks dataset using machine learning techniques. Understanding first the actual network deployments is fundamental for next-generation network (5G) for improving the performance and Quality of Service (QoS) of the users. The work starts from the collection of a novel type of dataset, using an over-the-air monitoring tool, that allows to extract the control information from the radio-link channel, without harming the users’ identities. The subsequent analysis comprehends a statistical characterization of the traffic and the derivation of prediction models for the network traffic. A wide group of algorithms are implemented and compared, in order to identify the highest performances. Moreover, the thesis addresses a set of applications in the context mobile networks that are prerogatives in the future mobile networks. This includes the detection of urban anomalies, the user classification based on the demanded network services, the design of a proactive wake-up scheme for efficient-energy devices.Esta tesis recoge los trabajos de investigación que realicé como Ph.D. candidato a la Universitat Politecnica de Catalunya (UPC). La mayor parte del trabajo se ha realizado en el Centro Tecnológico de Telecomunicaciones de Catalunya (CTTC) del Departamento de Redes Móviles. El tema principal de mi investigación es el estudio del tráfico de la red móvil a través del análisis del conjunto de datos de redes operativas utilizando técnicas de aprendizaje automático. Comprender primero las implementaciones de red reales es fundamental para la red de próxima generación (5G) para mejorar el rendimiento y la calidad de servicio (QoS) de los usuarios. El trabajo comienza con la recopilación de un nuevo tipo de conjunto de datos, utilizando una herramienta de monitoreo por aire, que permite extraer la información de control del canal de radioenlace, sin dañar las identidades de los usuarios. El análisis posterior comprende una caracterización estadística del tráfico y la derivación de modelos de predicción para el tráfico de red. Se implementa y compara un amplio grupo de algoritmos para identificar los rendimientos más altos. Además, la tesis aborda un conjunto de aplicaciones en el contexto de redes móviles que son prerrogativas en las redes móviles futuras. Esto incluye la detección de anomalías urbanas, la clasificación de usuarios basada en los servicios de red demandados, el diseño de un esquema de activación proactiva para dispositivos de energía eficiente.Postprint (published version
    corecore