11,681 research outputs found

    Neural Architecture Search: Insights from 1000 Papers

    Full text link
    In the past decade, advances in deep learning have resulted in breakthroughs in a variety of areas, including computer vision, natural language understanding, speech recognition, and reinforcement learning. Specialized, high-performing neural architectures are crucial to the success of deep learning in these areas. Neural architecture search (NAS), the process of automating the design of neural architectures for a given task, is an inevitable next step in automating machine learning and has already outpaced the best human-designed architectures on many tasks. In the past few years, research in NAS has been progressing rapidly, with over 1000 papers released since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized and comprehensive guide to neural architecture search. We give a taxonomy of search spaces, algorithms, and speedup techniques, and we discuss resources such as benchmarks, best practices, other surveys, and open-source libraries

    Information-Theoretic GAN Compression with Variational Energy-based Model

    Full text link
    We propose an information-theoretic knowledge distillation approach for the compression of generative adversarial networks, which aims to maximize the mutual information between teacher and student networks via a variational optimization based on an energy-based model. Because the direct computation of the mutual information in continuous domains is intractable, our approach alternatively optimizes the student network by maximizing the variational lower bound of the mutual information. To achieve a tight lower bound, we introduce an energy-based model relying on a deep neural network to represent a flexible variational distribution that deals with high-dimensional images and consider spatial dependencies between pixels, effectively. Since the proposed method is a generic optimization algorithm, it can be conveniently incorporated into arbitrary generative adversarial networks and even dense prediction networks, e.g., image enhancement models. We demonstrate that the proposed algorithm achieves outstanding performance in model compression of generative adversarial networks consistently when combined with several existing models.Comment: Accepted at Neurips202

    Accurate and Interpretable Solution of the Inverse Rig for Realistic Blendshape Models with Quadratic Corrective Terms

    Full text link
    We propose a new model-based algorithm solving the inverse rig problem in facial animation retargeting, exhibiting higher accuracy of the fit and sparser, more interpretable weight vector compared to SOTA. The proposed method targets a specific subdomain of human face animation - highly-realistic blendshape models used in the production of movies and video games. In this paper, we formulate an optimization problem that takes into account all the requirements of targeted models. Our objective goes beyond a linear blendshape model and employs the quadratic corrective terms necessary for correctly fitting fine details of the mesh. We show that the solution to the proposed problem yields highly accurate mesh reconstruction even when general-purpose solvers, like SQP, are used. The results obtained using SQP are highly accurate in the mesh space but do not exhibit favorable qualities in terms of weight sparsity and smoothness, and for this reason, we further propose a novel algorithm relying on a MM technique. The algorithm is specifically suited for solving the proposed objective, yielding a high-accuracy mesh fit while respecting the constraints and producing a sparse and smooth set of weights easy to manipulate and interpret by artists. Our algorithm is benchmarked with SOTA approaches, and shows an overall superiority of the results, yielding a smooth animation reconstruction with a relative improvement up to 45 percent in root mean squared mesh error while keeping the cardinality comparable with benchmark methods. This paper gives a comprehensive set of evaluation metrics that cover different aspects of the solution, including mesh accuracy, sparsity of the weights, and smoothness of the animation curves, as well as the appearance of the produced animation, which human experts evaluated

    Single Image Depth Prediction Made Better: A Multivariate Gaussian Take

    Full text link
    Neural-network-based single image depth prediction (SIDP) is a challenging task where the goal is to predict the scene's per-pixel depth at test time. Since the problem, by definition, is ill-posed, the fundamental goal is to come up with an approach that can reliably model the scene depth from a set of training examples. In the pursuit of perfect depth estimation, most existing state-of-the-art learning techniques predict a single scalar depth value per-pixel. Yet, it is well-known that the trained model has accuracy limits and can predict imprecise depth. Therefore, an SIDP approach must be mindful of the expected depth variations in the model's prediction at test time. Accordingly, we introduce an approach that performs continuous modeling of per-pixel depth, where we can predict and reason about the per-pixel depth and its distribution. To this end, we model per-pixel scene depth using a multivariate Gaussian distribution. Moreover, contrary to the existing uncertainty modeling methods -- in the same spirit, where per-pixel depth is assumed to be independent, we introduce per-pixel covariance modeling that encodes its depth dependency w.r.t all the scene points. Unfortunately, per-pixel depth covariance modeling leads to a computationally expensive continuous loss function, which we solve efficiently using the learned low-rank approximation of the overall covariance matrix. Notably, when tested on benchmark datasets such as KITTI, NYU, and SUN-RGB-D, the SIDP model obtained by optimizing our loss function shows state-of-the-art results. Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.Comment: Accepted to IEEE/CVF CVPR 2023. Draft info: 17 pages, 13 Figures, 9 Table

    Machine Learning Research Trends in Africa: A 30 Years Overview with Bibliometric Analysis Review

    Full text link
    In this paper, a critical bibliometric analysis study is conducted, coupled with an extensive literature survey on recent developments and associated applications in machine learning research with a perspective on Africa. The presented bibliometric analysis study consists of 2761 machine learning-related documents, of which 98% were articles with at least 482 citations published in 903 journals during the past 30 years. Furthermore, the collated documents were retrieved from the Science Citation Index EXPANDED, comprising research publications from 54 African countries between 1993 and 2021. The bibliometric study shows the visualization of the current landscape and future trends in machine learning research and its application to facilitate future collaborative research and knowledge exchange among authors from different research institutions scattered across the African continent

    Decoding spatial location of attended audio-visual stimulus with EEG and fNIRS

    Get PDF
    When analyzing complex scenes, humans often focus their attention on an object at a particular spatial location in the presence of background noises and irrelevant visual objects. The ability to decode the attended spatial location would facilitate brain computer interfaces (BCI) for complex scene analysis. Here, we tested two different neuroimaging technologies and investigated their capability to decode audio-visual spatial attention in the presence of competing stimuli from multiple locations. For functional near-infrared spectroscopy (fNIRS), we targeted dorsal frontoparietal network including frontal eye field (FEF) and intra-parietal sulcus (IPS) as well as superior temporal gyrus/planum temporal (STG/PT). They all were shown in previous functional magnetic resonance imaging (fMRI) studies to be activated by auditory, visual, or audio-visual spatial tasks. We found that fNIRS provides robust decoding of attended spatial locations for most participants and correlates with behavioral performance. Moreover, we found that FEF makes a large contribution to decoding performance. Surprisingly, the performance was significantly above chance level 1s after cue onset, which is well before the peak of the fNIRS response. For electroencephalography (EEG), while there are several successful EEG-based algorithms, to date, all of them focused exclusively on auditory modality where eye-related artifacts are minimized or controlled. Successful integration into a more ecological typical usage requires careful consideration for eye-related artifacts which are inevitable. We showed that fast and reliable decoding can be done with or without ocular-removal algorithm. Our results show that EEG and fNIRS are promising platforms for compact, wearable technologies that could be applied to decode attended spatial location and reveal contributions of specific brain regions during complex scene analysis

    Learning disentangled speech representations

    Get PDF
    A variety of informational factors are contained within the speech signal and a single short recording of speech reveals much more than the spoken words. The best method to extract and represent informational factors from the speech signal ultimately depends on which informational factors are desired and how they will be used. In addition, sometimes methods will capture more than one informational factor at the same time such as speaker identity, spoken content, and speaker prosody. The goal of this dissertation is to explore different ways to deconstruct the speech signal into abstract representations that can be learned and later reused in various speech technology tasks. This task of deconstructing, also known as disentanglement, is a form of distributed representation learning. As a general approach to disentanglement, there are some guiding principles that elaborate what a learned representation should contain as well as how it should function. In particular, learned representations should contain all of the requisite information in a more compact manner, be interpretable, remove nuisance factors of irrelevant information, be useful in downstream tasks, and independent of the task at hand. The learned representations should also be able to answer counter-factual questions. In some cases, learned speech representations can be re-assembled in different ways according to the requirements of downstream applications. For example, in a voice conversion task, the speech content is retained while the speaker identity is changed. And in a content-privacy task, some targeted content may be concealed without affecting how surrounding words sound. While there is no single-best method to disentangle all types of factors, some end-to-end approaches demonstrate a promising degree of generalization to diverse speech tasks. This thesis explores a variety of use-cases for disentangled representations including phone recognition, speaker diarization, linguistic code-switching, voice conversion, and content-based privacy masking. Speech representations can also be utilised for automatically assessing the quality and authenticity of speech, such as automatic MOS ratings or detecting deep fakes. The meaning of the term "disentanglement" is not well defined in previous work, and it has acquired several meanings depending on the domain (e.g. image vs. speech). Sometimes the term "disentanglement" is used interchangeably with the term "factorization". This thesis proposes that disentanglement of speech is distinct, and offers a viewpoint of disentanglement that can be considered both theoretically and practically

    Gasificação direta de biomassa para produção de gás combustível

    Get PDF
    The excessive consumption of fossil fuels to satisfy the world necessities of energy and commodities led to the emission of large amounts of greenhouse gases in the last decades, contributing significantly to the greatest environmental threat of the 21st century: Climate Change. The answer to this man-made disaster is not simple and can only be made if distinct stakeholders and governments are brought to cooperate and work together. This is mandatory if we want to change our economy to one more sustainable and based in renewable materials, and whose energy is provided by the eternal nature energies (e.g., wind, solar). In this regard, biomass can have a main role as an adjustable and renewable feedstock that allows the replacement of fossil fuels in various applications, and the conversion by gasification allows the necessary flexibility for that purpose. In fact, fossil fuels are just biomass that underwent extreme pressures and heat for millions of years. Furthermore, biomass is a resource that, if not used or managed, increases wildfire risks. Consequently, we also have the obligation of valorizing and using this resource. In this work, it was obtained new scientific knowledge to support the development of direct (air) gasification of biomass in bubbling fluidized bed reactors to obtain a fuel gas with suitable properties to replace natural gas in industrial gas burners. This is the first step for the integration and development of gasification-based biorefineries, which will produce a diverse number of value-added products from biomass and compete with current petrochemical refineries in the future. In this regard, solutions for the improvement of the raw producer gas quality and process efficiency parameters were defined and analyzed. First, addition of superheated steam as primary measure allowed the increase of H2 concentration and H2/CO molar ratio in the producer gas without compromising the stability of the process. However, the measure mainly showed potential for the direct (air) gasification of high-density biomass (e.g., pellets), due to the necessity of having char accumulation in the reactor bottom bed for char-steam reforming reactions. Secondly, addition of refused derived fuel to the biomass feedstock led to enhanced gasification products, revealing itself as a highly promising strategy in terms of economic viability and environmental benefits of future gasification-based biorefineries, due to the high availability and low costs of wastes. Nevertheless, integrated techno economic and life cycle analyses must be performed to fully characterize the process. Thirdly, application of low-cost catalyst as primary measure revealed potential by allowing the improvement of the producer gas quality (e.g., H2 and CO concentration, lower heating value) and process efficiency parameters with distinct solid materials; particularly, the application of concrete, synthetic fayalite and wood pellets chars, showed promising results. Finally, the economic viability of the integration of direct (air) biomass gasification processes in the pulp and paper industry was also shown, despite still lacking interest to potential investors. In this context, the role of government policies and appropriate economic instruments are of major relevance to increase the implementation of these projects.O consumo excessivo de combustíveis fósseis para garantir as necessidades e interesses da sociedade conduziu à emissão de elevadas quantidades de gases com efeito de estufa nas últimas décadas, contribuindo significativamente para a maior ameaça ambiental do século XXI: Alterações Climáticas. A solução para este desastre de origem humana é de caráter complexo e só pode ser atingida através da cooperação de todos os governos e partes interessadas. Para isto, é obrigatória a criação de uma bioeconomia como base de um futuro mais sustentável, cujas necessidades energéticas e materiais sejam garantidas pelas eternas energias da natureza (e.g., vento, sol). Neste sentido, a biomassa pode ter um papel principal como uma matéria prima ajustável e renovável que permite a substituição de combustíveis fósseis num variado número de aplicações, e a sua conversão através da gasificação pode ser a chave para este propósito. Afinal, na prática, os combustíveis fósseis são apenas biomassa sujeita a elevada temperatura e pressão durante milhões de anos. Além do mais, a gestão eficaz da biomassa é fundamental para a redução dos riscos de incêndio florestal e, como tal, temos o dever de utilizar e valorizar este recurso. Neste trabalho, foi obtido novo conhecimento científico para suporte do desenvolvimento das tecnologias de gasificação direta (ar) de biomassa em leitos fluidizados borbulhantes para produção de gás combustível, com o objetivo da substituição de gás natural em queimadores industriais. Este é o primeiro passo para o desenvolvimento de biorrefinarias de gasificação, uma potencial futura indústria que irá providenciar um variado número de produtos de valor acrescentado através da biomassa e competir com a atual indústria petroquímica. Neste sentido, foram analisadas várias medidas para a melhoria da qualidade do gás produto bruto e dos parâmetros de eficiência do processo. Em primeiro, a adição de vapor sobreaquecido como medida primária permitiu o aumento da concentração de H2 e da razão molar H2/CO no gás produto sem comprometer a estabilidade do processo. No entanto, esta medida somente revelou potencial para a gasificação direta (ar) de biomassa de alta densidade (e.g., pellets) devido à necessidade da acumulação de carbonizados no leito do reator para a ocorrência de reações de reforma com vapor. Em segundo, a mistura de combustíveis derivados de resíduos e biomassa residual florestal permitiu a melhoria dos produtos de gasificação, constituindo desta forma uma estratégia bastante promissora a nível económico e ambiental, devido à elevada abundância e baixo custo dos resíduos urbanos. Contudo, devem ser efetuadas análises técnico-económicas e de ciclo de vida para a completa caraterização do processo. Em terceiro, a aplicação de catalisadores de baixo custo como medida primária demonstrou elevado potencial para a melhoria do gás produto (e.g., concentração de H2 e CO, poder calorífico inferior) e para o incremento dos parâmetros de eficiência do processo; em particular, a aplicação de betão, faialite sintética e carbonizados de pellets de madeira, demonstrou resultados promissores. Finalmente, foi demonstrada a viabilidade económica da integração do processo de gasificação direta (ar) de biomassa na indústria da pasta e papel, apesar dos parâmetros determinados não serem atrativos para potenciais investidores. Neste contexto, a intervenção dos governos e o desenvolvimento de instrumentos de apoio económico é de grande relevância para a implementação destes projetos.Este trabalho foi financiado pela The Navigator Company e por Fundos Nacionais através da Fundação para a Ciência e a Tecnologia (FCT).Programa Doutoral em Engenharia da Refinação, Petroquímica e Químic

    Modeling Uncertainty for Reliable Probabilistic Modeling in Deep Learning and Beyond

    Full text link
    [ES] Esta tesis se enmarca en la intersección entre las técnicas modernas de Machine Learning, como las Redes Neuronales Profundas, y el modelado probabilístico confiable. En muchas aplicaciones, no solo nos importa la predicción hecha por un modelo (por ejemplo esta imagen de pulmón presenta cáncer) sino también la confianza que tiene el modelo para hacer esta predicción (por ejemplo esta imagen de pulmón presenta cáncer con 67% probabilidad). En tales aplicaciones, el modelo ayuda al tomador de decisiones (en este caso un médico) a tomar la decisión final. Como consecuencia, es necesario que las probabilidades proporcionadas por un modelo reflejen las proporciones reales presentes en el conjunto al que se ha asignado dichas probabilidades; de lo contrario, el modelo es inútil en la práctica. Cuando esto sucede, decimos que un modelo está perfectamente calibrado. En esta tesis se exploran tres vias para proveer modelos más calibrados. Primero se muestra como calibrar modelos de manera implicita, que son descalibrados por técnicas de aumentación de datos. Se introduce una función de coste que resuelve esta descalibración tomando como partida las ideas derivadas de la toma de decisiones con la regla de Bayes. Segundo, se muestra como calibrar modelos utilizando una etapa de post calibración implementada con una red neuronal Bayesiana. Finalmente, y en base a las limitaciones estudiadas en la red neuronal Bayesiana, que hipotetizamos que se basan en un prior mispecificado, se introduce un nuevo proceso estocástico que sirve como distribución a priori en un problema de inferencia Bayesiana.[CA] Aquesta tesi s'emmarca en la intersecció entre les tècniques modernes de Machine Learning, com ara les Xarxes Neuronals Profundes, i el modelatge probabilístic fiable. En moltes aplicacions, no només ens importa la predicció feta per un model (per ejemplem aquesta imatge de pulmó presenta càncer) sinó també la confiança que té el model per fer aquesta predicció (per exemple aquesta imatge de pulmó presenta càncer amb 67% probabilitat). En aquestes aplicacions, el model ajuda el prenedor de decisions (en aquest cas un metge) a prendre la decisió final. Com a conseqüència, cal que les probabilitats proporcionades per un model reflecteixin les proporcions reals presents en el conjunt a què s'han assignat aquestes probabilitats; altrament, el model és inútil a la pràctica. Quan això passa, diem que un model està perfectament calibrat. En aquesta tesi s'exploren tres vies per proveir models més calibrats. Primer es mostra com calibrar models de manera implícita, que són descalibrats per tècniques d'augmentació de dades. S'introdueix una funció de cost que resol aquesta descalibració prenent com a partida les idees derivades de la presa de decisions amb la regla de Bayes. Segon, es mostra com calibrar models utilitzant una etapa de post calibratge implementada amb una xarxa neuronal Bayesiana. Finalment, i segons les limitacions estudiades a la xarxa neuronal Bayesiana, que es basen en un prior mispecificat, s'introdueix un nou procés estocàstic que serveix com a distribució a priori en un problema d'inferència Bayesiana.[EN] This thesis is framed at the intersection between modern Machine Learning techniques, such as Deep Neural Networks, and reliable probabilistic modeling. In many machine learning applications, we do not only care about the prediction made by a model (e.g. this lung image presents cancer) but also in how confident is the model in making this prediction (e.g. this lung image presents cancer with 67% probability). In such applications, the model assists the decision-maker (in this case a doctor) towards making the final decision. As a consequence, one needs that the probabilities provided by a model reflects the true underlying set of outcomes, otherwise the model is useless in practice. When this happens, we say that a model is perfectly calibrated. In this thesis three ways are explored to provide more calibrated models. First, it is shown how to calibrate models implicitly, which are decalibrated by data augmentation techniques. A cost function is introduced that solves this decalibration taking as a starting point the ideas derived from decision making with Bayes' rule. Second, it shows how to calibrate models using a post-calibration stage implemented with a Bayesian neural network. Finally, and based on the limitations studied in the Bayesian neural network, which we hypothesize that came from a mispecified prior, a new stochastic process is introduced that serves as a priori distribution in a Bayesian inference problem.Maroñas Molano, J. (2022). Modeling Uncertainty for Reliable Probabilistic Modeling in Deep Learning and Beyond [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/181582TESI
    corecore