11,681 research outputs found
Recommended from our members
Ensuring Access to Safe and Nutritious Food for All Through the Transformation of Food Systems
Neural Architecture Search: Insights from 1000 Papers
In the past decade, advances in deep learning have resulted in breakthroughs
in a variety of areas, including computer vision, natural language
understanding, speech recognition, and reinforcement learning. Specialized,
high-performing neural architectures are crucial to the success of deep
learning in these areas. Neural architecture search (NAS), the process of
automating the design of neural architectures for a given task, is an
inevitable next step in automating machine learning and has already outpaced
the best human-designed architectures on many tasks. In the past few years,
research in NAS has been progressing rapidly, with over 1000 papers released
since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized
and comprehensive guide to neural architecture search. We give a taxonomy of
search spaces, algorithms, and speedup techniques, and we discuss resources
such as benchmarks, best practices, other surveys, and open-source libraries
Information-Theoretic GAN Compression with Variational Energy-based Model
We propose an information-theoretic knowledge distillation approach for the
compression of generative adversarial networks, which aims to maximize the
mutual information between teacher and student networks via a variational
optimization based on an energy-based model. Because the direct computation of
the mutual information in continuous domains is intractable, our approach
alternatively optimizes the student network by maximizing the variational lower
bound of the mutual information. To achieve a tight lower bound, we introduce
an energy-based model relying on a deep neural network to represent a flexible
variational distribution that deals with high-dimensional images and consider
spatial dependencies between pixels, effectively. Since the proposed method is
a generic optimization algorithm, it can be conveniently incorporated into
arbitrary generative adversarial networks and even dense prediction networks,
e.g., image enhancement models. We demonstrate that the proposed algorithm
achieves outstanding performance in model compression of generative adversarial
networks consistently when combined with several existing models.Comment: Accepted at Neurips202
Accurate and Interpretable Solution of the Inverse Rig for Realistic Blendshape Models with Quadratic Corrective Terms
We propose a new model-based algorithm solving the inverse rig problem in
facial animation retargeting, exhibiting higher accuracy of the fit and
sparser, more interpretable weight vector compared to SOTA. The proposed method
targets a specific subdomain of human face animation - highly-realistic
blendshape models used in the production of movies and video games. In this
paper, we formulate an optimization problem that takes into account all the
requirements of targeted models. Our objective goes beyond a linear blendshape
model and employs the quadratic corrective terms necessary for correctly
fitting fine details of the mesh. We show that the solution to the proposed
problem yields highly accurate mesh reconstruction even when general-purpose
solvers, like SQP, are used. The results obtained using SQP are highly accurate
in the mesh space but do not exhibit favorable qualities in terms of weight
sparsity and smoothness, and for this reason, we further propose a novel
algorithm relying on a MM technique. The algorithm is specifically suited for
solving the proposed objective, yielding a high-accuracy mesh fit while
respecting the constraints and producing a sparse and smooth set of weights
easy to manipulate and interpret by artists. Our algorithm is benchmarked with
SOTA approaches, and shows an overall superiority of the results, yielding a
smooth animation reconstruction with a relative improvement up to 45 percent in
root mean squared mesh error while keeping the cardinality comparable with
benchmark methods. This paper gives a comprehensive set of evaluation metrics
that cover different aspects of the solution, including mesh accuracy, sparsity
of the weights, and smoothness of the animation curves, as well as the
appearance of the produced animation, which human experts evaluated
Single Image Depth Prediction Made Better: A Multivariate Gaussian Take
Neural-network-based single image depth prediction (SIDP) is a challenging
task where the goal is to predict the scene's per-pixel depth at test time.
Since the problem, by definition, is ill-posed, the fundamental goal is to come
up with an approach that can reliably model the scene depth from a set of
training examples. In the pursuit of perfect depth estimation, most existing
state-of-the-art learning techniques predict a single scalar depth value
per-pixel. Yet, it is well-known that the trained model has accuracy limits and
can predict imprecise depth. Therefore, an SIDP approach must be mindful of the
expected depth variations in the model's prediction at test time. Accordingly,
we introduce an approach that performs continuous modeling of per-pixel depth,
where we can predict and reason about the per-pixel depth and its distribution.
To this end, we model per-pixel scene depth using a multivariate Gaussian
distribution. Moreover, contrary to the existing uncertainty modeling methods
-- in the same spirit, where per-pixel depth is assumed to be independent, we
introduce per-pixel covariance modeling that encodes its depth dependency w.r.t
all the scene points. Unfortunately, per-pixel depth covariance modeling leads
to a computationally expensive continuous loss function, which we solve
efficiently using the learned low-rank approximation of the overall covariance
matrix. Notably, when tested on benchmark datasets such as KITTI, NYU, and
SUN-RGB-D, the SIDP model obtained by optimizing our loss function shows
state-of-the-art results. Our method's accuracy (named MG) is among the top on
the KITTI depth-prediction benchmark leaderboard.Comment: Accepted to IEEE/CVF CVPR 2023. Draft info: 17 pages, 13 Figures, 9
Table
Machine Learning Research Trends in Africa: A 30 Years Overview with Bibliometric Analysis Review
In this paper, a critical bibliometric analysis study is conducted, coupled
with an extensive literature survey on recent developments and associated
applications in machine learning research with a perspective on Africa. The
presented bibliometric analysis study consists of 2761 machine learning-related
documents, of which 98% were articles with at least 482 citations published in
903 journals during the past 30 years. Furthermore, the collated documents were
retrieved from the Science Citation Index EXPANDED, comprising research
publications from 54 African countries between 1993 and 2021. The bibliometric
study shows the visualization of the current landscape and future trends in
machine learning research and its application to facilitate future
collaborative research and knowledge exchange among authors from different
research institutions scattered across the African continent
Decoding spatial location of attended audio-visual stimulus with EEG and fNIRS
When analyzing complex scenes, humans often focus their attention on an object at a particular spatial location in the presence of background noises and irrelevant visual objects. The ability to decode the attended spatial location would facilitate brain computer interfaces (BCI) for complex scene analysis. Here, we tested two different neuroimaging technologies and investigated their capability to decode audio-visual spatial attention in the presence of competing stimuli from multiple locations. For functional near-infrared spectroscopy (fNIRS), we targeted dorsal frontoparietal network including frontal eye field (FEF) and intra-parietal sulcus (IPS) as well as superior temporal gyrus/planum temporal (STG/PT). They all were shown in previous functional magnetic resonance imaging (fMRI) studies to be activated by auditory, visual, or audio-visual spatial tasks. We found that fNIRS provides robust decoding of attended spatial locations for most participants and correlates with behavioral performance. Moreover, we found that FEF makes a large contribution to decoding performance. Surprisingly, the performance was significantly above chance level 1s after cue onset, which is well before the peak of the fNIRS response.
For electroencephalography (EEG), while there are several successful EEG-based algorithms, to date, all of them focused exclusively on auditory modality where eye-related artifacts are minimized or controlled. Successful integration into a more ecological typical usage requires careful consideration for eye-related artifacts which are inevitable. We showed that fast and reliable decoding can be done with or without ocular-removal algorithm. Our results show that EEG and fNIRS are promising platforms for compact, wearable technologies that could be applied to decode attended spatial location and reveal contributions of specific brain regions during complex scene analysis
Learning disentangled speech representations
A variety of informational factors are contained within the speech signal and a single short recording of speech reveals much more than the spoken words. The best method to extract and represent informational factors from the speech signal ultimately depends on which informational factors are desired and how they will be used. In addition, sometimes methods will capture more than one informational factor at the same time such as speaker identity, spoken content, and speaker prosody.
The goal of this dissertation is to explore different ways to deconstruct the speech signal into abstract representations that can be learned and later reused in various speech technology tasks. This task of deconstructing, also known as disentanglement, is a form of distributed representation learning. As a general approach to disentanglement, there are some guiding principles that elaborate what a learned representation should contain as well as how it should function. In particular, learned representations should contain all of the requisite information in a more compact manner, be interpretable, remove nuisance factors of irrelevant information, be useful in downstream tasks, and independent of the task at hand. The learned representations should also be able to answer counter-factual questions.
In some cases, learned speech representations can be re-assembled in different ways according to the requirements of downstream applications. For example, in a voice conversion task, the speech content is retained while the speaker identity is changed. And in a content-privacy task, some targeted content may be concealed without affecting how surrounding words sound. While there is no single-best method to disentangle all types of factors, some end-to-end approaches demonstrate a promising degree of generalization to diverse speech tasks.
This thesis explores a variety of use-cases for disentangled representations including phone recognition, speaker diarization, linguistic code-switching, voice conversion, and content-based privacy masking. Speech representations can also be utilised for automatically assessing the quality and authenticity of speech, such as automatic MOS ratings or detecting deep fakes. The meaning of the term "disentanglement" is not well defined in previous work, and it has acquired several meanings depending on the domain (e.g. image vs. speech). Sometimes the term "disentanglement" is used interchangeably with the term "factorization". This thesis proposes that disentanglement of speech is distinct, and offers a viewpoint of disentanglement that can be considered both theoretically and practically
Gasificação direta de biomassa para produção de gás combustível
The excessive consumption of fossil fuels to satisfy the world necessities of
energy and commodities led to the emission of large amounts of greenhouse
gases in the last decades, contributing significantly to the greatest
environmental threat of the 21st century: Climate Change. The answer to this
man-made disaster is not simple and can only be made if distinct stakeholders
and governments are brought to cooperate and work together. This is
mandatory if we want to change our economy to one more sustainable and
based in renewable materials, and whose energy is provided by the eternal
nature energies (e.g., wind, solar). In this regard, biomass can have a main role
as an adjustable and renewable feedstock that allows the replacement of fossil
fuels in various applications, and the conversion by gasification allows the
necessary flexibility for that purpose. In fact, fossil fuels are just biomass that
underwent extreme pressures and heat for millions of years. Furthermore,
biomass is a resource that, if not used or managed, increases wildfire risks.
Consequently, we also have the obligation of valorizing and using this
resource.
In this work, it was obtained new scientific knowledge to support the
development of direct (air) gasification of biomass in bubbling fluidized bed
reactors to obtain a fuel gas with suitable properties to replace natural gas in
industrial gas burners. This is the first step for the integration and development
of gasification-based biorefineries, which will produce a diverse number of
value-added products from biomass and compete with current petrochemical
refineries in the future. In this regard, solutions for the improvement of the raw
producer gas quality and process efficiency parameters were defined and
analyzed. First, addition of superheated steam as primary measure allowed the
increase of H2 concentration and H2/CO molar ratio in the producer gas without
compromising the stability of the process. However, the measure mainly
showed potential for the direct (air) gasification of high-density biomass (e.g.,
pellets), due to the necessity of having char accumulation in the reactor bottom
bed for char-steam reforming reactions. Secondly, addition of refused derived
fuel to the biomass feedstock led to enhanced gasification products, revealing
itself as a highly promising strategy in terms of economic viability and
environmental benefits of future gasification-based biorefineries, due to the
high availability and low costs of wastes. Nevertheless, integrated techno economic and life cycle analyses must be performed to fully characterize the
process. Thirdly, application of low-cost catalyst as primary measure revealed
potential by allowing the improvement of the producer gas quality (e.g., H2 and
CO concentration, lower heating value) and process efficiency parameters with
distinct solid materials; particularly, the application of concrete, synthetic
fayalite and wood pellets chars, showed promising results. Finally, the
economic viability of the integration of direct (air) biomass gasification
processes in the pulp and paper industry was also shown, despite still lacking
interest to potential investors. In this context, the role of government policies
and appropriate economic instruments are of major relevance to increase the
implementation of these projects.O consumo excessivo de combustíveis fósseis para garantir as necessidades e
interesses da sociedade conduziu à emissão de elevadas quantidades de
gases com efeito de estufa nas últimas décadas, contribuindo
significativamente para a maior ameaça ambiental do século XXI: Alterações
Climáticas. A solução para este desastre de origem humana é de caráter
complexo e só pode ser atingida através da cooperação de todos os governos
e partes interessadas. Para isto, é obrigatória a criação de uma bioeconomia
como base de um futuro mais sustentável, cujas necessidades energéticas e
materiais sejam garantidas pelas eternas energias da natureza (e.g., vento,
sol). Neste sentido, a biomassa pode ter um papel principal como uma matéria prima ajustável e renovável que permite a substituição de combustíveis fósseis
num variado número de aplicações, e a sua conversão através da gasificação
pode ser a chave para este propósito. Afinal, na prática, os combustíveis
fósseis são apenas biomassa sujeita a elevada temperatura e pressão durante
milhões de anos. Além do mais, a gestão eficaz da biomassa é fundamental
para a redução dos riscos de incêndio florestal e, como tal, temos o dever de
utilizar e valorizar este recurso.
Neste trabalho, foi obtido novo conhecimento científico para suporte do
desenvolvimento das tecnologias de gasificação direta (ar) de biomassa em
leitos fluidizados borbulhantes para produção de gás combustível, com o
objetivo da substituição de gás natural em queimadores industriais. Este é o
primeiro passo para o desenvolvimento de biorrefinarias de gasificação, uma
potencial futura indústria que irá providenciar um variado número de produtos
de valor acrescentado através da biomassa e competir com a atual indústria
petroquímica. Neste sentido, foram analisadas várias medidas para a melhoria
da qualidade do gás produto bruto e dos parâmetros de eficiência do processo.
Em primeiro, a adição de vapor sobreaquecido como medida primária permitiu
o aumento da concentração de H2 e da razão molar H2/CO no gás produto sem
comprometer a estabilidade do processo. No entanto, esta medida somente
revelou potencial para a gasificação direta (ar) de biomassa de alta densidade
(e.g., pellets) devido à necessidade da acumulação de carbonizados no leito
do reator para a ocorrência de reações de reforma com vapor. Em segundo, a
mistura de combustíveis derivados de resíduos e biomassa residual florestal
permitiu a melhoria dos produtos de gasificação, constituindo desta forma uma
estratégia bastante promissora a nível económico e ambiental, devido à
elevada abundância e baixo custo dos resíduos urbanos. Contudo, devem ser
efetuadas análises técnico-económicas e de ciclo de vida para a completa
caraterização do processo. Em terceiro, a aplicação de catalisadores de baixo
custo como medida primária demonstrou elevado potencial para a melhoria do
gás produto (e.g., concentração de H2 e CO, poder calorífico inferior) e para o
incremento dos parâmetros de eficiência do processo; em particular, a
aplicação de betão, faialite sintética e carbonizados de pellets de madeira,
demonstrou resultados promissores. Finalmente, foi demonstrada a viabilidade
económica da integração do processo de gasificação direta (ar) de biomassa
na indústria da pasta e papel, apesar dos parâmetros determinados não serem
atrativos para potenciais investidores. Neste contexto, a intervenção dos
governos e o desenvolvimento de instrumentos de apoio económico é de
grande relevância para a implementação destes projetos.Este trabalho foi financiado pela The Navigator Company e por Fundos Nacionais através da Fundação para a Ciência e a Tecnologia (FCT).Programa Doutoral em Engenharia da Refinação, Petroquímica e Químic
Modeling Uncertainty for Reliable Probabilistic Modeling in Deep Learning and Beyond
[ES] Esta tesis se enmarca en la intersección entre las técnicas modernas de Machine Learning, como las Redes Neuronales Profundas, y el modelado probabilístico confiable. En muchas aplicaciones, no solo nos importa la predicción hecha por un modelo (por ejemplo esta imagen de pulmón presenta cáncer) sino también la confianza que tiene el modelo para hacer esta predicción (por ejemplo esta imagen de pulmón presenta cáncer con 67% probabilidad). En tales aplicaciones, el modelo ayuda al tomador de decisiones (en este caso un médico) a tomar la decisión final. Como consecuencia, es necesario que las probabilidades proporcionadas por un modelo reflejen las proporciones reales presentes en el conjunto al que se ha asignado dichas probabilidades; de lo contrario, el modelo es inútil en la práctica. Cuando esto sucede, decimos que un modelo está perfectamente calibrado.
En esta tesis se exploran tres vias para proveer modelos más calibrados. Primero se muestra como calibrar modelos de manera implicita, que son descalibrados por técnicas de aumentación de datos. Se introduce una función de coste que resuelve esta descalibración tomando como partida las ideas derivadas de la toma de decisiones con la regla de Bayes. Segundo, se muestra como calibrar modelos utilizando una etapa de post calibración implementada con una red neuronal Bayesiana. Finalmente, y en base a las limitaciones estudiadas en la red neuronal Bayesiana, que hipotetizamos que se basan en un prior mispecificado, se introduce un nuevo proceso estocástico que sirve como distribución a priori en un problema de inferencia Bayesiana.[CA] Aquesta tesi s'emmarca en la intersecció entre les tècniques modernes de Machine Learning, com ara les Xarxes Neuronals Profundes, i el modelatge probabilístic fiable. En moltes aplicacions, no només ens importa la predicció feta per un model (per ejemplem aquesta imatge de pulmó presenta càncer) sinó també la confiança que té el model per fer aquesta predicció (per exemple aquesta imatge de pulmó presenta càncer amb 67% probabilitat). En aquestes aplicacions, el model ajuda el prenedor de decisions (en aquest cas un metge) a prendre la decisió final. Com a conseqüència, cal que les probabilitats proporcionades per un model reflecteixin les proporcions reals presents en el conjunt a què s'han assignat aquestes probabilitats; altrament, el model és inútil a la pràctica. Quan això passa, diem que un model està perfectament calibrat.
En aquesta tesi s'exploren tres vies per proveir models més calibrats. Primer es mostra com calibrar models de manera implícita, que són descalibrats per tècniques d'augmentació de dades. S'introdueix una funció de cost que resol aquesta descalibració prenent com a partida les idees derivades de la presa de decisions amb la regla de Bayes. Segon, es mostra com calibrar models utilitzant una etapa de post calibratge implementada amb una xarxa neuronal Bayesiana. Finalment, i segons les limitacions estudiades a la xarxa neuronal Bayesiana, que es basen en un prior mispecificat, s'introdueix un nou procés estocàstic que serveix com a distribució a priori en un problema d'inferència Bayesiana.[EN] This thesis is framed at the intersection between modern Machine Learning techniques, such as Deep Neural Networks, and reliable probabilistic modeling. In many machine learning applications, we do not only care about the prediction made by a model (e.g. this lung image presents cancer) but also in how confident is the model in making this prediction (e.g. this lung image presents cancer with 67% probability). In such applications, the model assists the decision-maker (in this case a doctor) towards making the final decision. As a consequence, one needs that the probabilities provided by a model reflects the true underlying set of outcomes, otherwise the model is useless in practice. When this happens, we say that a model is perfectly calibrated.
In this thesis three ways are explored to provide more calibrated models. First, it is shown how to calibrate models implicitly, which are decalibrated by data augmentation techniques. A cost function is introduced that solves this decalibration taking as a starting point the ideas derived from decision making with Bayes' rule. Second, it shows how to calibrate models using a post-calibration stage implemented with a Bayesian neural network. Finally, and based on the limitations studied in the Bayesian neural network, which we hypothesize that came from a mispecified prior, a new stochastic process is introduced that serves as a priori distribution in a Bayesian inference problem.Maroñas Molano, J. (2022). Modeling Uncertainty for Reliable Probabilistic Modeling in Deep Learning and Beyond [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/181582TESI
- …