464 research outputs found

    Connecting mathematical models for image processing and neural networks

    Get PDF
    This thesis deals with the connections between mathematical models for image processing and deep learning. While data-driven deep learning models such as neural networks are flexible and well performing, they are often used as a black box. This makes it hard to provide theoretical model guarantees and scientific insights. On the other hand, more traditional, model-driven approaches such as diffusion, wavelet shrinkage, and variational models offer a rich set of mathematical foundations. Our goal is to transfer these foundations to neural networks. To this end, we pursue three strategies. First, we design trainable variants of traditional models and reduce their parameter set after training to obtain transparent and adaptive models. Moreover, we investigate the architectural design of numerical solvers for partial differential equations and translate them into building blocks of popular neural network architectures. This yields criteria for stable networks and inspires novel design concepts. Lastly, we present novel hybrid models for inpainting that rely on our theoretical findings. These strategies provide three ways for combining the best of the two worlds of model- and data-driven approaches. Our work contributes to the overarching goal of closing the gap between these worlds that still exists in performance and understanding.Gegenstand dieser Arbeit sind die ZusammenhĂ€nge zwischen mathematischen Modellen zur Bildverarbeitung und Deep Learning. WĂ€hrend datengetriebene Modelle des Deep Learning wie z.B. neuronale Netze flexibel sind und gute Ergebnisse liefern, werden sie oft als Black Box eingesetzt. Das macht es schwierig, theoretische Modellgarantien zu liefern und wissenschaftliche Erkenntnisse zu gewinnen. Im Gegensatz dazu bieten traditionellere, modellgetriebene AnsĂ€tze wie Diffusion, Wavelet Shrinkage und VariationsansĂ€tze eine FĂŒlle von mathematischen Grundlagen. Unser Ziel ist es, diese auf neuronale Netze zu ĂŒbertragen. Zu diesem Zweck verfolgen wir drei Strategien. ZunĂ€chst entwerfen wir trainierbare Varianten von traditionellen Modellen und reduzieren ihren Parametersatz, um transparente und adaptive Modelle zu erhalten. Außerdem untersuchen wir die Architekturen von numerischen Lösern fĂŒr partielle Differentialgleichungen und ĂŒbersetzen sie in Bausteine von populĂ€ren neuronalen Netzwerken. Daraus ergeben sich Kriterien fĂŒr stabile Netzwerke und neue Designkonzepte. Schließlich prĂ€sentieren wir neuartige hybride Modelle fĂŒr Inpainting, die auf unseren theoretischen Erkenntnissen beruhen. Diese Strategien bieten drei Möglichkeiten, das Beste aus den beiden Welten der modell- und datengetriebenen AnsĂ€tzen zu vereinen. Diese Arbeit liefert einen Beitrag zum ĂŒbergeordneten Ziel, die LĂŒcke zwischen den zwei Welten zu schließen, die noch in Bezug auf Leistung und ModellverstĂ€ndnis besteht.ERC Advanced Grant INCOVI

    Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks

    Full text link
    Learning feature representations of geographical space is vital for any machine learning model that integrates geolocated data, spanning application domains such as remote sensing, ecology, or epidemiology. Recent work mostly embeds coordinates using sine and cosine projections based on Double Fourier Sphere (DFS) features -- these embeddings assume a rectangular data domain even on global data, which can lead to artifacts, especially at the poles. At the same time, relatively little attention has been paid to the exact design of the neural network architectures these functional embeddings are combined with. This work proposes a novel location encoder for globally distributed geographic data that combines spherical harmonic basis functions, natively defined on spherical surfaces, with sinusoidal representation networks (SirenNets) that can be interpreted as learned Double Fourier Sphere embedding. We systematically evaluate the cross-product of positional embeddings and neural network architectures across various classification and regression benchmarks and synthetic evaluation datasets. In contrast to previous approaches that require the combination of both positional encoding and neural networks to learn meaningful representations, we show that both spherical harmonics and sinusoidal representation networks are competitive on their own but set state-of-the-art performances across tasks when combined. We provide source code at www.github.com/marccoru/locationencode

    Fluvial Processes in Motion: Measuring Bank Erosion and Suspended Sediment Flux using Advanced Geomatic Methods and Machine Learning

    Get PDF
    Excessive erosion and fine sediment delivery to river corridors and receiving waters degrade aquatic habitat, add to nutrient loading, and impact infrastructure. Understanding the sources and movement of sediment within watersheds is critical for assessing ecosystem health and developing management plans to protect natural and human systems. As our changing climate continues to cause shifts in hydrological regimes (e.g., increased precipitation and streamflow in the northeast U.S.), the development of tools to better understand sediment dynamics takes on even greater importance. In this research, advanced geomatics and machine learning are applied to improve the (1) monitoring of streambank erosion, (2) understanding of event sediment dynamics, and (3) prediction of sediment loading using meteorological data as inputs. Streambank movement is an integral part of geomorphic changes along river corridors and also a significant source of fine sediment to receiving waters. Advances in unmanned aircraft systems (UAS) and photogrammetry provide opportunities for rapid and economical quantification of streambank erosion and deposition at variable scales. We assess the performance of UAS-based photogrammetry to capture streambank topography and quantify bank movement. UAS data were compared to terrestrial laser scanner (TLS) and GPS surveying from Vermont streambank sites that featured a variety of bank conditions and vegetation. Cross-sectional analysis of UAS and TLS data revealed that the UAS reliably captured the bank surface and was able to quantify the net change in bank area where movement occurred. Although it was necessary to consider overhanging bank profiles and vegetation, UAS-based photogrammetry showed significant promise for capturing bank topography and movement at fine resolutions in a flexible and efficient manner. This study also used a new machine-learning tool to improve the analysis of sediment dynamics using three years of high-resolution suspended sediment data collected in the Mad River watershed. A restricted Boltzmann machine (RBM), a type of artificial neural network (ANN), was used to classify individual storm events based on the visual hysteresis patterns present in the suspended sediment-discharge data. The work expanded the classification scheme typically used for hysteresis analysis. The results provided insights into the connectivity and sources of sediment within the Mad River watershed and its tributaries. A recurrent counterpropagation network (rCPN) was also developed to predict suspended sediment discharge at ungauged locations using only local meteorological data as inputs. The rCPN captured the nonlinear relationships between meteorological data and suspended sediment discharge, and outperformed the traditional sediment rating curve approach. The combination of machine-learning tools for analyzing storm-event dynamics and estimating loading at ungauged locations in a river network provides a robust method for estimating sediment production from catchments that informs watershed management

    Data Reduction and Deep-Learning Based Recovery for Geospatial Visualization and Satellite Imagery

    Get PDF
    The storage, retrieval and distribution of data are some critical aspects of big data management. Data scientists and decision-makers often need to share large datasets and make decisions on archiving or deleting historical data to cope with resource constraints. As a consequence, there is an urgency of reducing the storage and transmission requirement. A potential approach to mitigate such problems is to reduce big datasets into smaller ones, which will not only lower storage requirements but also allow light load transfer over the network. The high dimensional data often exhibit high repetitiveness and paradigm across different dimensions. Carefully prepared data by removing redundancies, along with a machine learning model capable of reconstructing the whole dataset from its reduced version, can improve the storage scalability, data transfer, and speed up the overall data management pipeline. In this thesis, we explore some data reduction strategies for big datasets, while ensuring that the data can be transferred and used ubiquitously by all stakeholders, i.e., the entire dataset can be reconstructed with high quality whenever necessary. One of our data reduction strategies follows a straightforward uniform pattern, which guarantees a minimum of 75% data size reduction. We also propose a novel variance based reduction technique, which focuses on removing only redundant data and offers additional 1% to 2% deletion rate. We have adopted various traditional machine learning and deep learning approaches for high-quality reconstruction. We evaluated our pipelines with big geospatial data and satellite imageries. Among them, our deep learning approaches have performed very well both quantitatively and qualitatively with the capability of reconstructing high quality features. We also show how to leverage temporal data for better reconstruction. For uniform deletion, the reconstruction accuracy observed is as high as 98.75% on an average for spatial meteorological data (e.g., soil moisture and albedo), and 99.09% for satellite imagery. Pushing the deletion rate further by following variance based deletion method, the decrease in accuracy remains within 1% for spatial meteorological data and 7% for satellite imagery

    Beyond Multilayer Perceptrons: Investigating Complex Topologies in Neural Networks

    Full text link
    In this study, we explore the impact of network topology on the approximation capabilities of artificial neural networks (ANNs), with a particular focus on complex topologies. We propose a novel methodology for constructing complex ANNs based on various topologies, including Barab\'asi-Albert, Erd\H{o}s-R\'enyi, Watts-Strogatz, and multilayer perceptrons (MLPs). The constructed networks are evaluated on synthetic datasets generated from manifold learning generators, with varying levels of task difficulty and noise. Our findings reveal that complex topologies lead to superior performance in high-difficulty regimes compared to traditional MLPs. This performance advantage is attributed to the ability of complex networks to exploit the compositionality of the underlying target function. However, this benefit comes at the cost of increased forward-pass computation time and reduced robustness to graph damage. Additionally, we investigate the relationship between various topological attributes and model performance. Our analysis shows that no single attribute can account for the observed performance differences, suggesting that the influence of network topology on approximation capabilities may be more intricate than a simple correlation with individual topological attributes. Our study sheds light on the potential of complex topologies for enhancing the performance of ANNs and provides a foundation for future research exploring the interplay between multiple topological attributes and their impact on model performance

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

    Goal-directed cross-system interactions in brain and deep learning networks

    Get PDF
    Deep neural networks (DNN) have recently emerged as promising models for the mammalian ventral visual stream. However, how ventral stream adapts to various goal-directed influences and coordinates with higher-level brain regions during learning remain poorly understood. By incorporating top-down influences involving attentional cues, linguistic labels and novel category learning into DNN models, the thesis offers an explanation for how the tasks we do shape representations across levels in models and related brain regions including ventral visual stream, HPC and ventromedial prefrontal cortex (vmPFC) via a theoretical modelling approach. The thesis include three main contributions. In the first contribution, I developed a goal-directed attention mechanism which extends general-purpose DNN with the ability to reconfigure itself to better suit the current task goal, much like PFC modulates activity along the ventral stream. In the second contribution, I uncovered how linguistic labelling shapes semantic representation by amending existing DNN to both predict the meaning and the categorical label of an object. Supported by simulation results involving fine-grained and coarse-grained labels, I concluded that differences in label use, whether across languages or levels of expertise, manifest in differences in the semantic representations that support label discrimination. In the third contribution, I aimed to better understand cross-brain mechanisms in a novel learning task by combining insights on labelling and attention obtained from preceding efforts. Integrating DNN with a novel clustering model built off from SUSTAIN, the proposed account captures human category learning behaviour and the underlying neural mechanisms across multiple interacting brain areas involving HPC, vmPFC and the ventral visual stream. By extending models of the ventral stream to incorporate goal-directed cross-system coordination, I hope the thesis can inform understanding of the neurobiology supporting object recognition and category learning which in turn help us advance designs of deep learning models
    • 

    corecore