1,381 research outputs found

    Challenging Common Assumptions in Multi-task Learning

    Full text link
    While multi-task learning (MTL) has gained significant attention in recent years, its underlying mechanisms remain poorly understood. Recent methods did not yield consistent performance improvements over single task learning (STL) baselines, underscoring the importance of gaining more profound insights about challenges specific to MTL. In our study, we challenge common assumptions in MTL in the context of STL: First, the choice of optimizer has only been mildly investigated in MTL. We show the pivotal role of common STL tools such as the Adam optimizer in MTL. We deduce the effectiveness of Adam to its partial loss-scale invariance. Second, the notion of gradient conflicts has often been phrased as a specific problem in MTL. We delve into the role of gradient conflicts in MTL and compare it to STL. For angular gradient alignment we find no evidence that this is a unique problem in MTL. We emphasize differences in gradient magnitude as the main distinguishing factor. Lastly, we compare the transferability of features learned through MTL and STL on common image corruptions, and find no conclusive evidence that MTL leads to superior transferability. Overall, we find surprising similarities between STL and MTL suggesting to consider methods from both fields in a broader context.Comment:

    Selected Inductive Biases in Neural Networks To Generalize Beyond the Training Domain

    Get PDF
    Die künstlichen neuronalen Netze des computergesteuerten Sehens können mit den vielf\"altigen Fähigkeiten des menschlichen Sehens noch lange nicht mithalten. Im Gegensatz zum Menschen können künstliche neuronale Netze durch kaum wahrnehmbare Störungen durcheinandergebracht werden, es mangelt ihnen an Generalisierungsfähigkeiten über ihre Trainingsdaten hinaus und sie benötigen meist noch enorme Datenmengen für das Erlernen neuer Aufgaben. Somit sind auf neuronalen Netzen basierende Anwendungen häufig auf kleine Bereiche oder kontrollierte Umgebungen beschränkt und lassen sich schlecht auf andere Aufgaben übertragen. In dieser Dissertation, werden vier Veröffentlichungen besprochen, die sich mit diesen Einschränkungen auseinandersetzen und Algorithmen im Bereich des visuellen Repräsentationslernens weiterentwickeln. In der ersten Veröffentlichung befassen wir uns mit dem Erlernen der unabhängigen Faktoren, die zum Beispiel eine Szenerie beschreiben. Im Gegensatz zu vorherigen Arbeiten in diesem Forschungsfeld verwenden wir hierbei jedoch weniger künstliche, sondern natürlichere Datensätze. Dabei beobachten wir, dass die zeitlichen Änderungen von Szenerien beschreibenden, natürlichen Faktoren (z.B. die Positionen von Personen in einer Fußgängerzone) einer verallgemeinerten Laplace-Verteilung folgen. Wir nutzen die verallgemeinerte Laplace-Verteilung als schwaches Lernsignal, um neuronale Netze für mathematisch beweisbares Repräsentationslernen unabhängiger Faktoren zu trainieren. Wir erzielen in den disentanglement_lib Wettbewerbsdatensätzen vergleichbare oder bessere Ergebnisse als vorherige Arbeiten – dies gilt auch für die von uns beigesteuerten Datensätze, welche natürliche Faktoren beinhalten. Die zweite Veröffentlichung untersucht, ob verschiedene neuronale Netze bereits beobachtete, eine Szenerie beschreibende Faktoren generalisieren können. In den meisten bisherigen Generalisierungswettbewerben werden erst während der Testphase neue Störungsfaktoren hinzugefügt - wir hingegen garantieren, dass die für die Testphase relevanten Variationsfaktoren bereits während der Trainingsphase teilweise vorkommen. Wir stellen fest, dass die getesteten neuronalen Netze meist Schwierigkeiten haben, die beschreibenden Faktoren zu generalisieren. Anstatt die richtigen Werte der Faktoren zu bestimmen, neigen die Netze dazu, Werte in zuvor beobachteten Bereichen vorherzusagen. Dieses Verhalten ist bei allen untersuchten neuronalen Netzen recht ähnlich. Trotz ihrer begrenzten Generalisierungsfähigkeiten, können die Modelle jedoch modular sein: Obwohl sich einige Faktoren während der Trainingsphase in einem zuvor ungesehenen Wertebereich befinden, können andere Faktoren aus einem bereits bekannten Wertebereich größtenteils dennoch korrekt bestimmt werden. Die dritte Veröffentlichung präsentiert ein adversielles Trainingsverfahren für neuronale Netze. Das Verfahren ist inspiriert durch lokale Korrelationsstrukturen häufiger Bildartefakte, die z.B. durch Regen, Unschärfe oder Rauschen entstehen können. Im Klassifizierungswettbewerb ImageNet-C zeigen wir, dass mit unserer Methode trainierte Netzwerke weniger anfällig für häufige Störungen sind als einige, die mit bestehenden Methoden trainiert wurden. Schließlich stellt die vierte Veröffentlichung einen generativen Ansatz vor, der bestehende Ansätze gemäß mehrerer Robustheitsmetriken beim MNIST Ziffernklassifizierungswettbewerb übertrifft. Perzeptiv scheint unser generatives Modell im Vergleich zu früheren Ansätzen stärker auf das menschliche Sehen abgestimmt zu sein, da Bilder von Ziffern, die für unser generatives Modell mehrdeutig sind, auch für den Menschen mehrdeutig erscheinen können. Diese Arbeit liefert also Möglichkeiten zur Verbesserung der adversiellen Robustheit und der Störungstoleranz sowie Erweiterungen im Bereich des visuellen Repräsentationslernens. Somit nähern wir uns im Bereich des maschinellen Lernens weiter der Vielfalt menschlicher Fähigkeiten an.Artificial neural networks in computer vision have yet to approach the broad performance of human vision. Unlike humans, artificial networks can be derailed by almost imperceptible perturbations, lack strong generalization capabilities beyond the training data and still mostly require enormous amounts of data to learn novel tasks. Thus, current applications based on neural networks are often limited to a narrow range of controlled environments and do not transfer well across tasks. This thesis presents four publications that address these limitations and advance visual representation learning algorithms. In the first publication, we aim to push the field of disentangled representation learning towards more realistic settings. We observe that natural factors of variation describing scenes, e.g., the position of pedestrians, have temporally sparse transitions in videos. We leverage this sparseness as a weak form of learning signal to train neural networks for provable disentangled visual representation learning. We achieve competitive results on the disentanglement_lib benchmark datasets and our own contributed datasets, which include natural transitions. The second publication investigates whether various visual representation learning approaches generalize along partially observed factors of variation. In contrast to prior robustness benchmarks that add unseen types of perturbations during test time, we compose, interpolate, or extrapolate the factors observed during training. We find that the tested models mostly struggle to generalize to our proposed benchmark. Instead of predicting the correct factors, models tend to predict values in previously observed ranges. This behavior is quite common across models. Despite their limited out-of-distribution performances, the models can be fairly modular as, even though some factors are out-of-distribution, other in-distribution factors are still mostly inferred correctly. The third publication presents an adversarial noise training method for neural networks inspired by the local correlation structure of common corruptions caused by rain, blur, or noise. On the ImageNet-C classification benchmark, we show that networks trained with our method are less susceptible to common corruptions than those trained with existing methods. Finally, the fourth publication introduces a generative approach that outperforms existing approaches according to multiple robustness metrics on the MNIST digit classification benchmark. Perceptually, our generative model is more aligned with human vision compared to previous approaches, as images of digits at our model's decision boundary can also appear ambiguous to humans. In a nutshell, this work investigates ways of improving adversarial and corruption robustness, and disentanglement in visual representation learning algorithms. Thus, we alleviate some limitations in machine learning and narrow the gap towards human capabilities

    Deep currents and the eastward salinity tongue in the equatorial Atlantic: Results from an eddy-resolving, primitive equation model

    Get PDF
    The high-resolution model of the wind-driven and thermohaline circulation in the Atlantic Ocean developed in recent years as a “community modeling effort” for the World Ocean Circulation Experiment is examined for the temporal and spatial structure of the deep equatorial current field and its effect on the spreading of North Atlantic Deep Water (NADW). Under seasonally varying wind forcing, the model reveals a system of basin-wide zonal currents of O(5 cm s−1), alternating east-west, and oscillating at an annual period. The current fluctuations are induced by the seasonal cycle of the wind stress in the equatorial Atlantic and show characteristics of long equatorial Rossby waves with westward phase propagation of about 15 cm s−1. The mean flow in the deep western tropical Atlantic is governed by a deep western boundary current (DWBC) with core velocities of more than 10 cm s−1. Only a small fraction of the DWBC branches off at the equator, with correspondingly low mean eastward currents of only about 1 cm s−1. Despite this weak advection along the equator, a well-developed salinity tongue is observed in the model, which is reminiscent of observed property distributions at the upper NADW level. The model evaluation indicates the salinity pattern to be a result of a balance between mean zonal advection and meridional diffusion of salt. The presence of the zonal current oscillations appears to have no significance for the existence of the salinity tongue

    Understanding Neural Coding on Latent Manifolds by Sharing Features and Dividing Ensembles

    Full text link
    Systems neuroscience relies on two complementary views of neural data, characterized by single neuron tuning curves and analysis of population activity. These two perspectives combine elegantly in neural latent variable models that constrain the relationship between latent variables and neural activity, modeled by simple tuning curve functions. This has recently been demonstrated using Gaussian processes, with applications to realistic and topologically relevant latent manifolds. Those and previous models, however, missed crucial shared coding properties of neural populations. We propose feature sharing across neural tuning curves, which significantly improves performance and leads to better-behaved optimization. We also propose a solution to the problem of ensemble detection, whereby different groups of neurons, i.e., ensembles, can be modulated by different latent manifolds. This is achieved through a soft clustering of neurons during training, thus allowing for the separation of mixed neural populations in an unsupervised manner. These innovations lead to more interpretable models of neural population activity that train well and perform better even on mixtures of complex latent manifolds. Finally, we apply our method on a recently published grid cell dataset, recovering distinct ensembles, inferring toroidal latents and predicting neural tuning curves all in a single integrated modeling framework

    Somali Current rings in the eastern Gulf of Aden

    Get PDF
    Author Posting. © American Geophysical Union, 2006. This article is posted here by permission of American Geophysical Union for personal use, not for redistribution. The definitive version was published in Journal of Geophysical Research 111 (2006): C09039, doi:10.1029/2005JC003338.New satellite-based observations reveal that westward translating anticyclonic rings are generated as a portion of the Somali Current accelerates northward through the Socotra Passage near the mouth of the Gulf of Aden. Rings thus formed exhibit azimuthal geostrophic velocities exceeding 50 cm/s, are comparable in overall diameter to the width of the Gulf of Aden (250 km), and translate westward into the gulf at 5–8 cm/s. Ring generation is most notable in satellite ocean color imagery in November immediately following the transition between southwest (boreal summer) and northeast (winter) monsoon regimes. The observed rings contain anomalous fluid within their core which reflects their origin in the equator-crossing Somali Current system. Estimates of Socotra Passage flow variability derived from satellite altimetry provide evidence for a similar ring generation process in May following the winter-to-summer monsoon transition. Cyclonic recirculation eddies are observed to spin up on the eastern flank of newly formed rings with the resulting vortex pair translating westward together. Recent shipboard and Lagrangian observations indicate that vortices of both sign have substantial vertical extent and may dominate the lateral circulation at all depths in the eastern Gulf of Aden.This investigation is a component of the Red Sea Outflow Experiment (REDSOX) sponsored by the U.S. National Science Foundation through grants OCE 98-18464 and OCE 04-24647 to the Woods Hole Oceanographic Institution and OCE 98-19506 and OCE 03-51116 to the University of Miami

    IT-based Architecture for Power Market Oriented Optimization at Multiple Levels in Production Processes

    Get PDF
    Given the increasingly volatile prices on the power markets, it becomes economically more and more important for companies to develop and realize flexible strategies for energy consumption. A steady adaption of production processes which considers current power prices can take place on several levels of the automation pyramid, where each level has its own characteristics and requirements. In this paper, we present an optimization architecture based on an IT-platform which meets the challenges of complex multilayered production processes. We introduce layer-specific optimization strategies as well as an associated information flow, which facilitates creating holistic and well-coordinated optimizations

    Ocean preconditioning of Cyclone Nargis in the Bay of Bengal : interaction between Rossby waves, surface fresh waters, and sea surface temperatures

    Get PDF
    Author Posting. © American Meteorological Society, 2011. This article is posted here by permission of American Meteorological Society for personal use, not for redistribution. The definitive version was published in Journal of Physical Oceanography 41 (2011): 1741–1755, doi:10.1175/2011JPO4437.1.An in-depth data analysis was conducted to understand the occurrence of a strong sea surface temperature (SST) front in the central Bay of Bengal before the formation of Cyclone Nargis in April 2008. Nargis changed its course after encountering the front and tracked along the front until making landfall. One unique feature of this SST front was its coupling with high sea surface height anomalies (SSHAs), which is unusual for a basin where SST is normally uncorrelated with SSHA. The high SSHAs were associated with downwelling Rossby waves, and the interaction between downwelling and surface fresh waters was a key mechanism to account for the observed SST–SSHA coupling. The near-surface salinity field in the bay is characterized by strong stratification and a pronounced horizontal gradient, with low salinity in the northeast. During the passage of downwelling Rossby waves, freshening of the surface layer was observed when surface velocities were southwestward. Horizontal convergence of freshwater associated with downwelling Rossby waves increased the buoyancy of the upper layer and caused the mixed layer to shoal to within a few meters of the surface. Surface heating trapped in the thin mixed layer caused the fresh layer to warm, whereas the increase in buoyancy from low-salinity waters enhanced the high SSHA associated with Rossby waves. Thus, high SST coincided with high SSHA. The dominant role of salinity in controlling high SSHA suggests that caution should be exercised when computing hurricane heat potential in the bay from SSHA. This situation is different from most tropical oceans, where temperature has the dominant effect on SSHA.This work was supported by the NOAA/Office of Climate Observation (OCO) program
    corecore