320 research outputs found

    Invariance in deep representations

    Get PDF
    In this thesis, Invariance in Deep Representations, we propose novel solutions to the problem of learning invariant representations. We adopt two distinct notions of invariance. One is rooted in symmetry groups and the other in causality. Last, despite being developed independently from each other, we aim to take a first step towards unifying the two notions of invariance. The thesis consists of four main sections where: (i) We propose a neural network-based permutation-invariant aggregation operator that corresponds to the attention mechanism. We develop a novel approach for set classification. (ii) We demonstrate that causal concepts can be used to explain the success of data augmentation by describing how they can weaken the spurious correlation between the observed domains and the task labels. We demonstrate that data augmentation can serve as a tool for simulating interventional data. (iii) We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders with a single latent confounder that lives in the same space as the treatment variable without changing the observational and interventional distributions entailed by the causal model. After the reduction, we parameterize the reduced causal model using a flexible class of transformations, so-called normalizing flows. (iv) We propose the Domain Invariant Variational Autoencoder, a generative model that tackles the problem of domain shifts by learning three independent latent subspaces, one for the domain, one for the class, and one for any residual variations

    Dynamics of Circumstellar Disks III: The case of GG Tau A

    Full text link
    (abridged) We present 2-dimensional hydrodynamic simulations using the Smoothed Particle Hydrodynamic (SPH) code, VINE, to model a self-gravitating binary system similar to the GG Tau A system. We simulate systems configured with semi-major axes of either a=62a=62~AU (`wide') or a=32a=32~AU (`close'), and with eccentricity of either e=0e=0 or e=0.3e=0.3. Strong spiral structures are generated with large material streams extending inwards. A small fraction accretes onto the circumstellar disks, with most returning to the torus. Structures also propagate outwards, generating net outwards mass flow and eventually losing coherence at large distances. The torus becomes significantly eccentric in shape. Accretion onto the stars occurs at a rate of a few ×10−8\times10^{-8}\msun/yr implying disk lifetimes shorter than ∼104\sim10^4~yr, without replenishment. Only wide configurations retain disks by virtue of robust accretion. In eccentric configurations, accretion is episodic, occurs preferentially onto the secondary at wrates peaked near binary periapse. We conclude that the \ggtaua\ torus is strongly self gravitating and that a major contribution to its thermal energy is shock dissipation. We interpret its observed features as manifestations of spiral structures and the low density material surrounding it as an excretion disk created by outward mass flux. We interpret GG Tau A as a coplanar system with an eccentric torus, and account for its supposed mutual inclination as due to degeneracy between the interpretation of inclination and eccentricity. Although the disks persist for long enough to permit planet formation, the environment remains unfavorable due to high temperatures. We conclude that the GG Tau A system is in an eccentric, a∼62a\sim62~AU orbit.Comment: Accepted for publication in the Astrophysical Journa

    Visual analytics and artificial intelligence for marketing

    Get PDF
    In today’s online environments, such as social media platforms and e-commerce websites, consumers are overloaded with information and firms are competing for their attention. Most of the data on these platforms comes in the form of text, images, or other unstructured data sources. It is important to understand which information on company websites and social media platforms are enticing and/or likeable by consumers. The impact of online visual content, in particular, remains largely unknown. Finding the drivers behind likes and clicks can help (1) understand how consumers interact with the information that is presented to them and (2) leverage this knowledge to improve marketing content. The main goal of this dissertation is to learn more about why consumers like and click on visual content online. To reach this goal visual analytics are used for automatic extraction of relevant information from visual content. This information can then be related, at scale, to consumer and their decisions

    On discovering and learning structure under limited supervision

    Full text link
    Les formes, les surfaces, les événements et les objets (vivants et non vivants) constituent le monde. L'intelligence des agents naturels, tels que les humains, va au-delà de la simple reconnaissance de formes. Nous excellons à construire des représentations et à distiller des connaissances pour comprendre et déduire la structure du monde. Spécifiquement, le développement de telles capacités de raisonnement peut se produire même avec une supervision limitée. D'autre part, malgré son développement phénoménal, les succès majeurs de l'apprentissage automatique, en particulier des modèles d'apprentissage profond, se situent principalement dans les tâches qui ont accès à de grands ensembles de données annotées. Dans cette thèse, nous proposons de nouvelles solutions pour aider à combler cette lacune en permettant aux modèles d'apprentissage automatique d'apprendre la structure et de permettre un raisonnement efficace en présence de tâches faiblement supervisés. Le thème récurrent de la thèse tente de s'articuler autour de la question « Comment un système perceptif peut-il apprendre à organiser des informations sensorielles en connaissances utiles sous une supervision limitée ? » Et il aborde les thèmes de la géométrie, de la composition et des associations dans quatre articles distincts avec des applications à la vision par ordinateur (CV) et à l'apprentissage par renforcement (RL). Notre première contribution ---Pix2Shape---présente une approche basée sur l'analyse par synthèse pour la perception. Pix2Shape exploite des modèles génératifs probabilistes pour apprendre des représentations 3D à partir d'images 2D uniques. Le formalisme qui en résulte nous offre une nouvelle façon de distiller l'information d'une scène ainsi qu'une représentation puissantes des images. Nous y parvenons en augmentant l'apprentissage profond non supervisé avec des biais inductifs basés sur la physique pour décomposer la structure causale des images en géométrie, orientation, pose, réflectance et éclairage. Notre deuxième contribution ---MILe--- aborde les problèmes d'ambiguïté dans les ensembles de données à label unique tels que ImageNet. Il est souvent inapproprié de décrire une image avec un seul label lorsqu'il est composé de plus d'un objet proéminent. Nous montrons que l'intégration d'idées issues de la littérature linguistique cognitive et l'imposition de biais inductifs appropriés aident à distiller de multiples descriptions possibles à l'aide d'ensembles de données aussi faiblement étiquetés. Ensuite, nous passons au paradigme d'apprentissage par renforcement, et considérons un agent interagissant avec son environnement sans signal de récompense. Notre troisième contribution ---HaC--- est une approche non supervisée basée sur la curiosité pour apprendre les associations entre les modalités visuelles et tactiles. Cela aide l'agent à explorer l'environnement de manière autonome et à utiliser davantage ses connaissances pour s'adapter aux tâches en aval. La supervision dense des récompenses n'est pas toujours disponible (ou n'est pas facile à concevoir), dans de tels cas, une exploration efficace est utile pour générer un comportement significatif de manière auto-supervisée. Pour notre contribution finale, nous abordons l'information limitée contenue dans les représentations obtenues par des agents RL non supervisés. Ceci peut avoir un effet néfaste sur la performance des agents lorsque leur perception est basée sur des images de haute dimension. Notre approche a base de modèles combine l'exploration et la planification sans récompense pour affiner efficacement les modèles pré-formés non supervisés, obtenant des résultats comparables à un agent entraîné spécifiquement sur ces tâches. Il s'agit d'une étape vers la création d'agents capables de généraliser rapidement à plusieurs tâches en utilisant uniquement des images comme perception.Shapes, surfaces, events, and objects (living and non-living) constitute the world. The intelligence of natural agents, such as humans is beyond pattern recognition. We excel at building representations and distilling knowledge to understand and infer the structure of the world. Critically, the development of such reasoning capabilities can occur even with limited supervision. On the other hand, despite its phenomenal development, the major successes of machine learning, in particular, deep learning models are primarily in tasks that have access to large annotated datasets. In this dissertation, we propose novel solutions to help address this gap by enabling machine learning models to learn the structure and enable effective reasoning in the presence of weakly supervised settings. The recurring theme of the thesis tries to revolve around the question of "How can a perceptual system learn to organize sensory information into useful knowledge under limited supervision?" And it discusses the themes of geometry, compositions, and associations in four separate articles with applications to computer vision (CV) and reinforcement learning (RL). Our first contribution ---Pix2Shape---presents an analysis-by-synthesis based approach(also referred to as inverse graphics) for perception. Pix2Shape leverages probabilistic generative models to learn 3D-aware representations from single 2D images. The resulting formalism allows us to perform a novel view synthesis of a scene and produce powerful representations of images. We achieve this by augmenting unsupervised learning with physically based inductive biases to decompose a scene structure into geometry, pose, reflectance and lighting. Our Second contribution ---MILe--- addresses the ambiguity issues in single-labeled datasets such as ImageNet. It is often inappropriate to describe an image with a single label when it is composed of more than one prominent object. We show that integrating ideas from Cognitive linguistic literature and imposing appropriate inductive biases helps in distilling multiple possible descriptions using such weakly labeled datasets. Next, moving into the RL setting, we consider an agent interacting with its environment without a reward signal. Our third Contribution ---HaC--- is a curiosity based unsupervised approach to learning associations between visual and tactile modalities. This aids the agent to explore the environment in an analogous self-guided fashion and further use this knowledge to adapt to downstream tasks. In the absence of reward supervision, intrinsic movitivation is useful to generate meaningful behavior in a self-supervised manner. In our final contribution, we address the representation learning bottleneck in unsupervised RL agents that has detrimental effect on the performance on high-dimensional pixel based inputs. Our model-based approach combines reward-free exploration and planning to efficiently fine-tune unsupervised pre-trained models, achieving comparable results to task-specific baselines. This is a step towards building agents that can generalize quickly on more than a single task using image inputs alone

    Natural Image Statistics for Digital Image Forensics

    Get PDF
    We describe a set of natural image statistics that are built upon two multi-scale image decompositions, the quadrature mirror filter pyramid decomposition and the local angular harmonic decomposition. These image statistics consist of first- and higher-order statistics that capture certain statistical regularities of natural images. We propose to apply these image statistics, together with classification techniques, to three problems in digital image forensics: (1) differentiating photographic images from computer-generated photorealistic images, (2) generic steganalysis; (3) rebroadcast image detection. We also apply these image statistics to the traditional art authentication for forgery detection and identification of artists in an art work. For each application we show the effectiveness of these image statistics and analyze their sensitivity and robustness
    • …
    corecore