66 research outputs found

    Generative factorization for object-centric representation learning

    Get PDF
    Empowering machines to understand compositionality is considered by many (Lake et al., 2017; Lake and Baroni, 2018; Schölkopf et al., 2021) a promising path towards improved representational interpretability and out-of-distribution generalization. Yet, discovering the compositional structures of raw sensory data requires solving a factorization problem, i.e. decomposing the unstructured observations into modular components. Handling the factorization problem presents numerous technical challenges, especially in unsupervised settings which we explore to avoid the heavy burden of human annotation. In this thesis, we approach the factorization problem from a generative perspective. Specifically, we develop unsupervised machine learning models to recover the compositional data-generation mechanisms around objects from visual scene observations. First, we present MulMON as the first feasible unsupervised solution to the multi-view object-centric representation learning problem. MulMON resolves the spatial ambiguities arising from single-image observations of static scenes, e.g. optical illusions and occlusion, with a multi-view inference design. We demonstrate that not only can MulMON perform better scene object factorization with less uncertainty than single-view methods, but it can also predict a scene's appearances and object segmentations for novel viewpoints. Next, we present a technique, namely for latent duplicate suppression (abbr. LDS), and demonstrate its effectiveness in fixing a common scene object factorization issue that exists in various unsupervised object-centric learning models---i.e. inferring duplicate representations for the same objects. Finally, we present DyMON as the first unsupervised learner that can recover object-centric compositional generative mechanism from moving-view-dynamic-scene observational data. We demonstrate that not only can DyMON factorize dynamic scenes in terms of objects, but it can also factorize the entangled effects of observer motions and object dynamics that function independently. Furthermore, we demonstrate that DyMON can predict a scene's appearances and segmentations at arbitrary times (querying across time) and from arbitrary viewpoints (querying across space)---i.e. answer counterfactual questions. The scene modeling explored in this thesis is a proof of concept, which we hope will inspire: 1) a broader range of downstream applications (e.g. "world modelling'' and environment interactions) and 2) generative factorization research that targets more complex compositional structures (e.g. complex textures, multi-granularity compositions)

    Proceedings of the 36th International Workshop Statistical Modelling July 18-22, 2022 - Trieste, Italy

    Get PDF
    The 36th International Workshop on Statistical Modelling (IWSM) is the first one held in presence after a two year hiatus due to the COVID-19 pandemic. This edition was quite lively, with 60 oral presentations and 53 posters, covering a vast variety of topics. As usual, the extended abstracts of the papers are collected in the IWSM proceedings, but unlike the previous workshops, this year the proceedings will be not printed on paper, but it is only online. The workshop proudly maintains its almost unique feature of scheduling one plenary session for the whole week. This choice has always contributed to the stimulating atmosphere of the conference, combined with its informal character, encouraging the exchange of ideas and cross-fertilization among different areas as a distinguished tradition of the workshop, student participation has been strongly encouraged. This IWSM edition is particularly successful in this respect, as testified by the large number of students included in the program

    From pixels to people : recovering location, shape and pose of humans in images

    Get PDF
    Humans are at the centre of a significant amount of research in computer vision. Endowing machines with the ability to perceive people from visual data is an immense scientific challenge with a high degree of direct practical relevance. Success in automatic perception can be measured at different levels of abstraction, and this will depend on which intelligent behaviour we are trying to replicate: the ability to localise persons in an image or in the environment, understanding how persons are moving at the skeleton and at the surface level, interpreting their interactions with the environment including with other people, and perhaps even anticipating future actions. In this thesis we tackle different sub-problems of the broad research area referred to as "looking at people", aiming to perceive humans in images at different levels of granularity. We start with bounding box-level pedestrian detection: We present a retrospective analysis of methods published in the decade preceding our work, identifying various strands of research that have advanced the state of the art. With quantitative exper- iments, we demonstrate the critical role of developing better feature representations and having the right training distribution. We then contribute two methods based on the insights derived from our analysis: one that combines the strongest aspects of past detectors and another that focuses purely on learning representations. The latter method outperforms more complicated approaches, especially those based on hand- crafted features. We conclude our work on pedestrian detection with a forward-looking analysis that maps out potential avenues for future research. We then turn to pixel-level methods: Perceiving humans requires us to both separate them precisely from the background and identify their surroundings. To this end, we introduce Cityscapes, a large-scale dataset for street scene understanding. This has since established itself as a go-to benchmark for segmentation and detection. We additionally develop methods that relax the requirement for expensive pixel-level annotations, focusing on the task of boundary detection, i.e. identifying the outlines of relevant objects and surfaces. Next, we make the jump from pixels to 3D surfaces, from localising and labelling to fine-grained spatial understanding. We contribute a method for recovering 3D human shape and pose, which marries the advantages of learning-based and model- based approaches. We conclude the thesis with a detailed discussion of benchmarking practices in computer vision. Among other things, we argue that the design of future datasets should be driven by the general goal of combinatorial robustness besides task-specific considerations.Der Mensch steht im Zentrum vieler Forschungsanstrengungen im Bereich des maschinellen Sehens. Es ist eine immense wissenschaftliche Herausforderung mit hohem unmittelbarem Praxisbezug, Maschinen mit der FĂ€higkeit auszustatten, Menschen auf der Grundlage von visuellen Daten wahrzunehmen. Die automatische Wahrnehmung kann auf verschiedenen Abstraktionsebenen erfolgen. Dies hĂ€ngt davon ab, welches intelligente Verhalten wir nachbilden wollen: die FĂ€higkeit, Personen auf der BildflĂ€che oder im 3D-Raum zu lokalisieren, die Bewegungen von Körperteilen und KörperoberflĂ€chen zu erfassen, Interaktionen einer Person mit ihrer Umgebung einschließlich mit anderen Menschen zu deuten, und vielleicht sogar zukĂŒnftige Handlungen zu antizipieren. In dieser Arbeit beschĂ€ftigen wir uns mit verschiedenen Teilproblemen die dem breiten Forschungsgebiet "Betrachten von Menschen" gehören. Beginnend mit der FußgĂ€ngererkennung prĂ€sentieren wir eine Analyse von Methoden, die im Jahrzehnt vor unserem Ausgangspunkt veröffentlicht wurden, und identifizieren dabei verschiedene ForschungsstrĂ€nge, die den Stand der Technik vorangetrieben haben. Unsere quantitativen Experimente zeigen die entscheidende Rolle sowohl der Entwicklung besserer Bildmerkmale als auch der Trainingsdatenverteilung. Anschließend tragen wir zwei Methoden bei, die auf den Erkenntnissen unserer Analyse basieren: eine Methode, die die stĂ€rksten Aspekte vergangener Detektoren kombiniert, eine andere, die sich im Wesentlichen auf das Lernen von Bildmerkmalen konzentriert. Letztere ĂŒbertrifft kompliziertere Methoden, insbesondere solche, die auf handgefertigten Bildmerkmalen basieren. Wir schließen unsere Arbeit zur FußgĂ€ngererkennung mit einer vorausschauenden Analyse ab, die mögliche Wege fĂŒr die zukĂŒnftige Forschung aufzeigt. Anschließend wenden wir uns Methoden zu, die Entscheidungen auf Pixelebene betreffen. Um Menschen wahrzunehmen, mĂŒssen wir diese sowohl praezise vom Hintergrund trennen als auch ihre Umgebung verstehen. Zu diesem Zweck fĂŒhren wir Cityscapes ein, einen umfangreichen Datensatz zum VerstĂ€ndnis von Straßenszenen. Dieser hat sich seitdem als Standardbenchmark fĂŒr Segmentierung und Erkennung etabliert. DarĂŒber hinaus entwickeln wir Methoden, die die Notwendigkeit teurer Annotationen auf Pixelebene reduzieren. Wir konzentrieren uns hierbei auf die Aufgabe der Umgrenzungserkennung, d. h. das Erkennen der Umrisse relevanter Objekte und OberflĂ€chen. Als nĂ€chstes machen wir den Sprung von Pixeln zu 3D-OberflĂ€chen, vom Lokalisieren und Beschriften zum prĂ€zisen rĂ€umlichen VerstĂ€ndnis. Wir tragen eine Methode zur SchĂ€tzung der 3D-KörperoberflĂ€che sowie der 3D-Körperpose bei, die die Vorteile von lernbasierten und modellbasierten AnsĂ€tzen vereint. Wir schließen die Arbeit mit einer ausfĂŒhrlichen Diskussion von Evaluationspraktiken im maschinellen Sehen ab. Unter anderem argumentieren wir, dass der Entwurf zukĂŒnftiger DatensĂ€tze neben aufgabenspezifischen Überlegungen vom allgemeinen Ziel der kombinatorischen Robustheit bestimmt werden sollte

    Healthy Living: The European Congress of Epidemiology, 2015

    Get PDF

    Practicing Safe Sects

    Get PDF
    In Practicing Safe Sects F. LeRon Shults provides scientific and philosophical resources for having “the talk” about religious reproduction: where do gods come from – and what are the costs of bearing them in our culturally pluralistic, ecologically fragile environment?; Readership: All interested in the promotion of peaceful and healthy societies, and anyone concerned with the role of religion in fostering superstition and segregation
    • 

    corecore