952 research outputs found

    NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations

    Get PDF
    This paper introduces Non-Autonomous Input-Output Stable Network (NAIS-Net), a very deep architecture where each stacked processing block is derived from a time-invariant non-autonomous dynamical system. Non-autonomy is implemented by skip connections from the block input to each of the unrolled processing stages and allows stability to be enforced so that blocks can be unrolled adaptively to a pattern-dependent processing depth. NAIS-Net induces non-trivial, Lipschitz input-output maps, even for an infinite unroll length. We prove that the network is globally asymptotically stable so that for every initial condition there is exactly one input-dependent equilibrium assuming tanh units, and multiple stable equilibria for ReL units. An efficient implementation that enforces the stability under derived conditions for both fully-connected and convolutional layers is also presented. Experimental results show how NAIS-Net exhibits stability in practice, yielding a significant reduction in generalization gap compared to ResNets.Comment: NIPS 201

    Multi-View Stereo with Single-View Semantic Mesh Refinement

    Get PDF
    While 3D reconstruction is a well-established and widely explored research topic, semantic 3D reconstruction has only recently witnessed an increasing share of attention from the Computer Vision community. Semantic annotations allow in fact to enforce strong class-dependent priors, as planarity for ground and walls, which can be exploited to refine the reconstruction often resulting in non-trivial performance improvements. State-of-the art methods propose volumetric approaches to fuse RGB image data with semantic labels; even if successful, they do not scale well and fail to output high resolution meshes. In this paper we propose a novel method to refine both the geometry and the semantic labeling of a given mesh. We refine the mesh geometry by applying a variational method that optimizes a composite energy made of a state-of-the-art pairwise photo-metric term and a single-view term that models the semantic consistency between the labels of the 3D mesh and those of the segmented images. We also update the semantic labeling through a novel Markov Random Field (MRF) formulation that, together with the classical data and smoothness terms, takes into account class-specific priors estimated directly from the annotated mesh. This is in contrast to state-of-the-art methods that are typically based on handcrafted or learned priors. We are the first, jointly with the very recent and seminal work of [M. Blaha et al arXiv:1706.08336, 2017], to propose the use of semantics inside a mesh refinement framework. Differently from [M. Blaha et al arXiv:1706.08336, 2017], which adopts a more classical pairwise comparison to estimate the flow of the mesh, we apply a single-view comparison between the semantically annotated image and the current 3D mesh labels; this improves the robustness in case of noisy segmentations.Comment: {\pounds}D Reconstruction Meets Semantic, ICCV worksho

    Attention Mechanisms for Object Recognition with Event-Based Cameras

    Full text link
    Event-based cameras are neuromorphic sensors capable of efficiently encoding visual information in the form of sparse sequences of events. Being biologically inspired, they are commonly used to exploit some of the computational and power consumption benefits of biological vision. In this paper we focus on a specific feature of vision: visual attention. We propose two attentive models for event based vision: an algorithm that tracks events activity within the field of view to locate regions of interest and a fully-differentiable attention procedure based on DRAW neural model. We highlight the strengths and weaknesses of the proposed methods on four datasets, the Shifted N-MNIST, Shifted MNIST-DVS, CIFAR10-DVS and N-Caltech101 collections, using the Phased LSTM recognition network as a baseline reference model obtaining improvements in terms of both translation and scale invariance.Comment: WACV2019 camera-ready submissio

    ReConvNet: Video Object Segmentation with Spatio-Temporal Features Modulation

    Full text link
    We introduce ReConvNet, a recurrent convolutional architecture for semi-supervised video object segmentation that is able to fast adapt its features to focus on any specific object of interest at inference time. Generalization to new objects never observed during training is known to be a hard task for supervised approaches that would need to be retrained. To tackle this problem, we propose a more efficient solution that learns spatio-temporal features self-adapting to the object of interest via conditional affine transformations. This approach is simple, can be trained end-to-end and does not necessarily require extra training steps at inference time. Our method shows competitive results on DAVIS2016 with respect to state-of-the art approaches that use online fine-tuning, and outperforms them on DAVIS2017. ReConvNet shows also promising results on the DAVIS-Challenge 2018 winning the 1010-th position.Comment: CVPR Workshop - DAVIS Challenge 201

    ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation

    Get PDF
    We propose a structured prediction architecture, which exploits the local generic features extracted by Convolutional Neural Networks and the capacity of Recurrent Neural Networks (RNN) to retrieve distant dependencies. The proposed architecture, called ReSeg, is based on the recently introduced ReNet model for image classification. We modify and extend it to perform the more challenging task of semantic segmentation. Each ReNet layer is composed of four RNN that sweep the image horizontally and vertically in both directions, encoding patches or activations, and providing relevant global information. Moreover, ReNet layers are stacked on top of pre-trained convolutional layers, benefiting from generic local features. Upsampling layers follow ReNet layers to recover the original image resolution in the final predictions. The proposed ReSeg architecture is efficient, flexible and suitable for a variety of semantic segmentation tasks. We evaluate ReSeg on several widely-used semantic segmentation datasets: Weizmann Horse, Oxford Flower, and CamVid; achieving state-of-the-art performance. Results show that ReSeg can act as a suitable architecture for semantic segmentation tasks, and may have further applications in other structured prediction problems. The source code and model hyperparameters are available on https://github.com/fvisin/reseg.Comment: In CVPR Deep Vision Workshop, 201

    Endothelial Function in Pre-diabetes, Diabetes and Diabetic Cardiomyopathy: A Review

    Get PDF
    Diabetes mellitus worsens cardiovascular risk profile of affected individuals. Its worldwide increasing prevalence and its negative influences on vascular walls morphology and function are able to induce the expression of several morbidities which worsen the clinical conditions of the patients getting them running towards a reduced survival curve. Although overt diabetes increases the mortality rate of individuals due to its pathogenesis, poor information are in literature about the role of pre-diabetes and family history of diabetes mellitus in the outcome of general population. This emphasizes the importance of early detection of vascular impairment in subjects at risk of developing diabetes. The identification of early stages of atherosclerotic diseases in diabetic persons is a fundamental step in the risk stratification protocols followed-up by physicians in order to have a complete overview about the clinical status of such individuals. Common carotid intima-media thickness, flow-mediated vasodilatation, pulse wave velocity are instrumental tools able to detect the early impairment in cardiovascular system and stratify cardiovascular risk of individuals. The aim of this review is to get a general perspective on the complex relationship between cardiovascular diseases onset, pre-diabetes and family history of diabetes. Furthermore, it points out the influence of diabetes on heart function till the expression of the so-called diabetic cardiomyopathy

    On iterative and conditional computation for visual representation learning

    Get PDF
    DOTTORATOL'apprendimento di rappresentazioni direttamente dai dati è fondamentale per migliorare le prestazioni dei metodi di apprendimento automatico. Le reti neurali sono modelli flessibili in grado di apprendere potenti rappresentazioni gerarchiche. Tuttavia, una volta appresa, adattare la rappresentazione a nuovi dati o comportamenti non è banale. In questa tesi, facciamo un passo nella direzione dell'apprendimento di rappresentazioni adattive per dati visivi che affrontano il problema sia da una prospettiva pratica che teorica. In primo luogo, studiamo le Residual Networks dal punto di vista dei sistemi dinamici aggiungendo con un meccanismo per adattare automaticamente il numero di passi di computazione in base alle caratteristiche dei dati. Successivamente, ci concentriamo sul problema dell'apprendimento di rappresentazioni asincrone per dati basati su eventi. Proponiamo un meccanismo ricorrente che impara automaticamente come costruire in modo incrementale una rappresentazione bidimensionale direttamente dagli eventi, che può essere utilizzato come input per architetture convoluzionali per migliorare le loro prestazioni su attività di predizione del flusso ottico e riconoscimento di immagini rispetto alle a features euristiche, progettate da esperti. Infine, ci concentriamo sul complesso problema della segmentazione di oggetti video One-Shot, in cui al modello viene richiesto di segmentare oggetti specifici in un video dopo aver osservato un singolo fotogramma annotato. Affrontiamo il problema da una prospettiva di Meta-Learning mostrando che è possibile adattare una meta-rappresentazione generica a rappresentazioni specifiche per ogni task, modulando le attivazioni di una rete di segmentazione condizionata all'istanza data.Learning effective representations is crucial for scaling the performance of machine learning methods. Deep Neural Networks are flexible models that can learn powerful hierarchical representations by stacking several layers of computations. However, once learned, adapting the representation to new data or behaviours is nontrivial. In this thesis, we take a step in the direction of learning adaptive representations for visual data addressing the problem both from a practical and theoretical perspective. First, we study Residual Networks from a dynamical system perspective and augment them with a mechanism to automatically adapt the number of processing steps based on the characteristics of the data. Then, we focus on the problem of learning effective asynchronous representations for event-based data. We propose a recurrent mechanism that automatically learns how to incrementally build a two-dimensional representation from events, which can be used as input to convolutional frame-based architectures to improve their performance on optical flow prediction and image recognition tasks with respect to hand-designed features. Finally, we focus on the challenging problem of One-Shot Video Object Segmentation, where the model is asked to segment specific objects in unseen videos after observing a single annotated frame. We tackle the problem from a Meta-Learning perspective by showing that it is possible to adapt a generic meta-representation to specific task-representations, by modulating the activations of a segmentation network conditioned on the given instance.DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIAComputer Science and Engineering32SILVANO, CRISTINAPERNICI, BARBAR

    Asynchronous Convolutional Networks for Object Detection in Neuromorphic Cameras

    Get PDF
    Event-based cameras, also known as neuromorphic cameras, are bioinspired sensors able to perceive changes in the scene at high frequency with low power consumption. Becoming available only very recently, a limited amount of work addresses object detection on these devices. In this paper we propose two neural networks architectures for object detection: YOLE, which integrates the events into surfaces and uses a frame-based model to process them, and fcYOLE, an asynchronous event-based fully convolutional network which uses a novel and general formalization of the convolutional and max pooling layers to exploit the sparsity of camera events. We evaluate the algorithm with different extensions of publicly available datasets and on a novel synthetic dataset.Comment: accepted at CVPR2019 Event-based Vision Worksho

    Communication-Efficient Heterogeneous Federated Learning with Generalized Heavy-Ball Momentum

    Full text link
    Federated Learning (FL) is the state-of-the-art approach for learning from decentralized data in privacy-constrained scenarios. As the current literature reports, the main problems associated with FL refer to system and statistical challenges: the former ones demand for efficient learning from edge devices, including lowering communication bandwidth and frequency, while the latter require algorithms robust to non-iidness. State-of-art approaches either guarantee convergence at increased communication cost or are not sufficiently robust to handle extreme heterogeneous local distributions. In this work we propose a novel generalization of the heavy-ball momentum, and present FedHBM to effectively address statistical heterogeneity in FL without introducing any communication overhead. We conduct extensive experimentation on common FL vision and NLP datasets, showing that our FedHBM algorithm empirically yields better model quality and higher convergence speed w.r.t. the state-of-art, especially in pathological non-iid scenarios. While being designed for cross-silo settings, we show how FedHBM is applicable in moderate-to-high cross-device scenarios, and how good model initializations (e.g. pre-training) can be exploited for prompt acceleration. Extended experimentation on large-scale real-world federated datasets further corroborates the effectiveness of our approach for real-world FL applications
    corecore