18 research outputs found

    Classifying Signals on Irregular Domains via Convolutional Cluster Pooling

    Get PDF
    We present a novel and hierarchical approach for supervised classification of signals spanning over a fixed graph, reflecting shared properties of the dataset. To this end, we introduce a Convolutional Cluster Pooling layer exploiting a multi-scale clustering in order to highlight, at different resolutions, locally connected regions on the input graph. Our proposal generalises well-established neural models such as Convolutional Neural Networks (CNNs) on irregular and complex domains, by means of the exploitation of the weight sharing property in a graph-oriented architecture. In this work, such property is based on the centrality of each vertex within its soft-assigned cluster. Extensive experiments on NTU RGB+D, CIFAR-10 and 20NEWS demonstrate the effectiveness of the proposed technique in capturing both local and global patterns in graph-structured data out of different domains

    Latent Space Autoregression for Novelty Detection

    Get PDF
    Novelty detection is commonly referred to as the discrimination of observations that do not conform to a learned model of regularity. Despite its importance in different application settings, designing a novelty detector is utterly complex due to the unpredictable nature of novelties and its inaccessibility during the training procedure, factors which expose the unsupervised nature of the problem. In our proposal, we design a general framework where we equip a deep autoencoder with a parametric density estimator that learns the probability distribution underlying its latent representations through an autoregressive procedure. We show that a maximum likelihood objective, optimized in conjunction with the reconstruction of normal samples, effectively acts as a regularizer for the task at hand, by minimizing the differential entropy of the distribution spanned by latent vectors. In addition to providing a very general formulation, extensive experiments of our model on publicly available datasets deliver on-par or superior performances if compared to state-of-the-art methods in one-class and video anomaly detection settings. Differently from prior works, our proposal does not make any assumption about the nature of the novelties, making our work readily applicable to diverse contexts

    Conditional Channel Gated Networks for Task-Aware Continual Learning

    Get PDF
    Convolutional Neural Networks experience catastrophic forgetting when optimized on a sequence of learning problems: as they meet the objective of the current training examples, their performance on previous tasks drops drastically. In this work, we introduce a novel framework to tackle this problem with conditional computation. We equip each convolutional layer with task-specific gating modules, selecting which filters to apply on the given input. This way, we achieve two appealing properties. Firstly, the execution patterns of the gates allow to identify and protect important filters, ensuring no loss in the performance of the model for previously learned tasks. Secondly, by using a sparsity objective, we can promote the selection of a limited set of kernels, allowing to retain sufficient model capacity to digest new tasks.Existing solutions require, at test time, awareness of the task to which each example belongs to. This knowledge, however, may not be available in many practical scenarios. Therefore, we additionally introduce a task classifier that predicts the task label of each example, to deal with settings in which a task oracle is not available. We validate our proposal on four continual learning datasets. Results show that our model consistently outperforms existing methods both in the presence and the absence of a task oracle. Notably, on Split SVHN and Imagenet-50 datasets, our model yields up to 23.98% and 17.42% improvement in accuracy w.r.t. competing methods.Comment: CVPR 2020 (oral

    Region-of-Interest Based Neural Video Compression

    Full text link
    Humans do not perceive all parts of a scene with the same resolution, but rather focus on few regions of interest (ROIs). Traditional Object-Based codecs take advantage of this biological intuition, and are capable of non-uniform allocation of bits in favor of salient regions, at the expense of increased distortion the remaining areas: such a strategy allows a boost in perceptual quality under low rate constraints. Recently, several neural codecs have been introduced for video compression, yet they operate uniformly over all spatial locations, lacking the capability of ROI-based processing. In this paper, we introduce two models for ROI-based neural video coding. First, we propose an implicit model that is fed with a binary ROI mask and it is trained by de-emphasizing the distortion of the background. Secondly, we design an explicit latent scaling method, that allows control over the quantization binwidth for different spatial regions of latent variables, conditioned on the ROI mask. By extensive experiments, we show that our methods outperform all our baselines in terms of Rate-Distortion (R-D) performance in the ROI. Moreover, they can generalize to different datasets and to any arbitrary ROI at inference time. Finally, they do not require expensive pixel-level annotations during training, as synthetic ROI masks can be used with little to no degradation in performance. To the best of our knowledge, our proposals are the first solutions that integrate ROI-based capabilities into neural video compression models.Comment: Updated arxiv version to the camera-ready version after acceptance at British Machine Vision Conference (BMVC) 202

    Normal and pathogenic variation of RFC1 repeat expansions: implications for clinical diagnosis

    Get PDF
    Cerebellar Ataxia, Neuropathy and Vestibular Areflexia Syndrome (CANVAS) is an autosomal recessive neurodegenerative disease, usually caused by biallelic AAGGG repeat expansions in RFC1. In this study, we leveraged whole genome sequencing (WGS) data from nearly 10,000 individuals recruited within the Genomics England sequencing project to investigate the normal and pathogenic variation of the RFC1 repeat. We identified three novel repeat motifs, AGGGC (n=6 from 5 families), AAGGC (n=2 from 1 family), AGAGG (n=1), associated with CANVAS in the homozygous or compound heterozygous state with the common pathogenic AAGGG expansion. While AAAAG, AAAGGG and AAGAG expansions appear to be benign, here we show a pathogenic role for large AAAGG repeat configuration expansions (n=5). Long read sequencing was used to fully characterise the entire repeat sequence and revealed a pure AGGGC expansion in six patients, whereas the other patients presented complex motifs with AAGGG or AAAGG interruptions. All pathogenic motifs seem to have arisen from a common haplotype and are predicted to form highly stable G quadruplexes, which have been previously demonstrated to affect gene transcription in other conditions. The assessment of these novel configurations is warranted in CANVAS patients with negative or inconclusive genetic testing. Particular attention should be paid to carriers of compound AAGGG/AAAGG expansions, since the AAAGG motif when very large (>500 repeats) or in the presence of AAGGG interruptions. Accurate sizing and full sequencing of the satellite repeat with long read is recommended in clinically selected cases, in order to achieve an accurate molecular diagnosis and counsel patients and their families

    Identificazione di anomalie nell’attenzione del guidatore e nel comportamento delle persone.

    No full text
    Attraverso sensori e dispositivi informatici sempre più pervasivi il mondo diventa di giorno in giorno sempre più interconnesso e digitalizzato: di conseguenza, emergono nuove opportunità per l'intelligenza artificiale. In particolare, il monitoraggio pubblico si candida come tema critico e la visione artificiale ha le potenzialità per emergere come tecnologia guida nella costruzione di un mondo più sicuro. In questa tesi, presentiamo soluzioni per affrontare la salvaguardia pubblica in due diverse aree applicative. Consideriamo innanzitutto la sicurezza al volante, sviluppando un sistema in grado di prevedere su quali elementi della scena circostante un guidatore posa la sua attenzione. Nonostante il grande potenziale per il miglioramento della sicurezza, tale previsione appare molto complessa dal momento che guidare un'auto è un compito complicato, ed è altamente soggettivo dal punto di vista attentivo. A tal proposito, raccogliamo e rilasciamo DR(eye)VE, un dataset costituito da video acquisiti sia dal punto di vista del guidatore che da quello dell’auto, annotato con i punti di fissazione del guidatore sulla scena urbana esterna. Successivamente, una profonda ispezione di tali dati permette di stabilire quali fattori influenzano maggiormente l’attenzione del guidatore, in termini di movimento e di semantica. Guidati da tali evidenze, sviluppiamo infine una rete neurale profonda che, a partire da una scena urbana, identifica quali regioni sono salienti per l'attenzione del guidatore. In secondo luogo, affrontiamo la sicurezza in ambito videosorveglianza introducendo un modello di rilevamento delle anomalie. Tale modello è in grado di apprendere gli aspetti che caratterizzano situazioni normali (sicure), e quindi di generare una allerta ogni qualvolta compaiano eventi imprevisti. Addestrare tali modelli in assenza di esempi di condizioni anormale è lo scopo della ricerca per il rilevamento di anomalie (o rilevamento di novità). Nonostante la sua importanza ed una esuberanza di lavori precedenti, la natura imprevedibile di eventi anomali e la loro inaccessibilità durante la procedura di training degrada significativamente l'efficacia dei sistemi preesistenti. In questo contesto, proponiamo un modello generale costituito da un autoencoder profondo dotato di uno stimatore di densità parametrico, il quale impara la distribuzione delle sue rappresentazioni latenti attraverso una procedura autoregressiva. Mostriamo che un obiettivo di maximum likelihood nello spazio latente regolarizza l’obiettivo di ricostruzione dell'autoencoder e minimizza l'entropia differenziale della distribuzione dei vettori latenti. Intuitivamente, tale ottimizzazione congiunta forza il modello a descrivere (e ricostruire) ogni esempio in termini di features che appaiono frequentemente nel set di addestramento (pertanto, più rappresentative della normalità). Ampie indagini sperimentali e confronti con lo stato dell’arte dimostrano l'efficacia di entrambe le nostre proposte.As the world matures increasingly connected and digitized by the day, with sensors and computing devices becoming more and more pervasive, new opportunities appear for artificial intelligence. In particular, public monitoring steps forward as a critical theme, and computer vision can forcefully prevail as the lead technology to help build a safer world. In this thesis, we present solutions to tackle public safeguard in two different areas of operation. First, we begin with vehicle-based safety by developing a system capable of predicting where a person is likely to focus her attention on while driving. Such activity has a vast potential to improve driving safety. Nevertheless, it appears utterly complex since driving a car is a complicated task, and it is highly subjective from an attentive perspective. To handle attention prediction, we collect and release DR(eye)VE, a dataset consisting of driver-centric and car-centric clips, along with driver's fixation points on the outer urban scene. Next, we deeply inspect such data in order to establish which factors most influence a driver's gaze, both in terms of motion and semantics. Guided by such evidence, we finally develop a deep neural network that, given a car-centric urban scene, identifies which regions are likely to capture the driver's attention. Secondly, we address surveillance-based safety by introducing an anomaly detection model capable of learning the traits that characterize healthy (safe) situations and, therefore, alert when unexpected events appear. Learning such models without utilizing examples of abnormal conditions is the aim of anomaly detection (a.k.a. novelty detection) research. Despite its importance and a plethora of prior work, the unpredictable nature of novel events and their inaccessibility during the training procedure severely degrades the effectiveness of state-of-the-art systems. In this framework, we propose a general model consisting of a deep autoencoder equipped with a parametric density estimator, fitting its latent representations through an autoregressive procedure. We show that a maximum likelihood objective in latent space effectively regularizes the optimization of the autoencoder's reconstruction error, and minimizes the differential entropy of the distribution spanned by latent vectors. Intuitively, such a joint optimization forces the model to describe (and reconstruct) each example in terms of features that frequently appear in the training set. Extensive experimental inquiries and comparisons with prior art show the effectiveness of both our proposals
    corecore