Search CORE

9 research outputs found

Continuous Spatiotemporal Transformers

Author: Caro Josue Ortega
Fonseca Antonio H. de O.
van Dijk David
Zappala Emanuele
Publication venue
Publication date: 28/07/2023
Field of study

Modeling spatiotemporal dynamical systems is a fundamental challenge in machine learning. Transformer models have been very successful in NLP and computer vision where they provide interpretable representations of data. However, a limitation of transformers in modeling continuous dynamical systems is that they are fundamentally discrete time and space models and thus have no guarantees regarding continuous sampling. To address this challenge, we present the Continuous Spatiotemporal Transformer (CST), a new transformer architecture that is designed for the modeling of continuous systems. This new framework guarantees a continuous and smooth output via optimization in Sobolev space. We benchmark CST against traditional transformers as well as other spatiotemporal dynamics modeling methods and achieve superior performance in a number of tasks on synthetic and real systems, including learning brain dynamics from calcium imaging data.Comment: Updated version, after review

arXiv.org e-Print Archive

Local Convolutions Cause an Implicit Bias towards High Frequency Adversarial Examples

Author: Anselmi Fabio
Brendel Wieland
Caro Josue Ortega
Dey Sourav
Ju Yilong
Patel Ankit
Pyle Ryan
Publication venue
Publication date: 07/12/2021
Field of study

Adversarial Attacks are still a significant challenge for neural networks. Recent work has shown that adversarial perturbations typically contain high-frequency features, but the root cause of this phenomenon remains unknown. Inspired by theoretical work on linear full-width convolutional models, we hypothesize that the local (i.e. bounded-width) convolutional operations commonly used in current neural networks are implicitly biased to learn high frequency features, and that this is one of the root causes of high frequency adversarial examples. To test this hypothesis, we analyzed the impact of different choices of linear and nonlinear architectures on the implicit bias of the learned features and the adversarial perturbations, in both spatial and frequency domains. We find that the high-frequency adversarial perturbations are critically dependent on the convolution operation because the spatially-limited nature of local convolutions induces an implicit bias towards high frequency features. The explanation for the latter involves the Fourier Uncertainty Principle: a spatially-limited (local in the space domain) filter cannot also be frequency-limited (local in the frequency domain). Furthermore, using larger convolution kernel sizes or avoiding convolutions (e.g. by using Vision Transformers architecture) significantly reduces this high frequency bias, but not the overall susceptibility to attacks. Looking forward, our work strongly suggests that understanding and controlling the implicit bias of architectures will be essential for achieving adversarial robustness.Comment: 20 pages, 11 figures, 12 Table

arXiv.org e-Print Archive

Understanding Robustness and Generalization of Artificial Neural Networks Through Fourier Masks

Author: Anselmi Fabio
Besier Emma
Karantzas Nikos
Ortega Caro Josue
Patel Ankit B
Pitkow Xaq
Tolias Andreas S
Publication venue
Publication date: 01/01/2022
Field of study

Despite the enormous success of artificial neural networks (ANNs) in many disciplines, the characterization of their computations and the origin of key properties such as generalization and robustness remain open questions. Recent literature suggests that robust networks with good generalization properties tend to be biased toward processing low frequencies in images. To explore the frequency bias hypothesis further, we develop an algorithm that allows us to learn modulatory masks highlighting the essential input frequencies needed for preserving a trained network's performance. We achieve this by imposing invariance in the loss with respect to such modulations in the input frequencies. We first use our method to test the low-frequency preference hypothesis of adversarially trained or data-augmented networks. Our results suggest that adversarially robust networks indeed exhibit a low-frequency bias but we find this bias is also dependent on directions in frequency space. However, this is not necessarily true for other types of data augmentation. Our results also indicate that the essential frequencies in question are effectively the ones used to achieve generalization in the first place. Surprisingly, images seen through these modulatory masks are not recognizable and resemble texture-like patterns

Archivio istituzionale della ricerca - Università di Trieste

arXiv.org e-Print Archive

PubMed Central

Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics

Author: Andy Lu
Aneel Damaraju
Ankit B. Patel
Fabio Anselmi
Josue Ortega Caro
Justin Sahs
Onur Tavaslioglu
Ryan Pyle
Publication venue
Publication date: 01/01/2022
Field of study

Understanding the learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented. Partially, this is due to symmetries inherent within the NN parameterization, allowing multiple different parameter settings to result in an identical output function, resulting in both an unclear relationship and redundant degrees of freedom. The NN parameterization is invariant under two symmetries: permutation of the neurons and a continuous family of transformations of the scale of weight and bias parameters. We propose taking a quotient with respect to the second symmetry group and reparametrizing ReLU NNs as continuous piecewise linear splines. Using this spline lens, we study learning dynamics in shallow univariate ReLU NNs, finding unexpected insights and explanations for several perplexing phenomena. We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum. We also show that standard weight initializations yield very flat initial functions, and that this flatness, together with overparametrization and the initial weight scale, is responsible for the strength and type of implicit regularization, consistent with previous work. Our implicit regularization results are complementary to recent work, showing that initialization scale critically controls implicit regularization via a kernel-based argument. Overall, removing the weight scale symmetry enables us to prove these results more simply and enables us to prove new results and gain new insights while offering a far more transparent and intuitive picture. Looking forward, our quotiented spline-based approach will extend naturally to the multivariate and deep settings, and alongside the kernel-based view, we believe it will play a foundational role in efforts to understand neural networks. Videos of learning dynamics using a spline-based visualization are available at http://shorturl.at/tFWZ2

Archivio istituzionale della ricerca - Università di Trieste

PubMed Central

Recurrent computations for visual pattern completion

Author: Caro Josue Ortega
Cox David
Hardesty Walter
Kreiman Gabriel
Lotter William
Moerman Charlotte
Paredes Ana
Schrimpf Martin
Tang Hanlin
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2018
Field of study

Making inferences from partial information constitutes a critical aspect of cognition. During visual perception, pattern completion enables recognition of poorly visible or occluded objects. We combined psychophysics, physiology, and computational models to test the hypothesis that pattern completion is implemented by recurrent computations and present three pieces of evidence that are consistent with this hypothesis. First, subjects robustly recognized objects even when they were rendered <15% visible, but recognition was largely impaired when processing was interrupted by backward masking. Second, invasive physiological responses along the human ventral cortex exhibited visually selective responses to partially visible objects that were delayed compared with whole objects, suggesting the need for additional computations. These physiological delays were correlated with the effects of backward masking. Third, state-of-the-art feed-forward computational architectures were not robust to partial visibility. However, recognition performance was recovered when the model was augmented with attractor-based recurrent connectivity. The recurrent model was able to predict which images of heavily occluded objects were easier or harder for humans to recognize, could capture the effect of introducing a backward mask on recognition behavior, and was consistent with the physiological delays along the human ventral visual stream. These results provide a strong argument of plausibility for the role of recurrent computations in making visual inferences from partial information

arXiv.org e-Print Archive

OPUS Augsburg

Open Access LMU

AMPNet: Attention as Message Passing for Graph Neural Networks

Author: Abdallah Chadi G.
Averill Christopher
Bagherian Maryam
Brbic Maria
Caro Josue Ortega
Christensen Benjamin
Dhodapkar Rahul Madhav
Fonseca Antonio H. O.
Lyu Haoran
Nguyen Nhi
Rizvi Syed Asad
van Dijk David
Ying Rex
Zappala Emanuele
Publication venue
Publication date: 06/10/2023
Field of study

Graph Neural Networks (GNNs) have emerged as a powerful representation learning framework for graph-structured data. A key limitation of conventional GNNs is their representation of each node with a singular feature vector, potentially overlooking intricate details about individual node features. Here, we propose an Attention-based Message-Passing layer for GNNs (AMPNet) that encodes individual features per node and models feature-level interactions through cross-node attention during message-passing steps. We demonstrate the abilities of AMPNet through extensive benchmarking on real-world biological systems such as fMRI brain activity recordings and spatial genomic data, improving over existing baselines by 20% on fMRI signal reconstruction, and further improving another 8% with positional embedding added. Finally, we validate the ability of AMPNet to uncover meaningful feature-level interactions through case studies on biological systems. We anticipate that our architecture will be highly applicable to graph-structured data where node entities encompass rich feature-level information.Comment: 16 pages (12 + 4 pages appendix). 5 figures and 7 table

arXiv.org e-Print Archive

Robust deep learning object recognition models rely on low frequency information in natural images.

Author: Andreas S Tolias
Ankit B Patel
Evgenia Rusak
Fabio Anselmi
Josue Ortega Caro
Matthias Bethge
Wieland Brendel
Xaq Pitkow
Zhe Li
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2023
Field of study

Machine learning models have difficulty generalizing to data outside of the distribution they were trained on. In particular, vision models are usually vulnerable to adversarial attacks or common corruptions, to which the human visual system is robust. Recent studies have found that regularizing machine learning models to favor brain-like representations can improve model robustness, but it is unclear why. We hypothesize that the increased model robustness is partly due to the low spatial frequency preference inherited from the neural representation. We tested this simple hypothesis with several frequency-oriented analyses, including the design and use of hybrid images to probe model frequency sensitivity directly. We also examined many other publicly available robust models that were trained on adversarial images or with data augmentation, and found that all these robust models showed a greater preference to low spatial frequency information. We show that preprocessing by blurring can serve as a defense mechanism against both adversarial attacks and common corruptions, further confirming our hypothesis and demonstrating the utility of low spatial frequency information in robust object recognition

Archivio istituzionale della ricerca - Università di Trieste

Directory of Open Access Journals

Recurrent computations for visual pattern completion

Author: Ana Paredes
Charlotte Moerman
David Cox
Fyall
Gabriel Kreiman
Hanlin Tang
Josue Ortega Caro
Li
Martin Schrimpf
Panzeri
Walter Hardesty
William Lotter
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date
Field of study

Crossref