12 research outputs found
Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval
This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordZero-shot sketch-based image retrieval (SBIR) is an emerging task in computer vision, allowing to retrieve natural images relevant to sketch queries that might not been seen in the training phase. Existing works either require aligned sketch-image pairs or inefficient memory fusion layer for mapping the visual information to a semantic space. In this work, we propose a semantically aligned paired cycle-consistent generative (SEM-PCYC) model for zero-shot SBIR, where each branch maps the visual information to a common semantic space via an adversarial training. Each of these branches maintains a cycle consistency that only requires supervision at category levels, and avoids the need of highly-priced aligned sketch-image pairs. A classification criteria on the generators' outputs ensures the visual to semantic space mapping to be discriminating. Furthermore, we propose to combine textual and hierarchical side information via a feature selection auto-encoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in zero-shot SBIR performance over the state-of-the-art on the challenging Sketchy and TU-Berlin datasets.European Union Horizon 202
Semantically tied paired cycle consistency for any-shot sketch-based image retrieval
This is the final version. Available from the publisher via the DOI in this record. Low-shot sketch-based image retrieval is an emerging task in computer vision, allowing to retrieve natural images relevant to
hand-drawn sketch queries that are rarely seen during the training phase. Related prior works either require aligned sketchimage pairs that are costly to obtain or inefficient memory fusion layer for mapping the visual information to a semantic space.
In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks, where we introduce
the few-shot setting for SBIR. For solving these tasks, we propose a semantically aligned paired cycle-consistent generative
adversarial network (SEM-PCYC) for any-shot SBIR, where each branch of the generative adversarial network maps the
visual information from sketch and image to a common semantic space via adversarial training. Each of these branches
maintains cycle consistency that only requires supervision at the category level, and avoids the need of aligned sketch-image
pairs. A classification criteria on the generators’ outputs ensures the visual to semantic space mapping to be class-specific.
Furthermore, we propose to combine textual and hierarchical side information via an auto-encoder that selects discriminating
side information within a same end-to-end model. Our results demonstrate a significant boost in any-shot SBIR performance
over the state-of-the-art on the extended version of the challenging Sketchy, TU-Berlin and QuickDraw datasets.European Union: Marie Skłodowska-Curie GrantEuropean Research Council (ERC
"Task-relevant autoencoding" enhances machine learning for human neuroscience
In human neuroscience, machine learning can help reveal lower-dimensional
neural representations relevant to subjects' behavior. However,
state-of-the-art models typically require large datasets to train, so are prone
to overfitting on human neuroimaging data that often possess few samples but
many input dimensions. Here, we capitalized on the fact that the features we
seek in human neuroscience are precisely those relevant to subjects' behavior.
We thus developed a Task-Relevant Autoencoder via Classifier Enhancement
(TRACE), and tested its ability to extract behaviorally-relevant, separable
representations compared to a standard autoencoder, a variational autoencoder,
and principal component analysis for two severely truncated machine learning
datasets. We then evaluated all models on fMRI data from 59 subjects who
observed animals and objects. TRACE outperformed all models nearly
unilaterally, showing up to 12% increased classification accuracy and up to 56%
improvement in discovering "cleaner", task-relevant representations. These
results showcase TRACE's potential for a wide variety of data related to human
behavior.Comment: 41 pages, 11 figures, 5 tables including supplemental materia
Receptive fields optimization in deep learning for enhanced interpretability, diversity, and resource efficiency.
In both supervised and unsupervised learning settings, deep neural networks (DNNs) are known to perform hierarchical and discriminative representation of data. They are capable of automatically extracting excellent hierarchy of features from raw data without the need for manual feature engineering. Over the past few years, the general trend has been that DNNs have grown deeper and larger, amounting to huge number of final parameters and highly nonlinear cascade of features, thus improving the flexibility and accuracy of resulting models. In order to account for the scale, diversity and the difficulty of data DNNs learn from, the architectural complexity and the excessive number of weights are often deliberately built in into their design. This flexibility and performance usually come with high computational and memory demands both during training and inference. In addition, insight into the mappings DNN models perform and human ability to understand them still remain very limited. This dissertation addresses some of these limitations by balancing three conflicting objectives: computational/ memory demands, interpretability, and accuracy. This dissertation first introduces some unsupervised feature learning methods in a broader context of dictionary learning. It also sets the tone for deep autoencoder learning and constraints for data representations in light of removing some of the aforementioned bottlenecks such as the feature interpretability of deep learning models with nonnegativity constraints on receptive fields. In addition, the two main classes of solution to the drawbacks associated with overparameterization/ over-complete representation in deep learning models are also presented. Subsequently, two novel methods, one for each solution class, are presented to address the problems resulting from over-complete representation exhibited by most deep learning models. The first method is developed to achieve inference-cost-efficient models via elimination of redundant features with negligible deterioration of prediction accuracy. This is important especially for deploying deep learning models into resource-limited portable devices. The second method aims at diversifying the features of DNNs in the learning phase to improve their performance without undermining their size and capacity. Lastly, feature diversification is considered to stabilize adversarial learning and extensive experimental outcomes show that these methods have the potential of advancing the current state-of-the-art on different learning tasks and benchmark datasets
Recommended from our members
Enhancing the Discovery of Neural Representations: Integrating Task-Relevant Dimensionality Reduction and Domain Adaptation
In human neuroscience, machine learning models can be used to discover lower-dimensional neural representations relevant to behavior. However, these models often require large datasets and can be overfit with the small sample sizes typical in neuroimaging. To address this, we developed the Task-Relevant Autoencoder via Classifier Enhancement (TRACE) to extract behaviorally relevant representations. When tested against standard autoencoders and principal component analysis, TRACE showed up to 12% increased classification accuracy and 56% improvement in discovering task-relevant representations using fMRI data from ventral temporal cortex (VTC) of 59 subjects, highlighting its potential for behavioral data.Machine learning models applications also extend to predictive modeling and pattern discovery in modern biology. However, these models often fail to generalize across different datasets due to statistical differences. This issue also exists in neuroscience, where data are collected across various laboratories using different experimental setups. Domain adaptation can align statistical distributions across datasets, enabling model transfer and mitigating overfitting issues. In the second chapter we discussed domain adaptation in the context of small-scale, heterogeneous biological data, outlining its benefits, challenges, and key methodologies. We advocate for integrating domain adaptation techniques into computational biology, with further customized developments.Building on these insights, we used DA for understanding brain region interactions during visual processing. We examine the ventral temporal cortex (VTC) and prefrontal cortex (PFC) using Domain Adaptive Task-Relevant Autoencoding via Classifier Enhancement (DATRACE) to explore shared neural representations. DATRACE leverages domain adaptation techniques within an encoder-decoder architecture to predict voxel activities from a shared latent space, in order to ensure relevance for object recognition tasks. Preliminary results indicate that shared representations capture similar object categories in both VTC and PFC. We computed the representational dissimilarity matrix (RDM) of the shared representation between VTC and PFC and contrasted that to the RDM obtained from the low dimensional representation of VTC. Our results suggest that the nature of the information shared with PFC is very similar to those encoded in VTC. Additionally, feature perturbation analysis suggests the need for further studies to reveal the semantic interpretations of shared dimensions in these brain regions. This integrated approach underscores the potential of advanced machine learning techniques in both neuroscience and biology