Search CORE

12 research outputs found

Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval

Author: Akata Z
Dutta A
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordZero-shot sketch-based image retrieval (SBIR) is an emerging task in computer vision, allowing to retrieve natural images relevant to sketch queries that might not been seen in the training phase. Existing works either require aligned sketch-image pairs or inefficient memory fusion layer for mapping the visual information to a semantic space. In this work, we propose a semantically aligned paired cycle-consistent generative (SEM-PCYC) model for zero-shot SBIR, where each branch maps the visual information to a common semantic space via an adversarial training. Each of these branches maintains a cycle consistency that only requires supervision at category levels, and avoids the need of highly-priced aligned sketch-image pairs. A classification criteria on the generators' outputs ensures the visual to semantic space mapping to be discriminating. Furthermore, we propose to combine textual and hierarchical side information via a feature selection auto-encoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in zero-shot SBIR performance over the state-of-the-art on the challenging Sketchy and TU-Berlin datasets.European Union Horizon 202

arXiv.org e-Print Archive

Crossref

Open Research Exeter

MPG.PuRe

International Migration, Integration and Social Cohesion online publications

Semantically tied paired cycle consistency for any-shot sketch-based image retrieval

Author: Akata Z
Dutta A
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

This is the final version. Available from the publisher via the DOI in this record. Low-shot sketch-based image retrieval is an emerging task in computer vision, allowing to retrieve natural images relevant to hand-drawn sketch queries that are rarely seen during the training phase. Related prior works either require aligned sketchimage pairs that are costly to obtain or inefficient memory fusion layer for mapping the visual information to a semantic space. In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks, where we introduce the few-shot setting for SBIR. For solving these tasks, we propose a semantically aligned paired cycle-consistent generative adversarial network (SEM-PCYC) for any-shot SBIR, where each branch of the generative adversarial network maps the visual information from sketch and image to a common semantic space via adversarial training. Each of these branches maintains cycle consistency that only requires supervision at the category level, and avoids the need of aligned sketch-image pairs. A classification criteria on the generators’ outputs ensures the visual to semantic space mapping to be class-specific. Furthermore, we propose to combine textual and hierarchical side information via an auto-encoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in any-shot SBIR performance over the state-of-the-art on the extended version of the challenging Sketchy, TU-Berlin and QuickDraw datasets.European Union: Marie Skłodowska-Curie GrantEuropean Research Council (ERC

arXiv.org e-Print Archive

Open Research Exeter

MPG.PuRe

"Task-relevant autoencoding" enhances machine learning for human neuroscience

Author: Cherkaoui Mouslim
Cortese Aurelio
Cushing Cody
Kawato Mitsuo
Lau Hakwan
Odegaard Brian
Orouji Seyedmehdi
Peters Megan A. K.
Taschereau-Dumouchel Vincent
Publication venue
Publication date: 22/09/2023
Field of study

In human neuroscience, machine learning can help reveal lower-dimensional neural representations relevant to subjects' behavior. However, state-of-the-art models typically require large datasets to train, so are prone to overfitting on human neuroimaging data that often possess few samples but many input dimensions. Here, we capitalized on the fact that the features we seek in human neuroscience are precisely those relevant to subjects' behavior. We thus developed a Task-Relevant Autoencoder via Classifier Enhancement (TRACE), and tested its ability to extract behaviorally-relevant, separable representations compared to a standard autoencoder, a variational autoencoder, and principal component analysis for two severely truncated machine learning datasets. We then evaluated all models on fMRI data from 59 subjects who observed animals and objects. TRACE outperformed all models nearly unilaterally, showing up to 12% increased classification accuracy and up to 56% improvement in discovering "cleaner", task-relevant representations. These results showcase TRACE's potential for a wide variety of data related to human behavior.Comment: 41 pages, 11 figures, 5 tables including supplemental materia

arXiv.org e-Print Archive

Receptive fields optimization in deep learning for enhanced interpretability, diversity, and resource efficiency.

Author: Ayinde Babajide Odunitan
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/05/2019
Field of study

In both supervised and unsupervised learning settings, deep neural networks (DNNs) are known to perform hierarchical and discriminative representation of data. They are capable of automatically extracting excellent hierarchy of features from raw data without the need for manual feature engineering. Over the past few years, the general trend has been that DNNs have grown deeper and larger, amounting to huge number of final parameters and highly nonlinear cascade of features, thus improving the flexibility and accuracy of resulting models. In order to account for the scale, diversity and the difficulty of data DNNs learn from, the architectural complexity and the excessive number of weights are often deliberately built in into their design. This flexibility and performance usually come with high computational and memory demands both during training and inference. In addition, insight into the mappings DNN models perform and human ability to understand them still remain very limited. This dissertation addresses some of these limitations by balancing three conflicting objectives: computational/ memory demands, interpretability, and accuracy. This dissertation first introduces some unsupervised feature learning methods in a broader context of dictionary learning. It also sets the tone for deep autoencoder learning and constraints for data representations in light of removing some of the aforementioned bottlenecks such as the feature interpretability of deep learning models with nonnegativity constraints on receptive fields. In addition, the two main classes of solution to the drawbacks associated with overparameterization/ over-complete representation in deep learning models are also presented. Subsequently, two novel methods, one for each solution class, are presented to address the problems resulting from over-complete representation exhibited by most deep learning models. The first method is developed to achieve inference-cost-efficient models via elimination of redundant features with negligible deterioration of prediction accuracy. This is important especially for deploying deep learning models into resource-limited portable devices. The second method aims at diversifying the features of DNNs in the learning phase to improve their performance without undermining their size and capacity. Lastly, feature diversification is considered to stabilize adversarial learning and extensive experimental outcomes show that these methods have the potential of advancing the current state-of-the-art on different learning tasks and benchmark datasets

University of Louisville

Recommended from our members

Enhancing the Discovery of Neural Representations: Integrating Task-Relevant Dimensionality Reduction and Domain Adaptation

Author: Orouji Seyedmehdi
Publication venue: eScholarship, University of California
Publication date: 01/01/2024
Field of study

In human neuroscience, machine learning models can be used to discover lower-dimensional neural representations relevant to behavior. However, these models often require large datasets and can be overfit with the small sample sizes typical in neuroimaging. To address this, we developed the Task-Relevant Autoencoder via Classifier Enhancement (TRACE) to extract behaviorally relevant representations. When tested against standard autoencoders and principal component analysis, TRACE showed up to 12% increased classification accuracy and 56% improvement in discovering task-relevant representations using fMRI data from ventral temporal cortex (VTC) of 59 subjects, highlighting its potential for behavioral data.Machine learning models applications also extend to predictive modeling and pattern discovery in modern biology. However, these models often fail to generalize across different datasets due to statistical differences. This issue also exists in neuroscience, where data are collected across various laboratories using different experimental setups. Domain adaptation can align statistical distributions across datasets, enabling model transfer and mitigating overfitting issues. In the second chapter we discussed domain adaptation in the context of small-scale, heterogeneous biological data, outlining its benefits, challenges, and key methodologies. We advocate for integrating domain adaptation techniques into computational biology, with further customized developments.Building on these insights, we used DA for understanding brain region interactions during visual processing. We examine the ventral temporal cortex (VTC) and prefrontal cortex (PFC) using Domain Adaptive Task-Relevant Autoencoding via Classifier Enhancement (DATRACE) to explore shared neural representations. DATRACE leverages domain adaptation techniques within an encoder-decoder architecture to predict voxel activities from a shared latent space, in order to ensure relevance for object recognition tasks. Preliminary results indicate that shared representations capture similar object categories in both VTC and PFC. We computed the representational dissimilarity matrix (RDM) of the shared representation between VTC and PFC and contrasted that to the RDM obtained from the low dimensional representation of VTC. Our results suggest that the nature of the information shared with PFC is very similar to those encoded in VTC. Additionally, feature perturbation analysis suggests the need for further studies to reveal the semantic interpretations of shared dimensions in these brain regions. This integrated approach underscores the potential of advanced machine learning techniques in both neuroscience and biology

eScholarship - University of California