3 research outputs found

    Multimodal representation and learning

    Get PDF
    Recent years have seen an explosion in multimodal data on the web. It is therefore important to perform multimodal learning to understand the web. However, it is challenging to join various modalities because each modality has a different representation and correlational structure. In addition, various modalities generally carry different kinds of information that may provide enrich understanding; for example, the visual signal of a flower may provide happiness; however, its scent might not be pleasant. Multimodal information may be useful to make an informed decision. Therefore, we focus on improving representations from individual modalities to enhance multimodal representation and learning. In this doctoral thesis, we presented techniques to enhance representations from individual and multiple modalities for multimodal applications including classification, cross-modal retrieval, matching and verification on various benchmark datasets

    Hand written characters recognition via deep metric learning

    No full text
    Deep metric learning plays an important role in measuring similarity through distance metrics among arbitrary group of data. MNIST dataset is typically used to measure similarity however this dataset has few seemingly similar classes, making it less effective for deep metric learning methods. In this paper, we created a new handwritten dataset named Urdu-Characters with set of classes suitable for deep metric learning. With this work, we compare the performance of two state-of-The-Art deep metric learning methods i.e. Siamese and Triplet network. We show that a Triplet network is more powerful than a Siamese network. In addition, we show that the performance of a Triplet or Siamese network can be improved using most powerful underlying Convolutional Neural Network architectures
    corecore