956 research outputs found

    Hashing for Multimedia Similarity Modeling and Large-Scale Retrieval

    Get PDF
    In recent years, the amount of multimedia data such as images, texts, and videos have been growing rapidly on the Internet. Motivated by such trends, this thesis is dedicated to exploiting hashing-based solutions to reveal multimedia data correlations and support intra-media and inter-media similarity search among huge volumes of multimedia data. We start by investigating a hashing-based solution for audio-visual similarity modeling and apply it to the audio-visual sound source localization problem. We show that synchronized signals in audio and visual modalities demonstrate similar temporal changing patterns in certain feature spaces. We propose to use a permutation-based random hashing technique to capture the temporal order dynamics of audio and visual features by hashing them along the temporal axis into a common Hamming space. In this way, the audio-visual correlation problem is transformed into a similarity search problem in the Hamming space. Our hashing-based audio-visual similarity modeling has shown superior performances in the localization and segmentation of sounding objects in videos. The success of the permutation-based hashing method motivates us to generalize and formally define the supervised ranking-based hashing problem, and study its application to large-scale image retrieval. Specifically, we propose an effective supervised learning procedure to learn optimized ranking-based hash functions that can be used for large-scale similarity search. Compared with the randomized version, the optimized ranking-based hash codes are much more compact and discriminative. Moreover, it can be easily extended to kernel space to discover more complex ranking structures that cannot be revealed in linear subspaces. Experiments on large image datasets demonstrate the effectiveness of the proposed method for image retrieval. We further studied the ranking-based hashing method for the cross-media similarity search problem. Specifically, we propose two optimization methods to jointly learn two groups of linear subspaces, one for each media type, so that features\u27 ranking orders in different linear subspaces maximally preserve the cross-media similarities. Additionally, we develop this ranking-based hashing method in the cross-media context into a flexible hashing framework with a more general solution. We have demonstrated through extensive experiments on several real-world datasets that the proposed cross-media hashing method can achieve superior cross-media retrieval performances against several state-of-the-art algorithms. Lastly, to make better use of the supervisory label information, as well as to further improve the efficiency and accuracy of supervised hashing, we propose a novel multimedia discrete hashing framework that optimizes an instance-wise loss objective, as compared to the pairwise losses, using an efficient discrete optimization method. In addition, the proposed method decouples the binary codes learning and hash function learning into two separate stages, thus making the proposed method equally applicable for both single-media and cross-media search. Extensive experiments on both single-media and cross-media retrieval tasks demonstrate the effectiveness of the proposed method

    Proceedings of the Workshop on Change of Representation and Problem Reformulation

    Get PDF
    The proceedings of the third Workshop on Change of representation and Problem Reformulation is presented. In contrast to the first two workshops, this workshop was focused on analytic or knowledge-based approaches, as opposed to statistical or empirical approaches called 'constructive induction'. The organizing committee believes that there is a potential for combining analytic and inductive approaches at a future date. However, it became apparent at the previous two workshops that the communities pursuing these different approaches are currently interested in largely non-overlapping issues. The constructive induction community has been holding its own workshops, principally in conjunction with the machine learning conference. While this workshop is more focused on analytic approaches, the organizing committee has made an effort to include more application domains. We have greatly expanded from the origins in the machine learning community. Participants in this workshop come from the full spectrum of AI application domains including planning, qualitative physics, software engineering, knowledge representation, and machine learning

    Proceedings of the XIII Global Optimization Workshop: GOW'16

    Get PDF
    [Excerpt] Preface: Past Global Optimization Workshop shave been held in Sopron (1985 and 1990), Szeged (WGO, 1995), Florence (GOโ€™99, 1999), Hanmer Springs (Letโ€™s GO, 2001), Santorini (Frontiers in GO, 2003), San Josรฉ (Goโ€™05, 2005), Mykonos (AGOโ€™07, 2007), Skukuza (SAGOโ€™08, 2008), Toulouse (TOGOโ€™10, 2010), Natal (NAGOโ€™12, 2012) and Mรกlaga (MAGOโ€™14, 2014) with the aim of stimulating discussion between senior and junior researchers on the topic of Global Optimization. In 2016, the XIII Global Optimization Workshop (GOWโ€™16) takes place in Braga and is organized by three researchers from the University of Minho. Two of them belong to the Systems Engineering and Operational Research Group from the Algoritmi Research Centre and the other to the Statistics, Applied Probability and Operational Research Group from the Centre of Mathematics. The event received more than 50 submissions from 15 countries from Europe, South America and North America. We want to express our gratitude to the invited speaker Panos Pardalos for accepting the invitation and sharing his expertise, helping us to meet the workshop objectives. GOWโ€™16 would not have been possible without the valuable contribution from the authors and the International Scienti๏ฌc Committee members. We thank you all. This proceedings book intends to present an overview of the topics that will be addressed in the workshop with the goal of contributing to interesting and fruitful discussions between the authors and participants. After the event, high quality papers can be submitted to a special issue of the Journal of Global Optimization dedicated to the workshop. [...

    ๊ณต๋™ ๋Œ€์กฐ์  ํ•™์Šต์„ ์ด์šฉํ•œ ๋น„์ง€๋„ ๋„๋ฉ”์ธ ์ ์‘ ๊ธฐ๋ฒ• ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2021. 2. ์œค์„ฑ๋กœ.Domain adaptation is introduced to exploit the label information of source domain when labels are not available for target domain. Previous methods minimized domain discrepancy in a latent space to enable transfer learning. These studies are based on the theoretical analysis that the target error is upper bounded by the sum of source error, the domain discrepancy, and the joint error of the ideal hypothesis. However, feature discriminability is sacrificed while enhancing the feature transferability by matching marginal distributions. In particular, the ideal joint hypothesis error in the target error upper bound, which was previously considered to be minute, has been found to be significant, impairing its theoretical guarantee. In this paper, to manage the joint error, we propose an alternative upper bound on the target error that explicitly considers it. Based on the theoretical analysis, we suggest a joint optimization framework that combines the source and target domains. To minimize the joint error, we further introduce Joint Contrastive Learning (JCL) that finds class-level discriminative features. With a solid theoretical framework, JCL employs contrastive loss to maximize the mutual information between a feature and its label, which is equivalent to maximizing the Jensen-Shannon divergence between conditional distributions. Extensive experiments on domain adaptation datasets demonstrate that JCL outperforms existing state-of-the-art methods.๋„๋ฉ”์ธ ์ ์‘ ๊ธฐ๋ฒ•์€ ํƒ€๊ฒŸ ๋„๋ฉ”์ธ์˜ ๋ผ๋ฒจ ์ •๋ณด๊ฐ€ ์—†๋Š” ์ƒํ™ฉ์—์„œ ๋น„์Šทํ•œ ๋„๋ฉ”์ธ์ธ ์†Œ์Šค ๋„๋ฉ”์ธ์˜ ๋ผ๋ฒจ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๊ฐœ๋ฐœ๋˜์—ˆ๋‹ค. ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๋ก ๋“ค์€ ์ž ์žฌ ๊ณต๊ฐ„์—์„œ ๋„๋ฉ”์ธ๋“ค ์‚ฌ์ด์˜ ๋ถ„ํฌ ์ฐจ์ด๋ฅผ ์ค„์ž„์œผ๋กœ์จ ์ „์ด ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ๊ธฐ๋ฒ•๋“ค์€ ์†Œ์Šค ๋„๋ฉ”์ธ์˜ ์—๋Ÿฌ์œจ, ๋„๋ฉ”์ธ ๊ฐ„ ๋ถ„ํฌ ์ฐจ์ด, ๊ทธ๋ฆฌ๊ณ  ์–‘ ๋„๋ฉ”์ธ์—์„œ ์ด์ƒ์ ์ธ ๋ถ„๋ฅ˜๊ธฐ์˜ ์—๋Ÿฌ์œจ์˜ ํ•ฉ์ด ํƒ€๊ฒŸ ๋„๋ฉ”์ธ์˜ ์—๋Ÿฌ์œจ์˜ ์ƒ๊ณ„๊ฐ€ ๋œ๋‹ค๋Š” ์ด๋ก ์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋„๋ฉ”์ธ๋“ค ์‚ฌ์ด์˜ ๋ถ„ํฌ ์ฐจ์ด๋ฅผ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•๋“ค์€ ๋™์‹œ์— ์ž ์žฌ ๊ณต๊ฐ„์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ๋ผ๋ฒจ์„ ๊ฐ–๋Š” ๋ฐ์ดํ„ฐ๋“ค ์‚ฌ์ด์˜ ๊ตฌ๋ณ„์„ฑ์„ ๊ฐ์†Œ์‹œ์ผฐ๋‹ค. ํŠนํžˆ, ์ž‘์„ ๊ฒƒ์ด๋ผ ์ƒ๊ฐ๋˜๋˜ ์–‘ ๋„๋ฉ”์ธ์—์„œ ์ด์ƒ์ ์ธ ๋ถ„๋ฅ˜๊ธฐ์˜ ์—๋Ÿฌ์œจ์ด ํฐ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด์˜ ์ด๋ก ์—์„œ๋Š” ๋‹ค๋ฃจ์ง€ ์•Š์€ ์–‘ ๋„๋ฉ”์ธ์—์„œ ๋ถ„๋ฅ˜๊ธฐ์˜ ์—๋Ÿฌ์œจ์„ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๊ฒŒํ•˜๊ธฐ ์œ„ํ•ด ์ƒˆ๋กœ์šด ์ด๋ก ์„ ์ œ์‹œํ•œ๋‹ค. ์ด ์ด๋ก ์  ๋ฐฐ๊ฒฝ์„ ๋ฐ”ํƒ•์œผ๋กœ ์†Œ์Šค ๋„๋ฉ”์ธ๊ณผ ํƒ€๊ฒŸ ๋„๋ฉ”์ธ์„ ํ•จ๊ป˜ ํ•™์Šตํ•˜๋Š” ๊ณต๋™ ๋Œ€์กฐ์  ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ๋ณธ ๊ณต๋™ ๋Œ€์กฐ์  ํ•™์Šต ๋ฐฉ๋ฒ•์—์„œ๋Š” ๊ฐ ๋ผ๋ฒจ๋ณ„๋กœ ๊ตฌ๋ถ„๋˜๋Š” ์ž ์žฌ ๊ณต๊ฐ„์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ ๋ฐ์ดํ„ฐ์˜ ํŠน์ง•๊ณผ ๋ผ๋ฒจ ์‚ฌ์ด์˜ ์ƒํ˜ธ ์ •๋ณด๋Ÿ‰์„ ์ตœ๋Œ€ํ™”ํ•œ๋‹ค. ์ด ๊ฐ ๋ฐ์ดํ„ฐ์˜ ํŠน์ง•๊ณผ ๋ผ๋ฒจ ์‚ฌ์ด์˜ ์ƒํ˜ธ ์ •๋ณด๋Ÿ‰์€ ๊ฐ ๋ผ๋ฒจ ๋ถ„ํฌ ์‚ฌ์ด์˜ ์  ์„ผ-์ƒค๋…ผ ๊ฑฐ๋ฆฌ์™€ ๊ฐ™์œผ๋ฏ€๋กœ ์ด๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๊ฒƒ์€ ๊ณง ๋ผ๋ฒจ๋“ค์ด ์ž˜ ๊ตฌ๋ณ„๋˜๋Š” ์ž ์žฌ ๊ณต๊ฐ„์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๊ณต๋™ ๋Œ€์กฐ์  ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ ์…‹์— ์ ์šฉํ•˜์—ฌ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ก ๋“ค๊ณผ ๋น„๊ตํ•˜์˜€๋‹ค.1 Introduction 1 2 Background 4 2.1 Domain Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Problem Setting and Notations . . . . . . . . . . . . . . . . . 4 2.1.2 Theoretical Background . . . . . . . . . . . . . . . . . . . . 5 2.2 Approaches for Domain Adaptation . . . . . . . . . . . . . . . . . . 6 2.2.1 Marginal Distribution Alignment Based Approaches . . . . . 6 2.2.2 Conditional Distribution Matching Approaches . . . . . . . . 7 2.3 Contrastive Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Method 10 3.1 An Alternative Upper Bound . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Joint Contrastive Learning . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.1 Theoretical Guarantees . . . . . . . . . . . . . . . . . . . . . 14 3.2.2 Generalization to Multiclass Classification . . . . . . . . . . 17 3.2.3 Training Procedure . . . . . . . . . . . . . . . . . . . . . . . 19 4 Experiments 24 4.1 Datasets and Baselines . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.4 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5 Conclusion 35 Abstract (In Korean) 45Maste

    Reformulation and Decomposition: Multitask learning approaches to Long Document Problems

    Get PDF
    Recent advances in Natural Language Processing (NLP) have led to success across a wide range of tasks including machine translation, summarization, and classification. Yet, the field still faces major challenges. This thesis addresses two key under-researched areas: the absence of general multitask learning capabilities, and the inability to scale to long, complex documents. Firstly, this thesis explores a form of multitasking where NLP tasks are reformulated as question answering problems. I examine existing models and measure their robustness to paraphrasing of their input. I contribute an annotated dataset which enables detailed analysis of model failures as well as evaluating methods for improving model robustness. Secondly, a set of long document tasks; MuLD, is introduced which forms a benchmark for evaluating the performance of models on large inputs with long-range dependencies. I show that this is a challenging task for baseline models. I then design an approach using task-decomposition to provide an interpretable solution which easily allows for multitask learning. I then explore how these themes of task reformulation for multitask learning, and task-decomposition for long inputs can be applied to other modalities. I show how visual modelling: a visual analogue of language modelling, can be used to predict missing frames from videos of simple physics simulations, and probe what knowledge about the physical world this induces in such models. Finally, I demonstrate how this task can be used to unite vision and NLP using the same framework, describing how task-reformulation and task-decomposition can be used for this purpose

    Semantic multimedia modelling & interpretation for search & retrieval

    Get PDF
    With the axiomatic revolutionary in the multimedia equip devices, culminated in the proverbial proliferation of the image and video data. Owing to this omnipresence and progression, these data become the part of our daily life. This devastating data production rate accompanies with a predicament of surpassing our potentials for acquiring this data. Perhaps one of the utmost prevailing problems of this digital era is an information plethora. Until now, progressions in image and video retrieval research reached restrained success owed to its interpretation of an image and video in terms of primitive features. Humans generally access multimedia assets in terms of semantic concepts. The retrieval of digital images and videos is impeded by the semantic gap. The semantic gap is the discrepancy between a userโ€™s high-level interpretation of an image and the information that can be extracted from an imageโ€™s physical properties. Content- based image and video retrieval systems are explicitly assailable to the semantic gap due to their dependence on low-level visual features for describing image and content. The semantic gap can be narrowed by including high-level features. High-level descriptions of images and videos are more proficient of apprehending the semantic meaning of image and video content. It is generally understood that the problem of image and video retrieval is still far from being solved. This thesis proposes an approach for intelligent multimedia semantic extraction for search and retrieval. This thesis intends to bridge the gap between the visual features and semantics. This thesis proposes a Semantic query Interpreter for the images and the videos. The proposed Semantic Query Interpreter will select the pertinent terms from the user query and analyse it lexically and semantically. The proposed SQI reduces the semantic as well as the vocabulary gap between the users and the machine. This thesis also explored a novel ranking strategy for image search and retrieval. SemRank is the novel system that will incorporate the Semantic Intensity (SI) in exploring the semantic relevancy between the user query and the available data. The novel Semantic Intensity captures the concept dominancy factor of an image. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other. The SemRank will rank the retrieved images on the basis of Semantic Intensity. The investigations are made on the LabelMe image and LabelMe video dataset. Experiments show that the proposed approach is successful in bridging the semantic gap. The experiments reveal that our proposed system outperforms the traditional image retrieval systems

    Knowledge-directed intelligent information retrieval for research funding.

    Get PDF
    Thesis (M.Sc.)- University of Natal, Pietermaritzburg, 2001.Researchers have always found difficulty in attaining funding from the National Research Foundation (NRF) for new research interests. The field of Artificial Intelligence (AI) holds the promise of improving the matching of research proposals to funding sources in the area of Intelligent Information Retrieval (IIR). IIR is a fairly new AI technique that has evolved from the traditional IR systems to solve real-world problems. Typically, an IIR system contains three main components, namely, a knowledge base, an inference engine and a user-interface. Due to its inferential capabilities. IIR has been found to be applicable to domains for which traditional techniques, such as the use of databases, have not been well suited. This applicability has led it to become a viable AI technique from both, a research and an application perspective. This dissertation concentrates on researching and implementing an IIR system in LPA Prolog, that we call FUND, to assist in the matching of research proposals of prospective researchers to funding sources within the National Research Foundation (NRF). FUND'S reasoning strategy for its inference engine is backward chaining that carries out a depth-first search over its knowledge representation structure, namely, a semantic network. The distance constraint of the Constrained Spreading Activation (CSA) technique is incorporated within the search strategy to help prune non-relevant returns by FUND. The evolution of IIR from IR was covered in detail. Various reasoning strategies and knowledge representation schemes were reviewed to find the combination that best suited the problem domain and programming language chosen. FUND accommodated a depth 4, depth 5 and an exhaustive search algorithm. FUND'S effectiveness was tested, in relation to the different searches with respect to their precision and recall ability and in comparison to other similar systems. FUND'S performance in providing researchers with better funding advice in the South African situation proved to be favourably comparable to other similar systems elsewhere
    • โ€ฆ
    corecore