114 research outputs found
Knowledge Extraction in Video Through the Interaction Analysis of Activities
Video is a massive amount of data that contains complex interactions between moving objects. The extraction of knowledge from this type of information creates a demand for video analytics systems that uncover statistical relationships between activities and learn the correspondence between content and labels. However, those are open research problems that have high complexity when multiple actors simultaneously perform activities, videos contain noise, and streaming scenarios are considered. The techniques introduced in this dissertation provide a basis for analyzing video. The primary contributions of this research consist of providing new algorithms for the efficient search of activities in video, scene understanding based on interactions between activities, and the predicting of labels for new scenes
Data-Efficient Learning via Minimizing Hyperspherical Energy
Deep learning on large-scale data is dominant nowadays. The unprecedented
scale of data has been arguably one of the most important driving forces for
the success of deep learning. However, there still exist scenarios where
collecting data or labels could be extremely expensive, e.g., medical imaging
and robotics. To fill up this gap, this paper considers the problem of
data-efficient learning from scratch using a small amount of representative
data. First, we characterize this problem by active learning on homeomorphic
tubes of spherical manifolds. This naturally generates feasible hypothesis
class. With homologous topological properties, we identify an important
connection -- finding tube manifolds is equivalent to minimizing hyperspherical
energy (MHE) in physical geometry. Inspired by this connection, we propose a
MHE-based active learning (MHEAL) algorithm, and provide comprehensive
theoretical guarantees for MHEAL, covering convergence and generalization
analysis. Finally, we demonstrate the empirical performance of MHEAL in a wide
range of applications on data-efficient learning, including deep clustering,
distribution matching, version space sampling and deep active learning
Improving Representation Learning for Deep Clustering and Few-shot Learning
The amounts of data in the world have increased dramatically in recent years, and it is quickly becoming infeasible for humans to label all these data. It is therefore crucial that modern machine learning systems can operate with few or no labels. The introduction of deep learning and deep neural networks has led to impressive advancements in several areas of machine learning. These advancements are largely due to the unprecedented ability of deep neural networks to learn powerful representations from a wide range of complex input signals. This ability is especially important when labeled data is limited, as the absence of a strong supervisory signal forces models to rely more on intrinsic properties of the data and its representations.
This thesis focuses on two key concepts in deep learning with few or no labels. First, we aim to improve representation quality in deep clustering - both for single-view and multi-view data. Current models for deep clustering face challenges related to properly representing semantic similarities, which is crucial for the models to discover meaningful clusterings. This is especially challenging with multi-view data, since the information required for successful clustering might be scattered across many views. Second, we focus on few-shot learning, and how geometrical properties of representations influence few-shot classification performance. We find that a large number of recent methods for few-shot learning embed representations on the hypersphere. Hence, we seek to understand what makes the hypersphere a particularly suitable embedding space for few-shot learning.
Our work on single-view deep clustering addresses the susceptibility of deep clustering models to find trivial solutions with non-meaningful representations. To address this issue, we present a new auxiliary objective that - when compared to the popular autoencoder-based approach - better aligns with the main clustering objective, resulting in improved clustering performance. Similarly, our work on multi-view clustering focuses on how representations can be learned from multi-view data, in order to make the representations suitable for the clustering objective. Where recent methods for deep multi-view clustering have focused on aligning view-specific representations, we find that this alignment procedure might actually be detrimental to representation quality. We investigate the effects of representation alignment, and provide novel insights on when alignment is beneficial, and when it is not. Based on our findings, we present several new methods for deep multi-view clustering - both alignment and non-alignment-based - that out-perform current state-of-the-art methods.
Our first work on few-shot learning aims to tackle the hubness problem, which has been shown to have negative effects on few-shot classification performance. To this end, we present two new methods to embed representations on the hypersphere for few-shot learning. Further, we provide both theoretical and experimental evidence indicating that embedding representations as uniformly as possible on the hypersphere reduces hubness, and improves classification accuracy. Furthermore, based on our findings on hyperspherical embeddings for few-shot learning, we seek to improve the understanding of representation norms. In particular, we ask what type of information the norm carries, and why it is often beneficial to discard the norm in classification models. We answer this question by presenting a novel hypothesis on the relationship between representation norm and the number of a certain class of objects in the image. We then analyze our hypothesis both theoretically and experimentally, presenting promising results that corroborate the hypothesis
Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent Space Distribution Matching in WAE
Wasserstein autoencoder (WAE) shows that matching two distributions is
equivalent to minimizing a simple autoencoder (AE) loss under the constraint
that the latent space of this AE matches a pre-specified prior distribution.
This latent space distribution matching is a core component of WAE, and a
challenging task. In this paper, we propose to use the contrastive learning
framework that has been shown to be effective for self-supervised
representation learning, as a means to resolve this problem. We do so by
exploiting the fact that contrastive learning objectives optimize the latent
space distribution to be uniform over the unit hyper-sphere, which can be
easily sampled from. We show that using the contrastive learning framework to
optimize the WAE loss achieves faster convergence and more stable optimization
compared with existing popular algorithms for WAE. This is also reflected in
the FID scores on CelebA and CIFAR-10 datasets, and the realistic generated
image quality on the CelebA-HQ dataset
On Obtaining Stable Rankings
Decision making is challenging when there is more than one criterion to
consider. In such cases, it is common to assign a goodness score to each item
as a weighted sum of its attribute values and rank them accordingly. Clearly,
the ranking obtained depends on the weights used for this summation. Ideally,
one would want the ranked order not to change if the weights are changed
slightly. We call this property {\em stability} of the ranking. A consumer of a
ranked list may trust the ranking more if it has high stability. A producer of
a ranked list prefers to choose weights that result in a stable ranking, both
to earn the trust of potential consumers and because a stable ranking is
intrinsically likely to be more meaningful. In this paper, we develop a
framework that can be used to assess the stability of a provided ranking and to
obtain a stable ranking within an "acceptable" range of weight values (called
"the region of interest"). We address the case where the user cares about the
rank order of the entire set of items, and also the case where the user cares
only about the top- items. Using a geometric interpretation, we propose
algorithms that produce stable rankings. In addition to theoretical analyses,
we conduct extensive experiments on real datasets that validate our proposal
Recommended from our members
Complex Query Operators on Modern Parallel Architectures
Identifying interesting objects from a large data collection is a fundamental problem for multi-criteria decision making applications.In Relational Database Management Systems (RDBMS), the most popular complex query operators used to solve this type of problem are the Top-K selection operator and the Skyline operator.Top-K selection is tasked with retrieving the k-highest ranking tuples from a given relation, as determined by a user-defined aggregation function.Skyline selection retrieves those tuples with attributes offering (pareto) optimal trade-offs in a given relation.Efficient Top-K query processing entails minimizing tuple evaluations by utilizing elaborate processing schemes combined with sophisticated data structures that enable early termination.Skyline query evaluation involves supporting processing strategies which are geared towards early termination and incomparable tuple pruning.The rapid increase in memory capacity and decreasing costs have been the main drivers behind the development of main-memory database systems.Although the act of migrating query processing in-memory has created many opportunities to improve the associated query latency, attaining such improvements has been very challenging due to the growing gap between processor and main memory speeds.Addressing this limitation has been made easier by the rapid proliferation of multi-core and many-core architectures.However, their utilization in real systems has been hindered by the lack of suitable parallel algorithms that focus on algorithmic efficiency.In this thesis, we study in depth the Top-K and Skyline selection operators, in the context of emerging parallel architectures.Our ultimate goal is to provide practical guidelines for developing work-efficient algorithms suitable for parallel main memory processing.We concentrate on multi-core (CPU), many-core (GPU), and processing-in-memory architectures (PIM), developing solutions optimized for high throughout and low latency.The first part of this thesis focuses on Top-K selection, presenting the specific details of early termination algorithms that we developed specifically for parallel architectures and various types of accelerators (i.e. GPU, PIM).The second part of this thesis, concentrates on Skyline selection and the development of a massively parallel load balanced algorithm for PIM architectures.Our work consolidates performance results across different parallel architectures using synthetic and real data on variable query parameters and distributions for both of the aforementioned problems.The experimental results demonstrate several orders of magnitude better throughput and query latency, thus validating the effectiveness of our proposed solutions for the Top-K and Skyline selection operators
Privacy Aware Parallel Computation of Skyline Sets Queries from Distributed Databases
A skyline query finds objects that are not dominated by another object from a given set of objects. Skyline queries help us to filter unnecessary information efficiently and provide us clues for various decision making tasks. However, we cannot use skyline queries in privacy aware environment, since we have to hide individual's records values even though there is no ID information. Therefore, we considered skyline sets queries. The skyline set query returns skyline sets from all possible sets, each of which is composed of some objects in a database. With the growth of network infrastructure data are stored in distributed databases. In this paper, we expand the idea to compute skyline sets queries in parallel fashion from distributed databases without disclosing individual records to others. The proposed method utilizes an agent-based parallel computing framework that can efficiently compute skyline sets queries and can solve the privacy problems of skyline queries in distributed environment. The computation of skyline sets is performed simultaneously in all databases which increases parallelism and reduces the computation time
- …