249 research outputs found
Task-related edge density (TED) - a new method for revealing large-scale network formation in fMRI data of the human brain
The formation of transient networks in response to external stimuli or as a
reflection of internal cognitive processes is a hallmark of human brain
function. However, its identification in fMRI data of the human brain is
notoriously difficult. Here we propose a new method of fMRI data analysis that
tackles this problem by considering large-scale, task-related synchronisation
networks. Networks consist of nodes and edges connecting them, where nodes
correspond to voxels in fMRI data, and the weight of an edge is determined via
task-related changes in dynamic synchronisation between their respective times
series. Based on these definitions, we developed a new data analysis algorithm
that identifies edges in a brain network that differentially respond in unison
to a task onset and that occur in dense packs with similar characteristics.
Hence, we call this approach "Task-related Edge Density" (TED). TED proved to
be a very strong marker for dynamic network formation that easily lends itself
to statistical analysis using large scale statistical inference. A major
advantage of TED compared to other methods is that it does not depend on any
specific hemodynamic response model, and it also does not require a
presegmentation of the data for dimensionality reduction as it can handle large
networks consisting of tens of thousands of voxels. We applied TED to fMRI data
of a fingertapping task provided by the Human Connectome Project. TED revealed
network-based involvement of a large number of brain areas that evaded
detection using traditional GLM-based analysis. We show that our proposed
method provides an entirely new window into the immense complexity of human
brain function.Comment: 21 pages, 11 figure
Improving Representation Learning for Deep Clustering and Few-shot Learning
The amounts of data in the world have increased dramatically in recent years, and it is quickly becoming infeasible for humans to label all these data. It is therefore crucial that modern machine learning systems can operate with few or no labels. The introduction of deep learning and deep neural networks has led to impressive advancements in several areas of machine learning. These advancements are largely due to the unprecedented ability of deep neural networks to learn powerful representations from a wide range of complex input signals. This ability is especially important when labeled data is limited, as the absence of a strong supervisory signal forces models to rely more on intrinsic properties of the data and its representations.
This thesis focuses on two key concepts in deep learning with few or no labels. First, we aim to improve representation quality in deep clustering - both for single-view and multi-view data. Current models for deep clustering face challenges related to properly representing semantic similarities, which is crucial for the models to discover meaningful clusterings. This is especially challenging with multi-view data, since the information required for successful clustering might be scattered across many views. Second, we focus on few-shot learning, and how geometrical properties of representations influence few-shot classification performance. We find that a large number of recent methods for few-shot learning embed representations on the hypersphere. Hence, we seek to understand what makes the hypersphere a particularly suitable embedding space for few-shot learning.
Our work on single-view deep clustering addresses the susceptibility of deep clustering models to find trivial solutions with non-meaningful representations. To address this issue, we present a new auxiliary objective that - when compared to the popular autoencoder-based approach - better aligns with the main clustering objective, resulting in improved clustering performance. Similarly, our work on multi-view clustering focuses on how representations can be learned from multi-view data, in order to make the representations suitable for the clustering objective. Where recent methods for deep multi-view clustering have focused on aligning view-specific representations, we find that this alignment procedure might actually be detrimental to representation quality. We investigate the effects of representation alignment, and provide novel insights on when alignment is beneficial, and when it is not. Based on our findings, we present several new methods for deep multi-view clustering - both alignment and non-alignment-based - that out-perform current state-of-the-art methods.
Our first work on few-shot learning aims to tackle the hubness problem, which has been shown to have negative effects on few-shot classification performance. To this end, we present two new methods to embed representations on the hypersphere for few-shot learning. Further, we provide both theoretical and experimental evidence indicating that embedding representations as uniformly as possible on the hypersphere reduces hubness, and improves classification accuracy. Furthermore, based on our findings on hyperspherical embeddings for few-shot learning, we seek to improve the understanding of representation norms. In particular, we ask what type of information the norm carries, and why it is often beneficial to discard the norm in classification models. We answer this question by presenting a novel hypothesis on the relationship between representation norm and the number of a certain class of objects in the image. We then analyze our hypothesis both theoretically and experimentally, presenting promising results that corroborate the hypothesis
One-class classifiers based on entropic spanning graphs
One-class classifiers offer valuable tools to assess the presence of outliers
in data. In this paper, we propose a design methodology for one-class
classifiers based on entropic spanning graphs. Our approach takes into account
the possibility to process also non-numeric data by means of an embedding
procedure. The spanning graph is learned on the embedded input data and the
outcoming partition of vertices defines the classifier. The final partition is
derived by exploiting a criterion based on mutual information minimization.
Here, we compute the mutual information by using a convenient formulation
provided in terms of the -Jensen difference. Once training is
completed, in order to associate a confidence level with the classifier
decision, a graph-based fuzzy model is constructed. The fuzzification process
is based only on topological information of the vertices of the entropic
spanning graph. As such, the proposed one-class classifier is suitable also for
data characterized by complex geometric structures. We provide experiments on
well-known benchmarks containing both feature vectors and labeled graphs. In
addition, we apply the method to the protein solubility recognition problem by
considering several representations for the input samples. Experimental results
demonstrate the effectiveness and versatility of the proposed method with
respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification
Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN,
Vancouver, Canad
Hubness Reduction Improves Sentence-BERT Semantic Spaces
Semantic representations of text, i.e. representations of natural language
which capture meaning by geometry, are essential for areas such as information
retrieval and document grouping. High-dimensional trained dense vectors have
received much attention in recent years as such representations. We investigate
the structure of semantic spaces that arise from embeddings made with
Sentence-BERT and find that the representations suffer from a well-known
problem in high dimensions called hubness. Hubness results in asymmetric
neighborhood relations, such that some texts (the hubs) are neighbours of many
other texts while most texts (so-called anti-hubs), are neighbours of few or no
other texts. We quantify the semantic quality of the embeddings using hubness
scores and error rate of a neighbourhood based classifier. We find that when
hubness is high, we can reduce error rate and hubness using hubness reduction
methods. We identify a combination of two methods as resulting in the best
reduction. For example, on one of the tested pretrained models, this combined
method can reduce hubness by about 75% and error rate by about 9%. Thus, we
argue that mitigating hubness in the embedding space provides better semantic
representations of text.Comment: Accepted at NLDL 202
- …