11,652 research outputs found
Towards Sybil Resilience in Decentralized Learning
Federated learning is a privacy-enforcing machine learning technology but
suffers from limited scalability. This limitation mostly originates from the
internet connection and memory capacity of the central parameter server, and
the complexity of the model aggregation function. Decentralized learning has
recently been emerging as a promising alternative to federated learning. This
novel technology eliminates the need for a central parameter server by
decentralizing the model aggregation across all participating nodes. Numerous
studies have been conducted on improving the resilience of federated learning
against poisoning and Sybil attacks, whereas the resilience of decentralized
learning remains largely unstudied. This research gap serves as the main
motivator for this study, in which our objective is to improve the Sybil
poisoning resilience of decentralized learning.
We present SybilWall, an innovative algorithm focused on increasing the
resilience of decentralized learning against targeted Sybil poisoning attacks.
By combining a Sybil-resistant aggregation function based on similarity between
Sybils with a novel probabilistic gossiping mechanism, we establish a new
benchmark for scalable, Sybil-resilient decentralized learning.
A comprehensive empirical evaluation demonstrated that SybilWall outperforms
existing state-of-the-art solutions designed for federated learning scenarios
and is the only algorithm to obtain consistent accuracy over a range of
adversarial attack scenarios. We also found SybilWall to diminish the utility
of creating many Sybils, as our evaluations demonstrate a higher success rate
among adversaries employing fewer Sybils. Finally, we suggest a number of
possible improvements to SybilWall and highlight promising future research
directions
On information captured by neural networks: connections with memorization and generalization
Despite the popularity and success of deep learning, there is limited
understanding of when, how, and why neural networks generalize to unseen
examples. Since learning can be seen as extracting information from data, we
formally study information captured by neural networks during training.
Specifically, we start with viewing learning in presence of noisy labels from
an information-theoretic perspective and derive a learning algorithm that
limits label noise information in weights. We then define a notion of unique
information that an individual sample provides to the training of a deep
network, shedding some light on the behavior of neural networks on examples
that are atypical, ambiguous, or belong to underrepresented subpopulations. We
relate example informativeness to generalization by deriving nonvacuous
generalization gap bounds. Finally, by studying knowledge distillation, we
highlight the important role of data and label complexity in generalization.
Overall, our findings contribute to a deeper understanding of the mechanisms
underlying neural network generalization.Comment: PhD thesi
Machine learning in solar physics
The application of machine learning in solar physics has the potential to
greatly enhance our understanding of the complex processes that take place in
the atmosphere of the Sun. By using techniques such as deep learning, we are
now in the position to analyze large amounts of data from solar observations
and identify patterns and trends that may not have been apparent using
traditional methods. This can help us improve our understanding of explosive
events like solar flares, which can have a strong effect on the Earth
environment. Predicting hazardous events on Earth becomes crucial for our
technological society. Machine learning can also improve our understanding of
the inner workings of the sun itself by allowing us to go deeper into the data
and to propose more complex models to explain them. Additionally, the use of
machine learning can help to automate the analysis of solar data, reducing the
need for manual labor and increasing the efficiency of research in this field.Comment: 100 pages, 13 figures, 286 references, accepted for publication as a
Living Review in Solar Physics (LRSP
Non-parametric online market regime detection and regime clustering for multidimensional and path-dependent data structures
In this work we present a non-parametric online market regime detection
method for multidimensional data structures using a path-wise two-sample test
derived from a maximum mean discrepancy-based similarity metric on path space
that uses rough path signatures as a feature map. The latter similarity metric
has been developed and applied as a discriminator in recent generative models
for small data environments, and has been optimised here to the setting where
the size of new incoming data is particularly small, for faster reactivity.
On the same principles, we also present a path-wise method for regime
clustering which extends our previous work. The presented regime clustering
techniques were designed as ex-ante market analysis tools that can identify
periods of approximatively similar market activity, but the new results also
apply to path-wise, high dimensional-, and to non-Markovian settings as well as
to data structures that exhibit autocorrelation.
We demonstrate our clustering tools on easily verifiable synthetic datasets
of increasing complexity, and also show how the outlined regime detection
techniques can be used as fast on-line automatic regime change detectors or as
outlier detection tools, including a fully automated pipeline. Finally, we
apply the fine-tuned algorithms to real-world historical data including
high-dimensional baskets of equities and the recent price evolution of crypto
assets, and we show that our methodology swiftly and accurately indicated
historical periods of market turmoil.Comment: 65 pages, 52 figure
EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Large neural models (such as Transformers) achieve state-of-the-art
performance for information retrieval (IR). In this paper, we aim to improve
distillation methods that pave the way for the resource-efficient deployment of
such models in practice. Inspired by our theoretical analysis of the
teacher-student generalization gap for IR models, we propose a novel
distillation approach that leverages the relative geometry among queries and
documents learned by the large teacher model. Unlike existing teacher
score-based distillation methods, our proposed approach employs embedding
matching tasks to provide a stronger signal to align the representations of the
teacher and student models. In addition, it utilizes query generation to
explore the data manifold to reduce the discrepancies between the student and
the teacher where training data is sparse. Furthermore, our analysis also
motivates novel asymmetric architectures for student models which realizes
better embedding alignment without increasing online inference cost. On
standard benchmarks like MSMARCO, we show that our approach successfully
distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to
1/10th size asymmetric students that can retain 95-97% of the teacher
performance
Segmentation of Pathology Images: A Deep Learning Strategy with Annotated Data
Cancer has significantly threatened human life and health for many years. In the clinic, histopathology image segmentation is the golden stand for evaluating the prediction of patient prognosis and treatment outcome. Generally, manually labelling tumour regions in hundreds of high-resolution histopathological images is time-consuming and expensive for pathologists. Recently, the advancements in hardware and computer vision have allowed deep-learning-based methods to become mainstream to segment tumours automatically, significantly reducing the workload of pathologists. However, most current methods rely on large-scale labelled histopathological images. Therefore, this research studies label-effective tumour segmentation methods using deep-learning paradigms to relieve the annotation limitations. Chapter 3 proposes an ensemble framework for fully-supervised tumour segmentation. Usually, the performance of an individual-trained network is limited by significant morphological variances in histopathological images. We propose a fully-supervised learning ensemble fusion model that uses both shallow and deep U-Nets, trained with images of different resolutions and subsets of images, for robust predictions of tumour regions. Noise elimination is achieved with Convolutional Conditional Random Fields. Two open datasets are used to evaluate the proposed method: the ACDC@LungHP challenge at ISBI2019 and the DigestPath challenge at MICCAI2019. With a dice coefficient of 79.7 %, the proposed method takes third place in ACDC@LungHP. In DigestPath 2019, the proposed method achieves a dice coefficient 77.3 %. Well-annotated images are an indispensable part of training fully-supervised segmentation strategies. However, large-scale histopathology images are hardly annotated finely in clinical practice. It is common for labels to be of poor quality or for only a few images to be manually marked by experts. Consequently, fully-supervised methods cannot perform well in these cases. Chapter 4 proposes a self-supervised contrast learning for tumour segmentation. A self-supervised cancer segmentation framework is proposed to reduce label dependency. An innovative contrastive learning scheme is developed to represent tumour features based on unlabelled images. Unlike a normal U-Net, the backbone is a patch-based segmentation network. Additionally, data augmentation and contrastive losses are applied to improve the discriminability of tumour features. A convolutional Conditional Random Field is used to smooth and eliminate noise. Three labelled, and fourteen unlabelled images are collected from a private skin cancer dataset called BSS. Experimental results show that the proposed method achieves better tumour segmentation performance than other popular self-supervised methods. However, by evaluated on the same public dataset as chapter 3, the proposed self-supervised method is hard to handle fine-grained segmentation around tumour boundaries compared to the supervised method we proposed. Chapter 5 proposes a sketch-based weakly-supervised tumour segmentation method. To segment tumour regions precisely with coarse annotations, a sketch-supervised method is proposed, containing a dual CNN-Transformer network and a global normalised class activation map. CNN-Transformer networks simultaneously model global and local tumour features. With the global normalised class activation map, a gradient-based tumour representation can be obtained from the dual network predictions. We invited experts to mark fine and coarse annotations in the private BSS and the public PAIP2019 datasets to facilitate reproducible performance comparisons. Using the BSS dataset, the proposed method achieves 76.686 % IOU and 86.6 % Dice scores, outperforming state-of-the-art methods. Additionally, the proposed method achieves a Dice gain of 8.372 % compared with U-Net on the PAIP2019 dataset. The thesis presents three approaches to segmenting cancers from histology images: fully-supervised, unsupervised, and weakly supervised methods. This research effectively segments tumour regions based on histopathological annotations and well-designed modules. Our studies comprehensively demonstrate label-effective automatic histopathological image segmentation. Experimental results prove that our works achieve state-of-the-art segmentation performances on private and public datasets. In the future, we plan to integrate more tumour feature representation technologies with other medical modalities and apply them to clinical research
Efficient Approximations of Complete Interatomic Potentials for Crystal Property Prediction
We study property prediction for crystal materials. A crystal structure
consists of a minimal unit cell that is repeated infinitely in 3D space. How to
accurately represent such repetitive structures in machine learning models
remains unresolved. Current methods construct graphs by establishing edges only
between nearby nodes, thereby failing to faithfully capture infinite repeating
patterns and distant interatomic interactions. In this work, we propose several
innovations to overcome these limitations. First, we propose to model
physics-principled interatomic potentials directly instead of only using
distances as in many existing methods. These potentials include the Coulomb
potential, London dispersion potential, and Pauli repulsion potential. Second,
we model the complete set of potentials among all atoms, instead of only
between nearby atoms as in existing methods. This is enabled by our
approximations of infinite potential summations with provable error bounds. We
further develop efficient algorithms to compute the approximations. Finally, we
propose to incorporate our computations of complete interatomic potentials into
message passing neural networks for representation learning. We perform
experiments on the JARVIS and Materials Project benchmarks for evaluation.
Results show that the use of interatomic potentials and complete interatomic
potentials leads to consistent performance improvements with reasonable
computational costs. Our code is publicly available as part of the AIRS library
(https://github.com/divelab/AIRS)
Reinforcement learning in large state action spaces
Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios.
This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory).
In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications
Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning
In this paper, we investigate the impact of compression on stochastic
gradient algorithms for machine learning, a technique widely used in
distributed and federated learning. We underline differences in terms of
convergence rates between several unbiased compression operators, that all
satisfy the same condition on their variance, thus going beyond the classical
worst-case analysis. To do so, we focus on the case of least-squares regression
(LSR) and analyze a general stochastic approximation algorithm for minimizing
quadratic functions relying on a random field. We consider weak assumptions on
the random field, tailored to the analysis (specifically, expected H\"older
regularity), and on the noise covariance, enabling the analysis of various
randomizing mechanisms, including compression. We then extend our results to
the case of federated learning.
More formally, we highlight the impact on the convergence of the covariance
of the additive noise induced by the algorithm.
We demonstrate despite the non-regularity of the stochastic field, that the
limit variance term scales with (where is the Hessian of the optimization problem and the
number of iterations) generalizing the rate for the vanilla LSR case where it
is (Bach and Moulines,
2013). Then, we analyze the dependency of on the
compression strategy and ultimately its impact on convergence, first in the
centralized case, then in two heterogeneous FL frameworks
Collective variables between large-scale states in turbulent convection
The dynamics in a confined turbulent convection flow is dominated by multiple
long-lived macroscopic circulation states, which are visited subsequently by
the system in a Markov-type hopping process. In the present work, we analyze
the short transition paths between these subsequent macroscopic system states
by a data-driven learning algorithm that extracts the low-dimensional
transition manifold and the related new coordinates, which we term collective
variables, in the state space of the complex turbulent flow. We therefore
transfer and extend concepts for conformation transitions in stochastic
microscopic systems, such as in the dynamics of macromolecules, to a
deterministic macroscopic flow. Our analysis is based on long-term direct
numerical simulation trajectories of turbulent convection in a closed cubic
cell at a Prandtl number and Rayleigh numbers and
for a time lag of convective free-fall time units. The simulations
resolve vortices and plumes of all physically relevant scales resulting in a
state space spanned by more than 3.5 million degrees of freedom. The transition
dynamics between the large-scale circulation states can be captured by the
transition manifold analysis with only two collective variables which implies a
reduction of the data dimension by a factor of more than a million. Our method
demonstrates that cessations and subsequent reversals of the large-scale flow
are unlikely in the present setup and thus paves the way to the development of
efficient reduced-order models of the macroscopic complex nonlinear dynamical
system.Comment: 24 pages, 12 Figures, 1 tabl
- …