13,752 research outputs found
Machine learning in solar physics
The application of machine learning in solar physics has the potential to
greatly enhance our understanding of the complex processes that take place in
the atmosphere of the Sun. By using techniques such as deep learning, we are
now in the position to analyze large amounts of data from solar observations
and identify patterns and trends that may not have been apparent using
traditional methods. This can help us improve our understanding of explosive
events like solar flares, which can have a strong effect on the Earth
environment. Predicting hazardous events on Earth becomes crucial for our
technological society. Machine learning can also improve our understanding of
the inner workings of the sun itself by allowing us to go deeper into the data
and to propose more complex models to explain them. Additionally, the use of
machine learning can help to automate the analysis of solar data, reducing the
need for manual labor and increasing the efficiency of research in this field.Comment: 100 pages, 13 figures, 286 references, accepted for publication as a
Living Review in Solar Physics (LRSP
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging
Neural networks can be significantly compressed by pruning, leading to sparse
models requiring considerably less storage and floating-point operations while
maintaining predictive performance. Model soups (Wortsman et al., 2022) improve
generalization and out-of-distribution performance by averaging the parameters
of multiple models into a single one without increased inference time. However,
identifying models in the same loss basin to leverage both sparsity and
parameter averaging is challenging, as averaging arbitrary sparse models
reduces the overall sparsity due to differing sparse connectivities. In this
work, we address these challenges by demonstrating that exploring a single
retraining phase of Iterative Magnitude Pruning (IMP) with varying
hyperparameter configurations, such as batch ordering or weight decay, produces
models that are suitable for averaging and share the same sparse connectivity
by design. Averaging these models significantly enhances generalization
performance compared to their individual components. Building on this idea, we
introduce Sparse Model Soups (SMS), a novel method for merging sparse models by
initiating each prune-retrain cycle with the averaged model of the previous
phase. SMS maintains sparsity, exploits sparse network benefits being modular
and fully parallelizable, and substantially improves IMP's performance.
Additionally, we demonstrate that SMS can be adapted to enhance the performance
of state-of-the-art pruning during training approaches.Comment: 9 pages, 5 pages references, 7 pages appendi
Implicit Loss of Surjectivity and Facial Reduction: Theory and Applications
Facial reduction, pioneered by Borwein and Wolkowicz, is a preprocessing method that is commonly used to obtain strict feasibility in the reformulated, reduced constraint system.
The importance of strict feasibility is often addressed in the context of the convergence results for interior point methods.
Beyond the theoretical properties that the facial reduction conveys, we show that facial reduction, not only limited to interior point methods, leads to strong numerical performances in different classes of algorithms.
In this thesis we study various consequences and the broad applicability of facial reduction.
The thesis is organized in two parts.
In the first part, we show the instabilities accompanied by the absence
of strict feasibility through the lens of facially reduced systems.
In particular, we exploit the implicit redundancies, revealed by each nontrivial facial reduction step, resulting in the implicit loss of surjectivity.
This leads to the two-step facial reduction and two novel related notions of singularity.
For the area of semidefinite programming, we use these singularities to strengthen a known bound on the solution rank, the Barvinok-Pataki bound.
For the area of linear programming, we reveal degeneracies caused by the implicit redundancies.
Furthermore, we propose a preprocessing tool that uses the simplex method.
In the second part of this thesis, we continue with the semidefinite programs that do not have strictly feasible points.
We focus on the doubly-nonnegative relaxation of the binary quadratic program and a semidefinite program with a nonlinear objective function.
We closely work with two classes of algorithms, the splitting method and the Gauss-Newton interior point method.
We elaborate on the advantages in building models from facial reduction. Moreover, we develop algorithms for real-world problems including the quadratic assignment problem, the protein side-chain positioning problem, and the key rate computation for quantum key distribution.
Facial reduction continues to play an important role for
providing robust reformulated models in both the theoretical and the practical aspects, resulting in successful numerical performances
Modular lifelong machine learning
Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge.
Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand.
This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems.
First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures.
Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations.
Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods.
Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer
Beam scanning by liquid-crystal biasing in a modified SIW structure
A fixed-frequency beam-scanning 1D antenna based on Liquid Crystals (LCs) is designed for application in 2D scanning with lateral alignment. The 2D array environment imposes full decoupling of adjacent 1D antennas, which often conflicts with the LC requirement of DC biasing: the proposed design accommodates both. The LC medium is placed inside a Substrate Integrated Waveguide (SIW) modified to work as a Groove Gap Waveguide, with radiating slots etched on the upper broad wall, that radiates as a Leaky-Wave Antenna (LWA). This allows effective application of the DC bias voltage needed for tuning the LCs. At the same time, the RF field remains laterally confined, enabling the possibility to lay several antennas in parallel and achieve 2D beam scanning. The design is validated by simulation employing the actual properties of a commercial LC medium
Virtual Stiffness: A Novel Biomechanical Approach to Estimate Limb Stiffness of a Multi-Muscle and Multi-Joint System
In recent years, different groups have developed algorithms to control the stiffness of a robotic device through the electromyographic activity collected from a human operator. However, the approaches proposed so far require an initial calibration, have a complex subject-specific muscle model, or consider the activity of only a few pairs of antagonist muscles. This study described and tested an approach based on a biomechanical model to estimate the limb stiffness of a multi-joint, multi-muscle system from muscle activations. The “virtual stiffness” method approximates the generated stiffness as the stiffness due to the component of the muscle-activation vector that does not generate any endpoint force. Such a component is calculated by projecting the vector of muscle activations, estimated from the electromyographic signals, onto the null space of the linear mapping of muscle activations onto the endpoint force. The proposed method was tested by using an upper-limb model made of two joints and six Hill-type muscles and data collected during an isometric force-generation task performed with the upper limb. The null-space projection of the muscle-activation vector approximated the major axis of the stiffness ellipse or ellipsoid. The model provides a good approximation of the voluntary stiffening performed by participants that could be directly implemented in wearable myoelectric controlled devices that estimate, in real-time, the endpoint forces, or endpoint movement, from the mapping between muscle activation and force, without any additional calibrations
Introduction to Facial Micro Expressions Analysis Using Color and Depth Images: A Matlab Coding Approach (Second Edition, 2023)
The book attempts to introduce a gentle introduction to the field of Facial
Micro Expressions Recognition (FMER) using Color and Depth images, with the aid
of MATLAB programming environment. FMER is a subset of image processing and it
is a multidisciplinary topic to analysis. So, it requires familiarity with
other topics of Artifactual Intelligence (AI) such as machine learning, digital
image processing, psychology and more. So, it is a great opportunity to write a
book which covers all of these topics for beginner to professional readers in
the field of AI and even without having background of AI. Our goal is to
provide a standalone introduction in the field of MFER analysis in the form of
theorical descriptions for readers with no background in image processing with
reproducible Matlab practical examples. Also, we describe any basic definitions
for FMER analysis and MATLAB library which is used in the text, that helps
final reader to apply the experiments in the real-world applications. We
believe that this book is suitable for students, researchers, and professionals
alike, who need to develop practical skills, along with a basic understanding
of the field. We expect that, after reading this book, the reader feels
comfortable with different key stages such as color and depth image processing,
color and depth image representation, classification, machine learning, facial
micro-expressions recognition, feature extraction and dimensionality reduction.
The book attempts to introduce a gentle introduction to the field of Facial
Micro Expressions Recognition (FMER) using Color and Depth images, with the aid
of MATLAB programming environment.Comment: This is the second edition of the boo
Knowledge Distillation and Continual Learning for Optimized Deep Neural Networks
Over the past few years, deep learning (DL) has been achieving state-of-theart performance on various human tasks such as speech generation, language translation, image segmentation, and object detection. While traditional machine learning models require hand-crafted features, deep learning algorithms can automatically extract discriminative features and learn complex knowledge from large datasets. This powerful learning ability makes deep learning models attractive to both academia and big corporations.
Despite their popularity, deep learning methods still have two main limitations: large memory consumption and catastrophic knowledge forgetting. First, DL algorithms use very deep neural networks (DNNs) with many billion parameters, which have a big model size and a slow inference speed. This restricts the application of DNNs in resource-constraint devices such as mobile phones and autonomous vehicles. Second, DNNs are known to suffer from catastrophic forgetting. When incrementally learning new tasks, the model performance on old tasks significantly drops. The ability to accommodate new knowledge while retaining previously learned knowledge is called continual learning. Since the realworld environments in which the model operates are always evolving, a robust neural network needs to have this continual learning ability for adapting to new changes
Extending the reach of uncertainty quantification in nuclear theory
The theory of the strong interaction—quantum chromodynamics (QCD)—is unsuited to practical calculations of nuclear observables and approximate models for nuclear interaction potentials are required. In contrast to phenomenological models, chiral effective field theories (χEFTs) of QCD grant a handle on the theoretical uncertainty arising from the truncation of the chiral expansion. Uncertainties in χEFT are preferably quantified using Bayesian inference, but quantifying reliable posterior predictive distributions for nuclear observables presents several challenges. First, χEFT is parametrized by unknown low-energy constants (LECs) whose values must be inferred from low-energy data of nuclear structure and reaction observables. There are 31 LECs at fourth order in Weinberg power counting, leading to a high-dimensional inference problem which I approach by developing an advanced sampling protocol using Hamiltonian Monte Carlo (HMC). This allows me to quantify LEC posteriors up to and including fourth chiral order. Second, the χEFT truncation error is correlated across independent variables such as scattering energies and angles; I model correlations using a Gaussian process. Third, the computational cost of computing few- and many-nucleon observables typically precludes their direct use in Bayesian parameter estimation as each observable must be computed in excess of 100,000 times during HMC sampling. The one exception is nucleon-nucleon scattering observables, but even these incur a substantial computational cost in the present applications. I sidestep such issues using eigenvector-continuation emulators, which accurately mimic exact calculations while dramatically reducing the computational cost. Equipped with Bayesian posteriors for the LECs, and a model for the truncation error, I explore the predictive ability of χEFT, presenting the results as the probability distributions they always were
An end-to-end, interactive Deep Learning based Annotation system for cursive and print English handwritten text
With the surging inclination towards carrying out tasks on computational
devices and digital mediums, any method that converts a task that was
previously carried out manually, to a digitized version, is always welcome.
Irrespective of the various documentation tasks that can be done online today,
there are still many applications and domains where handwritten text is
inevitable, which makes the digitization of handwritten documents a very
essential task. Over the past decades, there has been extensive research on
offline handwritten text recognition. In the recent past, most of these
attempts have shifted to Machine learning and Deep learning based approaches.
In order to design more complex and deeper networks, and ensure stellar
performances, it is essential to have larger quantities of annotated data. Most
of the databases present for offline handwritten text recognition today, have
either been manually annotated or semi automatically annotated with a lot of
manual involvement. These processes are very time consuming and prone to human
errors. To tackle this problem, we present an innovative, complete end-to-end
pipeline, that annotates offline handwritten manuscripts written in both print
and cursive English, using Deep Learning and User Interaction techniques. This
novel method, which involves an architectural combination of a detection system
built upon a state-of-the-art text detection model, and a custom made Deep
Learning model for the recognition system, is combined with an easy-to-use
interactive interface, aiming to improve the accuracy of the detection,
segmentation, serialization and recognition phases, in order to ensure high
quality annotated data with minimal human interaction.Comment: 17 pages, 8 figures, 2 table
- …