363 research outputs found
Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering
Despite the empirical success and practical significance of (relational)
knowledge distillation that matches (the relations of) features between teacher
and student models, the corresponding theoretical interpretations remain
limited for various knowledge distillation paradigms. In this work, we take an
initial step toward a theoretical understanding of relational knowledge
distillation (RKD), with a focus on semi-supervised classification problems. We
start by casting RKD as spectral clustering on a population-induced graph
unveiled by a teacher model. Via a notion of clustering error that quantifies
the discrepancy between the predicted and ground truth clusterings, we
illustrate that RKD over the population provably leads to low clustering error.
Moreover, we provide a sample complexity bound for RKD with limited unlabeled
samples. For semi-supervised learning, we further demonstrate the label
efficiency of RKD through a general framework of cluster-aware semi-supervised
learning that assumes low clustering errors. Finally, by unifying data
augmentation consistency regularization into this cluster-aware framework, we
show that despite the common effect of learning accurate clusterings, RKD
facilitates a "global" perspective through spectral clustering, whereas
consistency regularization focuses on a "local" perspective via expansion
Traction force microscopy with optimized regularization and automated Bayesian parameter selection for comparing cells
Adherent cells exert traction forces on to their environment, which allows
them to migrate, to maintain tissue integrity, and to form complex
multicellular structures. This traction can be measured in a perturbation-free
manner with traction force microscopy (TFM). In TFM, traction is usually
calculated via the solution of a linear system, which is complicated by
undersampled input data, acquisition noise, and large condition numbers for
some methods. Therefore, standard TFM algorithms either employ data filtering
or regularization. However, these approaches require a manual selection of
filter- or regularization parameters and consequently exhibit a substantial
degree of subjectiveness. This shortcoming is particularly serious when cells
in different conditions are to be compared because optimal noise suppression
needs to be adapted for every situation, which invariably results in systematic
errors. Here, we systematically test the performance of new methods from
computer vision and Bayesian inference for solving the inverse problem in TFM.
We compare two classical schemes, L1- and L2-regularization, with three
previously untested schemes, namely Elastic Net regularization, Proximal
Gradient Lasso, and Proximal Gradient Elastic Net. Overall, we find that
Elastic Net regularization, which combines L1 and L2 regularization,
outperforms all other methods with regard to accuracy of traction
reconstruction. Next, we develop two methods, Bayesian L2 regularization and
Advanced Bayesian L2 regularization, for automatic, optimal L2 regularization.
Using artificial data and experimental data, we show that these methods enable
robust reconstruction of traction without requiring a difficult selection of
regularization parameters specifically for each data set. Thus, Bayesian
methods can mitigate the considerable uncertainty inherent in comparing
cellular traction forces
A Data-Driven Approach for Generating Vortex Shedding Regime Maps for an Oscillating Cylinder
Recent developments in wind energy extraction methods from vortex-induced vibration (VIV) have
fueled the research into vortex shedding behaviour. The vortex shedding map is vital for the consistent
use of normalized amplitude and wavelength to validate the predicting power of forced vibration
experiments. However, there is a lack of demonstrated methods of generating this map at Reynolds
numbers feasible for energy generation due to the high computational cost and complex dynamics.
Leveraging data-driven methods addresses the limitations of the traditional experimental vortex
shedding map generation, which requires large amounts of data and intensive supervision that is
unsuitable for many applications and Reynolds numbers. This thesis presents a data-driven approach for
generating vortex shedding maps of a cylinder undergoing forced vibration that requires less data and
supervision while accurately extracting the underlying vortex structure patterns.
The quantitative analysis in this dissertation requires the univariate time series signatures of local fluid
flow measurements in the wake of an oscillating cylinder experiencing forced vibration. The datasets
were extracted from a 2-dimensional computational fluid dynamic (CFD) simulation of a cylinder
oscillating at various normalized amplitude and wavelength parameters conducted at two discrete
Reynolds numbers of 4000 and 10,000. First, the validity of clustering local flow measurements was
demonstrated by proposing a vortex shedding mode classification strategy using supervised machine
learning models of random forest and -nearest neighbour models, which achieved 99.3% and 99.8%
classification accuracy using the velocity sensors orientated transverse to the pre-dominant flow (),
respectively. Next, the dataset of local flow measurement of the -component of velocity was used to
develop the procedure of generating vortex shedding maps using unsupervised clustering techniques. The
clustering task was conducted on subsequences of repeated patterns from the whole time series extracted
using the novel matrix profile method. The vortex shedding map was validated by reproducing a
benchmark map produced at a low Reynolds number. The method was extended to a higher Reynolds
number case of vortex shedding and demonstrated the insight gained into the underlying dynamical
regimes of the physical system. The proposed multi-step clustering methods denoted Hybrid Method B,
combining Density-Based Clustering Based on Connected Regions with High Density (DBSCAN) and
Agglomerative algorithms, and Hybrid Method C, combining -Means and Agglomerative algorithms
demonstrated the ability to extract meaningful clusters from more complex vortex structures that become
increasingly indistinguishable. The data-driven methods yield exceptional performance and versatility,
which significantly improves the map generation method while reducing the data input and supervision
required
Recommended from our members
Automatic Development and Adaptation of Concise Nonlinear Models for System Identification
Mathematical descriptions of natural and man-made processes are the bedrock of science, used by humans to understand, estimate, predict and control the natural and built world around them. The goal of system identification is to enable the inference of mathematical descriptions of the true behavior and dynamics of processes from their measured observations. The crux of this task is the identification of the dynamic model form (topology) in addition to its parameters. Model structures must be concise to offer insight to the user about the process in question. To that end, this dissertation proposes three methods to improve the ability of system identification to identify succinct nonlinear model structures.
The first is a model structure adaptation method (MSAM) that modifies first principles models to increase their predictive ability while maintaining intelligibility. Model structure identification is achieved by this method despite the presence of parametric error through a novel means of estimating the gradient of model structure perturbations. I demonstrate MSAM\u27s ability to identify underlying nonlinear dynamic models starting from linear models in the presence of parametric uncertainty. The main contribution of this method is the ability to adapt the structure of existing models of processes such that they more closely match the process observations.
The second method, known as epigenetic linear genetic programming (ELGP), conducts symbolic regression without a priori knowledge of the form of the model or its parameters. ELGP incorporates a layer of genetic regulation into genetic programming (GP) and adapts it by local search to tune the resultant model structures for accuracy and conciseness. The introduction of epigenetics is made simple by the use of a stack-based program representation. This method, tested on hundreds of dynamics problems, demonstrates the ability of epigenetic local search to improve GP by producing simpler and more accurate models.
The third method relies on a multidimensional GP approach (M4GP) for solving multiclass classification problems. The proposed method uses stack-based GP to conduct nonlinear feature transformations to optimize the clustering of data according to their classes. In comparison to several state-of-the-art methods, M4GP is able to classify test data better on several real-world problems. The main contribution of M4GP is its demonstrated ability to combine the strengths of GP (e.g. nonlinear feature transformations and feature selection) with the strengths of distance-based classification.
MSAM, ELGP and M4GP improve the identification of succinct nonlinear model structures for continuous dynamic processes with starting models, continuous dynamic processes without starting models, and multiclass dynamic processes without starting models, respectively. A considerable portion of this dissertation is devoted to the application of these methods to these three classes of real-world dynamic modeling problems. MSAM is applied to the restructuring of controllers to improve the closed-loop system response of nonlinear plants. ELGP is used to identify the closed-loop dynamics of an industrial scale wind turbine and to define a reduced-order model of fluid-structure interaction. Lastly, M4GP is used to identify a dynamic behavioral model of bald eagles from collected data. The methods are analyzed alongside many other state-of-the-art system identification methods in the context of model accuracy and conciseness
Reduced Order Models for Hydrodynamic Analysis of Pipelines based on Modal Analysis and Machine Learning
Bluff bodies have extensive implementation in engineering. For instance, marine risers, jumpers, umbilicals, and bundle flowlines are the examples of the
circular bluff bodies which are used in the field of offshore engineering. They can be designed both to be fixed or flexible supported and to be a single or tandem configurations. The subsea structures are subjected to the external hydrodynamic loads such as cyclic loads, vibrations, high pressure, etc. For example, one of the commonly observed damaging flow features associated with hydrodynamics of the flexible supported bluff bodies is vortex shedding. Therefore, investigation of the instantaneous flow structures and the hydrodynamic forces acting on the bluff bodies at the operational conditions need to be performed to prevent the degradation mechanisms and increase service life of the subsea structures. Nowadays, the modern techniques are used in order to analyse the flow field data with high efficiency. For example, neural networks (NNs) are trained with massive experimental or numerical simulation data to predict the spatial-temporal evolution of the dominant coherent structures of the flow field and structural behaviour can be considered as an alternative to the conventional computational fluid dynamics (CFD) simulations. In the present thesis, numerical investigations of the flow around cylindrical bluff bodies in the upper transition Reynolds number regime ( = 3.6 ∙ 10^6 ) are performed. Two pipeline
operational conditions are considered. First one is tandem configuration of the two stationary pipelines subjected to steady flow. Second one is a pipeline undergoing the vortex-induced-vibrations (VIV) subjected to a steady current. Two-dimensional (2D) Unsteady Reynolds-Averaged-Navier-Stokes (URANS) equations combined with the standard k−w SST turbulence model are solved. The open source CFD toolbox OpenFOAM v2012 is employed to perform the simulations. The Reduced Order Models (ROMs) which can provide a low-dimensional representation of the simulation data with reduced computational time and cost are designed. Dynamic mode decomposition (DMD) and proper orthogonal decomposition (POD) techniques are
implemented for the first and second cases, respectively. In addition, further development of the ROMs for the VIV cylinder case is done by implementing the long short-term neural network (LSTM-NN). The neural network based model allows to make the predictions of the dominant hydrodynamic characteristics of the flow around the cylindrical bluff bodies subjected to a high Reynolds number flow at a future time instances with a reduced computational cost
Simultaneous Behavior Onset Detection and Task Classification for Patients with Parkinson Disease Using Subthalamic Nucleus Local Field Potentials
This thesis aims to develop of methods for behavior onset detection of patients with Parkinson\u27s disease (PD), as well as to investigate the models for classification of different behavioral tasks performed by PD patient. The detection is based on recorded Local Field Potentials (LFP) of the Subthalamic nucleus (STN), captured through Deep Brain Stimulation (DBS) process.
One main part of this work is dedicated to the research of various properties and features of the STN LFP signals of several patients\u27 behavior conditions. Features based on temporal and time-frequency analysis of the signals are developed and implemented. Evaluation and comparison of the features is conducted on several patients\u27 data during a classification process, using onset windows of preprocessed signals.
Another part of this research is concentrated on automated onset detection of behavioral tasks for patients with PD using the LFP signals collected during DBS implantation surgeries. Using time-frequency signal processing methods, features are extracted and clustered in the feature space for onset detection. Then, a supervised model is employed which used Discrete Hidden Markov Models (DHMM) to specify the onset location of the behavior in the LFP signal.
Finally, a method for simultaneous onset detection and task classification for patients with PD is presented, which classifies the tasks into motor, language, and combination of motor and language behaviors, using LFP signals collected during DBS implantation surgeries. Again, time-frequency signal processing methods are applied, and features are extracted and clustered in the feature space. The features extracted from automated detected onset are used to classify the behavior task into predefined categories. DHMM is merged with SVM in a two-layer classifier to boost up the behavior classification rate into 84%, and the presented methodology is justified using the experimental results
A new hybrid algorithm for multi‐objective reactive power planning via facts devices and renewable wind resources
The power system planning problem considering system loss function, voltage profile function, the cost function of FACTS (flexible alternating current transmission system) devices, and stability function are investigated in this paper. With the growth of electronic technologies, FACTS devices have improved stability and more reliable planning in reactive power (RP) planning. In addition, in modern power systems, renewable resources have an inevitable effect on power system planning. Therefore, wind resources make a complicated problem of planning due to conflicting functions and non-linear constraints. This confliction is the stochastic nature of the cost, loss, and voltage functions that cannot be summarized in function. A multi-objective hybrid algorithm is proposed to solve this problem by considering the linear and non-linear constraints that combine particle swarm optimization (PSO) and the virus colony search (VCS). VCS is a new optimization method based on viruses’ search function to destroy host cells and cause the penetration of the best virus into a cell for reproduction. In the proposed model, the PSO is used to enhance local and global search. In addition, the non-dominated sort of the Pareto criterion is used to sort the data. The optimization results on different scenarios reveal that the combined method of the proposed hybrid algorithm can improve the parameters such as convergence time, index of voltage stability, and absolute magnitude of voltage deviation, and this method can reduce the total transmission line losses. In addition, the presence of wind resources has a positive effect on the mentioned issue
- …