6 research outputs found
Recommended from our members
Dimensionality Detection and the Geometric Median on Data Manifolds
In many applications high-dimensional observations are assumed to arrange on or near a low-dimensional manifold embedded in an ambient Euclidean space. In this thesis, ideas from differential geometry are extended to equation-free data analysis to better understand high-dimensional datasets. In particular, two questions are addressed: (1) how can the intrinsic dimensionality of a manifold-valued dataset be determined? and (2) how can this intrinsic dimensionality be leveraged to obtain a better notion of centrality? For (1), two common methods for estimating global dimensionality are stated and a novel approach is proposed to obtain an estimator for local dimensionality. Then, for (2), a novel approach is presented to estimate the geometric median on manifolds of which no prior knowledge of the underlyinggeometry is known. These methods are first applied to synthetic datasets and then to real world neurological measurements to create a biomarker for the development of epilepsy in an animal model
Distance Learner: Incorporating Manifold Prior to Model Training
The manifold hypothesis (real world data concentrates near low-dimensional
manifolds) is suggested as the principle behind the effectiveness of machine
learning algorithms in very high dimensional problems that are common in
domains such as vision and speech. Multiple methods have been proposed to
explicitly incorporate the manifold hypothesis as a prior in modern Deep Neural
Networks (DNNs), with varying success. In this paper, we propose a new method,
Distance Learner, to incorporate this prior for DNN-based classifiers. Distance
Learner is trained to predict the distance of a point from the underlying
manifold of each class, rather than the class label. For classification,
Distance Learner then chooses the class corresponding to the closest predicted
class manifold. Distance Learner can also identify points as being out of
distribution (belonging to neither class), if the distance to the closest
manifold is higher than a threshold. We evaluate our method on multiple
synthetic datasets and show that Distance Learner learns much more meaningful
classification boundaries compared to a standard classifier. We also evaluate
our method on the task of adversarial robustness, and find that it not only
outperforms standard classifier by a large margin, but also performs at par
with classifiers trained via state-of-the-art adversarial training
Learning Structured Representations of Data
Abstract Bayesian networks have shown themselves to be useful tools for the analysis and modelling of large data sets. However, their complete generality leads to computational and modelling complexities that have limited their applicability. We propose an approach to simplify and constrain Bayesian networks that strikes a more useful compromise between generality and tractability. These constrained graphical will allow us to build computationally tractable models for large high-dimensional data sets. We also describe examples of data sets drawn from image and speech processing on which we can (1) further explore this constrained set of graphical models, and (2) analyse their performance as a general-purpose statistical data analysis tool
Evaluacion de modelos de aprendizaje profundo mediante redes neuronales guiadas por datos para materiales no lineales
Nonlinear materials are often difficult to model with classical methods like the FiniteElement Method, have a complex and sometimes inaccurate physical and mathematicaldescription or simply we do not know how to describe such materials in terms ofrelations between external and internal variables. In many disciplines, neural networkmethods have arisen as powerful tools to deal with nonlinear problems. In this work, thevery recently developed concept of Physically-Guided Neural Networks with InternalVariables (PGNNIV) is applied for nonlinear materials, providing us with a tool to addphysically meaningful constraints to deep neural networks from a model-free perspective.These latter outperform classical simulation methods in terms of computational powerfor the evaluation of the prediction of external and specially internal variables, sincethey are less computationally intensive and easily scalable. Furthermore, in comparisonwith classical neural networks, they lter numerical noise, have faster convergence, areless data demanding and can have improved extrapolation capacity. In addition, as theyare not based on conventional parametric models (model-free character), a reductionin the time required to develop material models is achieved compared to the use ofmethods such as Finite Elements. In this work, it is shown that the same PGNNIVis capable of achieving good results in the predictions regardless of the nature of theelastic material considered (linear, with hardening or softening behavior), being able tounravel the constitutive law of the material and explain its nature. The results showthat PGNNIV is a useful tool to deal with the problems of solid mechanics, both fromthe point of view of predicting the response to new load situations, and to explain thebehavior of materials, placing the method in what is known as Explainable ArticialIntelligence (XAI).<br /
Robustness in Dimensionality Reduction
Dimensionality reduction is widely used in many statistical applications, such as image analysis, microarray analysis, or text mining. This thesis focuses on three problems that relate to the robustness in dimension reduction.
The first topic is the performance analysis in dimension reduction, that is, quantitatively assessing the performance of a algorithm on a given dataset. A criterion for success is established from the geometric point of view to address this issues. A family of goodness measures, called \textsl{local rank correlation}, is developed to assess the performance of dimensionality reduction methods. The potential application of the local rank correlation in selecting tuning parameters of dimension reduction algorithms is also explored. The second topic is the sensitivity analysis in dimension reduction. Two types of influence functions are developed as measures of robustness, based on which we develop graphical display strategies for visualizing the robustness of a dimension reduction method, and flagging potential outliers. In the third part of the thesis, a novel robust PCA framework, called \textsl{Performance-Weighted Bagging PCA}, is proposed from the perspective of model averaging. It obtains a robust linear subspace by weighted averaging a collection of subspaces produced by subsamples. The robustness against outliers is achieved by a proper weighting scheme, and possible choices of weighting scheme are investigated