237 research outputs found
Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization
We study the generalization capability of nearly-interpolating linear
regressors: 's whose training error is positive but
small, i.e., below the noise floor. Under a random matrix theoretic assumption
on the data distribution and an eigendecay assumption on the data covariance
matrix , we demonstrate that any near-interpolator
exhibits rapid norm growth: for fixed, has squared
-norm where is the number of samples and is the
exponent of the eigendecay, i.e., . This implies that existing data-independent norm-based bounds are
necessarily loose. On the other hand, in the same regime we precisely
characterize the asymptotic trade-off between interpolation and generalization.
Our characterization reveals that larger norm scaling exponents
correspond to worse trade-offs between interpolation and generalization. We
verify empirically that a similar phenomenon holds for nearly-interpolating
shallow neural networks.Comment: AISTATS 202
Unsupervised Metric Learning in Presence of Missing Data
For many machine learning tasks, the input data lie on a low-dimensional
manifold embedded in a high dimensional space and, because of this
high-dimensional structure, most algorithms are inefficient. The typical
solution is to reduce the dimension of the input data using standard dimension
reduction algorithms such as ISOMAP, LAPLACIAN EIGENMAPS or LLES. This
approach, however, does not always work in practice as these algorithms require
that we have somewhat ideal data. Unfortunately, most data sets either have
missing entries or unacceptably noisy values. That is, real data are far from
ideal and we cannot use these algorithms directly. In this paper, we focus on
the case when we have missing data. Some techniques, such as matrix completion,
can be used to fill in missing data but these methods do not capture the
non-linear structure of the manifold. Here, we present a new algorithm
MR-MISSING that extends these previous algorithms and can be used to compute
low dimensional representation on data sets with missing entries. We
demonstrate the effectiveness of our algorithm by running three different
experiments. We visually verify the effectiveness of our algorithm on synthetic
manifolds, we numerically compare our projections against those computed by
first filling in data using nlPCA and mDRUR on the MNIST data set, and we also
show that we can do classification on MNIST with missing data. We also provide
a theoretical guarantee for MR-MISSING under some simplifying assumptions
Generalization Error without Independence: Denoising, Linear Regression, and Transfer Learning
Studying the generalization abilities of linear models with real data is a
central question in statistical learning. While there exist a limited number of
prior important works (Loureiro et al. (2021A, 2021B), Wei et al. 2022) that do
validate theoretical work with real data, these works have limitations due to
technical assumptions. These assumptions include having a well-conditioned
covariance matrix and having independent and identically distributed data.
These assumptions are not necessarily valid for real data. Additionally, prior
works that do address distributional shifts usually make technical assumptions
on the joint distribution of the train and test data (Tripuraneni et al. 2021,
Wu and Xu 2020), and do not test on real data.
In an attempt to address these issues and better model real data, we look at
data that is not I.I.D. but has a low-rank structure. Further, we address
distributional shift by decoupling assumptions on the training and test
distribution. We provide analytical formulas for the generalization error of
the denoising problem that are asymptotically exact. These are used to derive
theoretical results for linear regression, data augmentation, principal
component regression, and transfer learning. We validate all of our theoretical
results on real data and have a low relative mean squared error of around 1%
between the empirical risk and our estimated risk
Demographic characteristics and association of serum Vitamin B12, ferritin and thyroid function with premature canities in Indian patients from an urban skin clinic of North India: A retrospective analysis of 71 cases
yesBackground: The incidence of self-reported premature hair graying (PHG) seems to be on the rise. PHG has a profound impact on the patient's quality of life. It remains an incompletely understood etiology with limited and modest treatment options. Aim: The evaluation of the demographic and clinical profile of patients with premature canities, and exploration of the association of this entity with certain systemic disorders suspected to be related to its etiology. Methods: Seventy-one cases of premature canities (onset noticed by patients before 25 years of age) presenting to an urban skin clinic in Gurugram, India, between September 2012 and September 2015 with this complaint were retrospectively analyzed. The patient records were retrieved that provided details of the onset, duration and pattern of involvement, history, and examination findings (scalp, cutis, and general physical). Since all these patients had been screened for anemia, thyroid disorder, fasting blood glucose, and Vitamin B12 levels at the time of presentation, these parameters were also available for analysis. Results: The mean age at onset of graying was 10.2 ± 3.6 years (range: 5–19 years), with an almost equal gender distribution. The earliest age of onset recorded was 5 years. A positive family history of PHG (at least one of the biological parents or siblings) was obtained in 64 (90.1%) of the cases. The temporal regions of the scalp (35.2%) were most commonly involved followed by the frontal region (18.3%). Hypovitaminosis B12 and hypothyroidism showed significant association with the disorder, whereas anemia, serum ferritin, and fasting blood glucose did not. Conclusion: The age of onset of hair graying can be as low as 5 years. Temporal and frontal areas are the most commonly involved sites. A strong family history, Vitamin B12 deficiency, and hypothyroidism are strongly associated with PHG. Larger case–control studies are mandated for discerning the correlation of these and other risk factors with PHG
Alternative Fuels for Diesel Engines: New Frontiers
The world at present is mainly dependent upon petroleum-derived fuels for meeting its energy requirement. However, perturbation in crude prices, which concerns about long-term availability of these fuels coupled with environmental degradation due to their combustion, has put renewable alternative fuels on the forefront of policy maker’s agenda. The diesel engines are considered workhorse in the global economy due to better thermal efficiency, ruggedness, and load carrying capacity. They, however, are also the main contributor to air pollution as they emit more oxides of nitrogen, suspended particulate matter as compared to gasoline engines. The most potential fuel either to supplement or to substitute diesel is biodiesel, butanol, producer gas, dimethyl ether, hydrogen, and so on. This chapter presents the developments about the use of alternative fuels in diesel engines. The exhaustive literature has evolved the main trends in the development of alternative fuels around the world. The chapter also describes the research directions on production and use of alternative fuels in off-road and transport vehicles powered by diesel engines
Metric and Representation Learning
All data has some inherent mathematical structure. I am interested in understanding the intrinsic geometric and probabilistic structure of data to design effective algorithms and tools that can be applied to machine learning and across all branches of science.
The focus of this thesis is to increase the effectiveness of machine learning techniques by developing a mathematical and algorithmic framework using which, given any type of data, we can learn an optimal representation. Representation learning is done for many reasons. It could be done to fix the corruption given corrupted data or to learn a low dimensional or simpler representation, given high dimensional data or a very complex representation of the data. It could also be that the current representation of the data does not capture the important geometric features of the data.
One of the many challenges in representation learning is determining ways to judge the quality of the representation learned. In many cases, the consensus is that if d is the natural metric on the representation, then this metric should provide meaningful information about the data. Many examples of this can be seen in areas such as metric learning, manifold learning, and graph embedding. However, most algorithms that solve these problems learn a representation in a metric space first and then extract a metric.
A large part of my research is exploring what happens if the order is switched, that is, learn the appropriate metric first and the embedding later. The philosophy behind this approach is that understanding the inherent geometry of the data is the most crucial part of representation learning. Often, studying the properties of the appropriate metric on the input data sets indicates the type of space, we should be seeking for the representation. Hence giving us more robust representations. Optimizing for the appropriate metric can also help overcome issues such as missing and noisy data. My projects fall into three different areas of representation learning.
1) Geometric and probabilistic analysis of representation learning methods.
2) Developing methods to learn optimal metrics on large datasets.
3) Applications.
For the category of geometric and probabilistic analysis of representation learning methods, we have three projects. First, designing optimal training data for denoising autoencoders. Second, formulating a new optimal transport problem and understanding the geometric structure. Third, analyzing the robustness to perturbations of the solutions obtained from the classical multidimensional scaling algorithm versus that of the true solutions to the multidimensional scaling problem.
For learning optimal metric, we are given a dissimilarity matrix , some function and some a subset of the space of all metrics and we want to find that minimizes . In this thesis, we consider the version of the problem when is the space of metrics defined on a fixed graph. That is, given a graph , we let , be the space of all metrics defined via . For this , we consider the sparse objective function as well as convex objective functions. We also looked at the problem where we want to learn a tree. We also show how the ideas behind learning the optimal metric can be applied to dimensionality reduction in the presence of missing data.
Finally, we look at an application to real world data. Specifically trying to reconstruct ancient Greek text.PHDApplied and Interdisciplinary MathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169738/1/rsonthal_1.pd
Generalized Metric Repair on Graphs
Many modern data analysis algorithms either assume or are considerably more efficient if the distances between the data points satisfy a metric. However, as real data sets are noisy, they often do not possess this fundamental property. For this reason, Gilbert and Jain [A. Gilbert and L. Jain, 2017] and Fan et al. [C. Fan et al., 2018] introduced the closely related sparse metric repair and metric violation distance problems. Given a matrix, representing all distances, the goal is to repair as few entries as possible to ensure they satisfy a metric. This problem was shown to be APX-hard, and an O(OPT^{1/3})-approximation was given, where OPT is the optimal solution size.
In this paper, we generalize the problem, by describing distances by a possibly incomplete positively weighted graph, where again our goal is to find the smallest number of weight modifications so that they satisfy a metric. This natural generalization is more flexible as it takes into account different relationships among the data points. We demonstrate the inherent combinatorial structure of the problem, and give an approximation-preserving reduction from MULTICUT, which is hard to approximate within any constant factor assuming UGC. Conversely, we show that for any fixed constant ?, for the large class of ?-chordal graphs, the problem is fixed parameter tractable, answering an open question from previous work. Call a cycle broken if it contains an edge whose weight is larger than the sum of all its other edges, and call the amount of this difference its deficit. We present approximation algorithms, one depending on the maximum number of edges in a broken cycle, and one depending on the number of distinct deficit values, both quantities which may naturally be small. Finally, we give improved analysis of previous algorithms for complete graphs
- …