Search CORE

237 research outputs found

Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization

Author: Hu Wei
Sonthalia Rishi
Wang Yutong
Publication venue
Publication date: 11/03/2024
Field of study

We study the generalization capability of nearly-interpolating linear regressors:

\boldsymbol{\beta}

's whose training error

\tau

is positive but small, i.e., below the noise floor. Under a random matrix theoretic assumption on the data distribution and an eigendecay assumption on the data covariance matrix

\boldsymbol{\Sigma}

, we demonstrate that any near-interpolator exhibits rapid norm growth: for

\tau

fixed,

\boldsymbol{\beta}

has squared

\ell_2

-norm

\mathbb{E}[\|{\boldsymbol{\beta}}\|_{2}^{2}] = \Omega(n^{\alpha})

where

n

is the number of samples and

\alpha >1

is the exponent of the eigendecay, i.e.,

\lambda_i(\boldsymbol{\Sigma}) \sim i^{-\alpha}

. This implies that existing data-independent norm-based bounds are necessarily loose. On the other hand, in the same regime we precisely characterize the asymptotic trade-off between interpolation and generalization. Our characterization reveals that larger norm scaling exponents

\alpha

correspond to worse trade-offs between interpolation and generalization. We verify empirically that a similar phenomenon holds for nearly-interpolating shallow neural networks.Comment: AISTATS 202

arXiv.org e-Print Archive

Unsupervised Metric Learning in Presence of Missing Data

Author: Gilbert Anna C.
Sonthalia Rishi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/03/2019
Field of study

For many machine learning tasks, the input data lie on a low-dimensional manifold embedded in a high dimensional space and, because of this high-dimensional structure, most algorithms are inefficient. The typical solution is to reduce the dimension of the input data using standard dimension reduction algorithms such as ISOMAP, LAPLACIAN EIGENMAPS or LLES. This approach, however, does not always work in practice as these algorithms require that we have somewhat ideal data. Unfortunately, most data sets either have missing entries or unacceptably noisy values. That is, real data are far from ideal and we cannot use these algorithms directly. In this paper, we focus on the case when we have missing data. Some techniques, such as matrix completion, can be used to fill in missing data but these methods do not capture the non-linear structure of the manifold. Here, we present a new algorithm MR-MISSING that extends these previous algorithms and can be used to compute low dimensional representation on data sets with missing entries. We demonstrate the effectiveness of our algorithm by running three different experiments. We visually verify the effectiveness of our algorithm on synthetic manifolds, we numerically compare our projections against those computed by first filling in data using nlPCA and mDRUR on the MNIST data set, and we also show that we can do classification on MNIST with missing data. We also provide a theoretical guarantee for MR-MISSING under some simplifying assumptions

arXiv.org e-Print Archive

Crossref

Generalization Error without Independence: Denoising, Linear Regression, and Transfer Learning

Author: Kausik Chinmaya
Sonthalia Rishi
Srivastava Kashvi
Publication venue
Publication date: 26/05/2023
Field of study

Studying the generalization abilities of linear models with real data is a central question in statistical learning. While there exist a limited number of prior important works (Loureiro et al. (2021A, 2021B), Wei et al. 2022) that do validate theoretical work with real data, these works have limitations due to technical assumptions. These assumptions include having a well-conditioned covariance matrix and having independent and identically distributed data. These assumptions are not necessarily valid for real data. Additionally, prior works that do address distributional shifts usually make technical assumptions on the joint distribution of the train and test data (Tripuraneni et al. 2021, Wu and Xu 2020), and do not test on real data. In an attempt to address these issues and better model real data, we look at data that is not I.I.D. but has a low-rank structure. Further, we address distributional shift by decoupling assumptions on the training and test distribution. We provide analytical formulas for the generalization error of the denoising problem that are asymptotically exact. These are used to derive theoretical results for linear regression, data augmentation, principal component regression, and transfer learning. We validate all of our theoretical results on real data and have a low relative mean squared error of around 1% between the empirical risk and our estimated risk

arXiv.org e-Print Archive

Demographic characteristics and association of serum Vitamin B12, ferritin and thyroid function with premature canities in Indian patients from an urban skin clinic of North India: A retrospective analysis of 71 cases

Author: Priya A.
Sonthalia S.
Tobin Desmond J.
Publication venue
Publication date: 01/01/2017
Field of study

yesBackground: The incidence of self-reported premature hair graying (PHG) seems to be on the rise. PHG has a profound impact on the patient's quality of life. It remains an incompletely understood etiology with limited and modest treatment options. Aim: The evaluation of the demographic and clinical profile of patients with premature canities, and exploration of the association of this entity with certain systemic disorders suspected to be related to its etiology. Methods: Seventy-one cases of premature canities (onset noticed by patients before 25 years of age) presenting to an urban skin clinic in Gurugram, India, between September 2012 and September 2015 with this complaint were retrospectively analyzed. The patient records were retrieved that provided details of the onset, duration and pattern of involvement, history, and examination findings (scalp, cutis, and general physical). Since all these patients had been screened for anemia, thyroid disorder, fasting blood glucose, and Vitamin B12 levels at the time of presentation, these parameters were also available for analysis. Results: The mean age at onset of graying was 10.2 ± 3.6 years (range: 5–19 years), with an almost equal gender distribution. The earliest age of onset recorded was 5 years. A positive family history of PHG (at least one of the biological parents or siblings) was obtained in 64 (90.1%) of the cases. The temporal regions of the scalp (35.2%) were most commonly involved followed by the frontal region (18.3%). Hypovitaminosis B12 and hypothyroidism showed significant association with the disorder, whereas anemia, serum ferritin, and fasting blood glucose did not. Conclusion: The age of onset of hair graying can be as low as 5 years. Temporal and frontal areas are the most commonly involved sites. A strong family history, Vitamin B12 deficiency, and hypothyroidism are strongly associated with PHG. Larger case–control studies are mandated for discerning the correlation of these and other risk factors with PHG

Directory of Open Access Journals

Bradford Scholars

Alternative Fuels for Diesel Engines: New Frontiers

Author: Kumar Naveen
Pali Harveer S.
Sidharth
Sonthalia Ankit
Publication venue: 'IntechOpen'
Publication date: 05/11/2018
Field of study

The world at present is mainly dependent upon petroleum-derived fuels for meeting its energy requirement. However, perturbation in crude prices, which concerns about long-term availability of these fuels coupled with environmental degradation due to their combustion, has put renewable alternative fuels on the forefront of policy maker’s agenda. The diesel engines are considered workhorse in the global economy due to better thermal efficiency, ruggedness, and load carrying capacity. They, however, are also the main contributor to air pollution as they emit more oxides of nitrogen, suspended particulate matter as compared to gasoline engines. The most potential fuel either to supplement or to substitute diesel is biodiesel, butanol, producer gas, dimethyl ether, hydrogen, and so on. This chapter presents the developments about the use of alternative fuels in diesel engines. The exhaustive literature has evolved the main trends in the development of alternative fuels around the world. The chapter also describes the research directions on production and use of alternative fuels in off-road and transport vehicles powered by diesel engines

IntechOpen

Crossref

Metric and Representation Learning

Author: Sonthalia Rishi Saurabh
Publication venue
Publication date: 01/01/2021
Field of study

All data has some inherent mathematical structure. I am interested in understanding the intrinsic geometric and probabilistic structure of data to design effective algorithms and tools that can be applied to machine learning and across all branches of science. The focus of this thesis is to increase the effectiveness of machine learning techniques by developing a mathematical and algorithmic framework using which, given any type of data, we can learn an optimal representation. Representation learning is done for many reasons. It could be done to fix the corruption given corrupted data or to learn a low dimensional or simpler representation, given high dimensional data or a very complex representation of the data. It could also be that the current representation of the data does not capture the important geometric features of the data. One of the many challenges in representation learning is determining ways to judge the quality of the representation learned. In many cases, the consensus is that if d is the natural metric on the representation, then this metric should provide meaningful information about the data. Many examples of this can be seen in areas such as metric learning, manifold learning, and graph embedding. However, most algorithms that solve these problems learn a representation in a metric space first and then extract a metric. A large part of my research is exploring what happens if the order is switched, that is, learn the appropriate metric first and the embedding later. The philosophy behind this approach is that understanding the inherent geometry of the data is the most crucial part of representation learning. Often, studying the properties of the appropriate metric on the input data sets indicates the type of space, we should be seeking for the representation. Hence giving us more robust representations. Optimizing for the appropriate metric can also help overcome issues such as missing and noisy data. My projects fall into three different areas of representation learning. 1) Geometric and probabilistic analysis of representation learning methods. 2) Developing methods to learn optimal metrics on large datasets. 3) Applications. For the category of geometric and probabilistic analysis of representation learning methods, we have three projects. First, designing optimal training data for denoising autoencoders. Second, formulating a new optimal transport problem and understanding the geometric structure. Third, analyzing the robustness to perturbations of the solutions obtained from the classical multidimensional scaling algorithm versus that of the true solutions to the multidimensional scaling problem. For learning optimal metric, we are given a dissimilarity matrix

hat{D}

, some function

f

and some a subset

S

of the space of all metrics and we want to find

D in S

that minimizes

f(D,hat{D})

. In this thesis, we consider the version of the problem when

S

is the space of metrics defined on a fixed graph. That is, given a graph

G

, we let

S

, be the space of all metrics defined via

G

. For this

S

, we consider the sparse objective function as well as convex objective functions. We also looked at the problem where we want to learn a tree. We also show how the ideas behind learning the optimal metric can be applied to dimensionality reduction in the presence of missing data. Finally, we look at an application to real world data. Specifically trying to reconstruct ancient Greek text.PHDApplied and Interdisciplinary MathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169738/1/rsonthal_1.pd

Deep Blue Documents

Generalized Metric Repair on Graphs

Author: Fan Chenglin
Gilbert Anna C.
Raichel Benjamin
Sonthalia Rishi
Van Buskirk Gregory
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 17th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2020)
Publication date: 21/08/2019
Field of study

Many modern data analysis algorithms either assume or are considerably more efficient if the distances between the data points satisfy a metric. However, as real data sets are noisy, they often do not possess this fundamental property. For this reason, Gilbert and Jain [A. Gilbert and L. Jain, 2017] and Fan et al. [C. Fan et al., 2018] introduced the closely related sparse metric repair and metric violation distance problems. Given a matrix, representing all distances, the goal is to repair as few entries as possible to ensure they satisfy a metric. This problem was shown to be APX-hard, and an O(OPT^{1/3})-approximation was given, where OPT is the optimal solution size. In this paper, we generalize the problem, by describing distances by a possibly incomplete positively weighted graph, where again our goal is to find the smallest number of weight modifications so that they satisfy a metric. This natural generalization is more flexible as it takes into account different relationships among the data points. We demonstrate the inherent combinatorial structure of the problem, and give an approximation-preserving reduction from MULTICUT, which is hard to approximate within any constant factor assuming UGC. Conversely, we show that for any fixed constant ?, for the large class of ?-chordal graphs, the problem is fixed parameter tractable, answering an open question from previous work. Call a cycle broken if it contains an edge whose weight is larger than the sum of all its other edges, and call the amount of this difference its deficit. We present approximation algorithms, one depending on the maximum number of edges in a broken cycle, and one depending on the number of distinct deficit values, both quantities which may naturally be small. Finally, we give improved analysis of previous algorithms for complete graphs

arXiv.org e-Print Archive

DROPS Dagstuhl Research Online Publication Server