587 research outputs found
Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression
Concentration inequalities form an essential toolkit in the study of high
dimensional (HD) statistical methods. Most of the relevant statistics
literature in this regard is based on sub-Gaussian or sub-exponential tail
assumptions. In this paper, we first bring together various probabilistic
inequalities for sums of independent random variables under much weaker
exponential type (namely sub-Weibull) tail assumptions. These results extract a
part sub-Gaussian tail behavior in finite samples, matching the asymptotics
governed by the central limit theorem, and are compactly represented in terms
of a new Orlicz quasi-norm - the Generalized Bernstein-Orlicz norm - that
typifies such tail behaviors.
We illustrate the usefulness of these inequalities through the analysis of
four fundamental problems in HD statistics. In the first two problems, we study
the rate of convergence of the sample covariance matrix in terms of the maximum
elementwise norm and the maximum k-sub-matrix operator norm which are key
quantities of interest in bootstrap, HD covariance matrix estimation and HD
inference. The third example concerns the restricted eigenvalue condition,
required in HD linear regression, which we verify for all sub-Weibull random
vectors through a unified analysis, and also prove a more general result
related to restricted strong convexity in the process. In the final example, we
consider the Lasso estimator for linear regression and establish its rate of
convergence under much weaker than usual tail assumptions (on the errors as
well as the covariates), while also allowing for misspecified models and both
fixed and random design. To our knowledge, these are the first such results for
Lasso obtained in this generality. The common feature in all our results over
all the examples is that the convergence rates under most exponential tails
match the usual ones under sub-Gaussian assumptions.Comment: 64 pages; Revised version (discussions added and some results
modified in Section 4, minor changes made throughout
Three essays in development economics
This thesis aims to provide information on youth employment struggles in a small developing country, Sri Lanka. Youth unemployment rates have consistently exceeded adult unemployment rates for many decades, but the root causes of the poor youth transition from school to work have not been explored. As a result, many important labor market policies that are being adopted to ameliorate this situation are being adopted without a firm purchase of the realities on the ground. This study aims to provide detailed systematic evidence on the school-to-work transition for Sri Lankan youth, which would, in turn, help to improve policy responses that the Government of Sri Lanka has adopted to try to tackle this problem.
The contributions of this thesis are as follows:
1. It lays out the difficulties that Sri Lankan youth face in making the transition from school-to-work. It addresses the issue of whether school leavers have the appropriate skills to thrive in the labor market. It highlights the need to target early school dropouts with various learning opportunities that can help improve their employment prospects. Key interventions here are work-skills training through job training programs and stand-along vocational training programs provided by both the private sector and NGOs.
2. It provides evidence that early out-of-work experiences tend to be damaging to future job prospects. Our study constitutes the first attempt ever to provide rigorous statistical estimates on this issue for Sri Lanka.
It provides a strong evidence based framework to evaluate training programs aimed at improving the labor market prospects of Sri Lankan youth by undertaking rigorous evaluation of these programs. By doing so, we improve knowledge about youth employment in a country that has traditionally underemphasized the collection of labor market outcomes data
The role of E-cadherin/β-catenin signalling in the development of an asthmatic airway epithelial phenotype
Asthma is characterized by reversible narrowing of the airways and airway obstruction, caused by inflammation, airway wall thickening and mucus production. Cells lining the airways, called epithelial cells, are thought to drive the development of asthma. When damaged, airway epithelial cells secrete signals called cytokines, such as CCL20 and GM-CSF, which attract and activate immune cells to induce airway inflammation. Airway epithelial cells form a tight barrier against the inhaled environment with the help of junction proteins, including E-cadherin and β-catenin. This barrier can be disrupted upon exposure to environmental insults such as house dust mite (HDM). E-cadherin is reduced in the airways of asthma patients, leading to loss of barrier function. Additionally, E-cadherin loss leads to the release of β-catenin into the cell, serving as signal to activate genes involved in inflammatory responses, airway wall remodelling and differentiation of epithelial cells towards mucus-producing cells. We hypothesized that activation of β-catenin signalling leads to abnormalities of airway epithelial cells as observed in asthma. To test this, we used small molecule inhibitor ICG-001 to specifically block β-catenin signalling. We cultured airway epithelial cells from asthma and healthy donors, exposed these to HDM, and studied the effect of ICG-001treatment. We found that ICG-001 improved airway epithelial barrier function, reduced the release of HDM-induced CCL20 and GM-CSF and inhibited differentiation towards mucus-producing cells. In a mouse model of asthma, ICG-001 also inhibited mucus production upon repeated HDM inhalation. Finally, we also observed that deletion of E-cadherin gene in mice was sufficient to cause spontaneous airway inflammation and did not further increase HDM-induced inflammation. In conclusion, we show that β-catenin signalling contributes to the development of asthma features and therefore, could be a novel target for therapeutic intervention
Linking Rare and Popular Tags in CQA Sites
Community Question Answer (CQA) sites are popular means for sharing knowledge in the form of questions and answers. These sites rely on tags for many purposes, such as content organi- zation, question routing, content searching, etc. Each CQA site has thousands of tags, making it challenging for the users to manually annotate their question posts with the appropriate tags. Understanding the semantic relationships between tags could aid in the tagging process, thereby properly routing the questions to the experts. Although it is relatively easier to mine the se- mantic relationships amongst the frequently used popular tags, it is difficult to do so for the less commonly used rare tags due to a lack of information about them. Most often, the rare tags are specific concepts subsumed by popular tags. For the questions to be routed to the right experts, they must be annotated with a proper mix of both popular and rare tags. In this paper, we pro- pose a novel approach to mine the semantic relationships between the rare and the popular tags. In addition, we show that the methods that are proposed to mine semantic relationships between popular tags cannot be used for rare tags. Specifically, we identify the top-k popular tags that are semantically related to a given rare tag, which is done using a set of semantic and topological features. Extensive evaluations on CQA datasets show the superiority of our proposed method over state-of-the-art methods
- …