Search CORE

3,079 research outputs found

Interpretable statistics for complex modelling: quantile and topological learning

Author: Padellini Tullia
Publication venue
Publication date: 22/02/2019
Field of study

As the complexity of our data increased exponentially in the last decades, so has our need for interpretable features. This thesis revolves around two paradigms to approach this quest for insights. In the first part we focus on parametric models, where the problem of interpretability can be seen as a “parametrization selection”. We introduce a quantile-centric parametrization and we show the advantages of our proposal in the context of regression, where it allows to bridge the gap between classical generalized linear (mixed) models and increasingly popular quantile methods. The second part of the thesis, concerned with topological learning, tackles the problem from a non-parametric perspective. As topology can be thought of as a way of characterizing data in terms of their connectivity structure, it allows to represent complex and possibly high dimensional through few features, such as the number of connected components, loops and voids. We illustrate how the emerging branch of statistics devoted to recovering topological structures in the data, Topological Data Analysis, can be exploited both for exploratory and inferential purposes with a special emphasis on kernels that preserve the topological information in the data. Finally, we show with an application how these two approaches can borrow strength from one another in the identification and description of brain activity through fMRI data from the ABIDE project

Archivio della ricerca- Università di Roma La Sapienza

Short-term power load probability density forecasting method using kernel-based support vector quantile regression and Copula theory

Author: He Yaoyao
Li Haiyan
Liu Rui
Lu Xiaofen
Wang Shuo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

University of Birmingham Research Portal

Recommended from our members

The impact of short tandem repeat variation on gene expression.

Author: Fotsing Stephanie Feupe
Goren Alon
Gymrek Melissa
Margoliash Jonathan
Saini Shubham
Shleizer-Burko Sharona
Wang Catherine
Yanicky Richard
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

Short tandem repeats (STRs) have been implicated in a variety of complex traits in humans. However, genome-wide studies of the effects of STRs on gene expression thus far have had limited power to detect associations and provide insights into putative mechanisms. Here, we leverage whole-genome sequencing and expression data for 17 tissues from the Genotype-Tissue Expression Project to identify more than 28,000 STRs for which repeat number is associated with expression of nearby genes (eSTRs). We use fine-mapping to quantify the probability that each eSTR is causal and characterize the top 1,400 fine-mapped eSTRs. We identify hundreds of eSTRs linked with published genome-wide association study signals and implicate specific eSTRs in complex traits, including height, schizophrenia, inflammatory bowel disease and intelligence. Overall, our results support the hypothesis that eSTRs contribute to a range of human phenotypes, and our data should serve as a valuable resource for future studies of complex traits

eScholarship - University of California

Using genomic annotations increases statistical power to detect eGenes.

Author: Duong Dat
Ernst Jason
Eskin Eleazar
Han Buhm
Hormozdiari Farhad
Sul Jae Hoon
Zou Jennifer
Publication venue: eScholarship, University of California
Publication date: 01/06/2016
Field of study

MotivationExpression quantitative trait loci (eQTLs) are genetic variants that affect gene expression. In eQTL studies, one important task is to find eGenes or genes whose expressions are associated with at least one eQTL. The standard statistical method to determine whether a gene is an eGene requires association testing at all nearby variants and the permutation test to correct for multiple testing. The standard method however does not consider genomic annotation of the variants. In practice, variants near gene transcription start sites (TSSs) or certain histone modifications are likely to regulate gene expression. In this article, we introduce a novel eGene detection method that considers this empirical evidence and thereby increases the statistical power.ResultsWe applied our method to the liver Genotype-Tissue Expression (GTEx) data using distance from TSSs, DNase hypersensitivity sites, and six histone modifications as the genomic annotations for the variants. Each of these annotations helped us detected more candidate eGenes. Distance from TSS appears to be the most important annotation; specifically, using this annotation, our method discovered 50% more candidate eGenes than the standard permutation [email protected] or [email protected]

SNU Open Repository and Archive

PubMed Central

eScholarship - University of California

Additive models for quantile regression: model selection and confidence bandaids

Author: Roger Koenker
Publication venue
Publication date
Field of study

Additive models for conditional quantile functions provide an attractive framework for nonparametric regression applications focused on features of the response beyond its central tendency. Total variation roughness penalities can be used to control the smoothness of the additive components much as squared Sobelev penalties are used for classical L 2 smoothing splines. We describe a general approach to estimation and inference for additive models of this type. We focus attention primarily on selection of smoothing parameters and on the construction of confidence bands for the nonparametric components. Both pointwise and uniform confidence bands are introduced; the uniform bands are based on the Hotelling (1939) tube approach. Some simulation evidence is presented to evaluate finite sample performance and the methods are also illustrated with an application to modeling childhood malnutrition in India.

Research Papers in Economics

Recommended from our members

Early Detection Techniques for Market Risk Failure

Author: Olmo J.
Pouliot W.
Publication venue: Department of Economics, City University London
Publication date: 01/01/2008
Field of study

The implementation of appropriate statistical techniques for monitoring conditional VaR models, i.e, backtesting, reported by institutions is fundamental to determine their exposure to market risk. Backtesting techniques are important since the severity of the departures of the VaR model from market results determine the penalties imposed for inadequate VaR models. In this paper we make six contributions to backtesting techniques. In particular, we show that the Kupiec test can be viewed as a combination of CUSUM change point tests; we detail the lack of power of CUSUM methods in detecting violations of VaR as soon as these occur; we develop an alternative technique based on weighted U-statistic type processes that have power against wrong specifications of the risk measure and early detection; we show these new backtesting techniques are robust to the presence of estimation risk; we construct a new class of weight functions that can be used to weight our processes; and our methods are applicable both under conditional and unconditional VaR settings

City Research Online

Mathematical Statistics of Partially Identified Objects

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2013
Field of study

The workshop brought together leading experts in mathematical statistics, theoretical econometrics and bio-mathematics interested in mathematical objects occurring in the analysis of partially identified structures. The mathematical core of these ubiquitous structures has an impact on all three research areas and is expected to lead to the development of new algorithms for solving such problems

Repositorium für Naturwissenschaften und Technik

Intersection bounds: estimation and inference

Author: Adam Rosen
Sokbae 'Simon' Lee
Victor Chernozhukov
Publication venue
Publication date
Field of study

We develop a practical and novel method for inference on intersection bounds, namely bounds deﬁned by either the inﬁmum or supremum of a parametric or nonparametric function, or equivalently, the value of a linear programming problem with a potentially inﬁnite constraint set. Our approach is especially convenient for models comprised of a continuum of inequalities that are separable in parameters, and also applies to models with inequalities that are non-separable in parameters. Since analog estimators for intersection bounds can be severely biased in ﬁnite samples, routinely underestimating the size of the identiﬁed set, we also oﬀer a median-bias-corrected estimator of such bounds as a natural by-product of our inferential procedures. We develop theory for large sample inference based on the strong approximation of a sequence of series or kernel-based empirical processes by a sequence of "penultimate" Gaussian processes. These penultimate processes are generally not weakly convergent, and thus non-Donsker. Our theoretical results establish that we can nonetheless perform asymptotically valid inference based on these processes. Our construction also provides new adaptive inequality/moment selection methods. We provide conditions for the use of nonparametric kernel and series estimators, including a novel result that establishes strong approximation for any general series estimator admitting linearization, which may be of independent interest.

Research Papers in Economics

Recommended from our members

A global transcriptional network connecting noncoding mutations to changes in tumor gene expression.

Author: Bojorquez-Gomez Ana
Carter Hannah
Chen Kevin
Farley Emma K
Fraley Stephanie I
Huang Justin K
Ideker Trey
Kreisberg Jason F
Licon Katherine
Melton Collin
Olson Katrina M
Sanchez Kyle S
Shen John Paul
Snyder Michael
Velez Daniel Ortiz
Xu Guorong
Yu Michael Ku
Zhang Wei
Publication venue: eScholarship, University of California
Publication date: 01/04/2018
Field of study

Although cancer genomes are replete with noncoding mutations, the effects of these mutations remain poorly characterized. Here we perform an integrative analysis of 930 tumor whole genomes and matched transcriptomes, identifying a network of 193 noncoding loci in which mutations disrupt target gene expression. These 'somatic eQTLs' (expression quantitative trait loci) are frequently mutated in specific cancer tissues, and the majority can be validated in an independent cohort of 3,382 tumors. Among these, we find that the effects of noncoding mutations on DAAM1, MTG2 and HYI transcription are recapitulated in multiple cancer cell lines and that increasing DAAM1 expression leads to invasive cell migration. Collectively, the noncoding loci converge on a set of core pathways, permitting a classification of tumors into pathway-based subtypes. The somatic eQTL network is disrupted in 88% of tumors, suggesting widespread impact of noncoding mutations in cancer

eScholarship - University of California