Search CORE

27 research outputs found

Random projections for Bayesian regression

Author: Geppert Leo N.
Ickstadt Katja
Munteanu Alexander
Quedenfeld Jens
Sohler Christian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/11/2015
Field of study

This article deals with random projections applied as a data reduction technique for Bayesian regression analysis. We show sufficient conditions under which the entire

d

-dimensional distribution is approximately preserved under random projections by reducing the number of data points from

n

k\in O(\operatorname{poly}(d/\varepsilon))

in the case

n\gg d

. Under mild assumptions, we prove that evaluating a Gaussian likelihood function based on the projected data instead of the original data yields a

(1+O(\varepsilon))

-approximation in terms of the

\ell_2

Wasserstein distance. Our main result shows that the posterior distribution of Bayesian linear regression is approximated up to a small error depending on only an

\varepsilon

-fraction of its defining parameters. This holds when using arbitrary Gaussian priors or the degenerate case of uniform distributions over

\mathbb{R}^d

for

\beta

. Our empirical evaluations involve different simulated settings of Bayesian linear regression. Our experiments underline that the proposed method is able to recover the regression model up to small error while considerably reducing the total running time

arXiv.org e-Print Archive

Springer - Publisher Connector

Bayesian and frequentist regression approaches for very large data sets

Author: Geppert Leo Nikolaus
Publication venue
Publication date
Field of study

This thesis is concerned with the analysis of frequentist and Bayesian regression models for data sets with a very large number of observations. Such large data sets pose a challenge when conducting regression analysis, because of the memory required (mainly for frequentist regression models) and the running time of the analysis (mainly for Bayesian regression models). I present two different approaches that can be employed in this setting. The first approach is based on random projections and reduces the number of observations to manageable level as a first step before the regression analysis. The reduced number of observations depends on the number of variables in the data set and the desired goodness of the approximation. It is, however, independent of the number of observations in the original data set, making it especially useful for very large data sets. Theoretical guarantees for Bayesian linear regression are presented, which extend known guarantees for the frequentist case. The fundamental theorem covers Bayesian linear regression with arbitrary normal distributions or non-informative uniform distributions as prior distributions. I evaluate how close the posterior distributions of the original model and the reduced data set are for this theoretically covered case as well as for extensions towards hierarchical models and models using q-generalised normal distributions as prior. The second approach presents a transfer of the Merge & Reduce-principle from data structures to regression models. In Computer Science, Merge & Reduce is employed in order to enable the use of static data structures in a streaming setting. Here, I present three possibilities of employing Merge & Reduce directly on regression models. This enables sequential or parallel analysis of subsets of the data set. The partial results are then combined in a way that recovers the regression model on the full data set well. This approach is suitable for a wide range of regression models. I evaluate the performance on simulated and real world data sets using linear and Poisson regression models. Both approaches are able to recover regression models on the original data set well. They thus offer scalable versions of frequentist or Bayesian regression analysis for linear regression as well as extensions to generalised linear models, hierarchical models, and q-generalised normal distributions as prior distribution. Application on data streams or in distributed settings is also possible. Both approaches can be combined with multiple algorithms for frequentist or Bayesian regression analysis

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Streaming statistical models via Merge & Reduce

Author: Geppert Leo N.
Ickstadt Katja
Munteanu Alexander
Sohler Christian
Publication venue
Publication date: 12/06/2020
Field of study

Merge & Reduce is a general algorithmic scheme in the theory of data structures. Its main purpose is to transform static data structures—that support only queries—into dynamic data structures—that allow insertions of new elements—with as little overhead as possible. This can be used to turn classic offline algorithms for summarizing and analyzing data into streaming algorithms. We transfer these ideas to the setting of statistical data analysis in streaming environments. Our approach is conceptually different from previous settings where Merge & Reduce has been employed. Instead of summarizing the data, we combine the Merge & Reduce framework directly with statistical models. This enables performing computationally demanding data analysis tasks on massive data sets. The computations are divided into small tractable batches whose size is independent of the total number of observations n. The results are combined in a structured way at the cost of a bounded O(logn) factor in their memory requirements. It is only necessary, though nontrivial, to choose an appropriate statistical model and design merge and reduce operations on a casewise basis for the specific type of model. We illustrate our Merge & Reduce schemes on simulated and real-world data employing (Bayesian) linear regression models, Gaussian mixture models and generalized linear models

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Providing Information by Resource- Constrained Data Analysis

The Collaborative Research Center SFB 876 (Providing Information by Resource-Constrained Data Analysis) brings together the research fields of data analysis (Data Mining, Knowledge Discovery in Data Bases, Machine Learning, Statistics) and embedded systems and enhances their methods such that information from distributed, dynamic masses of data becomes available anytime and anywhere. The research center approaches these problems with new algorithms respecting the resource constraints in the different scenarios. This Technical Report presents the work of the members of the integrated graduate school

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Validity of stable isotope data in doping control: perspectives and proposals

Author: Flenker Ulrich
Geppert Leo N.
Ickstadt Katja
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

?13C and d13C values of endogenous urinary steroids represent physiological random variables. Measurement uncertainty and biological scatter likewise contribute to the variances. The statistical distributions of negative controls are well investigated, but there is little knowledge about the corresponding distributions of steroid-users. For these reasons valid discrimination of steroid users from non-users by 13C/12C analysis of endogenous steroids requires elaborate statistical treatment. Corresponding Bayesian approaches are presented following an introduction to the rationale. The use of mixture models appears appropriate. The distribution of routine data has been deconvolved and characterized accordingly. The mixture components, which presumably represent steroid users and non-users, exhibit considerable overlap. The validity of a given result depends on both the analytical uncertainty and the prior probability of doping offenses. Low analytical uncertainties but high prior probabilities facilitate valid detection of doping offenses. Two recommendations can be deduced. First, before starting an 13C/12C analysis, any initial suspicion should be well-substantiated. This precludes use of permissive criteria derived from the steroid profile. Secondly, knowledge of relevant 13C/12C distributions is required. This must cover representative numbers of authentic steroid users. Finally, it is desirable that the conditional probability for steroid administration rather than the measurement uncertainty is calculated and reported. This quantity possesses superior validity and it is largely independent of laboratory bias. The findings suggest and facilitate flexible handling of decision limits. Proposals for the evaluation of stable isotope data are presented. Copyright (c) 2012 John Wiley & Sons, Ltd

Crossref

Kölner UniversitätsPublikationsServer

Random projections for Bayesian regression

Author: Geppert Leo N.
Ickstadt Katja
Munteanu Alexander
Sohler Christian
Publication venue
Publication date
Field of study

This article introduces random projections applied as a data reduction technique for Bayesian regression analysis. We show sufficient conditions under which the entire d -dimensional distribution is preserved under random projections by reducing the number of data points from n to k element of O(poly(d/epsilon)) in the case n >> d . Under mild assumptions, we prove that evaluating a Gaussian likelihood function based on the projected data instead of the original data yields a (1+ O(epsilon))-approximation in the l_2-Wasserstein distance. Our main result states that the posterior distribution of a Bayesian linear regression is approximated up to a small error depending on only an epsilon-fraction of its defining parameters when using either improper non-informative priors or arbitrary Gaussian priors. Our empirical evaluations involve different simulated settings of Bayesian linear regression. Our experiments underline that the proposed method is able to recover the regression model while considerably reducing the total run-time

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Comparison of the capacity of murine and human class I MHC molecules to stimulate T cell activation

Author: Brodsky
Brorson
Connolly
Dasgupta
Ellis
Evans
Geppert
Geppert
Geppert
Geppert
Gilliland
Guild
Guild
Guild
Gur
Hahn
Hansen
Herzenberg
Houlden
Imboden
Klein
Kroczek
Lee
Leo
Malek
Mittler
Ozato
Ploegh
Potter
Potter
Rebaï
Salter
Salter
Schraven
Sharon
Taurog
Taylor
Turco
Wacholtz
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Mortality Among Very Low-Birthweight Infants in Hospitals Serving Minority Populations

Author: Carpenter Joseph
Geppert Jeffrey
Horbar Jeffrey D.
Kenny Michael
Morales Leo S.
Rogowski Jeannette
Staiger Douglas
Publication date: 01/12/2005
Field of study

Objective. We investigated whether the proportion of Black very low-birth-weight (VLBW) infants treated by hospitals is associated with neonatal mortality for Black and White VLBW infants. Methods. We analyzed medical records linked to secondary data sources for 74050 Black and White VLBW infants (501 g to 1500 g) treated by 332 hospitals participating in the Vermont Oxford Network from 1995 to 2000. Hospitals where more than 35% of VLBW infants treated were Black were defined as “minority-serving.” Results. Compared with hospitals where less than 15% of the VLBW infants were Black, minority-serving hospitals had significantly higher risk-adjusted neonatal mortality rates (White infants: odds ratio [OR]=1.30, 95% confidence interval [CI] = 1.09, 1.56; Black infants: OR = 1.29, 95% CI = 1.01, 1.64; Pooled: OR = 1.28, 95% CI=1.10, 1.50). Higher neonatal mortality in minority-serving hospitals was not explained by either hospital or treatment variables. Conclusions. Minority-serving hospitals may provide lower quality of care to VLBW infants compared with other hospitals. Because VLBW Black infants are disproportionately treated by minority-serving hospitals, higher neonatal mortality rates at these hospitals may contribute to racial disparities in infant mortality in the United States

Crossref

PubMed Central

Influence of breast cancer risk factors and intramammary biotransformation on estrogen homeostasis in the human breast

Author: Esch Harald L.
Geppert Leo N.
Hauptstein René
Ickstadt Katja
Kleider Carolin
Lehrmann Leane
Pemp Daniela
Schmalbach Katja
Wigmann Claudia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Understanding intramammary estrogen homeostasis constitutes the basis of understanding the role of lifestyle factors in breast cancer etiology. Thus, the aim of the present study was to identify variables influencing levels of the estrogens present in normal breast glandular and adipose tissues (GLT and ADT, i.e., 17β-estradiol, estrone, estrone-3-sulfate, and 2-methoxy-estrone) by multiple linear regression models. Explanatory variables (exVARs) considered were (a) levels of metabolic precursors as well as levels of transcripts encoding proteins involved in estrogen (biotrans)formation, (b) data on breast cancer risk factors (i.e., body mass index, BMI, intake of estrogen-active drugs, and smoking) collected by questionnaire, and (c) tissue characteristics (i.e., mass percentage of oil, oil%, and lobule type of the GLT). Levels of estrogens in GLT and ADT were influenced by both extramammary production (menopausal status, intake of estrogen-active drugs, and BMI) thus showing that variables known to affect levels of circulating estrogens influence estrogen levels in breast tissues as well for the first time. Moreover, intratissue (biotrans)formation (by aromatase, hydroxysteroid-17beta-dehydrogenase 2, and beta-glucuronidase) influenced intratissue estrogen levels, as well. Distinct differences were observed between the exVARs exhibiting significant influence on (a) levels of specific estrogens and (b) the same dependent variables in GLT and ADT. Since oil% and lobule type of GLT influenced levels of some estrogens, these variables may be included in tissue characterization to prevent sample bias. In conclusion, evidence for the intracrine activity of the human breast supports biotransformation-based strategies for breast cancer prevention. The susceptibility of estrogen homeostasis to systemic and tissue-specific modulation renders both beneficial and adverse effects of further variables associated with lifestyle and the environment possible

Online-Publikations-Server der Universität Würzburg

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

Influence of breast cancer risk factors on proliferation and DNA damage in human breast glandular tissues: role of intracellular estrogen levels, oxidative stress and estrogen biotransformation

Author: Cecil Alexander
Dankekar Thomas
Esch Harald L.
Geppert Leo N.
Hauptstein René
Ickstadt Katja
Lehmann Leane
Mahdiani Maryam
Pemp Daniela
Schmalbach Katja
Wunder Juliane
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Breast cancer etiology is associated with both proliferation and DNA damage induced by estrogens. Breast cancer risk factors (BCRF) such as body mass index (BMI), smoking, and intake of estrogen-active drugs were recently shown to influence intratissue estrogen levels. Thus, the aim of the present study was to investigate the influence of BCRF on estrogen-induced proliferation and DNA damage in 41 well-characterized breast glandular tissues derived from women without breast cancer. Influence of intramammary estrogen levels and BCRF on estrogen receptor (ESR) activation, ESR-related proliferation (indicated by levels of marker transcripts), oxidative stress (indicated by levels of GCLC transcript and oxidative derivatives of cholesterol), and levels of transcripts encoding enzymes involved in estrogen biotransformation was identified by multiple linear regression models. Metabolic fluxes to adducts of estrogens with DNA (E-DNA) were assessed by a metabolic network model (MNM) which was validated by comparison of calculated fluxes with data on methoxylated and glucuronidated estrogens determined by GC- and UHPLC-MS/MS. Intratissue estrogen levels significantly influenced ESR activation and fluxes to E-DNA within the MNM. Likewise, all BCRF directly and/or indirectly influenced ESR activation, proliferation, and key flux constraints influencing E-DNA (i.e., levels of estrogens, CYP1B1, SULT1A1, SULT1A2, and GSTP1). However, no unambiguous total effect of BCRF on proliferation became apparent. Furthermore, BMI was the only BCRF to indeed influence fluxes to E-DNA (via congruent adverse influence on levels of estrogens, CYP1B1 and SULT1A2)

PubMed Central

Online-Publikations-Server der Universität Würzburg

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)