Search CORE

1,541 research outputs found

Statistical Methods in Topological Data Analysis for Complex, High-Dimensional Data

Author: Doerge R. W.
Medina Patrick S.
Publication venue
Publication date: 01/01/2015
Field of study

The utilization of statistical methods an their applications within the new field of study known as Topological Data Analysis has has tremendous potential for broadening our exploration and understanding of complex, high-dimensional data spaces. This paper provides an introductory overview of the mathematical underpinnings of Topological Data Analysis, the workflow to convert samples of data to topological summary statistics, and some of the statistical methods developed for performing inference on these topological summary statistics. The intention of this non-technical overview is to motivate statisticians who are interested in learning more about the subject.Comment: 15 pages, 7 Figures, 27th Annual Conference on Applied Statistics in Agricultur

arXiv.org e-Print Archive

Kansas State University

STATISTICAL THRESHOLD VALUES FOR LOCATING QUANTITATIVE TRAIT LOCI

Author: Doerge R. W.
Publication venue: 'New Prairie Press'
Publication date: 26/04/1998
Field of study

The detection and location of quantitative trait loci (QTL) that control quantitative characters is a problem of great interest to the genetic mapping community. Interval mapping has proved to be a useful tool in locating QTL, but has recently been challenged by faster, more sophisticated regression methods (e.g. .. composite interval mapping). Regardless of the method used to locate QTL. the distribution of the test statistic (LOD score or likelihood ratio test) is unknown. Due to the quantitative trait values following a mixture distribution rather than a single distribution, the asymptotic distribution of the test statistic is not from a standard family, such as chi-square. The purpose of this work is to introduce interval mapping, discuss the distribution of the resulting test statistic, and then present empirical threshold values for the declaration of major QTL. as well as minor QTL. Empirical threshold values are obtained by permuting the actual experimental trait data, under a fixed and known genetic map. for the purpose of representing the distribution of the test statistic under the null hypothesis of no QTL effect. Not only is a permutation test statistically justified in this case, the test reflects the specifics of the experimental situation under investigation (i. e., sample size, marker density, skewing, etc.), and may be used in a conditional sense to derive thresholds for minor QTL once a major effect has been determined

Kansas State University

Intersection tests for single marker QTL analysis can be more powerful than two marker QTL analysis

Author: Coffman Cynthia J
Doerge RW
McIntyre Lauren M
Wayne Marta L
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

BACKGROUND: It has been reported in the quantitative trait locus (QTL) literature that when testing for QTL location and effect, the statistical power supporting methodologies based on two markers and their estimated genetic map is higher than for the genetic map independent methodologies known as single marker analyses. Close examination of these reports reveals that the two marker approaches are more powerful than single marker analyses only in certain cases. Simulation studies are a commonly used tool to determine the behavior of test statistics under known conditions. We conducted a simulation study to assess the general behavior of an intersection test and a two marker test under a variety of conditions. The study was designed to reveal whether two marker tests are always more powerful than intersection tests, or whether there are cases when an intersection test may outperform the two marker approach. We present a reanalysis of a data set from a QTL study of ovariole number in Drosophila melanogaster. RESULTS: Our simulation study results show that there are situations where the single marker intersection test equals or outperforms the two marker test. The intersection test and the two marker test identify overlapping regions in the reanalysis of the Drosophila melanogaster data. The region identified is consistent with a regression based interval mapping analysis. CONCLUSION: We find that the intersection test is appropriate for analysis of QTL data. This approach has the advantage of simplicity and for certain situations supplies equivalent or more powerful results than a comparable two marker test

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Purdue E-Pubs

Tax Policy for the Wider Cryptoverse

Author: Doerge Arild B.
Publication venue: University of Tennessee College of Law
Publication date: 01/12/2019
Field of study

The rapid rise of Bitcoin and other “cryptoassets” offers many interesting technological capabilities but also comes with uncertainty and volatility in the markets for these assets. The diversity of types of cryptoassets is increasing rapidly, while public understanding and government policy have generally been slow to take account of this diversity. In regard to taxation policy related to cryptoassets, current IRS guidance merely categorizes cryptoassets as general property. The policy implications of this classification run contrary to fundamental goals of tax policy by inhibiting how people use cryptoassets, making compliance more complex and ambiguous than necessary, and taxing cryptoasset transactions differently than analogous currency transactions and like kind exchanges in addition to contradicting broader domestic and foreign policy goals. A more optimal tax policy would include (1) a general currency classification for cryptoassets; (2) a de minimis exemption for use of cryptoassets as a medium of exchange; and (3) an additional non-recognition exemption for gains realized on all transactions involving only cryptoassets, such as like kind exchanges. This proposed model would greatly improve the efficiency, equity, and administrability of taxation related to cryptoassets in addition to better serving public policy in other areas

Texas A&M University School of Law

Estimating the Proportion of True Null Hypotheses for Multiple Comparisons

Author: Doerge R.W.
Jiang Hongmei
Publication venue: Libertas Academica
Publication date: 01/01/2008
Field of study

Whole genome microarray investigations (e.g. differential expression, differential methylation, ChIP-Chip) provide opportunities to test millions of features in a genome. Traditional multiple comparison procedures such as familywise error rate (FWER) controlling procedures are too conservative. Although false discovery rate (FDR) procedures have been suggested as having greater power, the control itself is not exact and depends on the proportion of true null hypotheses. Because this proportion is unknown, it has to be accurately (small bias, small variance) estimated, preferably using a simple calculation that can be made accessible to the general scientific community. We propose an easy-to-implement method and make the R code available, for estimating the proportion of true null hypotheses. This estimate has relatively small bias and small variance as demonstrated by (simulated and real data) comparing it with four existing procedures. Although presented here in the context of microarrays, this estimate is applicable for many multiple comparison situations

CiteSeerX

Directory of Open Access Journals

PubMed Central

Combining Affymetrix microarray results

Author: Doerge RW
Stevens John R
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: As the use of microarray technology becomes more prevalent it is not unusual to find several laboratories employing the same microarray technology to identify genes related to the same condition in the same species. Although the experimental specifics are similar, typically a different list of statistically significant genes result from each data analysis. RESULTS: We propose a statistically-based meta-analytic approach to microarray analysis for the purpose of systematically combining results from the different laboratories. This approach provides a more precise view of genes that are significantly related to the condition of interest while simultaneously allowing for differences between laboratories. Of particular interest is the widely used Affymetrix oligonucleotide array, the results of which are naturally suited to a meta-analysis. A simulation model based on the Affymetrix platform is developed to examine the adaptive nature of the meta-analytic approach and to illustrate the usefulness of such an approach in combining microarray results across laboratories. The approach is then applied to real data involving a mouse model for multiple sclerosis. CONCLUSION: The quantitative estimates from the meta-analysis model tend to be closer to the "true" degree of differential expression than any single lab. Meta-analytic methods can systematically combine Affymetrix results from different laboratories to gain a clearer understanding of genes' relationships to specific conditions of interest

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Purdue E-Pubs

A NON-PARAMETRIC EMPIRICAL BAYES APPROACH FOR ESTIMATING TRANSCRIPT ABUNDANCE IN UN-REPLICATED NEXT-GENERATION SEQUENCING DATA

Author: Doerge R. W.
Srivastava Sanvesh
Publication venue: 'New Prairie Press'
Publication date: 25/04/2010
Field of study

Empirical Bayes approaches have been widely used to analyze data from high throughput sequencing devices. These approaches rely on borrowing information available for all the genes across samples to get better estimates of gene level expression. To date, transcript abundance in data from next generation sequencing (NGS) technologies has been estimated using parametric approaches for analyzing count data, namely – gamma-Poisson model, negative binomial model, and over-dispersed logistic model. One serious limitation of these approaches is they cannot be applied in absence of replication. The high cost of NGS technologies imposes a serious restriction on the number of biological replicates that can be assessed. In this work, a simple non–parametric empirical Bayes modeling approach is suggested for the estimation of transcript abundances in un-replicated NGS data. The empirical Bayes analysis of NGS data follows naturally from the empirical Bayes analysis of microarray data by modifying the distributional assumption on the observations. The analysis is presented for transcript abundance estimation for two treatment groups in an un-replicated experiment, but it is easily extended for more treatment groups and replicated experiments

Kansas State University