11,943 research outputs found
Now the wars are over: The past, present and future of Scottish battlefields
Battlefield archaeology has provided a new way of appreciating historic battlefields. This paper provides a summary of the long history of warfare and conflict in Scotland which has given rise to a large number of battlefield sites. Recent moves to highlight the archaeological importance of these sites, in the form
of Historic Scotland’s Battlefields Inventory are discussed, along with some of the problems associated with the preservation and management of these important
cultural sites
Alliances, assemblages, and affects: Three moments of building collective working-class literacies
© 2018 by the National Council of Teachers of English. All rights reserved. This article explores how assemblage and affect theories can enable research into the formation of a collective working-class identity, inclusive of written, print, publication, and organizational literacies through the origins of the Federation of Worker Writer and Community Publishers, an organization that expanded its collectivity as new heritages, ethnicities, and immigrant identities altered the organization’s membership and “class” identity
Baby-Step Giant-Step Algorithms for the Symmetric Group
We study discrete logarithms in the setting of group actions. Suppose that
is a group that acts on a set . When , a solution
to can be thought of as a kind of logarithm. In this paper, we study
the case where , and develop analogs to the Shanks baby-step /
giant-step procedure for ordinary discrete logarithms. Specifically, we compute
two sets such that every permutation of can be
written as a product of elements and . Our
deterministic procedure is optimal up to constant factors, in the sense that
and can be computed in optimal asymptotic complexity, and and
are a small constant from in size. We also analyze randomized
"collision" algorithms for the same problem
Supervised Distance Matrices: Theory and Applications to Genomics
We propose a new approach to studying the relationship between a very high dimensional random variable and an outcome. Our method is based on a novel concept, the supervised distance matrix, which quantifies pairwise similarity between variables based on their association with the outcome. A supervised distance matrix is derived in two stages. The first stage involves a transformation based on a particular model for association. In particular, one might regress the outcome on each variable and then use the residuals or the influence curve from each regression as a data transformation. In the second stage, a choice of distance measure is used to compute all pairwise distances between variables in this transformed data. When the outcome is right-censored, we show that the supervised distance matrix can be consistently estimated using inverse probability of censoring weighted (IPCW) estimators based on the mean and covariance of the transformed data. The proposed methodology is illustrated with examples of gene expression data analysis with a survival outcome. This approach is widely applicable in genomics and other fields where high-dimensional data is collected on each subject
Resampling-based Multiple Testing: Asymptotic Control of Type I Error and Applications to Gene Expression Data
We define a general statistical framework for multiple hypothesis testing and show that the correct null distribution for the test statistics is obtained by projecting the true distribution of the test statistics onto the space of mean zero distributions. For common choices of test statistics (based on an asymptotically linear parameter estimator), this distribution is asymptotically multivariate normal with mean zero and the covariance of the vector influence curve for the parameter estimator. This test statistic null distribution can be estimated by applying the non-parametric or parametric bootstrap to correctly centered test statistics. We prove that this bootstrap estimated null distribution provides asymptotic control of most type I error rates. We show that obtaining a test statistic null distribution from a data null distribution, e.g. projecting the data generating distribution onto the space of all distributions satisfying the complete null), only provides the correct test statistic null distribution if the covariance of the vector influence curve is the same under the data null distribution as under the true data distribution. This condition is a weak version of the subset pivotality condition. We show that our multiple testing methodology controlling type I error is equivalent to constructing an error-specific confidence region for the true parameter and checking if it contains the hypothesized value. We also study the two sample problem and show that the permutation distribution produces an asymptotically correct null distribution if (i) the sample sizes are equal or (ii) the populations have the same covariance structure. We include a discussion of the application of multiple testing to gene expression data, where the dimension typically far exceeds the sample size. An analysis of a cancer gene expression data set illustrates the methodology
Statistical Inference for Simultaneous Clustering of Gene Expression Data
Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function of the true data generating distribution, and an estimate is obtained by applying this function to the empirical distribution. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as parameters which are compositions of individual mappings for clustering patients and genes. This framework allows one to assess classical properties of clustering methods, such as consistency, and to formally study statistical inference regarding the clustering parameter. We present results of simulations designed to assess the asymptotic validity of different bootstrap methods for estimating the distributions of estimated simultaneous clustering parameters. The method is illustrated on a publicly available data set
From treebank resources to LFG F-structures
We present two methods for automatically annotating treebank resources with functional structures. Both methods define systematic patterns of correspondence between partial PS configurations and functional structures. These are applied to PS rules extracted from treebanks, or directly to constraint set encodings of treebank PS trees
PhylOTU: a high-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data.
Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity
- …