35,811 research outputs found
Bayesian Inference in a Sample Selection Model
This paper develops methods of Bayesian inference in a sample selection model. The main feature of this model is that the outcome variable is only partially observed. We first present a Gibbs sampling algorithm for a model in which the selection and outcome errors are normally distributed. The algorithm is then extended to analyze models that are characterized by nonnormality. Specifically, we use a Dirichlet process prior and model the distribution of the unobservables as a mixture of normal distributions with a random number of components. The posterior distribution in this model can simultaneously detect the presence of selection effects and departures from normality. Our methods are illustrated using some simulated data and an abstract from the RAND health insurance experiment
Succinct Representations of Permutations and Functions
We investigate the problem of succinctly representing an arbitrary
permutation, \pi, on {0,...,n-1} so that \pi^k(i) can be computed quickly for
any i and any (positive or negative) integer power k. A representation taking
(1+\epsilon) n lg n + O(1) bits suffices to compute arbitrary powers in
constant time, for any positive constant \epsilon <= 1. A representation taking
the optimal \ceil{\lg n!} + o(n) bits can be used to compute arbitrary powers
in O(lg n / lg lg n) time.
We then consider the more general problem of succinctly representing an
arbitrary function, f: [n] \rightarrow [n] so that f^k(i) can be computed
quickly for any i and any integer power k. We give a representation that takes
(1+\epsilon) n lg n + O(1) bits, for any positive constant \epsilon <= 1, and
computes arbitrary positive powers in constant time. It can also be used to
compute f^k(i), for any negative integer k, in optimal O(1+|f^k(i)|) time.
We place emphasis on the redundancy, or the space beyond the
information-theoretic lower bound that the data structure uses in order to
support operations efficiently. A number of lower bounds have recently been
shown on the redundancy of data structures. These lower bounds confirm the
space-time optimality of some of our solutions. Furthermore, the redundancy of
one of our structures "surpasses" a recent lower bound by Golynski [Golynski,
SODA 2009], thus demonstrating the limitations of this lower bound.Comment: Preliminary versions of these results have appeared in the
Proceedings of ICALP 2003 and 2004. However, all results in this version are
improved over the earlier conference versio
Provenance for SPARQL queries
Determining trust of data available in the Semantic Web is fundamental for
applications and users, in particular for linked open data obtained from SPARQL
endpoints. There exist several proposals in the literature to annotate SPARQL
query results with values from abstract models, adapting the seminal works on
provenance for annotated relational databases. We provide an approach capable
of providing provenance information for a large and significant fragment of
SPARQL 1.1, including for the first time the major non-monotonic constructs
under multiset semantics. The approach is based on the translation of SPARQL
into relational queries over annotated relations with values of the most
general m-semiring, and in this way also refuting a claim in the literature
that the OPTIONAL construct of SPARQL cannot be captured appropriately with the
known abstract models.Comment: 22 pages, extended version of the ISWC 2012 paper including proof
Computable de Finetti measures
We prove a computable version of de Finetti's theorem on exchangeable
sequences of real random variables. As a consequence, exchangeable stochastic
processes expressed in probabilistic functional programming languages can be
automatically rewritten as procedures that do not modify non-local state. Along
the way, we prove that a distribution on the unit interval is computable if and
only if its moments are uniformly computable.Comment: 32 pages. Final journal version; expanded somewhat, with minor
corrections. To appear in Annals of Pure and Applied Logic. Extended abstract
appeared in Proceedings of CiE '09, LNCS 5635, pp. 218-23
A comprehensive evaluation of alignment algorithms in the context of RNA-seq.
Transcriptome sequencing (RNA-Seq) overcomes limitations of previously used RNA quantification methods and provides one experimental framework for both high-throughput characterization and quantification of transcripts at the nucleotide level. The first step and a major challenge in the analysis of such experiments is the mapping of sequencing reads to a transcriptomic origin including the identification of splicing events. In recent years, a large number of such mapping algorithms have been developed, all of which have in common that they require algorithms for aligning a vast number of reads to genomic or transcriptomic sequences. Although the FM-index based aligner Bowtie has become a de facto standard within mapping pipelines, a much larger number of possible alignment algorithms have been developed also including other variants of FM-index based aligners. Accordingly, developers and users of RNA-seq mapping pipelines have the choice among a large number of available alignment algorithms. To provide guidance in the choice of alignment algorithms for these purposes, we evaluated the performance of 14 widely used alignment programs from three different algorithmic classes: algorithms using either hashing of the reference transcriptome, hashing of reads, or a compressed FM-index representation of the genome. Here, special emphasis was placed on both precision and recall and the performance for different read lengths and numbers of mismatches and indels in a read. Our results clearly showed the significant reduction in memory footprint and runtime provided by FM-index based aligners at a precision and recall comparable to the best hash table based aligners. Furthermore, the recently developed Bowtie 2 alignment algorithm shows a remarkable tolerance to both sequencing errors and indels, thus, essentially making hash-based aligners obsolete
- …