429 research outputs found
Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models
Topic models, and more specifically the class of Latent Dirichlet Allocation
(LDA), are widely used for probabilistic modeling of text. MCMC sampling from
the posterior distribution is typically performed using a collapsed Gibbs
sampler. We propose a parallel sparse partially collapsed Gibbs sampler and
compare its speed and efficiency to state-of-the-art samplers for topic models
on five well-known text corpora of differing sizes and properties. In
particular, we propose and compare two different strategies for sampling the
parameter block with latent topic indicators. The experiments show that the
increase in statistical inefficiency from only partial collapsing is smaller
than commonly assumed, and can be more than compensated by the speedup from
parallelization and sparsity on larger corpora. We also prove that the
partially collapsed samplers scale well with the size of the corpus. The
proposed algorithm is fast, efficient, exact, and can be used in more modeling
situations than the ordinary collapsed sampler.Comment: Accepted for publication in Journal of Computational and Graphical
Statistic
Making Sense of the Census: Classifying and Counting Ethnicity in Oceania, 1965-2011
As the flagship government effort to count and classify its population, censuses are a key site for rendering and making visible group boundaries. Despite claims to objective rationality, however, census taking is a political and inherently subjective exercise. Censuses help shape the very categories they claim to capture: censuses do more than reflect social reality, they also participate in the social construction of this reality (Kertzer and Arel, 2002b, p. 2). While ethnicity – as a social construct – is imagined, its effects are far from imaginary, and census categorisations may have significant material consequences for the lives of citizens.
Although an increasing number of studies have examined how and why governments in particular times or places count their populations by ethnicity, studies that are both cross-national and longitudinal are rare. Attempting to in part bridge this gap, this thesis studies census questionnaires from 1965 to 2011 for 24 countries in Oceania. In doing so, it explores three general questions: 1) how ethnicity is conceptualised and categorised in Oceanic censuses over time; 2) the relationship between ethnic counting in territories to that of their metropoles; and 3) Oceanic approaches towards multiple ethnic identities. Spread over an area of thirty million square kilometres of the Pacific Ocean, Oceania provides an interesting context to study ethnic counting. The countries and territories which make up the region present an enormous diversity in physical geography and culture, languages and social organization, size and resource endowment. As the last region in the world to decolonise, Oceania includes a mix of dependencies and sovereign states.
The study finds that engagement with ethnic classification and counting is near-ubiquitous across the time period, with most countries having done so in all five cross-sectional census rounds. In general terms, in ethnic census questions ‘racial’ terminology of race and ancestry has been displaced over the focal period by ‘ethnic’ terminology of ethnicity and ethnic origin. Overall, the concept of ethnic origins predominates, although interestingly it is paired with race in the US territories, reflecting the ongoing social and political salience of race in the metropole. With respect to ethnic categories provided on census forms (and thus imbued with the legitimacy of explicit state recognition) the study finds a shift away from the imagined and flawed Melanesian/Micronesian/ Polynesian racial typology and other colonial impositions to more localised and self-identified Pacific identities. It is theorised that these shifts are emblematic of broader global changes in the impetuses for ethnic counting, from colonially-influenced ‘top down’ counting serving exclusionary ends to more inclusive, ‘bottom up’ approaches motivated by concerns for minority rights and inclusive policy-making
Delayed Sampling and Automatic Rao-Blackwellization of Probabilistic Programs
We introduce a dynamic mechanism for the solution of analytically-tractable
substructure in probabilistic programs, using conjugate priors and affine
transformations to reduce variance in Monte Carlo estimators. For inference
with Sequential Monte Carlo, this automatically yields improvements such as
locally-optimal proposals and Rao-Blackwellization. The mechanism maintains a
directed graph alongside the running program that evolves dynamically as
operations are triggered upon it. Nodes of the graph represent random
variables, edges the analytically-tractable relationships between them. Random
variables remain in the graph for as long as possible, to be sampled only when
they are used by the program in a way that cannot be resolved analytically. In
the meantime, they are conditioned on as many observations as possible. We
demonstrate the mechanism with a few pedagogical examples, as well as a
linear-nonlinear state-space model with simulated data, and an epidemiological
model with real data of a dengue outbreak in Micronesia. In all cases one or
more variables are automatically marginalized out to significantly reduce
variance in estimates of the marginal likelihood, in the final case
facilitating a random-weight or pseudo-marginal-type importance sampler for
parameter estimation. We have implemented the approach in Anglican and a new
probabilistic programming language called Birch.Comment: 13 pages, 4 figure
Real-Time Probabilistic Programming
Complex cyber-physical systems interact in real-time and must consider both
timing and uncertainty. Developing software for such systems is both expensive
and difficult, especially when modeling, inference, and real-time behavior need
to be developed from scratch. Recently, a new kind of language has emerged --
called probabilistic programming languages (PPLs) -- that simplify modeling and
inference by separating the concerns between probabilistic modeling and
inference algorithm implementation. However, these languages have primarily
been designed for offline problems, not online real-time systems. In this
paper, we combine PPLs and real-time programming primitives by introducing the
concept of real-time probabilistic programming languages (RTPPL). We develop an
RTPPL called ProbTime and demonstrate its usability on an automotive testbed
performing indoor positioning and braking. Moreover, we study fundamental
properties and design alternatives for runtime behavior, including a new
fairness-guided approach that automatically optimizes the accuracy of a
ProbTime system under schedulability constraints
Automatic Alignment in Higher-Order Probabilistic Programming Languages
Probabilistic Programming Languages (PPLs) allow users to encode statistical
inference problems and automatically apply an inference algorithm to solve
them. Popular inference algorithms for PPLs, such as sequential Monte Carlo
(SMC) and Markov chain Monte Carlo (MCMC), are built around checkpoints --
relevant events for the inference algorithm during the execution of a
probabilistic program. Deciding the location of checkpoints is, in current
PPLs, not done optimally. To solve this problem, we present a static analysis
technique that automatically determines checkpoints in programs, relieving PPL
users of this task. The analysis identifies a set of checkpoints that execute
in the same order in every program run -- they are aligned. We formalize
alignment, prove the correctness of the analysis, and implement the analysis as
part of the higher-order functional PPL Miking CorePPL. By utilizing the
alignment analysis, we design two novel inference algorithm variants: aligned
SMC and aligned lightweight MCMC. We show, through real-world experiments, that
they significantly improve inference execution time and accuracy compared to
standard PPL versions of SMC and MCMC
Bíborsügérek (Hemichromis guttatus Günther, 1862) a Hévízi-tó termálvizében = Jewel cichlids (Hemichromis guttatus Günther, 1862) in thermal water of Lake Hévíz (Western Hungary)
Abstract—We contend that repeatability of execution times is crucial to the validity of testing of real-time systems. However, computer architecture designs fail to deliver repeatable timing, a consequence of aggressive techniques that improve averagecase performance. This paper introduces the Precision-Timed ARM (PTARM), a precision-timed (PRET) microarchitecture implementation that exhibits repeatable execution times without sacrificing performance. The PTARM employs a repeatable thread-interleaved pipeline with an exposed memory hierarchy, including a repeatable DRAM controller. Our benchmarks show an improved throughput compared to a single-threaded in-order five-stage pipeline, given sufficient parallelism in the software. I
- …