22,220 research outputs found
Zipf and Heaps laws from dependency structures in component systems
Complex natural and technological systems can be considered, on a
coarse-grained level, as assemblies of elementary components: for example,
genomes as sets of genes, or texts as sets of words. On one hand, the joint
occurrence of components emerges from architectural and specific constraints in
such systems. On the other hand, general regularities may unify different
systems, such as the broadly studied Zipf and Heaps laws, respectively
concerning the distribution of component frequencies and their number as a
function of system size. Dependency structures (i.e., directed networks
encoding the dependency relations between the components in a system) were
proposed recently as a possible organizing principles underlying some of the
regularities observed. However, the consequences of this assumption were
explored only in binary component systems, where solely the presence or absence
of components is considered, and multiple copies of the same component are not
allowed. Here, we consider a simple model that generates, from a given ensemble
of dependency structures, a statistical ensemble of sets of components,
allowing for components to appear with any multiplicity. Our model is a minimal
extension that is memoryless, and therefore accessible to analytical
calculations. A mean-field analytical approach (analogous to the "Zipfian
ensemble" in the linguistics literature) captures the relevant laws describing
the component statistics as we show by comparison with numerical computations.
In particular, we recover a power-law Zipf rank plot, with a set of core
components, and a Heaps law displaying three consecutive regimes (linear,
sub-linear and saturating) that we characterize quantitatively
Polynomial Size Analysis of First-Order Shapely Functions
We present a size-aware type system for first-order shapely function
definitions. Here, a function definition is called shapely when the size of the
result is determined exactly by a polynomial in the sizes of the arguments.
Examples of shapely function definitions may be implementations of matrix
multiplication and the Cartesian product of two lists. The type system is
proved to be sound w.r.t. the operational semantics of the language. The type
checking problem is shown to be undecidable in general. We define a natural
syntactic restriction such that the type checking becomes decidable, even
though size polynomials are not necessarily linear or monotonic. Furthermore,
we have shown that the type-inference problem is at least semi-decidable (under
this restriction). We have implemented a procedure that combines run-time
testing and type-checking to automatically obtain size dependencies. It
terminates on total typable function definitions.Comment: 35 pages, 1 figur
Why is the snowflake schema a good data warehouse design?
Database design for data warehouses is based on the notion of the snowflake schema and its important special case, the star schema. The snowflake schema represents a dimensional model which is composed of a central fact table and a set of constituent dimension tables which can be further broken up into subdimension tables. We formalise the concept of a snowflake schema in terms of an acyclic database schema whose join tree satisfies certain structural properties. We then define a normal form for snowflake schemas which captures its intuitive meaning with respect to a set of functional and inclusion dependencies. We show that snowflake schemas in this normal form are independent as well as separable when the relation schemas are pairwise incomparable. This implies that relations in the data warehouse can be updated independently of each other as long as referential integrity is maintained. In addition, we show that a data warehouse in snowflake normal form can be queried by joining the relation over the fact table with the relations over its dimension and subdimension tables. We also examine an information-theoretic interpretation of the snowflake schema and show that the redundancy of the primary key of the fact table is zero
- …