Many complex systems are modular. Such systems can be represented as
"component systems", i.e., sets of elementary components, such as LEGO bricks
in LEGO sets. The bricks found in a LEGO set reflect a target architecture,
which can be built following a set-specific list of instructions. In other
component systems, instead, the underlying functional design and constraints
are not obvious a priori, and their detection is often a challenge of both
scientific and practical importance, requiring a clear understanding of
component statistics. Importantly, some quantitative invariants appear to be
common to many component systems, most notably a common broad distribution of
component abundances, which often resembles the well-known Zipf's law. Such
"laws" affect in a general and non-trivial way the component statistics,
potentially hindering the identification of system-specific functional
constraints or generative processes. Here, we specifically focus on the
statistics of shared components, i.e., the distribution of the number of
components shared by different system-realizations, such as the common bricks
found in different LEGO sets. To account for the effects of component
heterogeneity, we consider a simple null model, which builds
system-realizations by random draws from a universe of possible components.
Under general assumptions on abundance heterogeneity, we provide analytical
estimates of component occurrence, which quantify exhaustively the statistics
of shared components. Surprisingly, this simple null model can positively
explain important features of empirical component-occurrence distributions
obtained from data on bacterial genomes, LEGO sets, and book chapters. Specific
architectural features and functional constraints can be detected from
occurrence patterns as deviations from these null predictions, as we show for
the illustrative case of the "core" genome in bacteria.Comment: 18 pages, 7 main figures, 7 supplementary figure