95,295 research outputs found
Scaling laws and fluctuations in the statistics of word frequencies
In this paper we combine statistical analysis of large text databases and
simple stochastic models to explain the appearance of scaling laws in the
statistics of word frequencies. Besides the sublinear scaling of the vocabulary
size with database size (Heaps' law), here we report a new scaling of the
fluctuations around this average (fluctuation scaling analysis). We explain
both scaling laws by modeling the usage of words by simple stochastic processes
in which the overall distribution of word-frequencies is fat tailed (Zipf's
law) and the frequency of a single word is subject to fluctuations across
documents (as in topic models). In this framework, the mean and the variance of
the vocabulary size can be expressed as quenched averages, implying that: i)
the inhomogeneous dissemination of words cause a reduction of the average
vocabulary size in comparison to the homogeneous case, and ii) correlations in
the co-occurrence of words lead to an increase in the variance and the
vocabulary size becomes a non-self-averaging quantity. We address the
implications of these observations to the measurement of lexical richness. We
test our results in three large text databases (Google-ngram, Enlgish
Wikipedia, and a collection of scientific articles).Comment: 19 pages, 4 figure
Challenges in Complex Systems Science
FuturICT foundations are social science, complex systems science, and ICT.
The main concerns and challenges in the science of complex systems in the
context of FuturICT are laid out in this paper with special emphasis on the
Complex Systems route to Social Sciences. This include complex systems having:
many heterogeneous interacting parts; multiple scales; complicated transition
laws; unexpected or unpredicted emergence; sensitive dependence on initial
conditions; path-dependent dynamics; networked hierarchical connectivities;
interaction of autonomous agents; self-organisation; non-equilibrium dynamics;
combinatorial explosion; adaptivity to changing environments; co-evolving
subsystems; ill-defined boundaries; and multilevel dynamics. In this context,
science is seen as the process of abstracting the dynamics of systems from
data. This presents many challenges including: data gathering by large-scale
experiment, participatory sensing and social computation, managing huge
distributed dynamic and heterogeneous databases; moving from data to dynamical
models, going beyond correlations to cause-effect relationships, understanding
the relationship between simple and comprehensive models with appropriate
choices of variables, ensemble modeling and data assimilation, modeling systems
of systems of systems with many levels between micro and macro; and formulating
new approaches to prediction, forecasting, and risk, especially in systems that
can reflect on and change their behaviour in response to predictions, and
systems whose apparently predictable behaviour is disrupted by apparently
unpredictable rare or extreme events. These challenges are part of the FuturICT
agenda
Modeling views in the layered view model for XML using UML
In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources. Conversely, since the introduction of Extensible Markup Language (XML), it is fast emerging as the dominant standard for storing, describing, and interchanging data among various web and heterogeneous data sources. In combination with XML Schema, XML provides rich facilities for defining and constraining user-defined data semantics and properties, a feature that is unique to XML. In this context, it is interesting to investigate traditional database features, such as view models and view design techniques for XML. However, traditional view formalisms are strongly coupled to the data language and its syntax, thus it proves to be a difficult task to support views in the case of semi-structured data models. Therefore, in this paper we propose a Layered View Model (LVM) for XML with conceptual and schemata extensions. Here our work is three-fold; first we propose an approach to separate the implementation and conceptual aspects of the views that provides a clear separation of concerns, thus, allowing analysis and design of views to be separated from their implementation. Secondly, we define representations to express and construct these views at the conceptual level. Thirdly, we define a view transformation methodology for XML views in the LVM, which carries out automated transformation to a view schema and a view query expression in an appropriate query language. Also, to validate and apply the LVM concepts, methods and transformations developed, we propose a view-driven application development framework with the flexibility to develop web and database applications for XML, at varying levels of abstraction
- …