32 research outputs found

    Predicting birth-rates through German micro-census data: a comparison of probit and Boolean regression

    Get PDF
    This paper investigates the complex interrelationships of qualitative socio-economic variables in the context of Boolean Regression. The data forming the basis for this investigation are from the German Micro-census waves of 1996 2002 and comprise about 400 000 observations. Boolean Regression is used to predict how birth events depend on the socio-economic characteristics of women and their male partners. Boolean Regression is compared to Probit. The data set is split into two halves in order to determine which method yields more accurate predictions. It turns out that Probit is superior, if a given socio-economic type is substantiated by less than about 30 observations, whereas Boolean Regression is superior to Probit, if a given socio-economic type is verified by more than about 30 observations. Therefore a "hybrid" estimation method, combining Probit and Boolean Regression, is proposed and used in the remainder of the paper. Different methods of interpreting the results of the estimations are introduced, relying mainly on simulation techniques. With respect to the reasons for the prevailing low German fertility rates, it is evident that these could be decisively higher if people had higher incomes and earned more with relative ease. From a methodological perspective, the paper demonstrates that Scientific Use Files of socio-economic data comprising hundred thousands or even millions of observations, and which have been made available recently, are the natural field of application for Boolean Regression. Possible consequences for future social and economic research are discussed. --

    Identification of binary cellular automata from spatiotemporal binary patterns using a fourier representation

    Get PDF
    The identification of binary cellular automata from spatio-temporal binary patterns is investigated in this paper. Instead of using the usual Boolean or multilinear polynomial representation, the Fourier transform representation of Boolean functions is employed in terms of a Fourier basis. In this way, the orthogonal forward regression least-squares algorithm can be applied directly to detect the significant terms and to estimate the associated parameters. Compared with conventional methods, the new approach is much more robust to noise. Examples are provided to illustrate the effectiveness of the proposed approach

    The dynamic of bicycle finals: A theoretical and empirical analysis of slipstreaming

    Get PDF
    The finals of bicycle races have certain peculiarities compared to other sports. The leading group in a bicycle race rides comparatively slowly up to a few meters before the finishing line, until one of the competitors tries to shake off his opponents. Only then do all riders perform to the limit. This raises the question of who takes the thankless early lead and why. The rider who is in front just before the final sprint is seldom the one who wins in the end. By means of the relevant physics it can be shown theoretically that on the one hand the better rider will always be able to win the race and, more surprisingly on the other hand, the better rider will definitely be the rider in the slipstream. These findings are confirmed empirically by means of several logistic regressions. 49 final sprints of road races between two up to seven professional racing cyclists with varying performance potentials were analyzed concerning the order of the riders at the beginning of the final sprint and the final outcome of the race. Subsequently, possibilities for further research and implications for sport economics are described. -- Sprintentscheidungen im Radsport weisen im Vergleich zu anderen Sportarten einige Besonderheiten auf. Die in einem Radrennen führende Gruppe fährt bis wenige Meter vor die Ziellinie vergleichsweise langsam, bevor dann ein Fahrer kurz vor dem Ziel versucht, seine Konkurrenten abzuschütteln. Erst dann erreichen die Fahrer ihre Höchstgeschwindigkeit. Der zu Beginn des Sprints in Führung liegende Rennfahrer gewinnt dabei in den seltensten Fällen am Ende auch das Rennen. Daher stellt sich die Frage, wer zu Beginn des Sprints die undankbarer Führungsposition im Wind übernimmt und warum. Zunächst kann mittels grundlegender Physik theoretisch gezeigt werden, dass zum einen der bessere Fahrer immer in der Lage sein wird, das Rennen zu gewinnen, und zum anderen, weniger naheliegend, der bessere Fahrer stets zu Beginn des Sprints im Windschatten fahren wird. Diese theoretischen Ergebnisse werden anschließend empirisch anhand logistischer Regressionen bestätigt. 49 Sprintentscheidungen in Straßenradrennen zwischen zwei bis sieben professionellen Radrennfahrern mit unterschiedlichen Leistungsstärken werden auf die Reihenfolge der Fahrer zu Beginn des Sprints und das Ergebnis des Rennens hin untersucht. Abschließend werden Möglichkeiten für weitere Untersuchungen sowie sportökonomische Implikationen dargestellt.

    Refining mutation variants in Cartesian genetic programming

    Get PDF
    In this work, we improve upon two frequently used mutation algorithms and therefore introduce three refined mutation strategies for Cartesian Genetic Programming. At first, we take the probabilistic concept of a mutation rate and split it into two mutation rates, one for active and inactive nodes respectively. Afterwards, the mutation method Single is taken and extended. Single mutates nodes until an active node is hit. Here, our extension mutates nodes until more than one but still predefined number n of active nodes are hit. At last, this concept is taken and a decay rate for n is introduced. Thus, we decrease the required number of active nodes hit per mutation step during CGP’s training process. We show empirically on different classification, regression and boolean regression benchmarks that all methods lead to better fitness values. This is then further supported by probabilistic comparison methods such as the Bayesian comparison of classifiers and the Mann-Whitney-U-Test. However, these improvements come with the cost of more mutation steps needed which in turn lengthens the training time. The third variant, in which n is decreased, does not differ from the second mutation strategy listed

    Are three points for a win really better than two? Theoretical and empirical evidence for German soccer

    Get PDF
    The effects of the three-point rule in first league German soccer are tested empirically and compared to games from the German cup-competition. The inclusion of cup games ensures that changes in league games can be attributed to the three-point rule. As a result of their relative devaluation, the number of draws should decrease. Furthermore, an increase in the number of close wins is expected. The strategy of a leading team becomes more defensive, resulting in fewer goal shootings by that team, as well as fewer shooting opportunities for the opponent. Empirical evidence supporting these effects is found. --

    Clustering households by time use patterns ; an empirical investigation using the German Time Use Survey 2001/2002

    Get PDF
    Clustering individuals or households on the basis of socio-economic variables has become a widespread practice in German social research over the past few decades. This paper is part of a research project that explores results which may be obtained when time use patterns are chosen as the basis of numerical classification. Over the past few years, results relating to single households were published by the authors. The present paper extends the analysis to families. The investigation uses data from the German Time Use Survey 2001/2002. It is shown that the clustering process fulfils the criteria required by stochastic and qualitative social science. Furthermore, evidence is provided that including cluster memberships as dummy variables into a regressor set increases the predictive capabilities of a common multivariate analysis of correlations between socio-economic variables. Especially concerning health, meaningful interconnections between household styles and health state are detected. --

    Geometric Semantic Grammatical Evolution

    Get PDF
    This is the author accepted manuscript. The final version is available from Springer via the DOI in this record.Geometric Semantic Genetic Programming (GSGP) is a novel form of Genetic Programming (GP), based on a geometric theory of evolutionary algorithms, which directly searches the semantic space of programs. In this chapter, we extend this framework to Grammatical Evolution (GE) and refer to the new method as Geometric Semantic Grammatical Evolution (GSGE). We formally derive new mutation and crossover operators for GE which are guaranteed to see a simple unimodal fitness landscape. This surprising result shows that the GE genotypephenotype mapping does not necessarily imply low genotype-fitness locality. To complement the theory, we present extensive experimental results on three standard domains (Boolean, Arithmetic and Classifier)

    Polytypic Genetic Programming

    Get PDF
    Program synthesis via heuristic search often requires a great deal of boilerplate code to adapt program APIs to the search mechanism. In addition, the majority of existing approaches are not type-safe: i.e. they can fail at runtime because the search mechanisms lack the strict type information often available to the compiler. In this article, we describe Polytope, a Scala framework that uses polytypic programming, a relatively recent advance in program abstraction. Polytope requires a minimum of boilerplate code and supports a form of strong-typing in which type rules are automatically enforced by the compiler, even for search operations such as mutation which are applied at run-time. By operating directly on language-native expressions, it provides an embeddable optimization procedure for existing code. We give a tutorial example of the specific polytypic approach we adopt and compare both runtime efficiency and required lines of code against the well-known EpochX GP framework, showing comparable performance in the former and the complete elimination of boilerplate for the latter
    corecore