9,236 research outputs found

    Causal inference using the algorithmic Markov condition

    Full text link
    Inferring the causal structure that links n observables is usually based upon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present. We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace the notion of conditional stochastic independence in the causal Markov condition with the vanishing of conditional algorithmic mutual information and describe the corresponding causal inference rules. We explain why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. This insight provides a theoretical foundation of a heuristic principle proposed in earlier work. We also discuss how to replace Kolmogorov complexity with decidable complexity criteria. This can be seen as an algorithmic analog of replacing the empirically undecidable question of statistical independence with practical independence tests that are based on implicit or explicit assumptions on the underlying distribution.Comment: 16 figure

    Family names as indicators of Britain’s changing regional geography

    Get PDF
    In recent years the geography of surnames has become increasingly researched in genetics, epidemiology, linguistics and geography. Surnames provide a useful data source for the analysis of population structure, migrations, genetic relationships and levels of cultural diffusion and interaction between communities. The Worldnames database (www.publicprofiler.org/worldnames) of 300 million people from 26 countries georeferenced in many cases to the equivalent of UK Postcode level provides a rich source of surname data. This work has focused on the UK component of this dataset, that is the 2001 Enhanced Electoral Role, georeferenced to Output Area level. Exploratory analysis of the distribution of surnames across the UK shows that clear regions exist, such as Cornwall, Central Wales and Scotland, in agreement with anecdotal evidence. This study is concerned with applying a wide range of methods to the UK dataset to test their sensitivity and consistency to surname regions. Methods used thus far are hierarchical and non-hierarchical clustering, barrier algorithms, such as the Monmonier Algorithm, and Multidimensional Scaling. These, to varying degrees, have highlighted the regionality of UK surnames and provide strong foundations to future work and refinement in the UK context. Establishing a firm methodology has enabled comparisons to be made with data from the Great British 1881 census, developing insights into population movements from within and outside Great Britain

    Comparison of a Genetic Algorithm Variable Selection and Interval Partial Least Squares for quantitative analysis of lactate in PBS

    Get PDF
    Blood lactate is an important biomarker that has been linked to morbidity and mortality of critically ill patients, acute ischemic stroke, septic shock, lung injuries, insulin resistance in diabetic patients, and cancer. Currently, the clinical measurement of blood lactate is done by collecting intermittent blood samples. Therefore, noninvasive, optical measurement of this significant biomarker would lead to a big leap in healthcare. This study, presents a quantitative analysis of the optical properties of lactate. The benefits of wavelength selection for the development of accurate, robust, and interpretable predictive models have been highlighted in the literature. Additionally, there is an obvious, time- and cost-saving benefit to focusing on narrower segments of the electromagnetic spectrum in practical applications. To this end, a dataset consisting of 47 spectra of Na-lactate and Phosphate Buffer Solution (PBS) was produced using a Fourier transform infrared spectrometer, and subsequently, a comparative study of the application of a genetic algorithm-based wavelength selection and two interval selection methods was carried out. The high accuracy of predictions using the developed models underlines the potential for optical measurement of lactate. Moreover, an interesting finding is the emergence of local features in the proposed genetic algorithm, while, unlike the investigated interval selection methods, no explicit constraints on the locality of features was imposed. Finally, the proposed genetic algorithm suggests the formation of α-hydroxy-esters methyl lactate in the solutions while the other investigated methods fail to indicate this

    The influence of mutation on population dynamics in multiobjective genetic programming

    Get PDF
    Using multiobjective genetic programming with a complexity objective to overcome tree bloat is usually very successful but can sometimes lead to undesirable collapse of the population to all single-node trees. In this paper we report a detailed examination of why and when collapse occurs. We have used different types of crossover and mutation operators (depth-fair and sub-tree), different evolutionary approaches (generational and steady-state), and different datasets (6-parity Boolean and a range of benchmark machine learning problems) to strengthen our conclusion. We conclude that mutation has a vital role in preventing population collapse by counterbalancing parsimony pressure and preserving population diversity. Also, mutation controls the size of the generated individuals which tends to dominate the time needed for fitness evaluation and therefore the whole evolutionary process. Further, the average size of the individuals in a GP population depends on the evolutionary approach employed. We also demonstrate that mutation has a wider role than merely culling single-node individuals from the population; even within a diversity-preserving algorithm such as SPEA2 mutation has a role in preserving diversity
    • 

    corecore