9,236 research outputs found
Causal inference using the algorithmic Markov condition
Inferring the causal structure that links n observables is usually based upon
detecting statistical dependences and choosing simple graphs that make the
joint measure Markovian. Here we argue why causal inference is also possible
when only single observations are present.
We develop a theory how to generate causal graphs explaining similarities
between single objects. To this end, we replace the notion of conditional
stochastic independence in the causal Markov condition with the vanishing of
conditional algorithmic mutual information and describe the corresponding
causal inference rules.
We explain why a consistent reformulation of causal inference in terms of
algorithmic complexity implies a new inference principle that takes into
account also the complexity of conditional probability densities, making it
possible to select among Markov equivalent causal graphs. This insight provides
a theoretical foundation of a heuristic principle proposed in earlier work.
We also discuss how to replace Kolmogorov complexity with decidable
complexity criteria. This can be seen as an algorithmic analog of replacing the
empirically undecidable question of statistical independence with practical
independence tests that are based on implicit or explicit assumptions on the
underlying distribution.Comment: 16 figure
Family names as indicators of Britainâs changing regional geography
In recent years the geography of surnames has become increasingly researched in genetics, epidemiology, linguistics and geography. Surnames provide a useful data source for the analysis of population structure, migrations, genetic relationships and levels of cultural diffusion and interaction between communities. The Worldnames database (www.publicprofiler.org/worldnames) of 300 million people from 26 countries georeferenced in many cases to the equivalent of UK Postcode level provides a rich source of surname data. This work has focused on the UK component of this dataset, that is the 2001 Enhanced Electoral Role, georeferenced to Output Area level. Exploratory analysis of the distribution of surnames across the UK shows that clear regions exist, such as Cornwall, Central Wales and Scotland, in agreement with anecdotal evidence. This study is concerned with applying a wide range of methods to the UK dataset to test their sensitivity and consistency to surname regions. Methods used thus far are hierarchical and non-hierarchical clustering, barrier algorithms, such as the Monmonier Algorithm, and Multidimensional Scaling. These, to varying degrees, have highlighted the regionality of UK surnames and provide strong foundations to future work and refinement in the UK context. Establishing a firm methodology has enabled comparisons to be made with data from the Great British 1881 census, developing insights into population movements from within and outside Great Britain
Comparison of a Genetic Algorithm Variable Selection and Interval Partial Least Squares for quantitative analysis of lactate in PBS
Blood lactate is an important biomarker that has been linked to morbidity and mortality of critically ill patients, acute ischemic stroke, septic shock, lung injuries, insulin resistance in diabetic patients, and cancer. Currently, the clinical measurement of blood lactate is done by collecting intermittent blood samples. Therefore, noninvasive, optical measurement of this significant biomarker would lead to a big leap in healthcare. This study, presents a quantitative analysis of the optical properties of lactate. The benefits of wavelength selection for the development of accurate, robust, and interpretable predictive models have been highlighted in the literature. Additionally, there is an obvious, time- and cost-saving benefit to focusing on narrower segments of the electromagnetic spectrum in practical applications. To this end, a dataset consisting of 47 spectra of Na-lactate and Phosphate Buffer Solution (PBS) was produced using a Fourier transform infrared spectrometer, and subsequently, a comparative study of the application of a genetic algorithm-based wavelength selection and two interval selection methods was carried out. The high accuracy of predictions using the developed models underlines the potential for optical measurement of lactate. Moreover, an interesting finding is the emergence of local features in the proposed genetic algorithm, while, unlike the investigated interval selection methods, no explicit constraints on the locality of features was imposed. Finally, the proposed genetic algorithm suggests the formation of α-hydroxy-esters methyl lactate in the solutions while the other investigated methods fail to indicate this
The influence of mutation on population dynamics in multiobjective genetic programming
Using multiobjective genetic programming with a complexity objective to overcome tree bloat is usually very successful but can sometimes lead to undesirable collapse of the population to all single-node trees. In this paper we report a detailed examination of why and when collapse occurs. We have used different types of crossover and mutation operators (depth-fair and sub-tree), different evolutionary approaches (generational and steady-state), and different datasets (6-parity Boolean and a range of benchmark machine learning problems) to strengthen our conclusion. We conclude that mutation has a vital role in preventing population collapse by counterbalancing parsimony pressure and preserving population diversity. Also, mutation controls the size of the generated individuals which tends to dominate the time needed for fitness evaluation and therefore the whole evolutionary process. Further, the average size of the individuals in a GP population depends on the evolutionary approach employed. We also demonstrate that mutation has a wider role than merely culling single-node individuals from the population; even within a diversity-preserving algorithm such as SPEA2 mutation has a role in preserving diversity
Recommended from our members
Comparison of a Genetic Algorithm Variable Selection and Interval Partial Least Squares for quantitative analysis of lactate in PBS
Blood lactate is an important biomarker that has been linked to morbidity and mortality of critically ill patients, acute ischemic stroke, septic shock, lung injuries, insulin resistance in diabetic patients, and cancer. Currently, the clinical measurement of blood lactate is done by collecting intermittent blood samples. Therefore, noninvasive, optical measurement of this significant biomarker would lead to a big leap in healthcare. This study, presents a quantitative analysis of the optical properties of lactate. The benefits of wavelength selection for the development of accurate, robust, and interpretable predictive models have been highlighted in the literature. Additionally, there is an obvious, time- and cost-saving benefit to focusing on narrower segments of the electromagnetic spectrum in practical applications. To this end, a dataset consisting of 47 spectra of Na-lactate and Phosphate Buffer Solution (PBS) was produced using a Fourier transform infrared spectrometer, and subsequently, a comparative study of the application of a genetic algorithm-based wavelength selection and two interval selection methods was carried out. The high accuracy of predictions using the developed models underlines the potential for optical measurement of lactate. Moreover, an interesting finding is the emergence of local features in the proposed genetic algorithm, while, unlike the investigated interval selection methods, no explicit constraints on the locality of features was imposed. Finally, the proposed genetic algorithm suggests the formation of α-hydroxy-esters methyl lactate in the solutions while the other investigated methods fail to indicate this
- âŠ