56 research outputs found
Ignorance based inference of optimality in thermodynamic processes
We derive ignorance based prior distribution to quantify incomplete
information and show its use to estimate the optimal work characteristics of a
heat engine.Comment: Latex, 10 pages, 3 figure
Deciphering the enigma of undetected species, phylogenetic, and functional diversity based on Good-Turing theory
Estimating the species, phylogenetic, and functional diversity of a community is challenging because rare species are often undetected, even with intensive sampling. The Good-Turing frequency formula, originally developed for cryptography, estimates in an ecological context the true frequencies of rare species in a single assemblage based on an incomplete sample of individuals. Until now, this formula has never been used to estimate undetected species, phylogenetic, and functional diversity. Here, we first generalize the Good-Turing formula to incomplete sampling of two assemblages. The original formula and its two-assemblage generalization provide a novel and unified approach to notation, terminology, and estimation of undetected biological diversity. For species richness, the Good-Turing framework offers an intuitive way to derive the non-parametric estimators of the undetected species richness in a single assemblage, and of the undetected species shared between two assemblages. For phylogenetic diversity, the unified approach leads to an estimator of the undetected Faith\u27s phylogenetic diversity (PD, the total length of undetected branches of a phylogenetic tree connecting all species), as well as a new estimator of undetected PD shared between two phylogenetic trees. For functional diversity based on species traits, the unified approach yields a new estimator of undetected Walker et al.\u27s functional attribute diversity (FAD, the total species-pairwise functional distance) in a single assemblage, as well as a new estimator of undetected FAD shared between two assemblages. Although some of the resulting estimators have been previously published (but derived with traditional mathematical inequalities), all taxonomic, phylogenetic, and functional diversity estimators are now derived under the same framework. All the derived estimators are theoretically lower bounds of the corresponding undetected diversities; our approach reveals the sufficient conditions under which the estimators are nearly unbiased, thus offering new insights. Simulation results are reported to numerically verify the performance of the derived estimators. We illustrate all estimators and assess their sampling uncertainty with an empirical dataset for Brazilian rain forest trees. These estimators should be widely applicable to many current problems in ecology, such as the effects of climate change on spatial and temporal beta diversity and the contribution of trait diversity to ecosystem multi-functionality
Scientific discovery as a combinatorial optimisation problem: How best to navigate the landscape of possible experiments?
A considerable number of areas of bioscience, including gene and drug discovery, metabolic engineering for the biotechnological improvement of organisms, and the processes of natural and directed evolution, are best viewed in terms of a ‘landscape’ representing a large search space of possible solutions or experiments populated by a considerably smaller number of actual solutions that then emerge. This is what makes these problems ‘hard’, but as such these are to be seen as combinatorial optimisation problems that are best attacked by heuristic methods known from that field. Such landscapes, which may also represent or include multiple objectives, are effectively modelled in silico, with modern active learning algorithms such as those based on Darwinian evolution providing guidance, using existing knowledge, as to what is the ‘best’ experiment to do next. An awareness, and the application, of these methods can thereby enhance the scientific discovery process considerably. This analysis fits comfortably with an emerging epistemology that sees scientific reasoning, the search for solutions, and scientific discovery as Bayesian processes
On being a good Bayesian
Bayesianism is fast becoming the dominant paradigm in archaeological chronology construction. This paradigm shift has been brought about in large part by widespread access to tailored computer software which provides users with powerful tools for complex statistical inference with little need to learn about statistical modelling or computer programming. As a result, we run the risk that such software will be reduced to the status of black boxes. This would be a dangerous position for our community since good, principled use of Bayesian methods requires mindfulness when selecting the initial model, defining prior information, checking the reliability and sensitivity of the software runs and interpreting the results obtained. In this article, we provide users with a brief review of the nature of the care required and offer some comments and suggestions to help ensure that our community continues to be respected for its philosophically rigorous scientific approach
The fallacy of placing confidence in confidence intervals
Interval estimates – estimates of parameters that include an allowance for sampling uncertainty – have long been touted as a key component of statistical analyses. There are several kinds of interval estimates, but the most popular are confidence intervals (CIs): intervals that contain the true parameter value in some known proportion of repeated samples, on average. The width of confidence intervals is thought to index the precision of an estimate; CIs are thought to be a guide to which parameter values are plausible or reasonable; and the confidence coefficient of the interval (e.g., 95 %) is thought to index the plausibility that the true parameter is included in the interval. We show in a number of examples that CIs do not necessarily have any of these properties, and can lead to unjustified or arbitrary inferences. For this reason, we caution against relying upon confidence interval theory to justify interval estimates, and suggest that other theories of interval estimation should be used instead
Model selection in historical research using approximate Bayesian computation
Formal Models and History Computational models are increasingly being used to study historical dynamics. This new trend, which could be named Model-Based History, makes use of recently published datasets and innovative quantitative methods to improve our understanding of past societies based on their written sources. The extensive use of formal models allows historians to reevaluate hypotheses formulated decades ago and still subject to debate due to the lack of an adequate quantitative framework. The initiative has the potential to transform the discipline if it solves the challenges posed by the study of historical dynamics. These difficulties are based on the complexities of modelling social interaction, and the methodological issues raised by the evaluation of formal models against data with low sample size, high variance and strong fragmentation. This work examines an alternate approach to this evaluation based on a Bayesian-inspired model selection method. The validity of the classical Lanchester's laws of combat is examined against a dataset comprising over a thousand battles spanning 300 years. Four variations of the basic equations are discussed, including the three most common formulations (linear, squared, and logarithmic) and a new variant introducing fatigue. Approximate Bayesian Computation is then used to infer both parameter values and model selection via Bayes Factors. Results indicate decisive evidence favouring the new fatigue model. The interpretation of both parameter estimations and model selection provides new insights into the factors guiding the evolution of warfare. At a methodological level, the case study shows how model selection methods can be used to guide historical research through the comparison between existing hypotheses and empirical evidence.Funding for this work was provided by the
SimulPast Consolider Ingenio project (CSD2010-00034) of the former Ministry for Science and Innovation of the Spanish Government and the European Research Council Advanced Grant EPNet (340828).Peer ReviewedPostprint (published version
- …