18,982 research outputs found
Where does good evidence come from?
This paper started as a debate between the two authors. Both authors present a series of propositions about quality standards in education research. Cook’s propositions, as might be expected, concern the importance of experimental trials for establishing the security of causal evidence, but they also include some important practical and acceptable alternatives such as regression discontinuity analysis. Gorard’s propositions, again as might be expected, tend to place experimental trials within a larger mixed method sequence of research activities, treating them as important but without giving them primacy. The paper concludes with a synthesis of these ideas, summarising the many areas of agreement and clarifying the few areas of disagreement. The latter include what proportion of available research funds should be devoted to trials, how urgent the need for more trials is, and whether the call for more truly mixed methods work requires a major shift in the community
Model selection via Bayesian information capacity designs for generalised linear models
The first investigation is made of designs for screening experiments where
the response variable is approximated by a generalised linear model. A Bayesian
information capacity criterion is defined for the selection of designs that are
robust to the form of the linear predictor. For binomial data and logistic
regression, the effectiveness of these designs for screening is assessed
through simulation studies using all-subsets regression and model selection via
maximum penalised likelihood and a generalised information criterion. For
Poisson data and log-linear regression, similar assessments are made using
maximum likelihood and the Akaike information criterion for minimally-supported
designs that are constructed analytically. The results show that effective
screening, that is, high power with moderate type I error rate and false
discovery rate, can be achieved through suitable choices for the number of
design support points and experiment size. Logistic regression is shown to
present a more challenging problem than log-linear regression. Some areas for
future work are also indicated
Recommended from our members
Bayesian Structural Causal Inference with Probabilistic Programming
Reasoning about causal relationships is central to the human experience. This evokes a natural question in our pursuit of human-like artificial intelligence: how might we imbue intelligent systems with similar causal reasoning capabilities? Better yet, how might we imbue intelligent systems with the ability to learn cause and effect relationships from observation and experimentation? Unfortunately, reasoning about cause and effect requires more than just data: it also requires partial knowledge about data generating mechanisms. Given this need, our task then as computational scientists is to design data structures for representing partial causal knowledge, and algorithms for updating that knowledge in light of observations and experiments. In this dissertation, I explore the Bayesian structural approach to causal inference in which probability distributions over structural causal models are one such data structure, and probabilistic inference in multi-world transformations of those models as the corresponding algorithmic task. Specifically, I demonstrate that this approach has two distinct advantages over the dominant computational paradigm of causal graphical models: (i) it expands the breadth of compatible assumptions; and (ii) it seamlessly integrates with modern Bayesian modeling and inference technologies to facilitate quantification of uncertainty about causal structure and the effects of interventions.
Specifically, doing so allows the emerging and powerful technology of probabilistic programming to be brought to bear on a large and diverse set of causal inference problems. In Chapter 3, I present an example-driven pedagogical introduction to the Bayesian structural approach to causal inference, demonstrating how priors over structural causal models induce joint distributions over observed and latent counterfactual random variables, and how the resulting posterior distributions capture common motifs in causal inference. In particular, I show how various assumptions about latent confounding influence our ability to estimate causal effects from data and I provide examples of common observational and quasi-experimental designs expressed as probabilistic programs. In Chapter 4, I present an advanced application of the Bayesian structural approach for modeling hierarchical relational dependencies with latent confounders, and how to combine such assumptions with flexible Gaussian process models. In Chapter 5, I present a prototype software implementation for causal inference using probabilistic programming, accommodating a broad class of multi-source observational and experimental data. Finally, in Chapter 6, I present Simulation-Based Identifiability, a gradient-based optimization method for determining if any differentiable and bounded prior over structural causal models converges to a unique causal conclusion asymptotically
Toward data science in biophotonics: biomedical investigations-based study
Biophotonics aims to grasp and investigate the characteristics of biological samples based on their interaction with incident light. Over the past decades, numerous biophotonic technologies have been developed delivering various sorts of biological and chemical information from the studied samples. Such information is usually contained in high dimensional data that need to be translated into high-level information like disease biomarkers. This data translation is not straightforward, but it can be achieved using the advances in computer and data science. The scientific contributions presented in this thesis were established to cover two main aspects of data science in biophotonics: the design of experiments and the data-driven modeling and validation. For the design of experiment, the scientific contributions focus on estimating the sample size required for group differentiation and on evaluating the influence of experimental factors on unbalanced multifactorial designs. Both methods were designed for multivariate data and were checked on Raman spectral datasets. Thereafter, the automatic detection and identification of three diagnostic tasks were checked based on combining several image processing techniques with machine learning (ML) algorithms. In the first task, an improved ML pipeline to predict the antibiotic susceptibilities of E. coli bacteria was presented and evaluated based on bright-field microscopic images. Then, transfer learning-based classification of bladder cancer was demonstrated using blue light cystoscopic images. Finally, different ML techniques and validation strategies were combined to perform the automatic detection of breast cancer based on a small-sized dataset of nonlinear multimodal images. The obtained results exhibited the benefits of data science tools in improving the experimantal planning and the translation of biophotonic-associated data into high-level information for various biophotonic technologies
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Estimation of bias in dose-response curve fitting and experimental strategies to its reduction
One of the biggest hurdles in cancer patient care is the lack of response to treatment. With the support of high-throughput drug screening, it is nowadays feasible to conduct vast amounts of drug sensitivity assays, aiding in the identification of sensitive and resistant samples to chemical perturbations. In an oncology setting, drug screening is the process by which patient cells are examined experimentally for response and activity to distinct drugs and analysed via dose-response curve fitting. However, the ability to reproduce and replicate with high confidence drug screening outcomes proved to be a challenge that needs to be addressed. Inefficient experimental designs, lack of standard protocols to control both biological and technical factors in such cell-based assays are at the core of a steep influx of experimental biases. Hence, additional endeavour has to be carried out to provide less biased estimations of drug effects. This thesis work focuses on reducing erroneous inferences (i.e., bias) from dose-response data in the curve fitting step, thereby improving the reproducibility of drug sensitivity screening through efficient dose selection. A novel two-step experimental design is introduced which significantly improves the estimation of dose-response curves while keeping the amount of cellular and chemical materials feasible
Exploration of Reaction Pathways and Chemical Transformation Networks
For the investigation of chemical reaction networks, the identification of
all relevant intermediates and elementary reactions is mandatory. Many
algorithmic approaches exist that perform explorations efficiently and
automatedly. These approaches differ in their application range, the level of
completeness of the exploration, as well as the amount of heuristics and human
intervention required. Here, we describe and compare the different approaches
based on these criteria. Future directions leveraging the strengths of chemical
heuristics, human interaction, and physical rigor are discussed.Comment: 48 pages, 4 figure
Research and Education in Computational Science and Engineering
Over the past two decades the field of computational science and engineering
(CSE) has penetrated both basic and applied research in academia, industry, and
laboratories to advance discovery, optimize systems, support decision-makers,
and educate the scientific and engineering workforce. Informed by centuries of
theory and experiment, CSE performs computational experiments to answer
questions that neither theory nor experiment alone is equipped to answer. CSE
provides scientists and engineers of all persuasions with algorithmic
inventions and software systems that transcend disciplines and scales. Carried
on a wave of digital technology, CSE brings the power of parallelism to bear on
troves of data. Mathematics-based advanced computing has become a prevalent
means of discovery and innovation in essentially all areas of science,
engineering, technology, and society; and the CSE community is at the core of
this transformation. However, a combination of disruptive
developments---including the architectural complexity of extreme-scale
computing, the data revolution that engulfs the planet, and the specialization
required to follow the applications to new frontiers---is redefining the scope
and reach of the CSE endeavor. This report describes the rapid expansion of CSE
and the challenges to sustaining its bold advances. The report also presents
strategies and directions for CSE research and education for the next decade.Comment: Major revision, to appear in SIAM Revie
- …