530 research outputs found

    Using Ramsey theory to measure unavoidable spurious correlations in Big Data

    Full text link
    Given a dataset we quantify how many patterns must always exist in the dataset. Formally this is done through the lens of Ramsey theory of graphs, and a quantitative bound known as Goodman's theorem. Combining statistical tools with Ramsey theory of graphs gives a nuanced understanding of how far away a dataset is from random, and what qualifies as a meaningful pattern. This method is applied to a dataset of repeated voters in the 1984 US congress, to quantify how homogeneous a subset of congressional voters is. We also measure how transitive a subset of voters is. Statistical Ramsey theory is also used with global economic trading data to provide evidence that global markets are quite transitive.Comment: 21 page

    The Agnostic Structure of Data Science Methods

    Get PDF
    In this paper we want to discuss the changing role of mathematics in science, as a way to discuss some methodological trends at work in big data science. More specifically, we will show how the role of mathematics has dramatically changed from its more classical approach. Classically, any application of mathematical techniques requires a previous understanding of the phenomena, and of the mutual relations among the relevant data; modern data analysis appeals, instead, to mathematics in order to identify possible invariants uniquely attached to the specific questions we may ask about the phenomena of interest. In other terms, the new paradigm for the application of mathematics does not require any understanding of the phenomenon, but rather relies on mathematics to organize data in such a way as to reveal possible invariants that may or may not provide further understanding of the phenomenon per se, but that nevertheless provide an answer to the relevant question

    Using the Literature to Identify Confounders

    Get PDF
    Prior work in causal modeling has focused primarily on learning graph structures and parameters to model data generating processes from observational or experimental data, while the focus of the literature-based discovery paradigm was to identify novel therapeutic hypotheses in publicly available knowledge. The critical contribution of this dissertation is to refashion the literature-based discovery paradigm as a means to populate causal models with relevant covariates to abet causal inference. In particular, this dissertation describes a generalizable framework for mapping from causal propositions in the literature to subgraphs populated by instantiated variables that reflect observational data. The observational data are those derived from electronic health records. The purpose of causal inference is to detect adverse drug event signals. The Principle of the Common Cause is exploited as a heuristic for a defeasible practical logic. The fundamental intuition is that improbable co-occurrences can be “explained away” with reference to a common cause, or confounder. Semantic constraints in literature-based discovery can be leveraged to identify such covariates. Further, the asymmetric semantic constraints of causal propositions map directly to the topology of causal graphs as directed edges. The hypothesis is that causal models conditioned on sets of such covariates will improve upon the performance of purely statistical techniques for detecting adverse drug event signals. By improving upon previous work in purely EHR-based pharmacovigilance, these results establish the utility of this scalable approach to automated causal inference

    Infrastructure and economic development in Sub-Saharan Africa

    Get PDF
    An adequate supply of infrastructure services has long been viewed by both academics and policy makers as a key ingredient for economic development. Sub-Saharan Africa ranks consistently at the bottom of all developing regions in terms of infrastructure performance, and an increasing number of observers point to deficient infrastructure as a major obstacle for growth and poverty reduction across the region. This paper offers an empirical assessment of the impact ofinfrastructure development on growth and inequality, with a focus on Sub-Saharan Africa. The paper uses a comparative cross-regional perspective to place Africa's experience in the international context. Drawing from an updated data set of infrastructure quantity and quality indicators covering more than 100 countries and spanning the years 1960-2005, the paper estimates empirical growth and inequality equations including a standard set of control variables augmented by infrastructure quantity and quality measures, and controlling for the potential endogeneity of the latter. The estimates illustrate the potential contribution of infrastructure development to growth and equity across Africa.Transport Economics Policy&Planning,Infrastructure Economics,Public Sector Economics&Finance,Banks&Banking Reform,Economic Theory&Research

    Randomness, Determinism and Undecidability in the Economic Cycle Theory

    Get PDF
    AbstractThe scientific literature that studies the Business cycles contains a historical debate between random and deterministic models. On the one hand, models built with explanatory variables follow a stochastic trajectory and produce, through transmission mechanisms, the studied cycles. Its rationale: the so-called Slutsky-Yule effect. In addition, models in which the system phase at time T fixes, applying the “ceteris paribus condition”, the phase at time t + 1. The cycle would be the product of variables, making it possible to predict and enabling economic policies to combat recessions. The thesis of this work is as follows. The application of the theorems of Chaitin of undecidability shows that it is not possible to conclude such debate. It is impossible to determine with absolute certainty whether the observed cycles follow a deterministic or stochastic model. To reach this result, I outline the fundamental theories of the business cycle, providing a classification and examples of mathematical models. I review the definition of randomness, and I consider the demonstration of Chaitin about the impossibility of deciding whether a data set is stochastic or not. A consequence, he says, of Gödel incompleteness theorems. I conclude considering a string of economic data, aggregated or not, as random or deterministic, depends on the theory. This applies to all cyclical phenomena of any nature. Specific mathematical models have observable consequences. But probabilism and determinism are only heuristic programs that guide the knowledge progress. Key words: Randomness, Business cycle theories, Undecidability, Heuristic.JEL: B40, D50, E32

    Decoherence in Solid State Qubits

    Full text link
    Interaction of solid state qubits with environmental degrees of freedom strongly affects the qubit dynamics, and leads to decoherence. In quantum information processing with solid state qubits, decoherence significantly limits the performances of such devices. Therefore, it is necessary to fully understand the mechanisms that lead to decoherence. In this review we discuss how decoherence affects two of the most successful realizations of solid state qubits, namely, spin-qubits and superconducting qubits. In the former, the qubit is encoded in the spin 1/2 of the electron, and it is implemented by confining the electron spin in a semiconductor quantum dot. Superconducting devices show quantum behavior at low temperatures, and the qubit is encoded in the two lowest energy levels of a superconducting circuit. The electron spin in a quantum dot has two main decoherence channels, a (Markovian) phonon-assisted relaxation channel, due to the presence of a spin-orbit interaction, and a (non-Markovian) spin bath constituted by the spins of the nuclei in the quantum dot that interact with the electron spin via the hyperfine interaction. In a superconducting qubit, decoherence takes place as a result of fluctuations in the control parameters, such as bias currents, applied flux, and bias voltages, and via losses in the dissipative circuit elements.Comment: review article, 66 pages, 10 figure

    Non-fundamental exchange rate volatility and welfare

    Get PDF
    We lay out an empirical and a theoretical model to analyze the effects of non-fundamental exchange rate volatility on economic activity and welfare. In the first part of the paper, the GARCH-SVARmodel is applied to measure empirically the effect of the conditional exogenous exchange rate volatility on the conditional mean of the endogenous variables in our open economy VAR. Our results for Canada, Germany and UK indicate that the effects of exchange rate uncertainty are small empirically. In the second part, we investigate the effect of non-fundamental exchange rate volatility in a stochastic open economy model. The second order approximation method of Sims [2003] is applied to the model equilibrium conditions. We show that in a model with habit persistence, even non-fundamental exchange rate volatility that generate only small variation in the unconditional mean of the variables might induce economically significant welfare changes. JEL Classification: C32, F31, F41Exchange rate volatility, GARCH-SVAR, Second-order

    The Agnostic Structure of Data Science Methods

    Get PDF
    In this paper we argue that data science is a coherent approach to empirical problems that, in its most general form, does not build understanding about phenomena. We start by exploring the broad structure of mathematization methods in data science, organized around the belief that if enough and sufficiently diverse data are collected regarding a certain phenomenon, it is possible to answer all relevant questions about it. We call this belief `the microarray paradigm’ and the approach to empirical phenomena based on it `agnostic science'. Not all computational methods dealing with large data sets are properly within the domain of agnostic science, and we give an example of an algorithm, PageRank, that relies on large data processing, but such that the significance of its output is readily intelligible. Within the new type of mathematization at work in agnostic science, mathematical methods are not selected because of any particular relevance for a problem at hand. Rather, mathematical methods are applied to a specific problem only on the basis of their ability to reorganize the data for further analysis and the intrinsic richness of their mathematical structure. We refer to this type of mathematization as `forcing’. We then show that optimization methods are used in data science by forcing them on problems. This is particularly significant since virtually all methods of data science can be reinterpreted as types of optimization methods. In particular, we argue that deep learning neural networks are best understood within the context of forcing optimality. We finally explore the broader question of the appropriateness of data science methods in solving problems. We argue that this question should not be interpreted as a search for a correspondence between phenomena and specific solutions found by data science methods. Rather, it is the internal structure of data science methods that is open to forms of understanding. As an example, we offer an analysis of ensemble methods, where distinct data science methods are combined in the search for the solution of a problem, and we speculate on the general structure of the data sets that are most appropriate for such methods
    • …
    corecore