12,271 research outputs found

    Self-Supervised Learning to Prove Equivalence Between Straight-Line Programs via Rewrite Rules

    Full text link
    We target the problem of automatically synthesizing proofs of semantic equivalence between two programs made of sequences of statements. We represent programs using abstract syntax trees (AST), where a given set of semantics-preserving rewrite rules can be applied on a specific AST pattern to generate a transformed and semantically equivalent program. In our system, two programs are equivalent if there exists a sequence of application of these rewrite rules that leads to rewriting one program into the other. We propose a neural network architecture based on a transformer model to generate proofs of equivalence between program pairs. The system outputs a sequence of rewrites, and the validity of the sequence is simply checked by verifying it can be applied. If no valid sequence is produced by the neural network, the system reports the programs as non-equivalent, ensuring by design no programs may be incorrectly reported as equivalent. Our system is fully implemented for a given grammar which can represent straight-line programs with function calls and multiple types. To efficiently train the system to generate such sequences, we develop an original incremental training technique, named self-supervised sample selection. We extensively study the effectiveness of this novel training approach on proofs of increasing complexity and length. Our system, S4Eq, achieves 97% proof success on a curated dataset of 10,000 pairs of equivalent programsComment: 30 pages including appendi

    Model Diagnostics meets Forecast Evaluation: Goodness-of-Fit, Calibration, and Related Topics

    Get PDF
    Principled forecast evaluation and model diagnostics are vital in fitting probabilistic models and forecasting outcomes of interest. A common principle is that fitted or predicted distributions ought to be calibrated, ideally in the sense that the outcome is indistinguishable from a random draw from the posited distribution. Much of this thesis is centered on calibration properties of various types of forecasts. In the first part of the thesis, a simple algorithm for exact multinomial goodness-of-fit tests is proposed. The algorithm computes exact pp-values based on various test statistics, such as the log-likelihood ratio and Pearson\u27s chi-square. A thorough analysis shows improvement on extant methods. However, the runtime of the algorithm grows exponentially in the number of categories and hence its use is limited. In the second part, a framework rooted in probability theory is developed, which gives rise to hierarchies of calibration, and applies to both predictive distributions and stand-alone point forecasts. Based on a general notion of conditional T-calibration, the thesis introduces population versions of T-reliability diagrams and revisits a score decomposition into measures of miscalibration, discrimination, and uncertainty. Stable and efficient estimators of T-reliability diagrams and score components arise via nonparametric isotonic regression and the pool-adjacent-violators algorithm. For in-sample model diagnostics, a universal coefficient of determination is introduced that nests and reinterprets the classical R2R^2 in least squares regression. In the third part, probabilistic top lists are proposed as a novel type of prediction in classification, which bridges the gap between single-class predictions and predictive distributions. The probabilistic top list functional is elicited by strictly consistent evaluation metrics, based on symmetric proper scoring rules, which admit comparison of various types of predictions

    Machine Learning Research Trends in Africa: A 30 Years Overview with Bibliometric Analysis Review

    Full text link
    In this paper, a critical bibliometric analysis study is conducted, coupled with an extensive literature survey on recent developments and associated applications in machine learning research with a perspective on Africa. The presented bibliometric analysis study consists of 2761 machine learning-related documents, of which 98% were articles with at least 482 citations published in 903 journals during the past 30 years. Furthermore, the collated documents were retrieved from the Science Citation Index EXPANDED, comprising research publications from 54 African countries between 1993 and 2021. The bibliometric study shows the visualization of the current landscape and future trends in machine learning research and its application to facilitate future collaborative research and knowledge exchange among authors from different research institutions scattered across the African continent

    Discovering the hidden structure of financial markets through bayesian modelling

    Get PDF
    Understanding what is driving the price of a financial asset is a question that is currently mostly unanswered. In this work we go beyond the classic one step ahead prediction and instead construct models that create new information on the behaviour of these time series. Our aim is to get a better understanding of the hidden structures that drive the moves of each financial time series and thus the market as a whole. We propose a tool to decompose multiple time series into economically-meaningful variables to explain the endogenous and exogenous factors driving their underlying variability. The methodology we introduce goes beyond the direct model forecast. Indeed, since our model continuously adapts its variables and coefficients, we can study the time series of coefficients and selected variables. We also present a model to construct the causal graph of relations between these time series and include them in the exogenous factors. Hence, we obtain a model able to explain what is driving the move of both each specific time series and the market as a whole. In addition, the obtained graph of the time series provides new information on the underlying risk structure of this environment. With this deeper understanding of the hidden structure we propose novel ways to detect and forecast risks in the market. We investigate our results with inferences up to one month into the future using stocks, FX futures and ETF futures, demonstrating its superior performance according to accuracy of large moves, longer-term prediction and consistency over time. We also go in more details on the economic interpretation of the new variables and discuss the created graph structure of the market.Open Acces

    Statistical-dynamical analyses and modelling of multi-scale ocean variability

    Get PDF
    This thesis aims to provide a comprehensive analysis of multi-scale oceanic variabilities using various statistical and dynamical tools and explore the data-driven methods for correct statistical emulation of the oceans. We considered the classical, wind-driven, double-gyre ocean circulation model in quasi-geostrophic approximation and obtained its eddy-resolving solutions in terms of potential vorticity anomaly and geostrophic streamfunctions. The reference solutions possess two asymmetric gyres of opposite circulations and a strong meandering eastward jet separating them with rich eddy activities around it, such as the Gulf Stream in the North Atlantic and Kuroshio in the North Pacific. This thesis is divided into two parts. The first part discusses a novel scale-separation method based on the local spatial correlations, called correlation-based decomposition (CBD), and provides a comprehensive analysis of mesoscale eddy forcing. In particular, we analyse the instantaneous and time-lagged interactions between the diagnosed eddy forcing and the evolving large-scale PVA using the novel `product integral' characteristics. The product integral time series uncover robust causality between two drastically different yet interacting flow quantities, termed `eddy backscatter'. We also show data-driven augmentation of non-eddy-resolving ocean models by feeding them the eddy fields to restore the missing eddy-driven features, such as the merging western boundary currents, their eastward extension and low-frequency variabilities of gyres. In the second part, we present a systematic inter-comparison of Linear Regression (LR), stochastic and deep-learning methods to build low-cost reduced-order statistical emulators of the oceans. We obtain the forecasts on seasonal and centennial timescales and assess them for their skill, cost and complexity. We found that the multi-level linear stochastic model performs the best, followed by the ``hybrid stochastically-augmented deep learning models''. The superiority of these methods underscores the importance of incorporating core dynamics, memory effects and model errors for robust emulation of multi-scale dynamical systems, such as the oceans.Open Acces

    Annals [...].

    Get PDF
    Pedometrics: innovation in tropics; Legacy data: how turn it useful?; Advances in soil sensing; Pedometric guidelines to systematic soil surveys.Evento online. Coordenado por: Waldir de Carvalho Junior, Helena Saraiva Koenow Pinheiro, Ricardo Simão Diniz Dalmolin

    Knowledge-based artificial neural network modeling assessment: integrating heterogeneous genomics data to uncover lifespan regulation

    Get PDF
    Biological analytics and more advanced data analysis techniques have made remarkable advancements as the area of machine learning continues to grow. More specifically, genetic modeling and neural network building are gaining interest as it becomes a fundamental piece of most model building we see today. We propose a Knowledge-Based Artificial Neural Network (KBANN) to predict phenotype while providing insight to effected subsystems. Within KBANN, the input layers are a single or group of Gene Ontology (GO) terms while each layer’s input is a single number between 0 and 1, explaining how expressed the given term is. The expression number provides an average of the number of copies that a gene is producing at its current age compared to that over the average of its entire lifespan. Preliminary results show that KBANN model can potentially be used to predict lifespan phenotype using the Genotype-Tissue Expression data

    How to Be a God

    Get PDF
    When it comes to questions concerning the nature of Reality, Philosophers and Theologians have the answers. Philosophers have the answers that can’t be proven right. Theologians have the answers that can’t be proven wrong. Today’s designers of Massively-Multiplayer Online Role-Playing Games create realities for a living. They can’t spend centuries mulling over the issues: they have to face them head-on. Their practical experiences can indicate which theoretical proposals actually work in practice. That’s today’s designers. Tomorrow’s will have a whole new set of questions to answer. The designers of virtual worlds are the literal gods of those realities. Suppose Artificial Intelligence comes through and allows us to create non-player characters as smart as us. What are our responsibilities as gods? How should we, as gods, conduct ourselves? How should we be gods

    DEEP REINFORCEMENT LEARNING AND MODEL PREDICTIVE CONTROL APPROACHES FOR THE SCHEDULED OPERATION OF DOMESTIC REFRIGERATORS

    Get PDF
    Excess capacity of the UK’s national grid is widely quoted to be reducing to around 4% over the coming years as a consequence of increased economic growth (and hence power usage) and reductions in power generation plants. There is concern that short term variations in power demand could lead to serious wide-scale disruption on a national scale. This is therefore spawning greater attention on augmenting traditional generation plants with renewable and localized energy storage technologies, and consideration of improved demand side responses (DSR), where power consumers are incentivized to switch off assets when the grid is under pressure. It is estimated, for instance, that refrigeration/HVAC systems alone could account for ~14% of the total UK energy usage, with refrigeration and water heating/cooling systems, in particular, being able to act as real-time ‘buffer’ technologies that can be demand-managed to accommodate transient demands by being switched-off for short periods without damaging their outputs. Large populations of thermostatically controlled loads (TCLs) hold significant potential for performing ancillary services in power systems since they are well-established and widely distributed around the power network. In the domestic sector, refrigerators and freezers collectively constitute a very large electrical load since they are continuously connected and are present in almost most households. The rapid proliferation of the ‘Internet of Things’ (IoT) now affords the opportunity to monitor and visualise smart buildings appliances performance and specifically, schedule the operation of the widely distributed domestic refrigerator and freezers to collectively improve energy efficiency and reduce peak power consumption on the electrical grid. To accomplish this, this research proposes the real-time estimation of the thermal mass of individual refrigerators in a network using on-line parameter identification, and the co-ordinated (ON-OFF) scheduling of the refrigerator compressors to maintain their respective temperatures within specified hysteresis bands—commensurate with accommodating food safety standards. Custom Model Predictive Control (MPC) schemes and a Machine Learning algorithm (Reinforcement Learning) are researched to realize an appropriate scheduling methodology which is implemented through COTS IoT hardware. Benefits afforded by the proposed schemes are investigated through experimental trials which show that the co-ordinated operation of domestic refrigerators can 1) reduce the peak power consumption as seen from the perspective of the electrical power grid (i.e. peak power shaving), 2) can adaptively control the temperature hysteresis band of individual refrigerators to increase operational efficiency, and 3) contribute to a widely distributed aggregated load shed for Demand Side Response purposes in order to aid grid stability. Comparative studies of measurements from experimental trials show that the co-ordinated scheduling of refrigerators allows energy savings of between 19% and 29% compared to their traditional isolated (non-co-operative) operation. Moreover, by adaptively changing the hysteresis bands of individual fridges in response to changes in thermal behaviour, a further 20% of savings in energy are possible at local refrigerator level, thereby providing benefits to both network supplier and individual consumer
    • …
    corecore