6,216 research outputs found

    Temporal Feature Selection with Symbolic Regression

    Get PDF
    Building and discovering useful features when constructing machine learning models is the central task for the machine learning practitioner. Good features are useful not only in increasing the predictive power of a model but also in illuminating the underlying drivers of a target variable. In this research we propose a novel feature learning technique in which Symbolic regression is endowed with a ``Range Terminal\u27\u27 that allows it to explore functions of the aggregate of variables over time. We test the Range Terminal on a synthetic data set and a real world data in which we predict seasonal greenness using satellite derived temperature and snow data over a portion of the Arctic. On the synthetic data set we find Symbolic regression with the Range Terminal outperforms standard Symbolic regression and Lasso regression. On the Arctic data set we find it outperforms standard Symbolic regression, fails to beat the Lasso regression, but finds useful features describing the interaction between Land Surface Temperature, Snow, and seasonal vegetative growth in the Arctic

    FIBS: A Generic Framework for Classifying Interval-based Temporal Sequences

    Full text link
    We study the problem of classifying interval-based temporal sequences (IBTSs). Since common classification algorithms cannot be directly applied to IBTSs, the main challenge is to define a set of features that effectively represents the data such that classifiers can be applied. Most prior work utilizes frequent pattern mining to define a feature set based on discovered patterns. However, frequent pattern mining is computationally expensive and often discovers many irrelevant patterns. To address this shortcoming, we propose the FIBS framework for classifying IBTSs. FIBS extracts features relevant to classification from IBTSs based on relative frequency and temporal relations. To avoid selecting irrelevant features, a filter-based selection strategy is incorporated into FIBS. Our empirical evaluation on eight real-world datasets demonstrates the effectiveness of our methods in practice. The results provide evidence that FIBS effectively represents IBTSs for classification algorithms, which contributes to similar or significantly better accuracy compared to state-of-the-art competitors. It also suggests that the feature selection strategy is beneficial to FIBS's performance.Comment: In: Big Data Analytics and Knowledge Discovery. DaWaK 2020. Springer, Cha

    Scallop: A Language for Neurosymbolic Programming

    Full text link
    We present Scallop, a language which combines the benefits of deep learning and logical reasoning. Scallop enables users to write a wide range of neurosymbolic applications and train them in a data- and compute-efficient manner. It achieves these goals through three key features: 1) a flexible symbolic representation that is based on the relational data model; 2) a declarative logic programming language that is based on Datalog and supports recursion, aggregation, and negation; and 3) a framework for automatic and efficient differentiable reasoning that is based on the theory of provenance semirings. We evaluate Scallop on a suite of eight neurosymbolic applications from the literature. Our evaluation demonstrates that Scallop is capable of expressing algorithmic reasoning in diverse and challenging AI tasks, provides a succinct interface for machine learning programmers to integrate logical domain knowledge, and yields solutions that are comparable or superior to state-of-the-art models in terms of accuracy. Furthermore, Scallop's solutions outperform these models in aspects such as runtime and data efficiency, interpretability, and generalizability

    The 11th Conference of PhD Students in Computer Science

    Get PDF

    A Formal Verification Environment for Use in the Certification of Safety-Related C Programs

    Get PDF
    In this thesis the design of an environment for the formal verification of functional properties of safety-related software written in the programming language C is described. The focus lies on the verification of (primarily) geometric computations. We give an overview of the applicable regulations for safety-related software systems. We define a combination of higher-order logic as formalised in the theorem prover Isabelle and a specification language syntactically based on C expressions. The language retains the mathematical character of higher-level specifications in code specifications. A memory model for C is formalised which is appropriate to model low-level memory operations while keeping the entailed verification overhead in tolerable bounds. Finally, a Hoare style proof calculus is devised so that correctness proofs can be performed in one integrated framework. The applicability of the approach is demonstrated by describing its use in an industrial project

    Prediction of high-performance concrete compressive strength through a comparison of machine learning techniques

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceHigh-performance concrete (HPC) is a highly complex composite material whose characteristics are extremely difficult to model. One of those characteristics is the concrete compressive strength, a nonlinear function of the same ingredients that compose HPC: cement, fly ash, blast furnace slag, water, superplasticizer, age, and coarse and fine aggregates. Research has shown time and time again that concrete strength is not determined just by the water-to-cement ratio, which was for years the go to metric. In addition, traditional methods that attempt to model HPC, such as regression analysis, do not provide sufficient prediction power due to nonlinear proprieties of the mixture. Therefore, this study attempts to optimize the prediction and modeling of the compressive strength of HPC by analyzing seven different machine learning (ML) algorithms: three regularization algorithms (Lasso, Ridge and Elastic Net), three ensemble algorithms (Random Forest, Gradient Boost and AdaBoost), and Artificial Neural Networks. All techniques were built and tested with a dataset composed of data from 17 different concrete strength test laboratories, under the same experimental conditions, which enabled a fair comparison amongst them and between different previous studies in the field. Feature importance analysis and outlier analysis were also performed, and all models were subject to a Wilcoxon Signed-Ranks Test to ensure statistically significant results. The final results show that the more complex ML algorithms provided greater accuracy than the regularization techniques, with Gradient Boost being the superior model amongst them, providing more accurate predictions than the sate-of-the-art. Better results were achieved using all variables and without removing outlier observations

    Prospects for Declarative Mathematical Modeling of Complex Biological Systems

    Full text link
    Declarative modeling uses symbolic expressions to represent models. With such expressions one can formalize high-level mathematical computations on models that would be difficult or impossible to perform directly on a lower-level simulation program, in a general-purpose programming language. Examples of such computations on models include model analysis, relatively general-purpose model-reduction maps, and the initial phases of model implementation, all of which should preserve or approximate the mathematical semantics of a complex biological model. The potential advantages are particularly relevant in the case of developmental modeling, wherein complex spatial structures exhibit dynamics at molecular, cellular, and organogenic levels to relate genotype to multicellular phenotype. Multiscale modeling can benefit from both the expressive power of declarative modeling languages and the application of model reduction methods to link models across scale. Based on previous work, here we define declarative modeling of complex biological systems by defining the operator algebra semantics of an increasingly powerful series of declarative modeling languages including reaction-like dynamics of parameterized and extended objects; we define semantics-preserving implementation and semantics-approximating model reduction transformations; and we outline a "meta-hierarchy" for organizing declarative models and the mathematical methods that can fruitfully manipulate them

    How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors?

    Full text link
    Surging interest in deep learning from high-stakes domains has precipitated concern over the inscrutable nature of black box neural networks. Explainable AI (XAI) research has led to an abundance of explanation algorithms for these black boxes. Such post hoc explainers produce human-comprehensible explanations, however, their fidelity with respect to the model is not well understood - explanation evaluation remains one of the most challenging issues in XAI. In this paper, we ask a targeted but important question: can popular feature-additive explainers (e.g., LIME, SHAP, SHAPR, MAPLE, and PDP) explain feature-additive predictors? Herein, we evaluate such explainers on ground truth that is analytically derived from the additive structure of a model. We demonstrate the efficacy of our approach in understanding these explainers applied to symbolic expressions, neural networks, and generalized additive models on thousands of synthetic and several real-world tasks. Our results suggest that all explainers eventually fail to correctly attribute the importance of features, especially when a decision-making process involves feature interactions.Comment: Accepted to NeurIPS Workshop XAI in Action: Past, Present, and Future Applications. arXiv admin note: text overlap with arXiv:2106.0837

    A Matrix-Based Heuristic Algorithm for Extracting Multiword Expressions from a Corpus

    Get PDF
    This paper describes an algorithm for automatically extracting multiword expressions (MWEs) from a corpus. The algorithm is node-based, ie extracts MWEs that contain the item specified by the user, using a fixed window-size around the node. The main idea is to detect the frequency anomalies that occur at the starting and ending points of an ngram that constitutes a MWE. This is achieved by locally comparing matrices of observed frequencies to matrices of expected frequencies, and determining, for each individual input, one or more sub-sequences that have the highest probability of being a MWE. Top-performing sub-sequences are then combined in a score-aggregation and ranking stage, thus producing a single list of score-ranked MWE candidates, without having to indiscriminately generate all possible sub-sequences of the input strings. The knowledge-poor and computationally efficient algorithm attempts to solve certain recurring problems in MWE extraction, such as the inability to deal with MWEs of arbitrary length, the repetitive counting of nested ngrams, and excessive sensitivity to frequency. Evaluation results show that the best-performing version generates top-50 precision values between 0.71 and 0.88 on Turkish and English data, and performs better than the baseline method even at n= 1000
    • …
    corecore