34 research outputs found

    Logic-based machine learning using a bounded hypothesis space: the lattice structure, refinement operators and a genetic algorithm approach

    Get PDF
    Rich representation inherited from computational logic makes logic-based machine learning a competent method for application domains involving relational background knowledge and structured data. There is however a trade-off between the expressive power of the representation and the computational costs. Inductive Logic Programming (ILP) systems employ different kind of biases and heuristics to cope with the complexity of the search, which otherwise is intractable. Searching the hypothesis space bounded below by a bottom clause is the basis of several state-of-the-art ILP systems (e.g. Progol and Aleph). However, the structure of the search space and the properties of the refinement operators for theses systems have not been previously characterised. The contributions of this thesis can be summarised as follows: (i) characterising the properties, structure and morphisms of bounded subsumption lattice (ii) analysis of bounded refinement operators and stochastic refinement and (iii) implementation and empirical evaluation of stochastic search algorithms and in particular a Genetic Algorithm (GA) approach for bounded subsumption. In this thesis we introduce the concept of bounded subsumption and study the lattice and cover structure of bounded subsumption. We show the morphisms between the lattice of bounded subsumption, an atomic lattice and the lattice of partitions. We also show that ideal refinement operators exist for bounded subsumption and that, by contrast with general subsumption, efficient least and minimal generalisation operators can be designed for bounded subsumption. In this thesis we also show how refinement operators can be adapted for a stochastic search and give an analysis of refinement operators within the framework of stochastic refinement search. We also discuss genetic search for learning first-order clauses and describe a framework for genetic and stochastic refinement search for bounded subsumption. on. Finally, ILP algorithms and implementations which are based on this framework are described and evaluated.Open Acces

    Human-machine scientific discovery

    Get PDF
    International audienceHumanity is facing existential, societal challenges related to food security, ecosystem conservation, antimicrobial resistance, etc, and Artificial Intelligence (AI) is already playing an important role in tackling these new challenges. Most current AI approaches are limited when it comes to ‘knowledge transfer’ with humans, i.e. it is difficult to incorporate existing human knowledge and also the output knowledge is not human comprehensible. In this chapter we demonstrate how a combination of comprehensible machine learning, text-mining and domain knowledge could enhance human-machine collaboration for the purpose of automated scientific discovery where humans and computers jointly develop and evaluate scientific theories. As a case study, we describe a combination of logic-based machine learning (which included human-encoded ecological background knowledge) and text-mining from scientific publications (to verify machine-learned hypotheses) for the purpose of automated discovery of ecological interaction networks (food-webs) to detect change in agricultural ecosystems using the Farm Scale Evaluations (FSEs) of genetically modified herbicide-tolerant (GMHT) crops dataset. The results included novel food-web hypotheses, some confirmed by subsequent experimental studies (e.g. DNA analysis) and published in scientific journals. These machine-leaned food-webs were also used as the basis of a recent study revealing resilience of agro-ecosystems to changes in farming management using GMHT crops

    Next-Generation Global Biomonitoring: Large-scale, Automated Reconstruction of Ecological Networks

    Get PDF
    We foresee a new global-scale, ecological approach to biomonitoring emerging within the next decade that can detect ecosystem change accurately, cheaply, and generically. Next-generation sequencing of DNA sampled from the Earth's environments would provide data for the relative abundance of operational taxonomic units or ecological functions. Machine-learning methods would then be used to reconstruct the ecological networks of interactions implicit in the raw NGS data. Ultimately, we envision the development of autonomous samplers that would sample nucleic acids and upload NGS sequence data to the cloud for network reconstruction. Large numbers of these samplers, in a global array, would allow sensitive automated biomonitoring of the Earth's major ecosystems at high spatial and temporal resolution, revolutionising our understanding of ecosystem change

    Automated Discovery of Food Webs from Ecological Data Using Logic-Based Machine Learning

    Get PDF
    Networks of trophic links (food webs) are used to describe and understand mechanistic routes for translocation of energy (biomass) between species. However, a relatively low proportion of ecosystems have been studied using food web approaches due to difficulties in making observations on large numbers of species. In this paper we demonstrate that Machine Learning of food webs, using a logic-based approach called A/ILP, can generate plausible and testable food webs from field sample data. Our example data come from a national-scale Vortis suction sampling of invertebrates from arable fields in Great Britain. We found that 45 invertebrate species or taxa, representing approximately 25% of the sample and about 74% of the invertebrate individuals included in the learning, were hypothesized to be linked. As might be expected, detritivore Collembola were consistently the most important prey. Generalist and omnivorous carabid beetles were hypothesized to be the dominant predators of the system. We were, however, surprised by the importance of carabid larvae suggested by the machine learning as predators of a wide variety of prey. High probability links were hypothesized for widespread, potentially destabilizing, intra-guild predation; predictions that could be experimentally tested. Many of the high probability links in the model have already been observed or suggested for this system, supporting our contention that A/ILP learning can produce plausible food webs from sample data, independent of our preconceptions about “who eats whom.” Well-characterised links in the literature correspond with links ascribed with high probability through A/ILP. We believe that this very general Machine Learning approach has great power and could be used to extend and test our current theories of agricultural ecosystem dynamics and function. In particular, we believe it could be used to support the development of a wider theory of ecosystem responses to environmental change

    Meta-interpretive learning of higher-order dyadic datalog: predicate invention revisited

    Full text link
    Since the late 1990s predicate invention has been under-explored within inductive logic programming due to difficulties in formulating efficient search mechanisms. However, a recent paper demonstrated that both predicate invention and the learning of recursion can be efficiently implemented for regular and context-free grammars, by way of metalogical substitutions with respect to a modified Prolog meta-interpreter which acts as the learning engine. New predicate symbols are introduced as constants representing existentially quantified higher-order variables. The approach demonstrates that predicate invention can be treated as a form of higher-order logical reasoning. In this paper we generalise the approach of meta-interpretive learning (MIL) to that of learning higher-order dyadic datalog programs. We show that with an infinite signature the higher-order dyadic datalog classH22H^2_2H22has universal Turing expressivity thoughH22H^2_2H22is decidable given a finite signature. Additionally we show that Knuth–Bendix ordering of the hypothesis space together with logarithmic clause bounding allows our MIL implementation MetagolD_{D}Dto PAC-learn minimal cardinalityH22H^2_2H22definitions. This result is consistent with our experiments which indicate that MetagolD_{D}Defficiently learns compactH22H^2_2H22definitions involving predicate invention for learning robotic strategies, the East–West train challenge and NELL. Additionally higher-order concepts were learned in the NELL language learning domain. The Metagol code and datasets described in this paper have been made publicly available on a website to allow reproduction of results in this paper

    A data-driven approach for characterising revenues of South-Asian long-haul low-cost carriers per equivalent flight capacity per block hour

    No full text
    This study sets out to investigate the revenue characteristics of long-haul low-cost carriers (LHLCCs) in the Southeast Asian market using a recently established metric that calculates revenue per equivalent flight capacity per block hour (REB). REB allows for the accurate comparison of airlines on different routes with varying cabin configurations, and stage lengths. The majority of the data was sourced from Sabre MIDT and OAG, while supplementary data were from other various sources. In addition to REB, the study investigates: How do LHLCCs yields compare to full-service network carriers (FSNCs)? Are LHLCCs positively impacted by smaller share of connecting passengers? Are LHLCCs positively impacted by ancillary revenues? Do LHLCCs benefit from higher load factors and seat densities? Results show that LHLCCs performed 26.6% less in overall REB compared to their FSNC counterpart despite LHLCCs generating 43.9% less yield. This is a result of less revenue diluting connecting passengers, higher average ancillary revenue per block hour, higher average load factors and higher average seat densities for LHLCCs. On a route-level, some LHLCC operations can equally perform or outperform competition's revenue performance. Furthermore, the findings are mostly consistent with earlier REB research conducted on the North Atlantic market by Soyk et al. (2018). Revenue and operating characteristic showed the same trends in both markets, although to varying degrees. This study exposes, with high detail, the revenue generating inner-workings of the elusive long-haul low-cost model in its second largest market, and compares it on equal grounds to their full-service network competition. We learn that airlines can drive the airfare down while minimising the loss of revenue per flight capacity of the aircraft by adjusting for the numerous variables that directly impact it, such as seat density, cabin configuration, ancillary revenues, load factors, and percentage of connecting passengers
    corecore