25 research outputs found

    The Data Mining OPtimization Ontology

    Get PDF
    The Data Mining OPtimization Ontology (DMOP) has been developed to support informed decision-making at various choice points of the data mining process. The ontology can be used by data miners and deployed in ontology-driven information systems. The primary purpose for which DMOP has been developed is the automation of algorithm and model selection through semantic meta-mining that makes use of an ontology-based meta-analysis of complete data mining processes in view of extracting patterns associated with mining performance. To this end, DMOP contains detailed descriptions of data mining tasks (e.g., learning, feature selection), data, algorithms, hypotheses such as mined models or patterns, and workflows. A development methodology was used for DMOP, including items such as competency questions and foundational ontology reuse. Several non-trivial modeling problems were encountered and due to the complexity of the data mining details, the ontology requires the use of the OWL 2 DL profile. DMOP was successfully evaluated for semantic meta-mining and used in constructing the Intelligent Discovery Assistant, deployed at the popular data mining environment RapidMiner

    Exploring Reasoning with the DMOP Ontology

    Get PDF
    We describe the Data Mining OPtimization Ontology (DMOP), which was developed to support informed decision-making at various choice points of the knowledge discovery (KD) process. DMOP contains in-depth descriptions of DM tasks, data, algorithms, hypotheses, and workflows. Its development raised a number of non-trivial modeling problems, the solution to which demanded maximal exploitation of OWL 2 representational potential. The choices made led to v5.4 of the DMOP ontology. We report some evaluations on processing DMOP with a standard reasoner by considering different DMOP features

    MOODY: An ontology-driven framework for standardizing multi-objective evolutionary algorithms

    Get PDF
    The application of semantic technologies, particularly ontologies, in the realm of multi-objective evolutionary algorithms is overlook despite their effectiveness in knowledge representation. In this paper, we introduce MOODY, an ontology specifically tailored to formalize these kinds of algorithms, encompassing their respective parameters, and multi-objective optimization problems based on a characterization of their search space landscapes. MOODY is designed to be particularly applicable in automatic algorithm configuration, which involves the search of the parameters of an optimization algorithm to optimize its performance. In this context, we observe a notable absence of standardized components, parameters, and related considerations, such as problem characteristics and algorithm configurations. This lack of standardization introduces difficulties in the selection of valid component combinations and in the re-use of algorithmic configurations between different algorithm implementations. MOODY offers a means to infuse semantic annotations into the configurations found by automatic tools, enabling efficient querying of the results and seamless integration across diverse sources through their incorporation into a knowledge graph. We validate our proposal by presenting four case studies.Funding for open Access charge: Universidad de Málaga / CBUA. This work has been partially funded by the Spanish Ministry of Science and Innovation via Grant PID2020-112540RB-C41 (AEI/FEDER, UE) and the Andalusian PAIDI program with grant P18-RT-2799. José F. Aldana-Martín is supported by Grant PRE2021-098594 (Spanish Ministry of Science, Innovation and Universities)

    Semantic descriptor for intelligence services

    Get PDF
    The exposition and discovery of intelligence especially for connected devices and autonomous systems have become an important area of the research towards an all-intelligent world. In this article, it a semantic description of functions is proposed and used to provide intelligence services mainly for networked devices. The semantic descriptors aim to provide interoperability between multiple domains' vocabularies, data models, and ontologies, so that device applications become able to deploy them autonomously once they are onboarded in the device or system platform. The proposed framework supports the discovery, onboarding, and updating of the services by providing descriptions of their execution environment, software dependencies, policies and data inputs required, as well as the outputs produced, to enable application decoupling from the AI functions

    Lightweight Knowledge Representations for Automating Data Analysis

    Full text link
    The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those data, the algorithms/models that are fed the data, and how all of these facets interweave. In this work, we take the first steps towards automating a key aspect of the data science pipeline: data analysis. We present an extensible taxonomy of data analytic operations that scopes across domains and data, as well as a method for codifying domain-specific knowledge that links this analytics taxonomy to actual data. We validate the functionality of our analytics taxonomy by implementing a system that leverages it, alongside domain labelings for 8 distinct domains, to automatically generate a space of answerable questions and associated analytic plans. In this way, we produce information spaces over data that enable complex analyses and search over this data and pave the way for fully automated data analysis

    Meta-QSAR: a large-scale application of meta-learning to drug design and discovery.

    Get PDF
    We investigate the learning of quantitative structure activity relationships (QSARs) as a case-study of meta-learning. This application area is of the highest societal importance, as it is a key step in the development of new medicines. The standard QSAR learning problem is: given a target (usually a protein) and a set of chemical compounds (small molecules) with associated bioactivities (e.g. inhibition of the target), learn a predictive mapping from molecular representation to activity. Although almost every type of machine learning method has been applied to QSAR learning there is no agreed single best way of learning QSARs, and therefore the problem area is well-suited to meta-learning. We first carried out the most comprehensive ever comparison of machine learning methods for QSAR learning: 18 regression methods, 3 molecular representations, applied to more than 2700 QSAR problems. (These results have been made publicly available on OpenML and represent a valuable resource for testing novel meta-learning methods.) We then investigated the utility of algorithm selection for QSAR problems. We found that this meta-learning approach outperformed the best individual QSAR learning method (random forests using a molecular fingerprint representation) by up to 13%, on average. We conclude that meta-learning outperforms base-learning methods for QSAR learning, and as this investigation is one of the most extensive ever comparisons of base and meta-learning methods ever made, it provides evidence for the general effectiveness of meta-learning over base-learning

    MIRO: guidelines for minimum information for the reporting of an ontology

    Full text link
    corecore