24 research outputs found

    Ontology of core data mining entities

    Get PDF
    In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

    Ontologies to Enable Interoperability of Multi-Agent Electricity Markets Simulation and Decision Support

    Get PDF
    This paper presents the AiD-EM Ontology, which provides a semantic representation of the concepts required to enable the interoperability between multi-agent-based decision support systems, namely AiD-EM, and the market agents that participate in electricity market simulations. Electricity markets’ constant changes, brought about by the increasing necessity for adequate integration of renewable energy sources, make them complex and dynamic environments with very particular characteristics. Several modeling tools directed at the study and decision support in the scope of the restructured wholesale electricity markets have emerged. However, a common limitation is identified: the lack of interoperability between the various systems. This gap makes it impossible to exchange information and knowledge between them, test different market models, enable players from heterogeneous systems to interact in common market environments, and take full advantage of decision support tools. To overcome this gap, this paper presents the AiD-EM Ontology, which includes the necessary concepts related to the AiD-EM multi-agent decision support system, to enable interoperability with easier cooperation and adequate communication between AiD-EM and simulated market agents wishing to take advantage of this decision support toolThis work has received funding from the EU Horizon 2020 research and innovation program under project TradeRES (grant agreement No 864276), from FEDER Funds through COMPETE program and from National Funds through (FCT) under projects CEECIND/01811/2017 and UID/EEA/00760/2019. Gabriel Santos was supported by the PhD grant SFRH/BD/118487/2016 from National Funds through FCTinfo:eu-repo/semantics/publishedVersio

    Ontologias para Manutenção Preditiva com Dados sensíveis ao tempo

    Get PDF
    As empresas de fabrico industrial devem assegurar um processo produtivo contínuo para serem competitivas e fornecer os produtos fabricados no prazo e com a qualidade exigida pelos clientes. A quebra da cadeia de fabrico pode ter desfechos graves, resultando numa redução da produção e na interrupção da cadeia de abastecimento. Estes processos são compostos por cadeias de máquinas que executam tarefas em etapas. Cada máquina tem uma tarefa específica a executar, e o resultado de cada etapa é fornecido à próxima etapa. Uma falha imprevista numa das máquinas tende a interromper toda a cadeia produtiva. A manutenção preventiva agendada tem como objetivo evitar a ocorrência de falhas, tendo como base o tempo médio antes da falha (MTBF), que representa a expectativa média de vida de componentes individuais com base em dados históricos. As tarefas de manutenção podem implicar um período de paralisação e a interrupção da produção. Esta manutenção é executada rotineiramente e a substituição de componentes não considera a necessidade premente da sua substituição, sendo os mesmos substituídos com base no ciclo do agendamento. É aqui que a manutenção preditiva é aplicável. Efetuando a recolha de dados de sensores dos equipamentos, é possível detetar irregularidades nos dados recolhidos, através da aplicação de processos de raciocínio e inferência, conduzindo à atempada previsão e deteção de falhas. Levando este cenário à otimização do tempo de manutenção, evitando falhas inesperadas, à redução de custos e ao aumento da produtividade em comparação com a manutenção preventiva. Os dados fornecidos pelos sensores são sensíveis ao tempo, variações e flutuações ocorrem ao longo do tempo e devem ser analisados em relação ao período em que ocorrem. Esta dissertação tem como objetivo o desenvolvimento de uma ontologia para a manutenção preditiva que descreva a sua abrangência e o campo da sua aplicação. A aplicabilidade da ontologia será demonstrada com uma ferramenta, igualmente desenvolvida, que transforma dados sensíveis ao tempo recolhidos em tempo real a partir de sensores de máquinas industriais, fornecidos por WebServices, em indivíduos dessa mesma ontologia, considerando a representação do fator temporal dos dados.Manufacturing companies must ensure a continuous production process to be competitive and supply the manufactured goods in time and with the desired quality the customers expect. Any disruption in the manufacturing chain may have disastrous consequences, representing a shortage of production and the interruption of the supply chain. The manufacturing processes are composed of a chain of industrial machines operating in stages. Each machine has a specific task to complete, and the result of each stage is forwarded to the next stage. An unpredicted malfunction of one of the machines tends to interrupt the whole production chain. Scheduled Preventive maintenance intends to avoid causes leading to faults, but relies on parameters such as Mean Time Before Failure (MTBF), which represents the average expected life span of individual components based on statistical data. A maintenance task may lead to a period of downtime and consequently to a production halt. Being the maintenance scheduled and executed routinely, the replacement of components, does not consider the effective need of its replacement, they are replaced based on the scheduling cycle. This is where predictive maintenance is applicable. By collecting sensor data of industrial equipment, anomalies can be determined through reasoning and inference processes applied to the data, leading to an early fault and time to failure prediction. This scenario leads to maintenance timing optimization, avoidance of unexpected failures, cost savings and improved productivity when compared to preventive maintenance. Data supplied by sensors is timesensitive, as variations and fluctuations occur over periods of time and must be analysed concerning the period they occur. This dissertation aims to develop an ontology for predictive maintenance that describes the scope and field of application. The applicability of the ontology will be demonstrated with a tool, also to be developed, that transforms time-sensitive data collected in real time from sensors of industrial machines, provided by a WebServices, into individuals of the same ontology, considering the representation of the temporal factor of the data

    OpenTox predictive toxicology framework: toxicological ontology and semantic media wiki-based OpenToxipedia

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The OpenTox Framework, developed by the partners in the OpenTox project (<url>http://www.opentox.org</url>), aims at providing a unified access to toxicity data, predictive models and validation procedures. Interoperability of resources is achieved using a common information model, based on the OpenTox ontologies, describing predictive algorithms, models and toxicity data. As toxicological data may come from different, heterogeneous sources, a deployed ontology, unifying the terminology and the resources, is critical for the rational and reliable organization of the data, and its automatic processing.</p> <p>Results</p> <p>The following related ontologies have been developed for OpenTox: a) Toxicological ontology – listing the toxicological endpoints; b) Organs system and Effects ontology – addressing organs, targets/examinations and effects observed in <it>in vivo</it> studies; c) ToxML ontology – representing semi-automatic conversion of the ToxML schema; d) OpenTox ontology– representation of OpenTox framework components: chemical compounds, datasets, types of algorithms, models and validation web services; e) ToxLink–ToxCast assays ontology and f) OpenToxipedia community knowledge resource on toxicology terminology.</p> <p>OpenTox components are made available through standardized REST web services, where every compound, data set, and predictive method has a unique resolvable address (URI), used to retrieve its Resource Description Framework (RDF) representation, or to initiate the associated calculations and generate new RDF-based resources.</p> <p>The services support the integration of toxicity and chemical data from various sources, the generation and validation of computer models for toxic effects, seamless integration of new algorithms and scientifically sound validation routines and provide a flexible framework, which allows building arbitrary number of applications, tailored to solving different problems by end users (e.g. toxicologists).</p> <p>Availability</p> <p>The OpenTox toxicological ontology projects may be accessed via the OpenTox ontology development page <url>http://www.opentox.org/dev/ontology</url>; the OpenTox ontology is available as OWL at <url>http://opentox.org/api/1 1/opentox.owl</url>, the ToxML - OWL conversion utility is an open source resource available at <url>http://ambit.svn.sourceforge.net/viewvc/ambit/branches/toxml-utils/</url></p

    The Ontology of Biological and Clinical Statistics (OBCS) for standardized and reproducible statistical analysis

    Get PDF
    Statistics play a critical role in biological and clinical research. However, most reports of scientific results in the published literature make it difficult for the reader to reproduce the statistical analyses performed in achieving those results because they provide inadequate documentation of the statistical tests and algorithms applied. The Ontology of Biological and Clinical Statistics (OBCS) is put forward here as a step towards solving this problem. Terms in OBCS, including ‘data collection’, ‘data transformation in statistics’, ‘data visualization’, ‘statistical data analysis’, and ‘drawing a conclusion based on data’, cover the major types of statistical processes used in basic biological research and clinical outcome studies. OBCS is aligned with the Basic Formal Ontology (BFO) and extends the Ontology of Biomedical Investigations (OBI), an OBO (Open Biological and Biomedical Ontologies) Foundry ontology supported by over 20 research communities. We discuss two examples illustrating how the ontology is being applied. In the first (biological) use case, we describe how OBCS was applied to represent the high throughput microarray data analysis of immunological transcriptional profiles in human subjects vaccinated with an influenza vaccine. In the second (clinical outcomes) use case, we applied OBCS to represent the processing of electronic health care data to determine the associations between hospital staffing levels and patient mortality. Our case studies were designed to show how OBCS can be used for the consistent representation of statistical analysis pipelines under two different research paradigms. By representing statistics-related terms and their relations in a rigorous fashion, OBCS facilitates standard data analysis and integration, and supports reproducible biological and clinical research

    The Data Mining OPtimization Ontology

    Get PDF
    The Data Mining OPtimization Ontology (DMOP) has been developed to support informed decision-making at various choice points of the data mining process. The ontology can be used by data miners and deployed in ontology-driven information systems. The primary purpose for which DMOP has been developed is the automation of algorithm and model selection through semantic meta-mining that makes use of an ontology-based meta-analysis of complete data mining processes in view of extracting patterns associated with mining performance. To this end, DMOP contains detailed descriptions of data mining tasks (e.g., learning, feature selection), data, algorithms, hypotheses such as mined models or patterns, and workflows. A development methodology was used for DMOP, including items such as competency questions and foundational ontology reuse. Several non-trivial modeling problems were encountered and due to the complexity of the data mining details, the ontology requires the use of the OWL 2 DL profile. DMOP was successfully evaluated for semantic meta-mining and used in constructing the Intelligent Discovery Assistant, deployed at the popular data mining environment RapidMiner

    Exploiting semantic web knowledge graphs in data mining

    Full text link
    Data Mining and Knowledge Discovery in Databases (KDD) is a research field concerned with deriving higher-level insights from data. The tasks performed in that field are knowledge intensive and can often benefit from using additional knowledge from various sources. Therefore, many approaches have been proposed in this area that combine Semantic Web data with the data mining and knowledge discovery process. Semantic Web knowledge graphs are a backbone of many information systems that require access to structured knowledge. Such knowledge graphs contain factual knowledge about real word entities and the relations between them, which can be utilized in various natural language processing, information retrieval, and any data mining applications. Following the principles of the Semantic Web, Semantic Web knowledge graphs are publicly available as Linked Open Data. Linked Open Data is an open, interlinked collection of datasets in machine-interpretable form, covering most of the real world domains. In this thesis, we investigate the hypothesis if Semantic Web knowledge graphs can be exploited as background knowledge in different steps of the knowledge discovery process, and different data mining tasks. More precisely, we aim to show that Semantic Web knowledge graphs can be utilized for generating valuable data mining features that can be used in various data mining tasks. Identifying, collecting and integrating useful background knowledge for a given data mining application can be a tedious and time consuming task. Furthermore, most data mining tools require features in propositional form, i.e., binary, nominal or numerical features associated with an instance, while Linked Open Data sources are usually graphs by nature. Therefore, in Part I, we evaluate unsupervised feature generation strategies from types and relations in knowledge graphs, which are used in different data mining tasks, i.e., classification, regression, and outlier detection. As the number of generated features grows rapidly with the number of instances in the dataset, we provide a strategy for feature selection in hierarchical feature space, in order to select only the most informative and most representative features for a given dataset. Furthermore, we provide an end-to-end tool for mining the Web of Linked Data, which provides functionalities for each step of the knowledge discovery process, i.e., linking local data to a Semantic Web knowledge graph, integrating features from multiple knowledge graphs, feature generation and selection, and building machine learning models. However, we show that such feature generation strategies often lead to high dimensional feature vectors even after dimensionality reduction, and also, the reusability of such feature vectors across different datasets is limited. In Part II, we propose an approach that circumvents the shortcomings introduced with the approaches in Part I. More precisely, we develop an approach that is able to embed complete Semantic Web knowledge graphs in a low dimensional feature space, where each entity and relation in the knowledge graph is represented as a numerical vector. Projecting such latent representations of entities into a lower dimensional feature space shows that semantically similar entities appear closer to each other. We use several Semantic Web knowledge graphs to show that such latent representation of entities have high relevance for different data mining tasks. Furthermore, we show that such features can be easily reused for different datasets and different tasks. In Part III, we describe a list of applications that exploit Semantic Web knowledge graphs, besides the standard data mining tasks, like classification and regression. We show that the approaches developed in Part I and Part II can be used in applications in various domains. More precisely, we show that Semantic Web graphs can be exploited for analyzing statistics, building recommender systems, entity and document modeling, and taxonomy induction. %In Part III, we focus on semantic annotations in HTML pages, which are another realization of the Semantic Web vision. Semantic annotations are integrated into the code of HTML pages using markup languages, like Microformats, RDFa, and Microdata. While such data covers various domains and topics, and can be useful for developing various data mining applications, additional steps of cleaning and integrating the data need to be performed. In this thesis, we describe a set of approaches for processing long literals and images extracted from semantic annotations in HTML pages. We showcase the approaches in the e-commerce domain. Such approaches contribute in building and consuming Semantic Web knowledge graphs
    corecore