879 research outputs found

    Artificial Intelligence methodologies to early predict student outcome and enrich learning material

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    An overview of recent distributed algorithms for learning fuzzy models in Big Data classification

    Get PDF
    AbstractNowadays, a huge amount of data are generated, often in very short time intervals and in various formats, by a number of different heterogeneous sources such as social networks and media, mobile devices, internet transactions, networked devices and sensors. These data, identified as Big Data in the literature, are characterized by the popular Vs features, such as Value, Veracity, Variety, Velocity and Volume. In particular, Value focuses on the useful knowledge that may be mined from data. Thus, in the last years, a number of data mining and machine learning algorithms have been proposed to extract knowledge from Big Data. These algorithms have been generally implemented by using ad-hoc programming paradigms, such as MapReduce, on specific distributed computing frameworks, such as Apache Hadoop and Apache Spark. In the context of Big Data, fuzzy models are currently playing a significant role, thanks to their capability of handling vague and imprecise data and their innate characteristic to be interpretable. In this work, we give an overview of the most recent distributed learning algorithms for generating fuzzy classification models for Big Data. In particular, we first show some design and implementation details of these learning algorithms. Thereafter, we compare them in terms of accuracy and interpretability. Finally, we argue about their scalability

    XML Matchers: approaches and challenges

    Full text link
    Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure

    Intelligent blockchain management for distributed knowledge graphs in IoT 5G environments

    Get PDF
    This article introduces a new problem of distributed knowledge graph, in IoT 5G setting. We developed an end-to-end solution for solving such problem by exploring the blockchain management and intelligent method for producing the better matching of the concepts and relations of the set of knowledge graphs. The concepts and the relations of the knowledge graphs are divided into several components, each of which contains similar concepts and relations. Instead of exploring the whole concepts and the relations of the knowledge graphs, only the representative of these components is compared during the matching process. The framework has outperformed state-of-the-art knowledge graph matching algorithms using different scenarios as input in the experiments. In addition, to confirm the usability of our suggested framework, an in-depth experimental analysis has been done; the results are very promising in both runtime and accuracy.publishedVersio

    Scalable Approaches for Auditing the Completeness of Biomedical Ontologies

    Get PDF
    An ontology provides a formalized representation of knowledge within a domain. In biomedicine, ontologies have been widely used in modern biomedical applications to enable semantic interoperability and facilitate data exchange. Given the important roles that biomedical ontologies play, quality issues such as incompleteness, if not addressed, can affect the quality of downstream ontology-driven applications. However, biomedical ontologies often have large sizes and complex structures. Thus, it is infeasible to uncover potential quality issues through manual effort. In this dissertation, we introduce automated and scalable approaches for auditing the completeness of biomedical ontologies. We mainly focus on two incompleteness issues -- missing hierarchical relations and missing concepts. To identify missing hierarchical relations, we develop three approaches: a lexical-based approach, a hybrid approach utilizing both lexical features and logical definitions, and an approach based on concept name transformation. To identify missing concepts, a lexical-based Formal Concept Analysis (FCA) method is proposed for concept enrichment. We also predict proper concept names for the missing concepts using deep learning techniques. Manual review by domain experts is performed to evaluate these approaches. In addition, we leverage extrinsic knowledge (i.e., external ontologies) to help validate the detected incompleteness issues. The auditing approaches have been applied to a variety of biomedical ontologies, including the SNOMED CT, National Cancer Institute (NCI) Thesaurus and Gene Ontology. In the first lexical-based approach to identify missing hierarchical relations, each concept is modeled with an enriched set of lexical features, leveraging words and noun phrases in the name of the concept itself and the concept\u27s ancestors. Given a pair of concepts that are not linked by a hierarchical relation, if the enriched lexical attributes of one concept is a superset of the other\u27s, a potentially missing hierarchical relation will be suggested. Applying this approach to the September 2017 release of SNOMED CT (US edition) suggested 38,615 potentially missing hierarchical relations. A domain expert reviewed a random sample of 100 potentially missing ones, and confirmed 90 are valid (a precision of 90%). In the second work, a hybrid approach is proposed to detect missing hierarchical relations in non-lattice subgraphs. For each concept, its lexical features are harmonized with role definitions to provide a more comprehensive semantic model. Then a two-step subsumption testing is performed to automatically suggest potentially missing hierarchical relations. This approach identified 55 potentially missing hierarchical relations in the 19.08d version of the NCI Thesaurus. 29 out of 55 were confirmed as valid by the curators from the NCI Enterprise Vocabulary Service (EVS) and have been incorporated in the newer versions of the NCI Thesaurus. 7 out of 55 further revealed incorrect existing hierarchical relations in the NCI Thesaurus. In the third work, we introduce a transformation-based method that leverages the Unified Medical Language System (UMLS) knowledge to identify missing hierarchical relations in its source ontologies. Given a concept name, noun chunks within it are identified and replaced by their more general counterparts to generate new concept names that are supposed to be more general than the original one. Applying this method to the UMLS (2019AB release), a total of 39,359 potentially missing hierarchical relations were detected in 13 source ontologies. Domain experts evaluated a random sample of 200 potentially missing hierarchical relations identified in the SNOMED CT (US edition), and 100 in the Gene Ontology. 173 out of 200 and 63 out of 100 potentially missing hierarchical relations were confirmed by domain experts, indicating our method achieved a precision of 86.5% and 63% for the SNOMED CT and Gene Ontology, respectively. In the work of concept enrichment, we introduce a lexical method based on FCA to identify potentially missing concepts. Lexical features (i.e., words appearing in the concept names) are considered as FCA attributes while generating formal context. Applying multistage intersection on FCA attributes results in newly formalized concepts along with bags of words that can be utilized to name the concepts. This method was applied to the Disease or Disorder sub-hierarchy in the 19.08d version of the NCI Thesaurus and identified 8,983 potentially missing concepts. We performed a preliminary evaluation and validated that 592 out of 8,983 potentially missing concepts were included in external ontologies in the UMLS. After obtaining new concepts and their relevant bags of words, we further developed deep learning-based approaches to automatically predict concept names that comply with the naming convention of a specific ontology. We explored simple neural network, Long Short-Term Memory (LSTM), and Convolutional Neural Network (CNN) combined with LSTM. Our experiments showed that the LSTM-based approach achieved the best performance with an F1 score of 63.41% for predicting names for newly added concepts in the March 2018 release of SNOMED CT (US Edition) and an F1 score of 73.95% for naming missing concepts revealed by our previous work. In the last part of this dissertation, extrinsic knowledge is leveraged to collect supporting evidence for the detected incompleteness issues. We present a work in which cross-ontology evaluation based on extrinsic knowledge from the UMLS is utilized to help validate potentially missing hierarchical relations, aiming at relieving the heavy workload of manual review

    Extraction of Principle Knowledge from Process Patents for Manufacturing Process Innovation

    Get PDF
    Process patents contain substantial knowledge of the principles behind manufacturing process problems-solving; however, this knowledge is implicit in lengthy texts and cannot be directly reused in innovation design. To effectively support systematic manufacturing process innovation, this paper presents an approach to extracting principle innovation knowledge from process patents. The proposed approach consists of (1) classifying process patents by taking process method, manufacturing object and manufacturing feature as the references; (2) extracting generalized process contradiction parameters and the principles behind solving such process contradictions based on patent mining and technology abstraction of TRIZ (the theory of inventive problem solving); and (3) constructing a domain process contradiction matrix and mapping the relationship between the matrix and the corresponding process patents. Finally, a case study is presented to illustrate the applicability of the proposed approach

    Can humain association norm evaluate latent semantic analysis?

    Get PDF
    This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations

    Machine Learning And A Workflow Engine For An Agent-Based Structural Health Monitoring System

    Get PDF
    This thesis reports on work in machine learning and high-performance computing for structural heath monitoring. The data used are acoustic emission signals, and we classify these signals according to source mechanisms, those associated with crack growth being particularly significant. The work reported here is part of a larger project to develop an agent-based structural health monitoring system. The agents are proxies for communication- and computation-intensive techniques and respond to the situation at hand by determining an appropriate constellation of techniques. The techniques thus structured are executed by a workflow engine, which is part of the contribution reported here. It is critical that the system have a repertoire of classifiers with different characteristics so that a combination appropriate for the situation at hand can generally be found. The classifiers are trained using machine-learning techniques, and we report on investigations we conducted on three supervised and two unsupervised learning techniques to determine which techniques are the best to use in a particular situation
    corecore