6,304 research outputs found

    Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.

    Get PDF
    Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task

    Generative Adversarial Networks (GANs): Challenges, Solutions, and Future Directions

    Full text link
    Generative Adversarial Networks (GANs) is a novel class of deep generative models which has recently gained significant attention. GANs learns complex and high-dimensional distributions implicitly over images, audio, and data. However, there exists major challenges in training of GANs, i.e., mode collapse, non-convergence and instability, due to inappropriate design of network architecture, use of objective function and selection of optimization algorithm. Recently, to address these challenges, several solutions for better design and optimization of GANs have been investigated based on techniques of re-engineered network architectures, new objective functions and alternative optimization algorithms. To the best of our knowledge, there is no existing survey that has particularly focused on broad and systematic developments of these solutions. In this study, we perform a comprehensive survey of the advancements in GANs design and optimization solutions proposed to handle GANs challenges. We first identify key research issues within each design and optimization technique and then propose a new taxonomy to structure solutions by key research issues. In accordance with the taxonomy, we provide a detailed discussion on different GANs variants proposed within each solution and their relationships. Finally, based on the insights gained, we present the promising research directions in this rapidly growing field.Comment: 42 pages, Figure 13, Table

    Computational acquisition of knowledge in small-data environments: a case study in the field of energetics

    Get PDF
    The UK’s defence industry is accelerating its implementation of artificial intelligence, including expert systems and natural language processing (NLP) tools designed to supplement human analysis. This thesis examines the limitations of NLP tools in small-data environments (common in defence) in the defence-related energetic-materials domain. A literature review identifies the domain-specific challenges of developing an expert system (specifically an ontology). The absence of domain resources such as labelled datasets and, most significantly, the preprocessing of text resources are identified as challenges. To address the latter, a novel general-purpose preprocessing pipeline specifically tailored for the energetic-materials domain is developed. The effectiveness of the pipeline is evaluated. Examination of the interface between using NLP tools in data-limited environments to either supplement or replace human analysis completely is conducted in a study examining the subjective concept of importance. A methodology for directly comparing the ability of NLP tools and experts to identify important points in the text is presented. Results show the participants of the study exhibit little agreement, even on which points in the text are important. The NLP, expert (author of the text being examined) and participants only agree on general statements. However, as a group, the participants agreed with the expert. In data-limited environments, the extractive-summarisation tools examined cannot effectively identify the important points in a technical document akin to an expert. A methodology for the classification of journal articles by the technology readiness level (TRL) of the described technologies in a data-limited environment is proposed. Techniques to overcome challenges with using real-world data such as class imbalances are investigated. A methodology to evaluate the reliability of human annotations is presented. Analysis identifies a lack of agreement and consistency in the expert evaluation of document TRL.Open Acces

    Second CLIPS Conference Proceedings, volume 1

    Get PDF
    Topics covered at the 2nd CLIPS Conference held at the Johnson Space Center, September 23-25, 1991 are given. Topics include rule groupings, fault detection using expert systems, decision making using expert systems, knowledge representation, computer aided design and debugging expert systems

    Data Science and Analytics in Industrial Maintenance: Selection, Evaluation, and Application of Data-Driven Methods

    Get PDF
    Data-driven maintenance bears the potential to realize various benefits based on multifaceted data assets generated in increasingly digitized industrial environments. By taking advantage of modern methods and technologies from the field of data science and analytics (DSA), it is possible, for example, to gain a better understanding of complex technical processes and to anticipate impending machine faults and failures at an early stage. However, successful implementation of DSA projects requires multidisciplinary expertise, which can rarely be covered by individual employees or single units within an organization. This expertise covers, for example, a solid understanding of the domain, analytical method and modeling skills, experience in dealing with different source systems and data structures, and the ability to transfer suitable solution approaches into information systems. Against this background, various approaches have emerged in recent years to make the implementation of DSA projects more accessible to broader user groups. These include structured procedure models, systematization and modeling frameworks, domain-specific benchmark studies to illustrate best practices, standardized DSA software solutions, and intelligent assistance systems. The present thesis ties in with previous efforts and provides further contributions for their continuation. More specifically, it aims to create supportive artifacts for the selection, evaluation, and application of data-driven methods in the field of industrial maintenance. For this purpose, the thesis covers four artifacts, which were developed in several publications. These artifacts include (i) a comprehensive systematization framework for the description of central properties of recurring data analysis problems in the field of industrial maintenance, (ii) a text-based assistance system that offers advice regarding the most suitable class of analysis methods based on natural language and domain-specific problem descriptions, (iii) a taxonomic evaluation framework for the systematic assessment of data-driven methods under varying conditions, and (iv) a novel solution approach for the development of prognostic decision models in cases of missing label information. Individual research objectives guide the construction of the artifacts as part of a systematic research design. The findings are presented in a structured manner by summarizing the results of the corresponding publications. Moreover, the connections between the developed artifacts as well as related work are discussed. Subsequently, a critical reflection is offered concerning the generalization and transferability of the achieved results. Thus, the thesis not only provides a contribution based on the proposed artifacts; it also paves the way for future opportunities, for which a detailed research agenda is outlined.:List of Figures List of Tables List of Abbreviations 1 Introduction 1.1 Motivation 1.2 Conceptual Background 1.3 Related Work 1.4 Research Design 1.5 Structure of the Thesis 2 Systematization of the Field 2.1 The Current State of Research 2.2 Systematization Framework 2.3 Exemplary Framework Application 3 Intelligent Assistance System for Automated Method Selection 3.1 Elicitation of Requirements 3.2 Design Principles and Design Features 3.3 Prototypical Instantiation and Evaluation 4 Taxonomic Framework for Method Evaluation 4.1 Survey of Prognostic Solutions 4.2 Taxonomic Evaluation Framework 4.3 Exemplary Framework Application 5 Method Application Under Industrial Conditions 5.1 Conceptualization of a Solution Approach 5.2 Prototypical Implementation and Evaluation 6 Discussion of the Results 6.1 Connections Between Developed Artifacts and Related Work 6.2 Generalization and Transferability of the Results 7 Concluding Remarks Bibliography Appendix I: Implementation Details Appendix II: List of Publications A Publication P1: Focus Area Systematization B Publication P2: Focus Area Method Selection C Publication P3: Focus Area Method Selection D Publication P4: Focus Area Method Evaluation E Publication P5: Focus Area Method ApplicationDatengetriebene Instandhaltung birgt das Potential, aus den in Industrieumgebungen vielfĂ€ltig anfallenden Datensammlungen unterschiedliche Nutzeneffekte zu erzielen. Unter Verwendung von modernen Methoden und Technologien aus dem Bereich Data Science und Analytics (DSA) ist es beispielsweise möglich, das Verhalten komplexer technischer Prozesse besser nachzuvollziehen oder bevorstehende MaschinenausfĂ€lle und Fehler frĂŒhzeitig zu erkennen. Eine erfolgreiche Umsetzung von DSA-Projekten erfordert jedoch multidisziplinĂ€res Expertenwissen, welches sich nur selten von einzelnen Personen bzw. Einheiten innerhalb einer Organisation abdecken lĂ€sst. Dies umfasst beispielsweise ein fundiertes DomĂ€nenverstĂ€ndnis, Kenntnisse ĂŒber zahlreiche Analysemethoden, Erfahrungen im Umgang mit verschiedenen Quellsystemen und Datenstrukturen sowie die FĂ€higkeit, geeignete LösungsansĂ€tze in Informationssysteme zu ĂŒberfĂŒhren. Vor diesem Hintergrund haben sich in den letzten Jahren verschiedene AnsĂ€tze herausgebildet, um die DurchfĂŒhrung von DSA-Projekten fĂŒr breitere Anwendergruppen zugĂ€nglich zu machen. Dazu gehören strukturierte Vorgehensmodelle, Systematisierungs- und Modellierungsframeworks, domĂ€nenspezifische Benchmark-Studien zur Veranschaulichung von Best Practices, Standardlösungen fĂŒr DSA-Software und intelligente Assistenzsysteme. An diese Arbeiten knĂŒpft die vorliegende Dissertation an und liefert weitere Artefakte, um insbesondere die Selektion, Evaluation und Anwendung datengetriebener Methoden im Bereich der industriellen Instandhaltung zu unterstĂŒtzen. Insgesamt erstreckt sich die Abhandlung auf vier Artefakte, die in einzelnen Publikationen erarbeitet wurden. Dies umfasst (i) ein umfangreiches Systematisierungsframework zur Beschreibung zentraler AusprĂ€gungen wiederkehrender Datenanalyseprobleme im Bereich der industriellen Instandhaltung, (ii) ein textbasiertes Assistenzsystem, welches ausgehend von natĂŒrlichsprachlichen und domĂ€nenspezifischen Problembeschreibungen eine geeignete Klasse von Analysemethoden vorschlĂ€gt, (iii) ein taxonomisches Evaluationsframework zur systematischen Bewertung von datengetriebenen Methoden unter verschiedenen Rahmenbedingungen sowie (iv) einen neuartigen Lösungsansatz zur Entwicklung von prognostischen Entscheidungsmodellen im Fall von eingeschrĂ€nkter Informationslage. Die Konstruktion der Artefakte wird durch einzelne Forschungsziele im Rahmen eines systematischen Forschungsdesigns angeleitet. Neben der Darstellung der einzelnen ForschungsbeitrĂ€ge unter Bezugnahme auf die erzielten Ergebnisse der dazugehörigen Publikationen werden auch die Verbindungen zwischen den entwickelten Artefakten beleuchtet und ZusammenhĂ€nge zu angrenzenden Arbeiten hergestellt. Zudem erfolgt eine kritische Reflektion der Ergebnisse hinsichtlich ihrer Verallgemeinerung und Übertragung auf andere Rahmenbedingungen. Dadurch liefert die vorliegende Abhandlung nicht nur einen Beitrag anhand der erzeugten Artefakte, sondern ebnet auch den Weg fĂŒr fortfĂŒhrende Forschungsarbeiten, wofĂŒr eine detaillierte Forschungsagenda erarbeitet wird.:List of Figures List of Tables List of Abbreviations 1 Introduction 1.1 Motivation 1.2 Conceptual Background 1.3 Related Work 1.4 Research Design 1.5 Structure of the Thesis 2 Systematization of the Field 2.1 The Current State of Research 2.2 Systematization Framework 2.3 Exemplary Framework Application 3 Intelligent Assistance System for Automated Method Selection 3.1 Elicitation of Requirements 3.2 Design Principles and Design Features 3.3 Prototypical Instantiation and Evaluation 4 Taxonomic Framework for Method Evaluation 4.1 Survey of Prognostic Solutions 4.2 Taxonomic Evaluation Framework 4.3 Exemplary Framework Application 5 Method Application Under Industrial Conditions 5.1 Conceptualization of a Solution Approach 5.2 Prototypical Implementation and Evaluation 6 Discussion of the Results 6.1 Connections Between Developed Artifacts and Related Work 6.2 Generalization and Transferability of the Results 7 Concluding Remarks Bibliography Appendix I: Implementation Details Appendix II: List of Publications A Publication P1: Focus Area Systematization B Publication P2: Focus Area Method Selection C Publication P3: Focus Area Method Selection D Publication P4: Focus Area Method Evaluation E Publication P5: Focus Area Method Applicatio
    • 

    corecore