39 research outputs found

    Optimizing Ontology Alignments through NSGA-II without Using Reference Alignment

    Get PDF
    Ontology is widely used to solve the data heterogeneity problems on the semantic web, but the available ontologies could themselves introduce heterogeneity. In order to reconcile these ontologies to implement the semantic interoperability, we need to find the relationships among the entities in various ontologies, and the process of identifying them is called ontology alignment. In all the existing matching systems that use evolutionary approaches to optimize their parameters, a reference alignment between two ontologies to be aligned should be given in advance which could be very expensive to obtain especially when the scale of ontologies is considerably large. To address this issue, in this paper we propose a novel approach to utilize the NSGA-II to optimize the ontology alignments without using the reference alignment. In our approach, an adaptive aggregation strategy is presented to improve the efficiency of optimizing process and two approximate evaluation measures, namely match coverage and match ratio, are introduced to replace the classic recall and precision on reference alignment to evaluate the quality of the alignments. Experimental results show that our approach is effective and can find the solutions that are very close to those obtained by the approaches using reference alignment, and the quality of alignments is in general better than that of state of the art ontology matching systems such as GOAL and SAMBO

    Semantic Biclustering

    Get PDF
    Tato disertační práce se zaměřuje na problém hledání interpretovatelných a prediktivních vzorů, které jsou vyjádřeny formou dvojshluků, se specializací na biologická data. Prezentované metody jsou souhrnně označovány jako sémantické dvojshlukování, jedná se o podobor dolování dat. Termín sémantické dvojshlukování je použit z toho důvodu, že zohledňuje proces hledání koherentních podmnožin řádků a sloupců, tedy dvojshluků, v 2-dimensionální binární matici a zárove ň bere také v potaz sémantický význam prvků v těchto dvojshlucích. Ačkoliv byla práce motivována biologicky orientovanými daty, vyvinuté algoritmy jsou obecně aplikovatelné v jakémkoli jiném výzkumném oboru. Je nutné pouze dodržet požadavek na formát vstupních dat. Disertační práce představuje dva originální a v tomto ohledu i základní přístupy pro hledání sémantických dvojshluků, jako je Bicluster enrichment analysis a Rule a tree learning. Jelikož tyto metody nevyužívají vlastní hierarchické uspořádání termů v daných ontologiích, obecně je běh těchto algoritmů dlouhý čin může docházet k indukci hypotéz s redundantními termy. Z toho důvodu byl vytvořen nový operátor zjemnění. Tento operátor byl včleněn do dobře známého algoritmu CN2, kde zavádí dvě redukční procedury: Redundant Generalization a Redundant Non-potential. Obě procedury pomáhají dramaticky prořezat prohledávaný prostor pravidel a tím umožňují urychlit proces indukce pravidel v porovnání s tradičním operátorem zjemnění tak, jak je původně prezentován v CN2. Celý algoritmus spolu s redukčními metodami je publikován ve formě R balííčku, který jsme nazvali sem1R. Abychom ukázali i možnost praktického užití metody sémantického dvojshlukování na reálných biologických problémech, v disertační práci dále popisujeme a specificky upravujeme algoritmus sem1R pro dv+ úlohy. Zaprvé, studujeme praktickou aplikaci algoritmu sem1R v analýze E-3 ubikvitin ligázy v trávicí soustavě s ohledem na potenciál regenerace tkáně. Zadruhé, kromě objevování dvojshluků v dat ech genové exprese, adaptujeme algoritmus sem1R pro hledání potenciálne patogenních genetických variant v kohortě pacientů.This thesis focuses on the problem of finding interpretable and predic tive patterns, which are expressed in the form of biclusters, with an orientation to biological data. The presented methods are collectively called semantic biclustering, as a subfield of data mining. The term semantic biclustering is used here because it reflects both a process of finding coherent subsets of rows and columns in a 2-dimensional binary matrix and simultaneously takes into account a mutual semantic meaning of elements in such biclusters. In spite of focusing on applications of algorithms in biological data, the developed algorithms are generally applicable to any other research field, there are only limitations on the format of the input data. The thesis introduces two novel, and in that context basic, approaches for finding semantic biclusters, as Bicluster enrichment analysis and Rule and tree learning. Since these methods do not exploit the native hierarchical order of terms of input ontologies, the run-time of algorithms is relatively long in general or an induced hypothesis might have terms that are redundant. For this reason, a new refinement operator has been invented. The refinement operator was incorporated into the well-known CN2 algorithm and uses two reduction procedures: Redundant Generalization and Redundant Non-potential, both of which help to dramatically prune the rule space and consequently, speed-up the entire process of rule induction in comparison with the traditional refinement operator as is presented in CN2. The reduction procedures were published as an R package that we called sem1R. To show a possible practical usage of semantic biclustering in real biological problems, the thesis also describes and specifically adapts the algorithm for two real biological problems. Firstly, we studied a practical application of sem1R algorithm in an analysis of E-3 ubiquitin ligase in the gastrointestinal tract with respect to tissue regeneration potential. Secondly, besides discovering biclusters in gene expression data, we adapted the sem1R algorithm for a different task, concretely for finding potentially pathogenic genetic variants in a cohort of patients

    Computationally Comparing Biological Networks and Reconstructing Their Evolution

    Get PDF
    Biological networks, such as protein-protein interaction, regulatory, or metabolic networks, provide information about biological function, beyond what can be gleaned from sequence alone. Unfortunately, most computational problems associated with these networks are NP-hard. In this dissertation, we develop algorithms to tackle numerous fundamental problems in the study of biological networks. First, we present a system for classifying the binding affinity of peptides to a diverse array of immunoglobulin antibodies. Computational approaches to this problem are integral to virtual screening and modern drug discovery. Our system is based on an ensemble of support vector machines and exhibits state-of-the-art performance. It placed 1st in the 2010 DREAM5 competition. Second, we investigate the problem of biological network alignment. Aligning the biological networks of different species allows for the discovery of shared structures and conserved pathways. We introduce an original procedure for network alignment based on a novel topological node signature. The pairwise global alignments of biological networks produced by our procedure, when evaluated under multiple metrics, are both more accurate and more robust to noise than those of previous work. Next, we explore the problem of ancestral network reconstruction. Knowing the state of ancestral networks allows us to examine how biological pathways have evolved, and how pathways in extant species have diverged from that of their common ancestor. We describe a novel framework for representing the evolutionary histories of biological networks and present efficient algorithms for reconstructing either a single parsimonious evolutionary history, or an ensemble of near-optimal histories. Under multiple models of network evolution, our approaches are effective at inferring the ancestral network interactions. Additionally, the ensemble approach is robust to noisy input, and can be used to impute missing interactions in experimental data. Finally, we introduce a framework, GrowCode, for learning network growth models. While previous work focuses on developing growth models manually, or on procedures for learning parameters for existing models, GrowCode learns fundamentally new growth models that match target networks in a flexible and user-defined way. We show that models learned by GrowCode produce networks whose target properties match those of real-world networks more closely than existing models

    Exploiting general-purpose background knowledge for automated schema matching

    Full text link
    The schema matching task is an integral part of the data integration process. It is usually the first step in integrating data. Schema matching is typically very complex and time-consuming. It is, therefore, to the largest part, carried out by humans. One reason for the low amount of automation is the fact that schemas are often defined with deep background knowledge that is not itself present within the schemas. Overcoming the problem of missing background knowledge is a core challenge in automating the data integration process. In this dissertation, the task of matching semantic models, so-called ontologies, with the help of external background knowledge is investigated in-depth in Part I. Throughout this thesis, the focus lies on large, general-purpose resources since domain-specific resources are rarely available for most domains. Besides new knowledge resources, this thesis also explores new strategies to exploit such resources. A technical base for the development and comparison of matching systems is presented in Part II. The framework introduced here allows for simple and modularized matcher development (with background knowledge sources) and for extensive evaluations of matching systems. One of the largest structured sources for general-purpose background knowledge are knowledge graphs which have grown significantly in size in recent years. However, exploiting such graphs is not trivial. In Part III, knowledge graph em- beddings are explored, analyzed, and compared. Multiple improvements to existing approaches are presented. In Part IV, numerous concrete matching systems which exploit general-purpose background knowledge are presented. Furthermore, exploitation strategies and resources are analyzed and compared. This dissertation closes with a perspective on real-world applications

    Proceedings. 19. Workshop Computational Intelligence, Dortmund, 2. - 4. Dezember 2009

    Get PDF
    Dieser Tagungsband enthält die Beiträge des 19. Workshops „Computational Intelligence“ des Fachausschusses 5.14 der VDI/VDE-Gesellschaft für Mess- und Automatisierungstechnik (GMA) und der Fachgruppe „Fuzzy-Systeme und Soft-Computing“ der Gesellschaft für Informatik (GI), der vom 2.-4. Dezember 2009 im Haus Bommerholz bei Dortmund stattfindet

    OntoSPARES: da linguagem natural às ontologias. Contributos para a classificação automática de dados históricos (séc. XVI-XVIII)

    Get PDF
    O processamento de linguagem natural e as ontologias são ferramentas cuja interação permite uma melhor compreensão dos dados armazenados. Este trabalho, ao associar estas duas áreas aos elementos disponíveis numa base de dados prosopográfica, tornou possível identificar e classificar relacionamentos entre setores de ocupação na forma como eram designados na época, setores de atividade num formato mais próximo do de hoje e o estatuto social que essas incumbências tinham na sociedade coeva. Os dados utilizados são sobretudo de membros do Santo Ofício – do século XVI ao século XVIII. Para atingir este objetivo utilizaram-se algumas descrições textuais de ocorrências da época e outras pouco estruturadas, disponíveis no repositório SPARES. A aplicação de processamento de linguagem natural (remoção de stopwords e aplicação de stemming), conjugada com a construção de duas ontologias, tornou possível classificar esses dados, permitindo consultas mais eficazes. Ao contribuir para a classificação automática de dados históricos, propõem-se metodologias que podem ser aplicadas em dados de qualquer outra área do conhecimento, especialmente as que lidam com as variáveis de tempo e espaço de forma mais intensa; Abstract: OntoSPARES: from natural language to ontologies Contributions to the automatic classification of historical data (16th-18th centuries) The interaction between the natural language processing and ontologies are tools allowing a better understanding of the data stored. This work, by combining these two areas to the elements available in a prosopographic database, has made possible to identify and classify relationships between occupations of many individuals (in general Holy Office members of the 16th-18th centuries). To achieve this goal the data used was gathered in SPARES repository, including some textual descriptions of the time occurrences. They are all few structured. The application of natural language processing (stopwords removal and stemming application), combined with the construction of two ontologies, made possible to classify those data, allowing a more effective search. By contributing to the automatic classification of historical data, this thesis proposes methodologies that can be applied to data from any other field of knowledge, specially data dealing with time and space variables

    Proceedings / 17. Workshop Computational Intelligence [Elektronische Ressource] : Dortmund, 5. - 7. Dezember 2007

    Get PDF
    Dieser Tagungsband enthält die Beiträge des 17. Workshops „Computational Intelligence“ des Fachausschusses 5.14 der VDI/VDE-Gesellschaft für Mess- und Automatisierungstechnik (GMA) und der Fachgruppe „Fuzzy-Systeme und Soft-Computing“ der Gesellschaft für Informatik (GI), der vom 5. – 7. Dezember 2007 im Haus Bommerholz bei Dortmund stattfindet. Der GMA-Fachausschuss 5.14 „Computational Intelligence“ entstand 2005 aus den bisherigen Fachausschüssen „Neuronale Netze und Evolutionäre Algorithmen“ (FA 5.21) sowie „Fuzzy Control“ (FA 5.22). Der Workshop steht in der Tradition der bisherigen Fuzzy-Workshops, hat aber seinen Fokus in den letzten Jahren schrittweise erweitert. Die Schwerpunkte sind Methoden, Anwendungen und Tools für • Fuzzy-Systeme, • Künstliche Neuronale Netze, • Evolutionäre Algorithmen und • Data-Mining-Verfahren sowie der Methodenvergleich anhand von industriellen und Benchmark-Problemen. INHALTSVERZEICHNIS T. Fober, E. Hüllermeier, M. Mernberger (Philipps-Universität Marburg): Evolutionary Construction of Multiple Graph Alignments for the Structural Analysis of Biomolecules G. Heidemann, S. Klenk (Universität Stuttgart): Visual Analytics for Image Retrieval F. Rügheimer (OvG-Universität Magdeburg): A Condensed Representation for Distributions over Set-Valued Attributes T. Mrziglod (Bayer Technology Services GmbH, Leverkusen): Mit datenbasierten Technologien und Versuchsplanung zu erfolgreichen Produkten H. Schulte (Bosch Rexroth AG, Elchingen): Approximationsgenauigkeit und dynamisches Fehlerwachstum der Modellierung mit Takagi-Sugeno Fuzzy Systemen C. Burghart, R. Mikut, T. Asfour, A. Schmid, F. Kraft, O. Schrempf, H. Holzapfel, R. Stiefelhagen, A. Swerdlow, G. Bretthauer, R. Dillmann (Universität Karlsruhe, Forschungszentrum Karlsruhe GmbH): Kognitive Architekturen für humanoide Roboter: Anforderungen, Überblick und Vergleich R. Mikut, C. Burghart, A. Swerdlow (Forschungszentrum Karlsruhe GmbH, Universität Karlsruhe): Ein Gedankenexperiment zum Entwurf einer integrierten kognitiven Architektur für humanoide Roboter G. Milighetti, H.-B. Kuntze (FhG IITB Karlsruhe): Diskret-kontinuierliche Regelung und Überwachung von Robotern basierend auf Aktionsprimitiven und Petri-Netzen N. Rosemann, W. Brockmann (Universität Osnabrück): Kontrolle dynamischer Eigenschaften des Online-Lernens in Neuro-Fuzzy-Systemen mit dem SILKE-Ansatz A. Hans, D. Schneegaß, A. Schäfer, S. Udluft (Siemens AG, TU Ilmenau): Sichere Exploration für Reinforcement-Learning-basierte Regelung Th. Bartz-Beielstein, M. Bongards, C. Claes, W. Konen, H. Westenberger (FH Köln): Datenanalyse und Prozessoptimierung für Kanalnetze und Kläranlagen mit CI-Methoden S. Nusser, C. Otte, W. Hauptmann (Siemens AG, OvG-Universität Magdeburg): Learning Binary Classifiers for Applications in Safety-Related Domains W. Jakob, A. Quinte, K.-U. Stucky, W. Süß, C. Blume (Forschungszentrum Karlsruhe GmbH; FH Köln, Campus Gummersbach) Schnelles Resource Constrained Project Scheduling mit dem Evolutionären Algorithmus GLEAM M. Preuß, B. Naujoks (Universität Dortmund): Evolutionäre mehrkriterielle Optimierung bei Anwendungen mit nichtzusammenhängenden Pareto-Mengen G. Rudolph, M. Preuß (Universität Dortmund): in mehrkriterielles Evolutionsverfahren zur Bestimmung des Phasengleichgewichts von gemischten Flüssigkeiten Y. Chen, O. Burmeister, C. Bauer, R. Rupp, R. Mikut (Universität Karlsruhe, Forschungszentrum Karlsruhe GmbH, Orthopädische Universitätsklinik Heidelberg): First Steps to Future Applications of Spinal Neural Circuit Models in Neuroprostheses and Humanoid Robots F. Hoffmann, J. Braun, T. Bertram, S. Hölemann (Universität Dortmund, RWTH Aachen): Multikriterielle Optimierung mit modellgestützten Evolutionsstrategien S. Piana, S. Engell (Universität Dortmund): Evolutionäre Optimierung des Betriebs von rohrlosen Chemieanlagen T. Runkler (Siemens AG, CT IC 4): Pareto Optimization of the Fuzzy c–Means Clustering Model Using a Multi–Objective Genetic Algorithm H. J. Rommelfanger (J.W. Goethe-Universität Frankfurt am Main): Die Optimierung von Fuzzy-Zielfunktionen in Fuzzy (Mehrziel-) LPSystemen - Ein kritischer Überblick D. Gamrad, D. Söffker (Universität Duisburg-Essen): Formalisierung menschlicher Interaktionen durch Situations-Operator- Modellbildung S. Ritter, P. Bretschneider (FhG AST Ilmenau): Optimale Planung und Betriebsführung der Energieversorgung im liberalisierten Energiemarkt R. Seising (Medizinische Universität Wien): Heinrich Hertz, Ludwig Wittgenstein und die Fuzzy-Strukturen - Eine kleine „Bildergeschichte“ zur Erkenntnisphilosophie J. Limberg, R. Seising (Medizinische Universität Wien): Sequenzvergleiche im Fuzzy-Hypercube M. Steinbrecher, R. Kruse (OvG-Universität Magdeburg): Visualisierung temporaler Abhängigkeiten in Bayesschen Netzen M. Schneider, R. Tillmann, U. Lehmann, J. Krone, P. Langbein, U. Stark, J. Schrickel, Ch. Ament, P. Otto (FH Südwestfalen, Airbus Deutschland GmbH, Hamburg, TU Ilmenau): Künstliches Neuronales Netz zur Analyse der Geometrie von großflächig gekrümmten Bauteilen C. Frey (FhG IITB Karlsruhe): Prozessdiagnose und Monitoring feldbusbasierter Automatisierungsanlagen mittels selbstorganisierender Karte

    Computational Design and Experimental Validation of Functional Ribonucleic Acid Nanostructures

    Get PDF
    In living cells, two major classes of ribonucleic acid (RNA) molecules can be found. The first class called the messenger RNA (mRNA) contains the genetic information that allows the ribosome to read and translate it into proteins. The second class called non-coding RNA (ncRNA), do not code for proteins and are involved with key cellular processes, such as gene expression regulation, splicing, differentiation, and development. NcRNAs fold into an ensemble of thermodynamically stable secondary structures, which will eventually lead the molecule to fold into a specific 3D structure. It is widely known that ncRNAs carry their functions via their 3D structures as well as their molecular composition. The secondary structure of ncRNAs is composed of different types of structural elements (motifs) such as stacking base pairs, internal loops, hairpin loops and pseudoknots. Pseudoknots are specifically difficult to model, are abundant in nature and known to stabilize the functional form of the molecule. Due to the diverse range of functions of ncRNAs, their computational design and analysis have numerous applications in nano-technology, therapeutics, synthetic biology, and materials engineering. The RNA design problem is to find novel RNA sequences that are predicted to fold into target structure(s) while satisfying specific qualitative characteristics and constraints. RNA design can be modeled as a combinatorial optimization problem (COP) and is known to be computationally challenging or more precisely NP-hard. Numerous algorithms to solve the RNA design problem have been developed over the past two decades, however mostly ignore pseudoknots and therefore limit application to only a slice of real-world modeling and design problems. Moreover, the few existing pseudoknot designer methods which were developed only recently, do not provide any evidence about the applicability of their proposed design methodology in biological contexts. The two objectives of this thesis are set to address these two shortcomings. First, we are interested in developing an efficient computational method for the design of RNA secondary structures including pseudoknots that show significantly improved in-silico quality characteristics than the state of the art. Second, we are interested in showing the real-world worthiness of the proposed method by validating it experimentally. More precisely, our aim is to design instances of certain types of RNA enzymes (i.e. ribozymes) and demonstrate that they are functionally active. This would likely only happen if their predicted folding matched their actual folding in the in-vitro experiments. In this thesis, we present four contributions. First, we propose a novel adaptive defect weighted sampling algorithm to efficiently solve the RNA secondary structure design problem where pseudoknots are included. We compare the performance of our design algorithm with the state of the art and show that our method generates molecules that are thermodynamically more stable and less defective than those generated by state of the art methods. Moreover, we show when the effect of fitness evaluation is decoupled from the search and optimization process, our optimization method converges faster than the non-dominated sorting genetic algorithm (NSGA II) and the ant colony optimization (ACO) algorithm do. Second, we use our algorithmic development to implement an RNA design pipeline called Enzymer and make it available as an open source package useful for wet lab practitioners and RNA bioinformaticians. Enzymer uses multiple sequence alignment (MSA) data to generate initial design templates for further optimization. Our design pipeline can then be used to re-engineer naturally occurring RNA enzymes such as ribozymes and riboswitches. Our first and second contributions are published in the RNA section of the Journal of Frontiers in Genetics. Third, we use Enzymer to reengineer three different species of pseudoknotted ribozymes: a hammerhead ribozyme from the mouse gut metagenome, a hammerhead ribozyme from Yarrowia lipolytica and a glmS ribozyme from Thermoanaerobacter tengcogensis. We designed a total of 18 ribozyme sequences and showed the 16 of them were active in-vitro. Our experimental results have been submitted to the RNA journal and strongly suggest that Enzymer is a reliable tool to design pseudoknotted ncRNAs with desired secondary structure. Finally, we propose a novel architecture for a new ribozyme-based gene regulatory network where a hammerhead ribozyme modulates expression of a reporter gene when an external stimulus IPTG is present. Our in-vivo results show expected results in 7 out of 12 cases

    Evolutionary Computation

    Get PDF
    This book presents several recent advances on Evolutionary Computation, specially evolution-based optimization methods and hybrid algorithms for several applications, from optimization and learning to pattern recognition and bioinformatics. This book also presents new algorithms based on several analogies and metafores, where one of them is based on philosophy, specifically on the philosophy of praxis and dialectics. In this book it is also presented interesting applications on bioinformatics, specially the use of particle swarms to discover gene expression patterns in DNA microarrays. Therefore, this book features representative work on the field of evolutionary computation and applied sciences. The intended audience is graduate, undergraduate, researchers, and anyone who wishes to become familiar with the latest research work on this field

    LIPIcs, Volume 277, GIScience 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 277, GIScience 2023, Complete Volum
    corecore