10 research outputs found

    The OBO Foundry: Coordinated Evolution of Ontologies to Support Biomedical Data Integration

    Get PDF
    The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or ‘ontologies’. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium has set in train a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing a process of coordinated reform, and new ontologies being created, on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable, logically well-formed, and to incorporate accurate representations of biological reality. We describe the OBO Foundry initiative, and provide guidelines for those who might wish to become involved in the future

    Meta-All: a system for managing metabolic pathway information

    Get PDF
    BACKGROUND: Many attempts are being made to understand biological subjects at a systems level. A major resource for these approaches are biological databases, storing manifold information about DNA, RNA and protein sequences including their functional and structural motifs, molecular markers, mRNA expression levels, metabolite concentrations, protein-protein interactions, phenotypic traits or taxonomic relationships. The use of these databases is often hampered by the fact that they are designed for special application areas and thus lack universality. Databases on metabolic pathways, which provide an increasingly important foundation for many analyses of biochemical processes at a systems level, are no exception from the rule. Data stored in central databases such as KEGG, BRENDA or SABIO-RK is often limited to read-only access. If experimentalists want to store their own data, possibly still under investigation, there are two possibilities. They can either develop their own information system for managing that own data, which is very time-consuming and costly, or they can try to store their data in existing systems, which is often restricted. Hence, an out-of-the-box information system for managing metabolic pathway data is needed. RESULTS: We have designed META-ALL, an information system that allows the management of metabolic pathways, including reaction kinetics, detailed locations, environmental factors and taxonomic information. Data can be stored together with quality tags and in different parallel versions. META-ALL uses Oracle DBMS and Oracle Application Express. We provide the META-ALL information system for download and use. In this paper, we describe the database structure and give information about the tools for submitting and accessing the data. As a first application of META-ALL, we show how the information contained in a detailed kinetic model can be stored and accessed. CONCLUSION: META-ALL is a system for managing information about metabolic pathways. It facilitates the handling of pathway-related data and is designed to help biochemists and molecular biologists in their daily research. It is available on the Web at and can be downloaded free of charge and installed locally

    Menetelmiä mielenkiintoisten solmujen löytämiseen verkostoista

    Get PDF
    With the increasing amount of graph-structured data available, finding interesting objects, i.e., nodes in graphs, becomes more and more important. In this thesis we focus on finding interesting nodes and sets of nodes in graphs or networks. We propose several definitions of node interestingness as well as different methods to find such nodes. Specifically, we propose to consider nodes as interesting based on their relevance and non-redundancy or representativeness w.r.t. the graph topology, as well as based on their characterisation for a class, such as a given node attribute value. Identifying nodes that are relevant, but non-redundant to each other is motivated by the need to get an overview of different pieces of information related to a set of given nodes. Finding representative nodes is of interest, e.g. when the user needs or wants to select a few nodes that abstract the large set of nodes. Discovering nodes characteristic for a class helps to understand the causes behind that class. Next, four methods are proposed to find a representative set of interesting nodes. The first one incrementally picks one interesting node after another. The second iteratively changes the set of nodes to improve its overall interestingness. The third method clusters nodes and picks a medoid node as a representative for each cluster. Finally, the fourth method contrasts diverse sets of nodes in order to select nodes characteristic for their class, even if the classes are not identical across the selected nodes. The first three methods are relatively simple and are based on the graph topology and a similarity or distance function for nodes. For the second and third, the user needs to specify one parameter, either an initial set of k nodes or k, the size of the set. The fourth method assumes attributes and class attributes for each node, a class-related interesting measure, and possible sets of nodes which the user wants to contrast, such as sets of nodes that represent different time points. All four methods are flexible and generic. They can, in principle, be applied on any weighted graph or network regardless of what nodes, edges, weights, or attributes represent. Application areas for the methods developed in this thesis include word co-occurrence networks, biological networks, social networks, data traffic networks, and the World Wide Web. As an illustrating example, consider a word co-occurrence network. There, finding terms (nodes in the graph) that are relevant to some given nodes, e.g. branch and root, may help to identify different, shared contexts such as botanics, mathematics, and linguistics. A real life application lies in biology where finding nodes (biological entities, e.g. biological processes or pathways) that are relevant to other, given nodes (e.g. some genes or proteins) may help in identifying biological mechanisms that are possibly shared by both the genes and proteins.Väitöskirja käsittelee verkostojen louhinnan menetelmiä. Sen tavoitteena on löytää mielenkiintoisia tietoja painotetuista verkoista. Painotettuna verkkona voi tarkastella esim. tekstiainestoja, biologisia ainestoja, ihmisten välisiä yhteyksiä tai internettiä. Tällaisissa verkoissa solmut edustavat käsitteitä (esim. sanoja, geenejä, ihmisiä tai internetsivuja) ja kaaret niiden välisiä suhteita (esim. kaksi sanaa esiintyy samassa lauseessa, geeni koodaa proteiinia, ihmisten ystävyyksiä tai internetsivu viittaa toiseen internetsivuun). Kaarten painot voivat vastata esimerkiksi yhteyden voimakuutta tai luotettavuutta. Väitöskirjassa esitetään erilaisia verkon rakenteeseen tai solmujen attribuutteihin perustuvia määritelmiä solmujen mielenkiintoisuudelle sekä useita menetelmiä mielenkiintoisten solmujen löytämiseksi. Mielenkiintoisuuden voi määritellä esim. merkityksellisyytenä suhteessa joihinkin annettuihin solmuihin ja toisaalta mielenkiintoisten solmujen keskinäisenä erilaisuutena. Esimerkiksi ns. ahneella menetelmällä voidaan löytää keskenään erilaisia solmuja yksi kerrallaan. Väitöskirjan tuloksia voidaan soveltaa esimerkiksi tekstiaineistoa käsittelemällä saatuun sanojen väliseen verkostoon, jossa kahden sanan välillä on sitä voimakkaampi yhteys mitä useammin ne tapaavat esiintyä keskenään samoissa lauseissa. Sanojen erilaisia käyttöyhteyksiä ja jopa merkityksiä voidaan nyt löytää automaattisesti. Jos kohdesanaksi otetaan vaikkapa "juuri", niin siihen liittyviä mutta keskenään toisiinsa liittymättömiä sanoja ovat "puu" (biologinen merkitys: kasvin juuri), "yhtälö" (matemaattinen merkitys: yhtälön ratkaisu eli juuri) sekä "indoeurooppalainen" (kielitieteellinen merkitys: sanan vartalo eli juuri). Tällaisia menetelmiä voidaan soveltaa esimerkiksi hakukoneessa: sanalla "juuri" tehtyihin hakutuloksiin sisällytetään tuloksia mahdollisimman erilaisista käyttöyhteyksistä, jotta käyttäjän tarkoittama merkitys tulisi todennäköisemmin katetuksi hakutuloksissa. Merkittävä sovelluskohde väitöskirjan menetelmille ovat biologiset verkot, joissa solmut edustavat biologisia käsitteitä (esim. geenejä, proteiineja tai sairauksia) ja kaaret niiden välisiä suhteita (esim. geeni koodaa proteiinia tai proteiini on aktiivinen tietyssä sairauksessa). Menetelmillä voidaan etsiä esimerkiksi sairauksiin vaikuttavia biologisia mekanismeja paikantamalla edustava joukko sairauteen ja siihen mahdollisesti liittyviin geeneihin verkostossa kytkeytyviä muita solmuja. Nämä voivat auttaa biologeja ymmärtämään geenien ja sairauden mahdollisia kytköksiä ja siten kohdentamaan jatkotutkimustaan lupaavimpiin geeneihin, proteiineihin tms. Väitöskirjassa esitetyt solmujen mielenkiintoisuuden määritelmät sekä niiden löytämiseen ehdotetut menetelmät ovat yleispäteviä ja niitä voi soveltaa periaatteessa mihin tahansa verkkoon riippumatta siitä, mitä solmut, kaaret tai painot edustavat. Kokeet erilaisilla verkoilla osoittavat että ne löytävät mielenkiintoisia solmuja

    Improving reproducibility and reuse of modelling results in the life sciences

    Get PDF
    Research results are complex and include a variety of heterogeneous data. This entails major computational challenges to (i) to manage simulation studies, (ii) to ensure model exchangeability, stability and validity, and (iii) to foster communication between partners. I describe techniques to improve the reproducibility and reuse of modelling results. First, I introduce a method to characterise differences in computational models. Second, I present approaches to obtain shareable and reproducible research results. Altogether, my methods and tools foster exchange and reuse of modelling results.Die verteilte Entwicklung von komplexen Simulationsstudien birgt eine große Zahl an informationstechnischen Herausforderungen: (i) Modelle müssen verwaltet werden; (ii) Reproduzierbarkeit, Stabilität und Gültigkeit von Ergebnissen muss sichergestellt werden; und (iii) die Kommunikation zwischen Partnern muss verbessert werden. Ich stelle Techniken vor, um die Reproduzierbarkeit und Wiederverwendbarkeit von Modellierungsergebnissen zu verbessern. Meine Implementierungen wurden erfolgreich in internationalen Anwendungen integriert und fördern das Teilen von wissenschaftlichen Ergebnissen

    Work flows in life science

    Get PDF
    The introduction of computer science technology in the life science domain has resulted in a new life science discipline called bioinformatics. Bioinformaticians are biologists who know how to apply computer science technology to perform computer based experiments, also known as in-silico or dry lab experiments. Various tools, such as databases, web applications and scripting languages, are used to design and run in-silico experiments. As the size and complexity of these experiments grow, new types of tools are required to design and execute the experiments and to analyse the results. Workflow systems promise to fulfill this role. The bioinformatician composes an experiment by using tools and web services as building blocks, and connecting them, often through a graphical user interface. Workflow systems, such as Taverna, provide access to up to a few thousand resources in a uniform way. Although workflow systems are intended to make the bioinformaticians' work easier, bioinformaticians experience difficulties in using them. This thesis is devoted to find out which problems bioinformaticians experience using workflow systems and to provide solutions for these problems.\u

    Genome visualisation and user studies in biologist-computer interaction

    Get PDF
    We surveyed a number of genome visualisation tools used in biomedical research. We recognised that none of the tools shows all the relevant data geneticists who look for candidate disease genes would like to see. The biological researchers we collaborate with would like to view integrated data from a variety of sources and be able to see both data overviews and details. In response to this need, we developed a new visualisation tool, VisGenome, which allows the users to add their own data or data downloaded from other sources, such as Ensembl. VisGenome visualises single and comparative representations of the rat, the mouse, and the human chromosomes, and can easily be used for other genomes. In the context of VisGenome development we made the following research contributions. We developed a new algorithm (CartoonPlus) which allows the users to see different kinds of data in cartoon scaling depending on a selected basis. Also, two user studies were conducted: an initial quantitative user study and a mixed paradigm user study. The first study showed that neither Ensembl nor VisGenome fulfil all user requirements and can be regarded as user-friendly, as the users make a significant number of mistakes during data navigation. To help users navigate their data easily, we improved existing visualisation techniques in VisGenome and added a new technique CartoonPlus. To verify if this solution was useful, we conducted a second user study. We saw that the users became more familiar with the tool, and found new ways to use the application on its own and in connection with other tools. They frequently used CartoonPlus, which allowed them to see small regions of their data in a way that was not possible before

    Eight Biennial Report : April 2005 – March 2007

    No full text

    Proceedings. 19. Workshop Computational Intelligence, Dortmund, 2. - 4. Dezember 2009

    Get PDF
    Dieser Tagungsband enthält die Beiträge des 19. Workshops „Computational Intelligence“ des Fachausschusses 5.14 der VDI/VDE-Gesellschaft für Mess- und Automatisierungstechnik (GMA) und der Fachgruppe „Fuzzy-Systeme und Soft-Computing“ der Gesellschaft für Informatik (GI), der vom 2.-4. Dezember 2009 im Haus Bommerholz bei Dortmund stattfindet

    A framework for analyzing changes in health care lexicons and nomenclatures

    Get PDF
    Ontologies play a crucial role in current web-based biomedical applications for capturing contextual knowledge in the domain of life sciences. Many of the so-called bio-ontologies and controlled vocabularies are known to be seriously defective from both terminological and ontological perspectives, and do not sufficiently comply with the standards to be considered formai ontologies. Therefore, they are continuously evolving in order to fix the problems and provide valid knowledge. Moreover, many problems in ontology evolution often originate from incomplete knowledge about the given domain. As our knowledge improves, the related definitions in the ontologies will be altered. This problem is inadequately addressed by available tools and algorithms, mostly due to the lack of suitable knowledge representation formalisms to deal with temporal abstract notations, and the overreliance on human factors. Also most of the current approaches have been focused on changes within the internal structure of ontologies, and interactions with other existing ontologies have been widely neglected. In this research, alter revealing and classifying some of the common alterations in a number of popular biomedical ontologies, we present a novel agent-based framework, RLR (Represent, Legitimate, and Reproduce), to semi-automatically manage the evolution of bio-ontologies, with emphasis on the FungalWeb Ontology, with minimal human intervention. RLR assists and guides ontology engineers through the change management process in general, and aids in tracking and representing the changes, particularly through the use of category theory. Category theory has been used as a mathematical vehicle for modeling changes in ontologies and representing agents' interactions, independent of any specific choice of ontology language or particular implementation. We have also employed rule-based hierarchical graph transformation techniques to propose a more specific semantics for analyzing ontological changes and transformations between different versions of an ontology, as well as tracking the effects of a change in different levels of abstractions. Thus, the RLR framework enables one to manage changes in ontologies, not as standalone artifacts in isolation, but in contact with other ontologies in an openly distributed semantic web environment. The emphasis upon the generality and abstractness makes RLR more feasible in the multi-disciplinary domain of biomedical Ontology change management

    Simulation and graph mining tools for improving gene mapping efficiency

    Get PDF
    Gene mapping is a systematic search for genes that affect observable characteristics of an organism. In this thesis we offer computational tools to improve the efficiency of (disease) gene-mapping efforts. In the first part of the thesis we propose an efficient simulation procedure for generating realistic genetical data from isolated populations. Simulated data is useful for evaluating hypothesised gene-mapping study designs and computational analysis tools. As an example of such evaluation, we demonstrate how a population-based study design can be a powerful alternative to traditional family-based designs in association-based gene-mapping projects. In the second part of the thesis we consider a prioritisation of a (typically large) set of putative disease-associated genes acquired from an initial gene-mapping analysis. Prioritisation is necessary to be able to focus on the most promising candidates. We show how to harness the current biomedical knowledge for the prioritisation task by integrating various publicly available biological databases into a weighted biological graph. We then demonstrate how to find and evaluate connections between entities, such as genes and diseases, from this unified schema by graph mining techniques. Finally, in the last part of the thesis, we define the concept of reliable subgraph and the corresponding subgraph extraction problem. Reliable subgraphs concisely describe strong and independent connections between two given vertices in a random graph, and hence they are especially useful for visualising such connections. We propose novel algorithms for extracting reliable subgraphs from large random graphs. The efficiency and scalability of the proposed graph mining methods are backed by extensive experiments on real data. While our application focus is in genetics, the concepts and algorithms can be applied to other domains as well. We demonstrate this generality by considering coauthor graphs in addition to biological graphs in the experiments.Geenikartoitus on organismin havaittaviin piirteisiin vaikuttavien geenien järjestelmällistä etsintää perimästä. Väitöskirjassa esitetään uusia menetelmiä, joilla voidaan tehostaa sairauksille altistavien geenien kartoitusta. Väitöskirjan alussa tarkastellaan perimän simulointia (tyypillisesti maantieteellisesti) eristäytyneissä populaatioissa ja esitetään tarkoitukseen soveltuva uusi simulaattoriohjelmisto. Simuloidut aineistot ovat hyödyllisiä tutkimussuunnittelussa, jolloin niillä voidaan arvioida suunniteltujen aineistojen tilastollisia ominaisuuksia sekä käytettävien analysointimenetelmien toimintaa. Esimerkkinä tällaisesta tutkimuksesta työssä käydään läpi esitetyllä ohjelmistolla tehty laajahko simulaatiotutkimus. Tulosten perusteella väestöpohjainen tapaus-verrokkitutkimusasetelma vaikuttaa olevan tilastollisesti voimakas vaihtoehto kalliimmille perhe- ja sukupuupohjaisille asetelmille. Toinen osa väitöskirjaa käsittelee mahdollisesti sairauksille altistavien ns. ehdokasgeenien pisteytystä sen mukaan, kuinka vahvat yhteydet niillä on tutkittavaan sairauteen. Pisteytys on tärkeää, koska alustavat aineiston tarkastelut tuottavat tyypillisesti runsaasti ehdokasgeenejä, joiden kaikkien läpikäynti olisi liian työlästä. Pisteytyksellä jatkotutkimukset voidaan kohdistaa lupaavimpiin ehdokkaisiin. Työssä esitetään kuinka tällä hetkellä erillissä tietokannoissa oleva biologinen tieto voidaan esittää yhteinäisessä verkkomuodossa. Lisäksi näytetään kuinka tällaisesta aineistosta voidaan etsiä ehdokasgeenien ja tutkittavan sairauden välisiä yhteyksiä ja pisteyttää niitä verkonlouhinta-algoritmien avulla. Lopuksi työssä esitetään luotettavan aliverkon eristämisongelma ja algoritmeja sen ratkaisemiseen. Ongelmassa tavoitteena on poimia suuresta verkosta suhteellisen pieni aliverkko, joka sisältää vahvoja ja toisistaan riippumattomia yhteyksiä kahden annetun verkon solmun välillä. Siten luotettavat aliverkot soveltuvat erityisen hyvin löydettyjen yhteyksien kuvalliseen esittämiseen. Luotettavia aliverkkoja voidaan soveltaa perinnöllisyystieteen lisäksi myös muilla aloilla, kuten sosiaalisten verkkojen analyysissä
    corecore