312 research outputs found

    Exposing WikiPathways as Linked Open Data

    Get PDF
    Biology has become a data intensive science. Discovery of new biological facts increasingly relies on the ability to find and match appropriate biological data. For instance for functional annotation of genes of interest or for identification of pathways affected by over-expressed genes. Functional and pathway information about genes and proteins is typically distributed over a variety of databases and the literature.

Pathways are a convenient, easy to interpret way to describe known biological interactions. WikiPathways provides community curated pathways. WikiPathways users integrate their knowledge with facts from the literature and biological databases. The curated pathway is then reviewed and possibly corrected or enriched. Different tools (e.g. Pathvisio and Cytoscape) support the integration of WikiPathways-knowledge for additional tasks, such as the integration with personal data sets. 

Data from WikiPathways is increasingly also used for advanced analysis where it is integrated or compared with other data, Currently, integration with data from different biological sources is mostly done manually. This can be a very time consuming task because the curator often first needs to find the available resources, needs to learn about their specific content and qualities and often spends a lot of time to technically combine the two. 

Semantic web and Linked Data technologies eliminate the barriers between database silos by relying on a set of standards and best practices for representing and describing data. The architecture of the semantic web relies on the architecture of the web itself for integrating and mapping universal resource identifiers (URI), coupled with basic inference mechanisms to enable matching concepts and properties across data sources. Semantic Web and Linked Data technologies are increasingly being successfully applied as integration engines for linking biological elements. 

Exposing WikiPathways content as Linked Open Data to the Semantic Web, enables rapid, semi-automated integration with a the growing amount of biological resources available from the linked open data cloud, it also allows really fast queries of WikiPathways itself. 

We have harmonised WikiPathways content according to a selected set of vocabularies (Biopax, CHEMBL, etc), common to resources already available as Linked Open Data. 
WikiPathways content is now available as Linked Open Data for dynamic querying through a SPARQL endpoint: http://semantics.bigcat.unimaas.nl:8000/sparql

    Using a data triangle to understand molecular nutrition

    Get PDF
    Until recently nutrigenomics was mainly about transcriptomics related data. That already confronted us with overwhelming analytical problems. We learned to mathematically and statistically treat genome wide expression studies and studies directed to gene expression regulation. Nutrigenomics researchers had to become bilingual speaking: English and R1 and learned to think about co-expression, clusters and false discovery rates. The latter in fact proofed to be a trap. Removing all the false positives made us loose the information we were really interested in. To understand the results of our genomics experiments we often had to confront what we were measuring with what we already knew. After all false positives are not likely to all be related to the same meaningful biological process. That asked for the development of new analytical tools like Cytoscape for network analysis and PathVisio for pathway analysis. More importantly we had to structure what we know. Text mining and data mining helped us to do that, but what was really needed was mobilization of all the knowledge that is present in the heads of the scientific community. WikiPathways was our contribution to the rapidly emerging field of community curation. Thus we started to become able to integrate different types of technologies that span the full gene expression pipeline and to understand that in the biological context. 
Today the story repeats itself. Genome wide genetics is becoming real. We can do Genome Wide Association Studies and soon we can sequence individual genomes in relation to food intake and phenotypic responses. And then what? How can we deal with that new avalanche of data? The oversampling problems will be a few orders of magnitude larger; after all there can be hundreds of SNPs in every gene. There will just be too many to understand which SNPs are important from the data alone. We will again have to relate them to the biological processes. But is that enough? I think not. We will only understand the outcome of those large scale genetics studies if we not only attribute the SNPs to genes and thereby to pathways. We will also have to consider the actual sequences and see what the functional effect is that the SNP causes. Is it likely to influence transcription factor binding, miRNA effects, or protein-protein interactions? This calls for new types of data integration, for which we already have the tools. And it calls for new creative ways to do that. What we really need is teams of creative minds. Some new initiatives seem to show that these are already being formed.

1: http://www.r-project.org 
&#xa

    The Importance of Modularity in Bioinformatics Tools

    Get PDF
    In the last decade the amount of Bioinformatics tools has increased enormously. There are tools to store, analyse, visualize, edit or generate biological data and there are still more in development. Still, the demand for increased functionality in a single piece of software must be balanced by the need for modularity to keep the software maintainable. In complex systems, the conflicting demands of features and maintainability are often solved by plug-in systems.

For example Cytoscape, an open source platform for Complex-Network Analysis and Visualization, is using a plug-in system to allow the extension of the application without changing the core. This not only allows the integration of new functionality without a new release but offers the possibility for other developers to contribute plug-ins which are needed in their research.

Most tools have their own, individual plug-in system to meet the needs of the application. These are often very simple and easy to use. However, the increasing complexity of plug-ins demands more functionality of the plug-in system. We want to reuse components in different contexts, we want to have simple plug-in interfaces and we want to allow communication and dependencies between plug-ins. Many tools implemented in Java are facing these problems and there seems to be a common solution: the integration of an established modularity framework, like OSGi. To our knowledge, a number of developers of bioinformatics tools are already implementing, planning or thinking about the integration of OSGi into their applications, e.g. Cytoscape, Protege, PathVisio, ImageJ, Jalview or Chipster. The adoption of modularity frameworks in the development of bioinformatics applications is steadily increasing and should be considered in the design of new software.

By modularity in the traditional computer science sense, we mean the division of a software application into logical parts with separate concerns. To ease the development of software tools the application is separated into smaller logical parts, which are implemented individually. A set of modules can form a larger application but only if a proper glue is used, OSGi is an example of such a glue. OSGi allows to build an infrastructure into an application to add and use different modules. It provides mechanisms to allow the individual modules to rely on and interact with each other, opening the possibility to put together different modules to solve the problem at hand. Later, modules can be removed and new ones can be added to tackle another problem. As Katy Boerner in her article 'Plug-and-Play Macroscopes' writes, we should 'implement software frameworks that empower domain scientists to assemble their own continuously evolving macroscopes, adding and upgrading existing (and removing obsolete) plug-ins to arrive at a set that is truly relevant for their work'.

Some of these modules are going to be specific for one application but a lot of these modules can actually be reused by other tools. We are talking about general features like the import or export of different file formats, a layout algorithm that could be used by several visualization tools or the lookup in an external online database. Why should every tool implement its own parser or algorithm? Modularity can help to share functionality. There is no need to start from scratch and implement everything anew, thus developers can focus on new and important features.

Adding modularity, or better, a modularity framework to an existing software application is not a trivial task. The developers of Cytoscape are currently undertaking this challenge with the coming version 3. We are also working on the integration of OSGi into our pathway visualization tool PathVisio and we now want to share and compare our experiences, so others can benefit from our discoveries. This will not only help them in making a decision if OSGi is a suitable solution for them but also in the integration process itself

    Measuring impact in online resources with the CI­number (the CitedIn Number for online impact)

    Get PDF
    CitedIn is a webservice that can be found at www.citedin.org. It tracks online citations in databases, blogs, wikis, and community sites to the literature covered in Pubmed. For biomedical scientific literature Pubmed is by far the most interesting resource. Pubmed is the interface to the Medline repository which covers over 1 century of publications and contains almost all relevant publications in the field. Pubmed uses very simple unique numeric identifiers. This makes it easy to cite publications using these Pubmed Identifiers (PMIDs) or to link to Pubmed itself based on those. This ease of use results in PMIDs often being used for online citations, this is often the standard in biomedical databases and for instance in Wikipedia. CitedIn uses a federate search approach towards references to Pubmed to find which publications contained in Pubmed are cited where in a large number of web resources. At this moment (April 2011) CitedIn searches wikis (including Wikipedia), search engines for scientific blogs (Nature blog and Google blog search), databases (including some major bioinformatics databases), Google books, some special publications sets, and social network sites (such as Connotea and CiteULike). In the near future searching through Twitter tweets will be implemented as well. A CitedIn search can be done for any set of Pubmed references either offered as a list of PMIDs or retrieved from Pubmed through a set of keywords. This for instance allows searches for all papers produced by a single author and thus allows you to ask the question “where am I cited in on the web, besides in scientific publications?”. CitedIn will show you the publications it reviewed and for each of those it will indicate where it was cited. It is possible to receive an overview of the whole set, in which the contribution of each resource to the set is given, and it is also possible to review an individual publication where you can find the actual citation. CitedIn also offers an interface for programmatic access (API) through which it can be used for automated analysis. While we initially thought about CitedIn as “just” a resource to find online citations it also provides information that is useful to estimate the online impact of a paper or a set of papers. This offers an opportunity to assess the online impact of an author, a group of authors or a research topic. We propose to use this to calculate a metric for online scientific impact: the CI-­‐number (the CitedIn Number for online impact). Traditionally impact of scientific publications, journals and researchers is determined based on how often publications are cited in other publications. That leads to number of publications per article, impact factors per journal (average number of citations for all articles in a journal) and for instances h-­‐indices for researchers (number of articles cited at least as often as that number)[1-­3]. Debate is ongoing about how justifiable these indices are but the importance of scientific literature for the advance of science and technology indicates the need to somehow measure contributions, and the current system often determine academic careers, the fate of journals or even decisions to close or fund whole research institutes or research programs. Since the current methods only consider structured citations in reference lists of journals articles (and sometimes books), they miss important citations. These are roughly from four domains: 1) publications on the Internet (e.g. blogs, Wikipedia), 2) online databases (containing structured knowledge derived from papers and often referring to them), 3) social network cites (these are in part designed to share important publications, like Mendeley, CiteULike and Connotea) and 4) supplementary data (especially for reviews long lists of references are sometimes only published online).
We have defined the CI-­‐number (CitedIn number for online impact) as a metric to assess the impact in online resources for a set of scientific publications contained in Pubmed. This metric is calculated dynamically while the numbers of citations in each resource are counted. In the calculation of the CI-­‐number we normalize on the total number of citations covered in the resource under scrutiny, we also introduce a weight value for each resource. This weight indicates the “impact” of the resource. A citation in Wikipedia should be considered to have a higher impact, than when being cited in a blog. The total weight is first divided over the main groups of resources and then between the resources of that group. Individual weights and relative total weights for groups will be adjusted on a yearly basis (effectively leading to a yearly CI-­‐number). We will start with the following arbitrary selected group weights: Wikipedia (25%), blogs and social media (15%), small wikis (15%), databases (35%), and a rest category (10%)

    WikiPathways: building research communities on biological pathways.

    Get PDF
    Here, we describe the development of WikiPathways (http://www.wikipathways.org), a public wiki for pathway curation, since it was first published in 2008. New features are discussed, as well as developments in the community of contributors. New features include a zoomable pathway viewer, support for pathway ontology annotations, the ability to mark pathways as private for a limited time and the availability of stable hyperlinks to pathways and the elements therein. WikiPathways content is freely available in a variety of formats such as the BioPAX standard, and the content is increasingly adopted by external databases and tools, including Wikipedia. A recent development is the use of WikiPathways as a staging ground for centrally curated databases such as Reactome. WikiPathways is seeing steady growth in the number of users, page views and edits for each pathway. To assess whether the community curation experiment can be considered successful, here we analyze the relation between use and contribution, which gives results in line with other wiki projects. The novel use of pathway pages as supplementary material to publications, as well as the addition of tailored content for research domains, is expected to stimulate growth further

    Maternal folate depletion during early development and high fat feeding from weaning elicit similar changes in gene expression, but not in DNA methylation, in adult offspring

    Get PDF
    Scope: The ‘Predictive Adaptive Response’ hypothesis suggests that the in utero environment when mismatched with the post-natal environment can influence later life health. Underlying mechanisms are poorly understood, but may involve gene transcription changes regulated via epigenetic mechanisms. Methods and results: In a 2 × 2 factorial design, female C57Bl/6 mice were randomised to low or normal folate diets (0.4 mg/2 mg folic acid/kg diet) prior to and during pregnancy and lactation with offspring randomised to high- or low-fat diets at weaning. Genome-wide gene expression and promoter DNA methylation were measured using microarrays in adult male livers. Maternal folate depletion and high fat intake post-weaning influenced gene expression (1859 and 1532 genes, respectively) and promoter DNA methylation (201 and 324 loci, respectively) but changes in expression and methylation were poorly matched for both dietary interventions. Expression of 642 genes was altered in response to both maternal folate depletion and post-weaning high fat feeding, treatments imposed separately. In addition, there was evidence that the combined dietary insult (i.e. maternal folate depletion followed by high fat post-weaning) caused the largest expression change for most genes. Conclusion: Our observations align with, and provide evidence in support of, a potential underlying mechanism for the ‘Predictive Adaptive Response’ hypothesis

    BridgeDb: standardized access to gene, protein and metabolite identifier mapping services

    Get PDF
    Many interesting problems in bioinformatics require integration of data from various sources. For example when combining microarray data with a pathway database, or merging co-citation networks with protein-protein interaction networks. Invariably this leads to an identifier mapping problem, where different datasets are annotated with identifiers that are related, but originate from different databases.

Solutions for the identifier mapping problem exist, such as Biomart, Synergizer, Cronos, PICR, HMS and many more. This creates an opportunity for bioinformatics tool developers. Tools can be made to flexibly support multiple mapping services or mapping services could be combined to get broader coverage. This approach requires an interface layer between tools and mapping services. BridgeDb provides such an interface layer, in the form of both a Java and REST API.

Because of the standardized interface layer, BridgeDb is not tied to a specific source of mapping information. You can switch easily between flat files, relational databases and several different web services. Mapping services can be combined to support multi-omics experiments or to integrate custom microarray annotations. BridgeDb isn't just yet another mapping service: it tries to build further on existing work, and integrate multiple partial solutions. The framework is intended for customization and adaptation to any identifier mapping service. 

BridgeDb makes it easy to add an important capability to existing tools. BridgeDb has already been integrated into several popular bioinformatics applications, such as Cytoscape, WikiPathways, PathVisio, Vanted and Taverna. To encourage tool developers to start using BridgeDb, we've created code examples, online documentation, and a mailinglist to ask questions. 

We believe that, to meet the challenges that are encountered in bioinformatics today, the software development process should follow a few essential principles: user friendliness, code reuse, modularity and open source. BridgeDb adheres to these principles, and can serve as a useful model for others to follow. BridgeDb can function to increase user-friendliness of graphical applications. It re-uses work from other projects such as BioMart and MIRIAM. BridgeDb consists of several small modules, integrated through a common interface (API). Components of BridgeDb can be left out or replaced, for maximum flexibility. BridgeDb was open source from the very beginning of the project. The philosophy of open source is closely aligned to academic values, of building on top of the work of giants. 

Many interesting problems in bioinformatics require integration of data from various sources. For example when combining microarray data with a pathway database, or merging co-citation networks with protein-protein interaction networks. Invariably this leads to an identifier mapping problem, where different datasets are annotated with identifiers that are related, but originate from different databases.

Solutions for the identifier mapping problem exist, such as Biomart, Synergizer, Cronos, PICR, HMS and many more. This creates an opportunity for bioinformatics tool developers. Tools can be made to flexibly support multiple mapping services or mapping services could be combined to get broader coverage. This approach requires an interface layer between tools and mapping services. BridgeDb provides such an interface layer, in the form of both a Java and REST API.

Because of the standardized interface layer, BridgeDb is not tied to a specific source of mapping information. You can switch easily between flat files, relational databases and several different web services. Mapping services can be combined to support multi-omics experiments or to integrate custom microarray annotations. BridgeDb isn't just yet another mapping service: it tries to build further on existing work, and integrate multiple partial solutions. The framework is intended for customization and adaptation to any identifier mapping service. 

BridgeDb makes it easy to add an important capability to existing tools. BridgeDb has already been integrated into several popular bioinformatics applications, such as Cytoscape, WikiPathways, PathVisio, Vanted and Taverna. To encourage tool developers to start using BridgeDb, we've created code examples, online documentation, and a mailinglist to ask questions. 

We believe that, to meet the challenges that are encountered in bioinformatics today, the software development process should follow a few essential principles: user friendliness, code reuse, modularity and open source. BridgeDb adheres to these principles, and can serve as a useful model for others to follow. BridgeDb can function to increase user-friendliness of graphical applications. It re-uses work from other projects such as BioMart and MIRIAM. BridgeDb consists of several small modules, integrated through a common interface (API). Components of BridgeDb can be left out or replaced, for maximum flexibility. BridgeDb was open source from the very beginning of the project. The philosophy of open source is closely aligned to academic values, of building on top of the work of giants. 

The BridgeDb library is available at "http://www.bridgedb.org":http://www.bridgedb.org.
A paper about BridgeDb was published in BMC _Bioinformatics_, 2010 Jan 4;11(1):5.

BridgeDb blog: "http://www.helixsoft.nl/blog/?tag=bridgedb":http://www.helixsoft.nl/blog/?tag=bridged

    Exploring pathway interactions in insulin resistant mouse liver

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Complex phenotypes such as insulin resistance involve different biological pathways that may interact and influence each other. Interpretation of related experimental data would be facilitated by identifying relevant pathway interactions in the context of the dataset.</p> <p>Results</p> <p>We developed an analysis approach to study interactions between pathways by integrating gene and protein interaction networks, biological pathway information and high-throughput data. This approach was applied to a transcriptomics dataset to investigate pathway interactions in insulin resistant mouse liver in response to a glucose challenge. We identified regulated pathway interactions at different time points following the glucose challenge and also studied the underlying protein interactions to find possible mechanisms and key proteins involved in pathway cross-talk. A large number of pathway interactions were found for the comparison between the two diet groups at t = 0. The initial response to the glucose challenge (t = 0.6) was typed by an acute stress response and pathway interactions showed large overlap between the two diet groups, while the pathway interaction networks for the late response were more dissimilar.</p> <p>Conclusions</p> <p>Studying pathway interactions provides a new perspective on the data that complements established pathway analysis methods such as enrichment analysis. This study provided new insights in how interactions between pathways may be affected by insulin resistance. In addition, the analysis approach described here can be generally applied to different types of high-throughput data and will therefore be useful for analysis of other complex datasets as well.</p

    An integrated bioinformatics approach to improve two-color microarray quality-control: impact on biological conclusions

    Get PDF
    Omics technology used for large-scale measurements of gene expression is rapidly evolving. This work pointed out the need of an extensive bioinformatics analyses for array quality assessment before and after gene expression clustering and pathway analysis. A study focused on the effect of red wine polyphenols on rat colon mucosa was used to test the impact of quality control and normalisation steps on the biological conclusions. The integration of data visualization, pathway analysis and clustering revealed an artifact problem that was solved with an adapted normalisation. We propose a possible point to point standard analysis procedure, based on a combination of clustering and data visualization for the analysis of microarray data

    CyTargetLinker app update: A flexible solution for network extension in Cytoscape

    Get PDF
    Here, we present an update of the open-source CyTargetLinker app for Cytoscape ( http://apps.cytoscape.org/apps/cytargetlinker) that introduces new automation features. CyTargetLinker provides a simple interface to extend networks with links to relevant data and/or knowledge extracted from so-called linksets. The linksets are provided on the CyTargetLinker website ( https://cytargetlinker.github.io/) or can be custom-made for specific use cases. The new automation feature enables users to programmatically execute the app's functionality in Cytoscape (command line tool) and with external tools (e.g. R, Jupyter, Python, etc). This allows users to share their analysis workflows and therefore increase repeatability and reproducibility. Three use cases demonstrate automated workflows, combinations with other Cytoscape apps and core Cytoscape functionality. We first extend a protein-protein interaction network created with the stringApp, with compound-target interactions and disease-gene annotations. In the second use case, we created a workflow to load differentially expressed genes from an experimental dataset and extend it with gene-pathway associations. Lastly, we chose an example outside the biological domain and used CyTargetLinker to create an author-article-journal network for the five authors of this manuscript using a two-step extension mechanism. With 400 downloads per month in the last year and nearly 20,000 downloads in total, CyTargetLinker shows the adoption and relevance of the app in the field of network biology. In August 2019, the original publication was cited in 83 articles demonstrating the applicability in biomedical research
    • …
    corecore