Search CORE

25 research outputs found

Standardizing Pathway Entries to Wikidata

Author: Alexander Pico (99736)
Andra Waagmeester (97751)
andrew su (4501945)
Henning Hermjakob (14241)
Sarah Keating (3463595)
Publication venue
Publication date
Field of study

Wikidata is a free, collaborative database that collects structured data from a wide variety of sources. Wikidata items build on what is called a statement. A statement is expressed as a triple - similar the the RDF triple - which is embedded in a set of qualifiers and references to provide their provenance. The Wikidata statements and their provenance are (almost) continually being converted to an RDF model, which is made available to the Wikidata Query Service (a SPARQL endpoint). This approach means that Wikidata is easily queried and the range of data allows links between items from different fields to established. For example, a query for any item citing a given scientific article will result in not only other scientific articles, but can also access any other data item that may reference the article. Here we present the collaborative effort between several groups to enter data concerning biological pathways into Wikidata in a standard fashion that will allow users to query several databases with a single Wikidata query. Initial data from both Reactome [http://reactome.org/] and WikiPathways [http://wikipathways.org] has been added using a data model designed to standardize the commonality among pathway resources. Work is now progressing to establish and improve links between the data entries and produce a standard format that will facilitate the addition of further pathway information to Wikidata. With a harmonized data model and sufficient coverage of multiple pathway resources, the platform is implemented to query and enrich pathway information with a single query, including information provided by non-pathway resources. See for examples: https://www.wikidata.org/wiki/User:Pathwaybot/query_examplesIt should be noted when using these queries that, as yet, not all Reactome data has been exported to wikidata. Wikidata supports SPARQL 1.1. This means that federated queries are possible. These are a special type of queries that allow to query multiple SPARQL endpoints in one query. See this designated blog for examples: http://sulab.org/2017/07/integrating-wikidata-and-other-linked-data-sources-federated-sparql-queries/. Wikidata is not a replacement for either Reactome and Wikipathways, it acts as proxy between both resources, and to a larger extent to other data resources, by providing a unified interface. To maintain its role as a hub of scientific data, regular updates with the primary sources are essential. Also direct links and references to the original sources are stored, allowing direct access. Harmonized data models create the mechanism to proxy through various scientific data sources. WikiPathways and Reactome are now available on a unified data model, we invite you to join our efforts. <div> </div

FigShare

Analyzing Protein–Protein Interaction Networks

Author: Bruno Aranda (38307)
Gavin C. K. W. Koh (2096602)
Henning Hermjakob (14241)
Pablo Porras (184176)
Sandra E. Orchard (2096599)
Publication venue
Publication date
Field of study

The advent of the “omics” era in biology research has brought new challenges and requires the development of novel strategies to answer previously intractable questions. Molecular interaction networks provide a framework to visualize cellular processes, but their complexity often makes their interpretation an overwhelming task. The inherently artificial nature of interaction detection methods and the incompleteness of currently available interaction maps call for a careful and well-informed utilization of this valuable data. In this tutorial, we aim to give an overview of the key aspects that any researcher needs to consider when working with molecular interaction data sets and we outline an example for interactome analysis. Using the molecular interaction database IntAct, the software platform Cytoscape, and its plugins BiNGO and clusterMaker, and taking as a starting point a list of proteins identified in a mass spectrometry-based proteomics experiment, we show how to build, visualize, and analyze a protein–protein interaction network

FigShare

Screenshot of the LipidHome “Browser” view.

Author: Antonio Fabregat (410345)
Christoph Steinbeck (316152)
Henning Hermjakob (14241)
Joseph M. Foster (410343)
Juan Antonio Vizcaíno (410346)
Michael J. O. Wakelam (116384)
Pablo Moreno (410344)
Rolf Apweiler (1386)
Publication venue
Publication date
Field of study

The LipidHome structural hierarchy can be navigated in the far left tree panel. Clicking on a lipid record produces two vertically stacked panels in the right hand panel. The top panel shows general information about the selected record including an image. The bottom panel displays a table of the selected records’ children lipids, i.e. selecting the “Sub Class” “Diacylglycerophosphocholines” provides a list of its “Species”. These lists are exportable to a number of file formats.</p

FigShare

A diagram of in silico construction of theoretical diradyl lipid “Sub Species”.

Author: Antonio Fabregat (410345)
Christoph Steinbeck (316152)
Henning Hermjakob (14241)
Joseph M. Foster (410343)
Juan Antonio Vizcaíno (410346)
Michael J. O. Wakelam (116384)
Pablo Moreno (410344)
Rolf Apweiler (1386)
Publication venue
Publication date
Field of study

Steps: 1. All viable potential fatty acids are generated from a set of starting parameters; 2. They are combined all against all; 3. The head groups with alpha-carbons and linkages are generated; 4. The head groups are crossed with the fatty acid pairs to produce all viable lipid structures within the predefined chemical space.</p

FigShare

The structural hierarchy of lipid records.

Author: Antonio Fabregat (410345)
Christoph Steinbeck (316152)
Henning Hermjakob (14241)
Joseph M. Foster (410343)
Juan Antonio Vizcaíno (410346)
Michael J. O. Wakelam (116384)
Pablo Moreno (410344)
Rolf Apweiler (1386)
Publication venue
Publication date
Field of study

a)The structural hierarchy of lipid records in the Lipid Maps Structural Database (LMSD). The Lipid Maps classification system organises all lipids into “Categories”, “Main Classes” and “Sub Classes”. The lipid records are stored at the “Geometric Isomer” level where the total number of carbons, total number of double bonds, the position of double bonds and the stereochemistry of double bonds are defined for each fatty acid. The transparent lipid identification hierarchy levels are not supported by the LMSD. b)The structural hierarchy of lipid records in the LipidHome database. Similar to the LIPID MAPS classification system, lipids are organised into “Categories”, “Main Classes” and “Sub Classes”. Lipid records are stored at four levels. Each level relates to a typical type of identification from a high throughput mass spectrometry experiment. From structurally undefined “Species” typically identified from a single precursor ion mass, to structurally resolved “Isomer” level identifications.</p

FigShare

Comparison of the response and elapsed time for one user sequentially retrieving 5,000 reaction instances from the graph and relational databases (blue and orange respectively).

Author: Antonio Fabregat (410345)
Florian Korninger (4803828)
Guanming Wu (29283)
Guilherme Viteri (4803831)
Henning Hermjakob (14241)
Konstantinos Sidiropoulos (830564)
Lincoln Stein (14330)
Pablo Marin-Garcia (256191)
Peipei Ping (675941)
Peter D’Eustachio (727315)
Publication venue
Publication date
Field of study

The graph database software ecosystem achieved a 93% average improvement in performance compared to that of the relational database.</p

FigShare

The Reactome graph database in numbers.

Author: Antonio Fabregat (410345)
Florian Korninger (4803828)
Guanming Wu (29283)
Guilherme Viteri (4803831)
Henning Hermjakob (14241)
Konstantinos Sidiropoulos (830564)
Lincoln Stein (14330)
Pablo Marin-Garcia (256191)
Peipei Ping (675941)
Peter D’Eustachio (727315)
Publication venue
Publication date
Field of study

The Reactome graph database in numbers.</p

FigShare

Examples of frequent use cases that can be answered using Cypher queries.

Author: Antonio Fabregat (410345)
Florian Korninger (4803828)
Guanming Wu (29283)
Guilherme Viteri (4803831)
Henning Hermjakob (14241)
Konstantinos Sidiropoulos (830564)
Lincoln Stein (14330)
Pablo Marin-Garcia (256191)
Peipei Ping (675941)
Peter D’Eustachio (727315)
Publication venue
Publication date
Field of study

a) Retrieving the participating molecules for “Interleukin-4 and 13 signalling” pathway. b) Retrieving the pathways in which CCR5 participates.</p

FigShare

Representation of the content migration.

Author: Antonio Fabregat (410345)
Florian Korninger (4803828)
Guanming Wu (29283)
Guilherme Viteri (4803831)
Henning Hermjakob (14241)
Konstantinos Sidiropoulos (830564)
Lincoln Stein (14330)
Pablo Marin-Garcia (256191)
Peipei Ping (675941)
Peter D’Eustachio (727315)
Publication venue
Publication date
Field of study

The example shows a Reaction class reduced to its inputs, outputs, catalyst and regulators. A model class instance is converted to a graph database node where (1) slots with primitive value types become node properties and (2) slots allocating instances of another class become relationships.</p

FigShare

A schematic diagram of the new ecosystem.

Author: Antonio Fabregat (410345)
Florian Korninger (4803828)
Guanming Wu (29283)
Guilherme Viteri (4803831)
Henning Hermjakob (14241)
Konstantinos Sidiropoulos (830564)
Lincoln Stein (14330)
Pablo Marin-Garcia (256191)
Peipei Ping (675941)
Peter D’Eustachio (727315)
Publication venue
Publication date
Field of study

The relational database is converted to a graph database via the batch importer that relies on the Domain Model. Spring Data Neo4j and AspectJ are two main pillars for the graph-core, which also rests on the Domain Model. Users access services or use tools that make direct use of the graph-core as a library that eliminates the code boilerplate for data retrieval and offers a data persistency mechanism. Finally, export tools take advantage of Cypher to generate flat mapping files.</p

FigShare