Search CORE

9 research outputs found

Global Unique identification LINCS Digital Research Objects to Enable Citation, Reuse, and Persistence of LINCS Data.

Author: Amar Koleti (3354098)
Caty Chung (3352190)
Dušica Vidović (3354110)
John P Turner (4014359)
Raymond Terryn (3354104)
Stephan Schürer (3354080)
Publication venue
Publication date
Field of study

The ability to cite LINCS datasets is critical for users and data producers alike. Requirements for dataset citation records have been set forth by the Joint Declaration of Data Citation Principles (JDDCP) and include attribution, a unique identifier, data persistence, verification, and interoperability. Data citation is also an important facilitator of the FAIR (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific datasets. DOIs are well aligned with journal tracking citations, and the cost is justified within the business goal. However, LINCS datasets are complex and require granular identifiers of various LINCS Digital Research Objects (DROs), including a dataset at a specific data level, a dataset group (combining all data levels from one experiment), derived datasets (e.g. computationally reprocessed, by a LINCS Center or an outside group) and also various LINCS metadata. Such identifiers are needed to describe provenance. Although DOI have been used for datasets, we preferred an open and free solution. To accomplish that we created collections of LINCS DROs in the MIRIAM Registry to generate unique, perennial and location-independent identifiers. Such collections include data-level specific dataset packages, dataset groups, small molecules, and cells. The identifiers.org service which is built upon the information stored in MIRIAM, provides directly resolvable identifiers in the form of Uniform Resource Locators (URLs). This system provides a globally unique identification scheme to which any external resource can point and a resolving system that gives the owner / creator of the resource collection flexibility to update the resolving URL without changing the global identifiers. These dataset and dataset group identifiers are the central component of the LINCS dataset citation record, which further includes the authors, title, year, repository, resource type and version. These citation records have been incorporated into the LINCS Data Portal and can be downloaded in several formats making it easy to cite a specific LINCS dataset or a dataset group. The LINCS provenance model provides a record of creation, manipulation, and source of the dataset and metadata that are part of a LINCS dataset package. It will provide mappings of LINCS dataset packages to corresponding records in public repositories and at the data generation centers. Persistent global identifiers of LINCS DROs, formal dataset provenance, and mappings of key LINCS metadata to external qualified references (such as ontologies) also are required for the persistence of LINCS beyond the funded project and independent from the current LINCS Centers

FigShare

FAIR LINCS Metadata Powered by CEDAR Cloud-Based Templates and Services

Author: Amar Koleti (3354098)
Caty Chung (3352190)
Daniel Cooper (3354113)
Debra Willrett (3352616)
John Greybeal (4011461)
Mark Musen (50329)
Martin O'Connor (4011464)
Stephan Schurer (684993)
Publication venue
Publication date
Field of study

The Library of Integrated Network-based Signatures (LINCS) program generates a wide variety of cell-based perturbation-response signatures using diverse assay technologies. For example, LINCS includes large-scale transcriptional profiling of genetic and small molecule perturbations, and various proteomics and imaging datasets. We currently obtain metadata through an online platform, the metadata submission tool (MST), based off the use of spreadsheet data templates. While functional, it remains difficult to maintain FAIR standards, specifically remaining findable and re-usable, for metadata without (enforced) controlled vocabulary and internally built linkages to ontologies and metadata standards. To maintain FAIR-centric metadata, we have worked with the Center for Enhanced Data Annotation and Retrieval (CEDAR), to develop modular metadata templates linked to ontologies and standards present in the NCBO Bioportal. We have also developed a new LINCS Dataset Submission Tool (DST), which links new LINCS datasets to the form-fillable CEDAR templates. This metadata management framework supports authoring, curation, validation, management, and sharing of LINCS metadata, while building upon the existing LINCS metadata standards and data-release workflows. Additionally, the CEDAR technology facilitates metadata validation and testing testing, enabling users to ensure their input metadata are LINCS compliant prior to submission for public release. CEDAR templates have been developed for reagent metadata, experimental metadata, to describe assays, and to capture global dataset attributes. Integrating the submission of all these components into one submission tool and workflow we aim to significantly simplify and streamline the workflow of LINCS dataset submission, processing, validation, registration, and publication. As other projects apply the same approach, many more datasets will become cross-searchable and can be linked optimizing the metadata pathway from submission to discovery

FigShare

LINCS Small Molecules Standardization and Annotation to Improve Data Integration, Analysis, and Modeling

Author: Amar Koleti (3354098)
Caty Chung (3352190)
Dušica Vidović (3354110)
John P Turner (4014359)
Raymond Terryn (3354104)
Stephan Schürer (3354080)
Tanya T. Kelley (2898401)
Vasileios Stathias (680557)
Publication venue
Publication date
Field of study

The physical properties of small molecules, in particular “drug-like” molecules, including their ability to interact with and modulate protein function, cell permeability, and metabolic stability make them powerful tools to study biological systems. Large amounts of small molecule biological activity data are publicly available and small molecules are systematically studied in the diverse profiling assays of the LINCS Consortium. To integrate LINCS data across the various assays, Centers, and with external bioactivity data requires to uniquely identify each small molecule samples tested in an assay based on its unique “active” component. Typically, this is done based on the unique chemical structure. Although non-trivial, a unique, single fragment representation of a organic small molecule can, in most cases, be generated after removing salt counter ions and addends, considering ionization states, tautomeric forms and canonicalizing the chemical structure representation, e.g. as a canonical SMILES or InChI. We implemented the chemical structure standardization using chemical informatics tools. Exceptions include small numbers of metal-organic and multi-component compounds, which we handled by manual expert curation. However, a significant challenge in standardizing small molecules lies in the considerable variability of reported chemical structures for the same compound, depending on the source. Typical and frequent errors include inversed or removed stereochemical centers, relative vs absolute stereochemical configuration, E/Z geometric isomerism of alkenes or imines, loss of aromaticity, changes in oxidation states and other problems. Further complexities can be introduced by different representations of compound mixtures. Public resources, such as PubChem, report many different chemical structures for the same compound, for example as identified by a common drug name. The apparent lack of curation of small molecule chemical structures results in error propagation, for example incorrect chemical structures submitted to PubChem, which are then referenced and potentially added to another resource.Herein we present the chemical structure standardization and registration pipeline implemented for LINCS small molecules including manual curation, automated steps, mappings to PubChem, naming, validation, and several QC and review steps. The standardization pipeline considers stereochemical representations, mixtures of stereoisomers, geometric isomers of carbon-carbon and carbon-hetero double bonds, regio-isomers, non-isomeric mixtures, ionization states, tautomeric forms, and salt forms or other addends. We illustrate typical errors and their propagation, a problem exacerbated by the lack of user-friendly tools to enable biologists to work with complex chemical information. In LINCS we work to disambiguate compound identity during the registration process using redundant information including chemical structures, drug names, vendor information and provided cross references.Standardized LINCS small molecules are mapped to PubChem, ChEMBL, ChEBI, and via UniChem to many other resources. These mappings facilitate the curation and integration of diverse external annotations, such as biochemical target information. Compound standardization and mapping makes it easy to integrate different LINCS signatures. The LINCS Small Molecule collection has been registered into the MIRIAM Registry, and identifiers.org for the Persistent URL (PURL). The identifiers.org PURL for each LINCS small molecule re-directs to the LINCS Data Portal, and the information is accessible via RESTful API, in coordination with interoperable smartAPI

FigShare

LINCSAnalytics: An integrated platform for the efficient query and computation across diverse LINCS signatures

Author: Amar Koleti (3354098)
Caty Chung (3352190)
Dušica Vidović (3354110)
John Turner (3354095)
Michele Forlin (1300680)
Raymond Terryn (3354104)
Stephan Schürer (3354080)
Vasileios Stathias (680557)
ZongJun Hu (4015568)
Publication venue
Publication date
Field of study

The Library of Integrated Network-based Signatures (LINCS) program generates a wide variety of cell-based perturbation-response signatures using diverse assay technologies. A signature, defined as a specific cellular response to a given perturbation, can hence be expressed as a function of a set of parameters: the model system (typically a cell), the perturbation (e.g. small molecule) and the detected analytes (e.g. expressed in a transcriptional profiling assay) plus additional experimental details (such as concentration and time). In order to effectively use LINCS data for a wide variety of scientific use case, signatures need to be readily queryable, retrievable and accessible for computation as a function of all of these dimensions. Here we present a computational platform built on top of the open source Cloudera Hadoop platform allowing the distributed storage and processing of large datasets through a number of dedicated modules. LINCS signature data and standardized entity metadata are stored in the Hadoop Distributed Filesystem. Apache HIVE and IMPALA are responsible for the fast query and retrieval of any data point, while computation and modeling are available through Apache Spark and its Sparklyr R interface. Full accessibility to the core of the platform is achieved via a set of APIs, which also allow to build and deploy custom-made applications. As an initial demonstration, we show a simple Shiny R application to interactively query and retrieve LINCS signatures for any dimension of interest.To enable the computational biology community to use LINCS data in their research via the LINCS Analytics platform, we deployed an R package that allows to retrieve the available data and metadata for any dimension of interest. It also allows on the fly aggregation of replicates and filtering by desired output values

FigShare

The LINCS Data Portal and FAIR LINCS Dataset Landing Pages

Author: Amar Koleti (3354098)
Avi Ma’ayan (581211)
Caroline Monteiro (3354107)
Caty Chung (298750)
Christopher Mader (3354101)
Dušica Vidović (3354110)
Mario Medvedovic (1779)
Michele Forlin (1300680)
Raymond Terryn (3354104)
Stephan Schürer (3354080)
Vasileios Stathias (680557)
Wen Niu (381056)
Publication venue
Publication date
Field of study

The LINCS Data Portal (LDP) presents a unified interface to access LINCS datasets and metadata with mappings to several external resources. LDP provides various options to explore, query, and download LINCS dataset packages and reagents that have been described using the LINCS metadata standards.We recently introduced LINCS Dataset Landing Pages to provide integrated access to important content for each LINCS dataset. The landing pages provide deep metadata for each LINCS dataset including description of the assays, authors, data analysis pipelines, and standardized reagents such as small molecules cell lines, antibodies, etc, with rich annotations. The landing pages are a key component to make LINCS data persistent and reusable, by integrating LINCS datasets, data processing pipelines, analytes, perturbations, model systems and related concepts as uniquely identifiable digital research objects.LDP supports ontology-driven concept search, free text search, facet filtering, logical intersection of filters (AND, OR), and list, table, and matrix views. LDP enables download of LINCS dataset packages, which consist of released datasets and associated metadata. LDP also provides several specialized apps including small molecule compounds and cell lines. A landing page facilitates interactive exploration of all LINCS datasets via several classifications.LDP is built on a robust API and is integrated with the MetaData Registry and interfaces with other components of the Integrated Knowledge Environment (IKE) developed in our Center. All LINCS datasets are also indexed in bioCADDIE DataMed

FigShare

FAIR Dataset Landing Pages, Digital Research Objects, and Software Tools for LINCS and BD2K

Author: Amar Koleti (3354098)
Avi Ma’ayan (581211)
Caroline Monteiro (3354107)
Caty Chung (3352190)
Christopher Mader (3354101)
Dušica Vidović (3354110)
Edward He (3354122)
Mario Medvedovic (1779)
Michele Forlin (1300680)
Raymond Terryn (3354104)
Sherry Jenkins (3354125)
Stephan Schürer (3354080)
Vasileios Stathias (680557)
Wen Niu (381056)
Publication venue
Publication date
Field of study

The Library of Integrated Network-based Signatures (LINCS, http://lincsproject.org/) program generates a wide variety of cell-based perturbation-response signatures using diverse assay technologies. For example, LINCS includes large-scale transcriptional profiling of genetic and small molecule perturbations, and various proteomics and imaging datasets. The BD2K LINCS Data Coordination and Integration Center (DCIC) has been developing a collection of tools including data standards specifications, data processing pipelines and infrastructure, a metadata registration system, and a diverse suite of end-user software tools to support and implement an end-to-end solution from submitting LINCS datasets by the Data and Signature Generation Centers (DSGCs) to dataset publication via a Data Portal followed by integrated data analytics enabled by easy to use web-based tools. We will give an overview of LINCS tools with an emphasis on our long-term goal of persistent and FAIR (findable, accessible, interoperable, reusable) LINCS resources by connecting signatures, data processing pipelines, analytes, perturbagens, model systems and related concepts, and analysis software tools via uniquely identifiable digital research objects.All LINCS Datasets are already indexed in bioCADDIE DataMed. In another example of BD2K and LINCS collaboration, we are working with the CEDAR Metadata Center to develop a LINCS Community Metadata Framework for end-to-end metadata management supporting authoring, curation, validation, management, and sharing of LINCS metadata. Shared metadata facilitated via re-usable, modular, and user-friendly CEDAR templates provide the prospect of cross-searchable linkable datasets connecting many different data generation programs.In addition to building an advanced integrated knowledge environment, our Center supports several internal and external data science research projects and we have an active outreach and training program. Our software and data analytics resources, data science projects, and training programs are available at http://bd2k-lincs.org/

FigShare

LINCS Data Portal: Unified interface to search, query, explore, analyze, and download LINCS data

Author: Afoma C Umeano (4015520)
Amar Koleti (3354098)
Avi Ma’ayan (581211)
Caty Chung (3352190)
Christopher Mader (3354101)
Daniel Cooper (3354113)
Dušica Vidović (3354110)
John P Turner (4014359)
Mario Medvedovic (1779)
Michele Forlin (1300680)
Raymond Terryn (3354104)
Stephan Schürer (3354080)
Tanya Kelley (4015523)
Vasileios Stathias (680557)
Publication venue
Publication date
Field of study

The Library of Integrated Network-based Signatures (LINCS, http://lincsproject.org/) program generates a wide variety of cell-based perturbation-response signatures using diverse assay technologies. For example, LINCS includes large-scale transcriptional profiling of genetic and small molecule perturbations, and various proteomics and imaging datasets. The BD2K LINCS Data Coordination and Integration Center (DCIC) has been developing a collection of tools including data standards specifications, data processing pipelines and infrastructure, a metadata registration system, and a diverse suite of end-user software tools to support and implement an end-to-end solution from submitting LINCS datasets by the Data and Signature Generation Centers (DSGCs) to dataset publication via a Data Portal followed by integrated data analytics enabled by easy to use web-based tools.The LINCS Data Portal (LDP) presents a unified interface to access LINCS datasets and metadata with mappings to many external resources. LDP provides various options to explore, query, and download LINCS dataset packages and reagents that have been described using the LINCS metadata standards.The LINCS Data Portal dataset landing pages provide deep metadata for each LINCS dataset including description of the assays, authors, data analysis pipelines, and standardized reagents such as small molecules cell lines, antibodies, etc, with rich annotations. The landing pages are a key component to make LINCS data persistent and reusable, by integrating LINCS datasets, data processing pipelines, analytes, perturbations, model systems and related concepts as uniquely identifiable digital research objects. Dataset citation is now integrated into LDP.We recently added new curated content and search functionality for small molecules and many new features to further improve the utility of LDP.LDP is built on a robust API and is integrated with the MetaData Registry and interfaces with other components of the Integrated Knowledge Environment (IKE) developed in our Center. All LINCS datasets are also indexed in bioCADDIE DataMed and omicsDI

FigShare

Recommended from our members

FAIR LINCS Data and Metadata powered by the CEDAR Framework

Author: Amar Koleti (3354098)
Avi Ma’ayan (581211)
Caroline Monteiro (3354107)
Caty Chung (3352190)
Christopher Mader (3354101)
Csongor I. Nyulas (3352646)
Daniel Cooper (3354113)
Dusica Vidovic (684990)
John Graybeal (3352631)
Mario Medvedovic (1779)
Mark A. Musen (107442)
Martin J. O'Connor (3352619)
Michele Forlin (1300680)
Rafael S. Gonçalves (3354116)
Stephan Schürer (3354080)
Vance Lemmon (515697)
Vasileios Stathias (680557)
Wen Niu (381056)
Publication venue
Publication date: 23/11/2016
Field of study

The Library of Integrated Network-based Signatures (LINCS) program generates a wide variety of cell-based perturbation-response signatures using diverse assay technologies. For example, LINCS includes large-scale transcriptional profiling of genetic and small molecule perturbations, and various proteomics and imaging datasets. We have developed data processing pipelines, and supporting informatics infrastructure to access, standardize and harmonize, register and publish LINCS datasets and metadata from all Data and Signature Generating Centers (DSGC’s). Metadata standards specifications provide a foundation for harmonizing and integrating LINCS data. Here we introduce a CEDAR-based LINCS Community Metadata Environment, to support end-to-end metadata management framework that supports authoring, curation, validation, management, and sharing of LINCS metadata, while building upon the existing LINCS metadata standards and data-release workflows. Following this initial validation, our goal is to create reusable metadata modules with user friendly templates for each of the LINCS metadata categories and to make our suite of tools compatible with the CEDAR metadata technologies. This should further simplify metadata handling in the LINCS consortium and facilitate a global metadata repository at CEDAR. As other projects apply the same approach, many more datasets will become cross-searchable and can be linked optimizing the metadata pathway from submission to discovery

University of Miami: Scholarship Miami

FigShare

Additional file 1: Table S1. of Drug target ontology to classify and integrate drug discovery data

SPARQL query results to identify kinase domains in the KINOMEscan assay with gatekeeper annotations. Shown are TDL classification, DTO ID, kinase domain description, and protein name. (XLSX 28 kb

FigShare