16 research outputs found

    Semantic web data warehousing for caGrid

    Get PDF
    The National Cancer Institute (NCI) is developing caGrid as a means for sharing cancer-related data and services. As more data sets become available on caGrid, we need effective ways of accessing and integrating this information. Although the data models exposed on caGrid are semantically well annotated, it is currently up to the caGrid client to infer relationships between the different models and their classes. In this paper, we present a Semantic Web-based data warehouse (Corvus) for creating relationships among caGrid models. This is accomplished through the transformation of semantically-annotated caBIGĀ® Unified Modeling Language (UML) information models into Web Ontology Language (OWL) ontologies that preserve those semantics. We demonstrate the validity of the approach by Semantic Extraction, Transformation and Loading (SETL) of data from two caGrid data sources, caTissue and caArray, as well as alignment and query of those sources in Corvus. We argue that semantic integration is necessary for integration of data from distributed web services and that Corvus is a useful way of accomplishing this. Our approach is generalizable and of broad utility to researchers facing similar integration challenges

    Doctor of Philosophy

    Get PDF
    dissertationPublic health surveillance systems are crucial for the timely detection and response to public health threats. Since the terrorist attacks of September 11, 2001, and the release of anthrax in the following month, there has been a heightened interest in public health surveillance. The years immediately following these attacks were met with increased awareness and funding from the federal government which has significantly strengthened the United States surveillance capabilities; however, despite these improvements, there are substantial challenges faced by today's public health surveillance systems. Problems with the current surveillance systems include: a) lack of leveraging unstructured public health data for surveillance purposes; and b) lack of information integration and the ability to leverage resources, applications or other surveillance efforts due to systems being built on a centralized model. This research addresses these problems by focusing on the development and evaluation of new informatics methods to improve the public health surveillance. To address the problems above, we first identified a current public surveillance workflow which is affected by the problems described and has the opportunity for enhancement through current informatics techniques. The 122 Mortality Surveillance for Pneumonia and Influenza was chosen as the primary use case for this dissertation work. The second step involved demonstrating the feasibility of using unstructured public health data, in this case death certificates. For this we created and evaluated a pipeline iv composed of a detection rule and natural language processor, for the coding of death certificates and the identification of pneumonia and influenza cases. The second problem was addressed by presenting the rationale of creating a federated model by leveraging grid technology concepts and tools for the sharing and epidemiological analyses of public health data. As a case study of this approach, a secured virtual organization was created where users are able to access two grid data services, using death certificates from the Utah Department of Health, and two analytical grid services, MetaMap and R. A scientific workflow was created using the published services to replicate the mortality surveillance workflow. To validate these approaches, and provide proofs-of-concepts, a series of real-world scenarios were conducted

    Developing a semantically rich ontology for the biobank-administration domain

    Full text link

    A semantic web framework to integrate cancer omics data with biological knowledge

    Get PDF
    BACKGROUND: The RDF triple provides a simple linguistic means of describing limitless types of information. Triples can be flexibly combined into a unified data source we call a semantic model. Semantic models open new possibilities for the integration of variegated biological data. We use Semantic Web technology to explicate high throughput clinical data in the context of fundamental biological knowledge. We have extended Corvus, a data warehouse which provides a uniform interface to various forms of Omics data, by providing a SPARQL endpoint. With the querying and reasoning tools made possible by the Semantic Web, we were able to explore quantitative semantic models retrieved from Corvus in the light of systematic biological knowledge. RESULTS: For this paper, we merged semantic models containing genomic, transcriptomic and epigenomic data from melanoma samples with two semantic models of functional data - one containing Gene Ontology (GO) data, the other, regulatory networks constructed from transcription factor binding information. These two semantic models were created in an ad hoc manner but support a common interface for integration with the quantitative semantic models. Such combined semantic models allow us to pose significant translational medicine questions. Here, we study the interplay between a cell's molecular state and its response to anti-cancer therapy by exploring the resistance of cancer cells to Decitabine, a demethylating agent. CONCLUSIONS: We were able to generate a testable hypothesis to explain how Decitabine fights cancer - namely, that it targets apoptosis-related gene promoters predominantly in Decitabine-sensitive cell lines, thus conveying its cytotoxic effect by activating the apoptosis pathway. Our research provides a framework whereby similar hypotheses can be developed easily

    IGRhCellID: integrated genomic resources of human cell lines for identification

    Get PDF
    Cell line identification is emerging as an essential method for every cell line user in research community to avoid using misidentified cell lines for experiments and publications. IGRhCellID (http://igrcid.ibms.sinica.edu.tw) is designed to integrate eight cell identification methods including seven methods (STR profile, gender, immunotypes, karyotype, isoenzyme profile, TP53 mutation and mutations of cancer genes) available in various public databases and our method of profiling genome alterations of human cell lines. With data validation of 11 small deleted genes in human cancer cell lines, profiles of genomic alterations further allow users to search for human cell lines with deleted gene to serve as indigenous knock-out cell model (such as SMAD4 in gene view), with amplified gene to be the cell models for testing therapeutic efficacy (such as ERBB2 in gene view) and with overlapped aberrant chromosomal loci for revealing common cancer genes (such as 9p21.3 homozygous deletion with co-deleted CDKN2A, CDKN2B and MTAP in chromosome view). IGRhCellID provides not only available methods for cell identification to help eradicating concerns of using misidentified cells but also designated genetic features of human cell lines for experiments

    Datenintegration in biomedizinischen ForschungsverbĆ¼nden auf Basis von serviceorientierten Architekturen

    Get PDF
    In biomedizinischen ForschungsverbĆ¼nden besteht der Bedarf, Forschungsdaten innerhalb des Verbundes und darĆ¼ber hinaus gemeinsam zu nutzen. Hierzu wird zunƤchst ein Anforderungsmodell erstellt, das anschlieƟend konsolidiert und abstrahiert wird. Daraus ergibt sich ein Referenzmodell fĆ¼r Anforderungen, welches anderen ForschungsverbĆ¼nden als Grundlage fĆ¼r die beschleunigte Erstellung eines eigenen SOA-Systems dienen kann. Zum Referenzmodell wird weiterhin eine konkrete Instanz als Anforderungsmodell fĆ¼r den durch die Deutsche Forschungsgemeinschaft (DFG) gefƶrderten gefƶrderten Sonderforschungsbereich/Transregio 77 ā€žLeberkrebsā€“von der molekularen Pathogenese zur zielgerichteten Therapieā€œ beschrieben. Aus dem Anforderungsmodell wird ein IT-Architekturmodell fĆ¼r den Verbund abgeleitet, welches aus Komponentenmodell, Verteilungsmodell und der Sicherheitsarchitektur besteht. Die Architektur wird unter Verwendung des Cancer Biomedical Informatics Grid (caBIG) umgesetzt. Dabei werden die in den Projekten anfallenden Daten in Datendienste umgewandelt und so fĆ¼r den Zugriff in einer SOA bereitgestellt. Durch die Datendienste kann die Anforderung der Projekte, die Kontrolle Ć¼ber die eigenen Daten zu behalten, weitgehend erfĆ¼llt werden: Die Dienste kƶnnen mit individuellen Zugriffsberechtigungen versehen und dezentral betrieben werden, bei Bedarf auch im Verantwortungsbereich der Projekte selbst. Der Zugriff auf das System erfolgt mittels eines Webbrowsers, mit dem sich die Mitarbeiter des Verbundes unter Verwendung einer individuellen Zugangskennung an einem zentralen Portal anmelden. Zum einfachen und sicheren Austausch von Dokumenten innerhalb des Verbundes wird ein Dokumentenmanagementsystem in die SOA eingebunden. Um die Forschungsdaten aus verschiedenen Quellen auch auf semantischer Ebene integrieren zu kƶnnen, werden Metadatensysteme entwickelt. Hierzu wird ein kontrolliertes Vokabular erstellt, das mit der hierfĆ¼r entwickelten Methode aus den von den Projekten verwendeten Terminologien gewonnen wird. Die so gesammelten Begriffe werden mit standardisierten Vokabularien aus dem Unified Medical Language System (UMLS) abgeglichen. HierfĆ¼r wird ein Software-Werkzeug erstellt, das diesen Abgleich unterstĆ¼tzt. Des Weiteren hat sich im Rahmen dieser Arbeit herausgestellt, dass keine Ontologie existiert, um die in der biomedizinischen Forschung hƤufig verwendeten Zelllinien einschlieƟlich ihrer Wachstumsbedingungen umfassend abzubilden. Daher wird mit der Cell Culture Ontology (CCONT) eine neue Ontologie fĆ¼r Zelllinien entwickelt. Dabei wird Wert darauf gelegt, bereits etablierte Ontologien dieses Bereichs soweit wie mƶglich zu integrieren. Somit wird hier eine vollstƤndige IT-Architektur auf der Basis einer SOA zum Austausch und zur Integration von Forschungsdaten innerhalb von ForschungsverbĆ¼nden beschrieben. Das Referenzmodell fĆ¼r Anforderungen, die IT-Architektur und die Metadatenspezifikationen stehen fĆ¼r andere ForschungsverbĆ¼nde und darĆ¼ber hinaus als Grundlagen fĆ¼r eigene Entwicklungen zur VerfĆ¼gung. Gleiches gilt fĆ¼r die entwickelten Software-Werkzeuge zum UMLS-Abgleich von Vokabularen und zur automatisierten Modellerstellung fĆ¼r caBIG-Datendienste

    Doctor of Philosophy

    Get PDF
    dissertationOver 40 years ago, the first computer simulation of a protein was reported: the atomic motions of a 58 amino acid protein were simulated for few picoseconds. With today's supercomputers, simulations of large biomolecular systems with hundreds of thousands of atoms can reach biologically significant timescales. Through dynamics information biomolecular simulations can provide new insights into molecular structure and function to support the development of new drugs or therapies. While the recent advances in high-performance computing hardware and computational methods have enabled scientists to run longer simulations, they also created new challenges for data management. Investigators need to use local and national resources to run these simulations and store their output, which can reach terabytes of data on disk. Because of the wide variety of computational methods and software packages available to the community, no standard data representation has been established to describe the computational protocol and the output of these simulations, preventing data sharing and collaboration. Data exchange is also limited due to the lack of repositories and tools to summarize, index, and search biomolecular simulation datasets. In this dissertation a common data model for biomolecular simulations is proposed to guide the design of future databases and APIs. The data model was then extended to a controlled vocabulary that can be used in the context of the semantic web. Two different approaches to data management are also proposed. The iBIOMES repository offers a distributed environment where input and output files are indexed via common data elements. The repository includes a dynamic web interface to summarize, visualize, search, and download published data. A simpler tool, iBIOMES Lite, was developed to generate summaries of datasets hosted at remote sites where user privileges and/or IT resources might be limited. These two informatics-based approaches to data management offer new means for the community to keep track of distributed and heterogeneous biomolecular simulation data and create collaborative networks

    Nanoinformatics: a new area of research in nanomedicine

    Get PDF
    pre-printAbstract: Over a decade ago, nanotechnologists began research on applications of nanomaterials for medicine. This research has revealed a wide range of different challenges, as well as many opportunities. Some of these challenges are strongly related to informatics issues, dealing, for instance, with the management and integration of heterogeneous information, defining nomenclatures, taxonomies and classifications for various types of nanomaterials, and research on new modeling and simulation techniques for nanoparticles. Nanoinformatics has recently emerged in the USA and Europe to address these issues. In this paper, we present a review of nanoinformatics, describing its origins, the problems it addresses, areas of interest, and examples of current research initiatives and informatics resources. We suggest that nanoinformatics could accelerate research and development in nanomedicine, as has occurred in the past in other fields. For instance, biomedical informatics served as a fundamental catalyst for the Human Genome Project, and other genomic and -omics projects, as well as the translational efforts that link resulting molecular-level research to clinical problems and findings
    corecore