15 research outputs found

    A Linked Data Approach to Sharing Workflows and Workflow Results

    No full text
    A bioinformatics analysis pipeline is often highly elaborate, due to the inherent complexity of biological systems and the variety and size of datasets. A digital equivalent of the ‘Materials and Methods’ section in wet laboratory publications would be highly beneficial to bioinformatics, for evaluating evidence and examining data across related experiments, while introducing the potential to find associated resources and integrate them as data and services. We present initial steps towards preserving bioinformatics ‘materials and methods’ by exploiting the workflow paradigm for capturing the design of a data analysis pipeline, and RDF to link the workflow, its component services, run-time provenance, and a personalized biological interpretation of the results. An example shows the reproduction of the unique graph of an analysis procedure, its results, provenance, and personal interpretation of a text mining experiment. It links data from Taverna, myExperiment.org, BioCatalogue.org, and ConceptWiki.org. The approach is relatively ‘light-weight’ and unobtrusive to bioinformatics users

    Semantic Web for data harmonization in Chinese medicine

    Get PDF
    Scientific studies to investigate Chinese medicine with Western medicine have been generating a large amount of data to be shared preferably under a global data standard. This article provides an overview of Semantic Web and identifies some representative Semantic Web applications in Chinese medicine. Semantic Web is proposed as a standard for representing Chinese medicine data and facilitating their integration with Western medicine data

    FlyTED: the Drosophila Testis Gene Expression Database

    Get PDF
    FlyTED, the Drosophila Testis Gene Expression Database, is a biological research database for gene expression images from the testis of the fruit fly Drosophila melanogaster. It currently contains 2762 mRNA in situ hybridization images and ancillary metadata revealing the patterns of gene expression of 817 Drosophila genes in testes of wild type flies and of seven meiotic arrest mutant strains in which spermatogenesis is defective. This database has been built by adapting a widely used digital library repository software system, EPrints (http://eprints.org/software/), and provides both web-based search and browse interfaces, and programmatic access via an SQL dump, OAI-PMH and SPARQL. FlyTED is available at http://www.fly-ted.org/

    Literature classification for semi-automated updating of biological knowledgebases

    Get PDF
    BACKGROUND: As the output of biological assays increase in resolution and volume, the body of specialized biological data, such as functional annotations of gene and protein sequences, enables extraction of higher-level knowledge needed for practical application in bioinformatics. Whereas common types of biological data, such as sequence data, are extensively stored in biological databases, functional annotations, such as immunological epitopes, are found primarily in semi-structured formats or free text embedded in primary scientific literature. RESULTS: We defined and applied a machine learning approach for literature classification to support updating of TANTIGEN, a knowledgebase of tumor T-cell antigens. Abstracts from PubMed were downloaded and classified as either "relevant" or "irrelevant" for database update. Training and five-fold cross-validation of a k-NN classifier on 310 abstracts yielded classification accuracy of 0.95, thus showing significant value in support of data extraction from the literature. CONCLUSION: We here propose a conceptual framework for semi-automated extraction of epitope data embedded in scientific literature using principles from text mining and machine learning. The addition of such data will aid in the transition of biological databases to knowledgebases

    Open Data, Grey Data, and Stewardship: Universities at the Privacy Frontier

    Full text link
    As universities recognize the inherent value in the data they collect and hold, they encounter unforeseen challenges in stewarding those data in ways that balance accountability, transparency, and protection of privacy, academic freedom, and intellectual property. Two parallel developments in academic data collection are converging: (1) open access requirements, whereby researchers must provide access to their data as a condition of obtaining grant funding or publishing results in journals; and (2) the vast accumulation of 'grey data' about individuals in their daily activities of research, teaching, learning, services, and administration. The boundaries between research and grey data are blurring, making it more difficult to assess the risks and responsibilities associated with any data collection. Many sets of data, both research and grey, fall outside privacy regulations such as HIPAA, FERPA, and PII. Universities are exploiting these data for research, learning analytics, faculty evaluation, strategic decisions, and other sensitive matters. Commercial entities are besieging universities with requests for access to data or for partnerships to mine them. The privacy frontier facing research universities spans open access practices, uses and misuses of data, public records requests, cyber risk, and curating data for privacy protection. This paper explores the competing values inherent in data stewardship and makes recommendations for practice, drawing on the pioneering work of the University of California in privacy and information security, data governance, and cyber risk.Comment: Final published version, Sept 30, 201

    Linking Justifications in the Collaborative Semantic Web Applications

    Get PDF
    International audienceCollaborative Semantic Web applications produce ever changing interlinked Semantic Web data. Applications that utilize these data to obtain their results should provide explanations about how the results are obtained in order to ensure the effectiveness and increase the user acceptance of these applications. Justifications providing meta information about why a conclusion has been reached enable generation of such explanations. We present an encoding approach for justifications in a distributed environment focusing on the collaborative platforms. We discuss the usefulness of linking justifications across the Web. We introduce a vocabulary for encoding justifications in a distributed environment and provide examples of our encoding approach

    Security Aspects in Web of Data Based on Trust Principles. A brief of Literature Review

    Get PDF
    Within scientific community, there is a certain consensus to define "Big Data" as a global set, through a complex integration that embraces several dimensions from using of research data, Open Data, Linked Data, Social Network Data, etc. These data are scattered in different sources, which suppose a mix that respond to diverse philosophies, great diversity of structures, different denominations, etc. Its management faces great technological and methodological challenges: The discovery and selection of data, its extraction and final processing, preservation, visualization, access possibility, greater or lesser structuring, between other aspects, that allow showing a huge domain of study at the level of analysis and implementation in different knowledge domains. However, given the data availability and its possible opening: What problems do the data opening face? This paper shows a literature review about these security aspects

    Security Aspects in Web of Data Based on Trust Principles. A brief of Literature Review

    Get PDF
    Within scientific community, there is a certain consensus to define "Big Data" as a global set, through a complex integration that embraces several dimensions from using of research data, Open Data, Linked Data, Social Network Data, etc. These data are scattered in different sources, which suppose a mix that respond to diverse philosophies, great diversity of structures, different denominations, etc. Its management faces great technological and methodological challenges: The discovery and selection of data, its extraction and final processing, preservation, visualization, access possibility, greater or lesser structuring, between other aspects, which allow showing a huge domain of study at the level of analysis and implementation in different knowledge domains. However, given the data availability and its possible opening: What problems do the data opening face? This paper shows a literature review about these security aspects
    corecore