11,448 research outputs found

    Knowledge Organization Systems (KOS) in the Semantic Web: A Multi-Dimensional Review

    Full text link
    Since the Simple Knowledge Organization System (SKOS) specification and its SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009 a significant number of conventional knowledge organization systems (KOS) (including thesauri, classification schemes, name authorities, and lists of codes and terms, produced before the arrival of the ontology-wave) have made their journeys to join the Semantic Web mainstream. This paper uses "LOD KOS" as an umbrella term to refer to all of the value vocabularies and lightweight ontologies within the Semantic Web framework. The paper provides an overview of what the LOD KOS movement has brought to various communities and users. These are not limited to the colonies of the value vocabulary constructors and providers, nor the catalogers and indexers who have a long history of applying the vocabularies to their products. The LOD dataset producers and LOD service providers, the information architects and interface designers, and researchers in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper examines a set of the collected cases (experimental or in real applications) and aims to find the usages of LOD KOS in order to share the practices and ideas among communities and users. Through the viewpoints of a number of different user groups, the functions of LOD KOS are examined from multiple dimensions. This paper focuses on the LOD dataset producers, vocabulary producers, and researchers (as end-users of KOS).Comment: 31 pages, 12 figures, accepted paper in International Journal on Digital Librarie

    Historical Grassland Turboveg Database Project. 2067 Relevés recorded by Dr Austin O’ Sullivan 1962 – 1982

    Get PDF
    User Guide and CD of Database are availableEnd of project reportThe more common grassland types occupy about 70% of the Irish landscape (O’Sullivan, 1982), but information on these vegetation types is rare. Generally, Irish grasslands are distinguished based on the intensity of their management (improved or semi-natural grasslands), and the drainage conditions and acidity of the soil (dry or wet, calcareous or acidic grassland types) (Fossitt, 2000). However, little is known about their floristic composition and the changes in floristic composition over time. The current knowledge on grassland vegetation is mostly based on a survey of Irish grasslands by Dr. Austin O’Sullivan completed in the 1960’s and 1970’s (O’Sullivan, 1982). In this survey O’Sullivan identified Irish grassland types in accordance with the classification of continental European grasslands based on the principles of the School of Phytosociology. O’Sullivan distinguished five main grassland types introducing agricultural criteria as well as floristic criteria into grassland classification (O’Sullivan, 1982). In 1978, O’Sullivan made an attempt at mapping Ireland’s vegetation types including the five grassland types distinguished in his later publication as well as two types of peatland vegetation (Figures 1 and 2). This map was completed using 1960’s soils maps (National Soil Survey, Teagasc, Johnstown Castle) and a subsample of the dataset on the composition of Irish grasslands. Phytosociological classification of vegetation is based on the full floristic composition of the vegetation as determined by assessing the abundance and spatial structure of the plant species in a given area. The actual area of the survey (or relevé) is determined according to strict criteria, which include how representative the sample area is for the wider vegetation (i.e. how many of the species found in the wider area are also present in the survey area).National Parks and Wildlife Service of the Department of the Environment, Heritage and Local Government, Dublin, Ireland

    Random Forest as a tumour genetic marker extractor

    Get PDF
    Identifying tumour genetic markers is an essential task for biomedicine. In this thesis, we analyse a dataset of chromosomal rearrangements of cancer samples and present a methodology for extracting genetic markers from this dataset by using a Random Forest as a feature selection tool

    Improving the value of public RNA-seq expression data by phenotype prediction.

    Get PDF
    Publicly available genomic data are a valuable resource for studying normal human variation and disease, but these data are often not well labeled or annotated. The lack of phenotype information for public genomic data severely limits their utility for addressing targeted biological questions. We develop an in silico phenotyping approach for predicting critical missing annotation directly from genomic measurements using well-annotated genomic and phenotypic data produced by consortia like TCGA and GTEx as training data. We apply in silico phenotyping to a set of 70 000 RNA-seq samples we recently processed on a common pipeline as part of the recount2 project. We use gene expression data to build and evaluate predictors for both biological phenotypes (sex, tissue, sample source) and experimental conditions (sequencing strategy). We demonstrate how these predictions can be used to study cross-sample properties of public genomic data, select genomic projects with specific characteristics, and perform downstream analyses using predicted phenotypes. The methods to perform phenotype prediction are available in the phenopredict R package and the predictions for recount2 are available from the recount R package. With data and phenotype information available for 70,000 human samples, expression data is available for use on a scale that was not previously feasible

    A Linked Data Approach to Sharing Workflows and Workflow Results

    No full text
    A bioinformatics analysis pipeline is often highly elaborate, due to the inherent complexity of biological systems and the variety and size of datasets. A digital equivalent of the ‘Materials and Methods’ section in wet laboratory publications would be highly beneficial to bioinformatics, for evaluating evidence and examining data across related experiments, while introducing the potential to find associated resources and integrate them as data and services. We present initial steps towards preserving bioinformatics ‘materials and methods’ by exploiting the workflow paradigm for capturing the design of a data analysis pipeline, and RDF to link the workflow, its component services, run-time provenance, and a personalized biological interpretation of the results. An example shows the reproduction of the unique graph of an analysis procedure, its results, provenance, and personal interpretation of a text mining experiment. It links data from Taverna, myExperiment.org, BioCatalogue.org, and ConceptWiki.org. The approach is relatively ‘light-weight’ and unobtrusive to bioinformatics users

    A golden age for working with public proteomics data

    Get PDF
    Data sharing in mass spectrometry (MS)-based proteomics is becoming a common scientific practice, as is now common in the case of other, more mature 'omics' disciplines like genomics and transcriptomics. We want to highlight that this situation, unprecedented in the field, opens a plethora of opportunities for data scientists. First, we explain in some detail some of the work already achieved, such as systematic reanalysis efforts. We also explain existing applications of public proteomics data, such as proteogenomics and the creation of spectral libraries and spectral archives. Finally, we discuss the main existing challenges and mention the first attempts to combine public proteomics data with other types of omics data sets
    • …
    corecore