18,843 research outputs found
Extracting discourse elements and annotating scientific documents using the SciAnnotDoc model: a use case in gender documents
When scientists are searching for informa- tion, they generally have a precise objective in mind. Instead of looking for documents âabout a topic Tâ, they try to answer specific questions such as finding the definition of a concept, finding results for a particular problem, checking whether an idea has already been tested, or comparing the scientific conclusions of two articles. Answering these precise or complex queries on a corpus of scientific documents requires precise mod- elling of the full content of the documents. In particu- lar, each document element must be characterised by its discourse type (hypothesis, definition, result, method, etc.). In this paper we present a scientific document model (SciAnnotDoc ontology), developed from an em- pirical study conducted with scientists, that models the discourse types. We developed an automated process that analyse documents effectively identifying the dis- course types of each element. Using syntactic rules (pat- terns), we evaluated the process output in terms of pre- cision and recall using a previously annotated corpus in Gender Studies. We chose to annotate documents in Humanities, as these documents are well known to be less formalised than those in âhard scienceâ. The process output has been used to create a SciAnnotDoc representation of the corpus on top of which we built a faceted search interface. Experiments with users show that searches using with this interface clearly outper- form standard keyword searches for precise or complex queries
Establishing a distributed system for the simple representation and integration of diverse scientific assertions
<p>Abstract</p> <p>Background</p> <p>Information technology has the potential to increase the pace of scientific progress by helping researchers in formulating, publishing and finding information. There are numerous projects that employ ontologies and Semantic Web technologies towards this goal. However, the number of applications that have found widespread use among biomedical researchers is still surprisingly small. In this paper we present the aTag (âassociative tagsâ) convention, which aims to drastically lower the entry barriers to the biomedical Semantic Web. aTags are short snippets of HTML+RDFa with embedded RDF/OWL based on the Semantically Interlinked Online Communities (SIOC) vocabulary and domain ontologies and taxonomies, such as the Open Biomedical Ontologies and DBpedia. The structure of aTags is very simple: a short piece of human-readable text that is âtaggedâ with relevant ontological entities. This paper describes our efforts for seeding the creation of a viable ecosystem of datasets, tools and services around aTags.</p> <p>Results</p> <p>Numerous biomedical datasets in aTag format and systems for the creation of aTags have been set-up and are described in this paper. Prototypes of some of these systems are accessible at <url>http://hcls.deri.org/atag</url></p> <p>Conclusions</p> <p>The aTags convention enables the rapid development of diverse, integrated datasets and semantically interoperable applications. More work needs to be done to study the practicability of this approach in different use-case scenarios, and to encourage uptake of the convention by other groups.</p
Interoperability and FAIRness through a novel combination of Web technologies
Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs
Economies of space and the school geography curriculum
This paper is about the images of economic space that are found in school curricula. It suggests the importance for educators of evaluating these representations in terms of the messages they contain about how social processes operate. The paper uses school geography texts in Britain since the 1970s to illustrate the different ways in which economic space has been represented to students, before exploring some alternative resources that could be used to provide a wider range of representations of economic space. The paper highlights the continued importance of understanding the politics of school knowledge
Recommended from our members
The classification of gene products in the molecular biology domain: Realism, objectivity, and the limitations of the Gene Ontology
Background: Controlled vocabularies in the molecular biology domain exist to facilitate data integration across database resources. One such tool is the Gene Ontology (GO), a classification designed to act as a universal index for gene products from any species. The Gene Ontology is used extensively in annotating gene products and analysing gene expression data, yet very little research exists from a library and information science perspective exploring the design principles, philosophy and social role of ontologies in biology.
Aim: To explore how molecular biologists, in creating the Gene Ontology, devised guidelines and rules for determining which scientific concepts are included in the ontology, and the criteria for how these concepts are represented.
Methods: A domain analysis approach was used to devise a mixed methodology to study the design of the Gene Ontology. Concept analysis of a GO term and a critical discourse analysis of GO developer mailing list texts were used to test whether ontological realism is a tenable basis for constructing objective ontologies. A comparison of the current GO vocabulary construction guidelines and a study of the reasons why GO terms are removed from the ontology further explored the justifications for the design of the Gene Ontology. Finally, a content analysis of published GO papers examined how authors use and cite GO data and terminology.
Results: Gene Ontology terms can be presented according to different epistemologies for concepts, indicating that ontological realism is not the only way objective ontologies can be designed. Social roles and the exercise of power were found to play an important role in determining ontology content, and poor synonym control, a lack of clear warrant for deciding terminology and arbitrary decisions to delete and invent new terms undermine the objectivity and universal applicability of the Gene Ontology. Authors exhibited poor compliance with GO data citation policies, and in re-wording and misquoting GO terminology, risk exacerbating the semantic problems this controlled vocabulary was designed to solve.
Conclusions: The failure of the Gene Ontology to define what is meant by a molecular function, the exercise of power by GO developers in clearing contentious concepts from the ontology, and the strict adherence to ontological realism, which marginalises social and subjective ways of classifying scientific concepts, limits the utility of the ontology as a tool to unify the molecular biology domain. These limitations to the Gene Ontology design could be overcome with the development of lighter, pluralistic, user-controlled âopen ontologiesâ for gene products that can work alongside more traditional, âtop-downâ developed vocabularies
- âŚ