36 research outputs found

    The Librarian & the Big Data: Bridging the Gap

    Get PDF
    Arcot Rajasekar, PhD, is Professor in the School of Information and Science, Chief Scientist for the Renaissance Computing Institute, and Co-Director of the Data Intensive Cyber Environments Center, all at the University of North Carolina at Chapel Hill. He spoke on how the library and information science community can meet the challenges of the scientific data explosion

    Semantics of Horn and disjunctive logic programs

    Get PDF
    AbstractVan Emden and Kowalski proposed a fixpoint semantics based on model-theory and an operational semantics based on proof-theory for Horn logic programs. They prove the equivalence of these semantics using fixpoint techniques. The main goal of this paper is to present a unified theory for the semantics of Horn and disjunctive logic programs. For this, we extend the fixpoint semantics and the operational or procedural semantics to the class of disjunctive logic programs and prove their equivalence using techniques similar to the ones used for Horn programs

    Server‐side workflow execution using data grid technology for reproducible analyses of data‐intensive hydrologic systems

    Get PDF
    Many geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing and postprocessing routines can be challenging for a number of reasons including (1) accessing and preprocessing the large volume and variety of data required by the model, (2) postprocessing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object functionality of the Integrated Rule‐Oriented Data System and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community‐driven cyberinfrastructure systems. The approach is demonstrated for a study investigating the impact of drought on populations in the Carolinas region of the United States. The analysis leverages computational modeling along with data from the Terra Populus project and data management and publication services provided by the Sustainable Environment‐Actionable Data project. The work is part of a larger effort under the DataNet Federation Consortium project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.Plain Language SummaryExecuting computational workflows in the geosciences can be challenging, especially when dealing with large, distributed, and heterogeneous data sets and computational tools. We present a methodology for addressing this challenge using the Integrated Rule‐Oriented Data System (iRODS) Workflow Structured Object (WSO). We demonstrate the approach through an end‐to‐end application of data access, processing, and publication of digital assets for a scientific study analyzing drought in the Carolinas region of the United States.Key PointsReproducibility of data‐intensive analyses remains a significant challengeData grids are useful for reproducibility of workflows requiring large, distributed data setsData and computations should be co‐located on servers to create executable Web‐resourcesPeer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/137520/1/ess271_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/137520/2/ess271.pd

    NeuroBridge ontology: computable provenance metadata to give the long tail of neuroimaging data a FAIR chance for secondary use

    Get PDF
    Background Despite the efforts of the neuroscience community, there are many published neuroimaging studies with data that are still not findable or accessible. Users face significant challenges in reusing neuroimaging data due to the lack of provenance metadata, such as experimental protocols, study instruments, and details about the study participants, which is also required for interoperability. To implement the FAIR guidelines for neuroimaging data, we have developed an iterative ontology engineering process and used it to create the NeuroBridge ontology. The NeuroBridge ontology is a computable model of provenance terms to implement FAIR principles and together with an international effort to annotate full text articles with ontology terms, the ontology enables users to locate relevant neuroimaging datasets. Methods Building on our previous work in metadata modeling, and in concert with an initial annotation of a representative corpus, we modeled diagnosis terms (e.g., schizophrenia, alcohol usage disorder), magnetic resonance imaging (MRI) scan types (T1-weighted, task-based, etc.), clinical symptom assessments (PANSS, AUDIT), and a variety of other assessments. We used the feedback of the annotation team to identify missing metadata terms, which were added to the NeuroBridge ontology, and we restructured the ontology to support both the final annotation of the corpus of neuroimaging articles by a second, independent set of annotators, as well as the functionalities of the NeuroBridge search portal for neuroimaging datasets. Results The NeuroBridge ontology consists of 660 classes with 49 properties with 3,200 axioms. The ontology includes mappings to existing ontologies, enabling the NeuroBridge ontology to be interoperable with other domain specific terminological systems. Using the ontology, we annotated 186 neuroimaging full-text articles describing the participant types, scanning, clinical and cognitive assessments. ConclusionThe NeuroBridge ontology is the first computable metadata model that represents the types of data available in recent neuroimaging studies in schizophrenia and substance use disorders research; it can be extended to include more granular terms as needed. This metadata ontology is expected to form the computational foundation to help both investigators to make their data FAIR compliant and support users to conduct reproducible neuroimaging research

    NeuroBridge: a prototype platform for discovery of the long-tail neuroimaging data

    Get PDF
    Introduction Open science initiatives have enabled sharing of large amounts of already collected data. However, significant gaps remain regarding how to find appropriate data, including underutilized data that exist in the long tail of science. We demonstrate the NeuroBridge prototype and its ability to search PubMed Central full-text papers for information relevant to neuroimaging data collected from schizophrenia and addiction studies. Methods The NeuroBridge architecture contained the following components: (1) Extensible ontology for modeling study metadata: subject population, imaging techniques, and relevant behavioral, cognitive, or clinical data. Details are described in the companion paper in this special issue; (2) A natural-language based document processor that leveraged pre-trained deep-learning models on a small-sample document corpus to establish efficient representations for each article as a collection of machine-recognized ontological terms; (3) Integrated search using ontology-driven similarity to query PubMed Central and NeuroQuery, which provides fMRI activation maps along with PubMed source articles. Results The NeuroBridge prototype contains a corpus of 356 papers from 2018 to 2021 describing schizophrenia and addiction neuroimaging studies, of which 186 were annotated with the NeuroBridge ontology. The search portal on the NeuroBridge website https://neurobridges.org/ provides an interactive Query Builder, where the user builds queries by selecting NeuroBridge ontology terms to preserve the ontology tree structure. For each return entry, links to the PubMed abstract as well as to the PMC full-text article, if available, are presented. For each of the returned articles, we provide a list of clinical assessments described in the Section “Methods” of the article. Articles returned from NeuroQuery based on the same search are also presented. Conclusion The NeuroBridge prototype combines ontology-based search with natural-language text-mining approaches to demonstrate that papers relevant to a user’s research question can be identified. The NeuroBridge prototype takes a first step toward identifying potential neuroimaging data described in full-text papers. Toward the overall goal of discovering “enough data of the right kind,” ongoing work includes validating the document processor with a larger corpus, extending the ontology to include detailed imaging data, and extracting information regarding data availability from the returned publications and incorporating XNAT-based neuroimaging databases to enhance data accessibility

    String-oriented databases

    No full text
    Abstract Relational databases and 1: Introduction Relational databases and Datalog view each attribute as indivisible. This view, though useful in several applications, does not provide a powerful database system for applications in genetic sequence querying, iconic image processing, textual processing, etc. Data in these applications are unstructured and do not provide a convenient way to partition them into fields. Current applications in such domains use databases to store such indivisible data as a 'blob' and use either pre-extracted thematic attribute-values to answer queries (using inverted indices) or use postretrieval processing using specialized software programs (eg. vision processing systems) to compute and match features from the blobs. In essense, the database is used primarily as a repository of raw data with no processing allowed on them during retrieval. Several systems such as VIMSYS, OVID, PROBE, PICQUERY (see In this paper,we present an extension to relational algebra (RAS) and Datalog (Datalog(S)) that provides primitive level string operations in the database framework. It thus provides a general purpose system design that can be applied in several domains of interest. In the relational extension a new structure, called 'string expression', is defined that combines the power of string constants, variables, regular expressions, interpreted functions and approximate evaluation. In our approach strings are viewed as database objects that can be compared, divided, subsumed, interpreted and approximated. Allowing such operations on strings enrich the syntax and semantics and increase the expressive power of database languages -apart from a rich data structure it allows approximate string-level reasoning by allowing inexact string matchings and storage of incomplete data in the form of 'partial nulls'. Moreover, with a rule-based framework like Datalog(S)

    Data grid management systems

    No full text

    iRODS Primer: Integrated Rule-Oriented Data System

    No full text
    Policy-based data management enables the creation of community-specific collections. Every collection is created for a purpose. The purpose defines the set of properties that will be associated with the collection. The properties are enforced by management policies that control the execution of procedures that are applied whenever data are ingested or accessed. The procedures generate state information that defines the outcome of enforcing the management policy. The state information can be queried to validate assessment criteria and verify that the required collection properties have been co
    corecore