Search CORE

17 research outputs found

Micropublication: incentivizing community curation and placing unpublished data into the public domain

Author: Harris Todd W.
Raciti Daniela
Schedl Tim
Sternberg Paul W.
Yook Karen
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

Large volumes of data generated by research laboratories coupled with the required effort and cost of curation present a significant barrier to inclusion of these data in authoritative community databases. Further, many publicly funded experimental observations remain invisible to curation simply because they are never published: results often do not fit within the scope of a standard publication; trainee-generated data are forgotten when the experimenter (e.g. student, post-doc) leaves the lab; results are omitted from science narratives due to publication bias where certain results are considered irrelevant for the publication. While authors are in the best position to curate their own data, they face a steep learning curve to ensure that appropriate referential tags, metadata, and ontologies are applied correctly to their observations, a task sometimes considered beyond the scope of their research and other numerous responsibilities. Getting researchers to adopt a new system of data reporting and curation requires a fundamental change in behavior among all members of the research community. To solve these challenges, we have created a novel scholarly communication platform that captures data from researchers and directly delivers them to information resources via Micropublication. This platform incentivizes authors to publish their unpublished observations along with associated metadata by providing a deliberately fast and lightweight but still peer-reviewed process that results in a citable publication. Our long-term goal is to develop a data ecosystem that improves reproducibility and accountability of publicly funded research and in turn accelerates both basic and translational discovery

Digital Commons@Becker

Caltech Authors

Recommended from our members

Translational bioinformatics in mental health: open access data sources and computational biomarker discovery

Author: Bhuvaneshwar Krithika
Fultz Hollis Kate
Gagliardi Jane P
Jia Peilin
Ma Liang
Nagarajan Radhakrishnan
Rakesh Gopalkumar
Rozenblit Leon
Subbian Vignesh
Tenenbaum Jessica D
Visweswaran Shyam
Zhao Zhongming
Publication venue: 'Oxford University Press (OUP)'
Publication date: 21/05/2019
Field of study

Mental illness is increasingly recognized as both a significant cost to society and a significant area of opportunity for biological breakthrough. As -omics and imaging technologies enable researchers to probe molecular and physiological underpinnings of multiple diseases, opportunities arise to explore the biological basis for behavioral health and disease. From individual investigators to large international consortia, researchers have generated rich data sets in the area of mental health, including genomic, transcriptomic, metabolomic, proteomic, clinical and imaging resources. General data repositories such as the Gene Expression Omnibus (GEO) and Database of Genotypes and Phenotypes (dbGaP) and mental health (MH)-specific initiatives, such as the Psychiatric Genomics Consortium, MH Research Network and PsychENCODE represent a wealth of information yet to be gleaned. At the same time, novel approaches to integrate and analyze data sets are enabling important discoveries in the area of mental and behavioral health. This review will discuss and catalog into an organizing framework the increasingly diverse set of MH data resources available, using schizophrenia as a focus area, and will describe novel and integrative approaches to molecular biomarker discovery that make use of mental health data.National Institutes of Health [UL1TR001117, R01LM012095, R01LM012806]Open access articleThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

The University of Arizona

User-centered semantic dataset retrieval

Author: Löffler Felicitas
Publication venue
Publication date: 01/01/2023
Field of study

Finding relevant research data is an increasingly important but time-consuming task in daily research practice. Several studies report on difficulties in dataset search, e.g., scholars retrieve only partial pertinent data, and important information can not be displayed in the user interface. Overcoming these problems has motivated a number of research efforts in computer science, such as text mining and semantic search. In particular, the emergence of the Semantic Web opens a variety of novel research perspectives. Motivated by these challenges, the overall aim of this work is to analyze the current obstacles in dataset search and to propose and develop a novel semantic dataset search. The studied domain is biodiversity research, a domain that explores the diversity of life, habitats and ecosystems. This thesis has three main contributions: (1) We evaluate the current situation in dataset search in a user study, and we compare a semantic search with a classical keyword search to explore the suitability of semantic web technologies for dataset search. (2) We generate a question corpus and develop an information model to figure out on what scientific topics scholars in biodiversity research are interested in. Moreover, we also analyze the gap between current metadata and scholarly search interests, and we explore whether metadata and user interests match. (3) We propose and develop an improved dataset search based on three components: (A) a text mining pipeline, enriching metadata and queries with semantic categories and URIs, (B) a retrieval component with a semantic index over categories and URIs and (C) a user interface that enables a search within categories and a search including further hierarchical relations. Following user centered design principles, we ensure user involvement in various user studies during the development process

Digitale Bibliothek Thüringen

Dataset Search in Biodiversity Research: Do Metadata in Data Repositories Reflect Scholarly Information Needs?

Author: Klan Friederike
Ko¨nig-Ries Brigitta
Lo¨ffler Felicitas
Wesp Valentin
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 24/03/2021
Field of study

Abstract The increasing amount of publicly available research data provides the opportunity to link and integrate data in order to create and prove novel hypotheses, to repeat experiments or to compare recent data to data collected at a different time or place. However, recent studies have shown that retrieving relevant data for data reuse is a time-consuming task in daily research practice. In this study, we explore what hampers dataset retrieval in biodiversity research, a field that produces a large amount of heterogeneous data. In particular, we focus on scholarly search interests and metadata, the primary source of data in a dataset retrieval system. We show that existing metadata currently poorly reflect information needs and therefore are the biggest obstacle in retrieving relevant data. Our findings indicate that for data seekers in the biodiversity domain environments, materials and chemicals, species, biological and chemical processes, locations, data parameters and data types are important information categories. These interests are well covered in metadata elements of domain-specific standards. However, instead of utilizing these standards, large data repositories tend to use metadata standards with domain-independent metadata fields that cover search interests only to some extent. A second problem are arbitrary keywords utilized in descriptive fields such as title, description or subject. Keywords support scholars in a full text search only if the provided terms syntactically match or their semantic relationship to terms used in a user query is known

DigitalCommons@University of Nebraska

A recommender system for scientific datasets and analysis pipelines

Author: Mazaheri Mandana
Publication venue
Publication date: 01/10/2021
Field of study

Scientific datasets and analysis pipelines are increasingly being shared publicly in the interest of open science. However, mechanisms are lacking to reliably identify which pipelines and datasets can appropriately be used together. Given the increasing number of high-quality public datasets and pipelines, this lack of clear compatibility threatens the findability and reusability of these resources. We investigate the feasibility of a collaborative filtering system to recommend pipelines and datasets based on provenance records from previous executions. We evaluate our system using datasets and pipelines extracted from the Canadian Open Neuroscience Platform, a national initiative for open neuroscience. The recommendations provided by our system (AUC

=0.83

) are significantly better than chance and outperform recommendations made by domain experts using their previous knowledge as well as pipeline and dataset descriptions (AUC

=0.63

). In particular, domain experts often neglect low-level technical aspects of a pipeline-dataset interaction, such as the level of pre-processing, which are captured by a provenance-based system. We conclude that provenance-based pipeline and dataset recommenders are feasible and beneficial to the sharing and usage of open-science resources. Future work will focus on the collection of more comprehensive provenance traces, and on deploying the system in production

Concordia University Research Repository

Dataset Search: A lightweight, community-built tool to support research data discovery

Author: Clark Jason A.
Espeland James
Hagerman Kyle
Mannheimer Sara
Schultz Jakob
Publication venue: eScholarship@UMassChan
Publication date: 19/01/2021
Field of study

Objective: Promoting discovery of research data helps archived data realize its potential to advance knowledge. Montana State University (MSU) Dataset Search aims to support discovery and reporting for research datasets created by researchers at institutions. Methods and Results: The Dataset Search application consists of five core features: a streamlined browse and search interface, a data model based on dataset discovery, a harvesting process for finding and vetting datasets stored in external repositories, an administrative interface for managing the creation, ingest, and maintenance of dataset records, and a dataset visualization interface to demonstrate how data is produced and used by MSU researchers. Conclusion: The Dataset Search application is designed to be easily customized and implemented by other institutions. Indexes like Dataset Search can improve search and discovery for content archived in data repositories, therefore amplifying the impact and benefits of archived data

eScholarship@UMMS

Understanding Data Search as a Socio-technical Practice

Author: Cousijn Helena
Gregory Kathleen
Groth Paul
Scharnhorst Andrea
Wyatt Sally
Publication venue: 'SAGE Publications'
Publication date: 18/02/2019
Field of study

Open research data are heralded as having the potential to increase effectiveness, productivity, and reproducibility in science, but little is known about the actual practices involved in data search. The socio-technical problem of locating data for reuse is often reduced to the technological dimension of designing data search systems. We combine a bibliometric study of the current academic discourse around data search with interviews with data seekers. In this article, we explore how adopting a contextual, socio-technical perspective can help to understand user practices and behavior and ultimately help to improve the design of data discovery systems.Comment: 19 pages, 3 figures, 7 table

arXiv.org e-Print Archive

Maastricht University Research Portal

International Migration, Integration and Social Cohesion online publications

UvA-DARE

The CAMH Neuroinformatics Platform: A Hospital-Focused Brain-CODE Implementation

Author: Andy Wang
Anthony L. Vaccarino
Anthony L. Vaccarino
Brendan Behan
Damian Jankowicz
David J. Rotenberg
Fan Dong
Fan Dong
Jordan Mikkelsen
Kenneth R. Evans
Marcia Hon
Marcos Sanches
Marcos Sanches
Mojib Javadi
Moyez Dharsee
Natalia Potapova
Nathan Frias
Nikola Bogetic
Qing Chang
Rachad El-Badrawi
Shuai Laing
Shuai Laing
Stephen C. Strother
Stephen C. Strother
Stephen R. Arnott
Stephen R. Arnott
Susan G. Evans
Tom Gee
Tom Gee
Tommy Liu
Publication venue: 'Frontiers Media SA'
Publication date: 01/11/2018
Field of study

Investigations of mental illness have been enriched by the advent and maturation of neuroimaging technologies and the rapid pace and increased affordability of molecular sequencing techniques, however, the increased volume, variety and velocity of research data, presents a considerable technical and analytic challenge to curate, federate and interpret. Aggregation of high-dimensional datasets across brain disorders can increase sample sizes and may help identify underlying causes of brain dysfunction, however, additional barriers exist for effective data harmonization and integration for their combined use in research. To help realize the potential of multi-modal data integration for the study of mental illness, the Centre for Addiction and Mental Health (CAMH) constructed a centralized data capture, visualization and analytics environment—the CAMH Neuroinformatics Platform—based on the Ontario Brain Institute (OBI) Brain-CODE architecture, towards the curation of a standardized, consolidated psychiatric hospital-wide research dataset, directly coupled to high performance computing resources

Directory of Open Access Journals

brainlife.io: A decentralized and open source cloud platform to support neuroscience research

Author: Avesani Paolo
Aydogan D. Baran
Berto Giulia
Bhatia Dheeraj
Bridge Holly
Brown Shaw T.
Bullock Daniel N.
Bussalb Aurore
Caron Bradley
Carson James P.
Chaumon Maximilien
Craddock Cameron
Delogu Franco
Eke Damian
Fabrega Ricardo
Faskowitz Joshua
Fischer Jeremy
Freiwald Winrich
Garyfallidis Eleftherios
George Nathalie
Guaje Javier
Hancock David Y.
Hanekamp Sandra
Hanson Jamie
Hayashi Soichi
Heinsfeld Anibal S.
Henschel Robert
Heyman Stephanie
Hunt David
Iacovella Vittorio
Jolly Jasleen
Kitchell Lindsey
Koudoro Serge
Kurzwaski Jan
Leong Josiah
Levitas Daniel
Marinazzo Daniele
McKee Shawn
McPherson Brent C.
Mejia Amanda
Mikellidou Koulla
Niso J. Guiomar
Olivetti Emanuele
Pestilli Franco
Pisner Derek
Poldrack Russell A.
Port Nicholas
Puce Aina
Rorden Christopher
Sani Ilaria
Schnyer David
Silva Filipi N.
Stanzione Daniel
Stewart Craig A.
Veraart Jelle
Victory Conner
Vinci-Booher Sophia
Willis Hanna
Yeh Frank C.
Zuidema Taylor
Publication venue
Publication date: 03/06/2023
Field of study

Neuroscience research has expanded dramatically over the past 30 years by advancing standardization and tool development to support rigor and transparency. Consequently, the complexity of the data pipeline has also increased, hindering access to FAIR data analysis to portions of the worldwide research community. brainlife.io was developed to reduce these burdens and democratize modern neuroscience research across institutions and career levels. Using community software and hardware infrastructure, the platform provides open-source data standardization, management, visualization, and processing and simplifies the data pipeline. brainlife.io automatically tracks the provenance history of thousands of data objects, supporting simplicity, efficiency, and transparency in neuroscience research. Here brainlife.io's technology and data services are described and evaluated for validity, reliability, reproducibility, replicability, and scientific utility. Using data from 4 modalities and 3,200 participants, we demonstrate that brainlife.io's services produce outputs that adhere to best practices in modern neuroscience research

arXiv.org e-Print Archive