9 research outputs found

    Open Science ETDs and Institutional Repositories: Making Research Data FAIRer

    Get PDF
    Graduate students, as potential future full-time researchers, are a population that should show proficiency in data sharing. Though there are many resources that teach data sharing best practices for students, it is difficult to tell how well students do when sharing their data. We compared the FAIRness of non-traditional research output metadata associated with theses and dissertations for records shared in a generalist repository by individual students, and records shared through an institutional repository using the same repository platform. Those shared in an institutional repository were significantly FAIRer, as measured by metadata richness and interoperability, and had higher views per month. The only measure where records shared by students exceed institutional records is listing funding sources. We also examine how multiple related research outputs are grouped and offer suggestions to improve interoperability. We conclude that our sample population of graduate students sharing research outputs are not yet proficient in applying the FAIR principles. The review process and oversight that are often part of institutional repositories can offer a measurable benefit to non-traditional ETD outputs

    Navigating in vitro bioactivity data by investigating available resources using model compounds.

    Get PDF
    The number of chemical compounds and associated experimental data in public databases is growing, but presently there is no simple way to access these data in a quick and synoptic manner. Instead, data are fragmented across different resources and interested parties need to invest invaluable time and effort to navigate these systems

    Metadados para o uso de ferramentas de gestão de dados de investigação com investigadores do I3S

    Get PDF
    Nas últimas décadas, a produção de dados de investigação tem vindo a crescer muito, principalmente devido ao desenvolvimento tecnológico que transformou todo o fluxo de trabalho dos investigadores. Esta situação cria desafios relativos às atividades de gestão dos dados de investigação, sobretudo ao nível da análise, armazenamento, preservação e partilha desses mesmos dados. A gestão de dados de investigação é essencial para a prática científica e existem bastantes intervenientes nas diferentes etapas deste processo investigadores, agências de financiamento, universidades, curadores que se preocupam com o valor dos dados produzidos. Torna-se também importante apoiar os investigadores com ferramentas que simplifiquem o trabalho necessário na gestão dos seus dados de investigação.As ferramentas eletrónicas de gestão de dados de investigação são ferramentas importantes já que permitem aos investigadores cumprir os requisitos e criar uma ponte entre as diferentes etapas do fluxo da gestão de dados de investigação.A adoção de uma ferramenta de gestão de dados de investigação pode também contribuir para auxiliar a controlar o ciclo de vida dos dados já que é possível armazenar os dados e associar-lhes metadados de modo a torná-los FAIR Findable, Accessible, Interoperable, Reusable. Além disso, a sua integração com repositórios de dados de investigação é também essencial na medida da indexação, preservação e disponibilização dos dados à comunidade científica.Com o objetivo de apoiar os investigadores nas tarefas de gerir os seus dados de investigação, neste trabalho colabora-se com um grupo de investigadores do Instituto de Investigação e Inovação em Saúde (I3S) de modo a testar a plataforma Dendro, ferramenta desenvolvida na FEUP e INESC TEC, assim como para validar um modelo de metadados desenvolvido específico para os domínios dos investigadores.Os resultados obtidos a partir do feedback dos investigadores demonstram que o modelo desenvolvido favorece um ponto de entrada fácil na descrição de dados, mas não impede os investigadores de apresentar limitações e identificar os seus requisitos específicos.In the last decades, the research data production has been growing a lot, mainly due to the technological development which has transformed the entire workflow of the researchers. This situation creates challenges regarding research data management activities, especially at the level of analysis, storage, preservation and sharing of these research data. Research data management is essential to scientific practice and there are many stakeholders involved in the different stages of this process researchers, funding agencies, universities, curators who care about the value of the produced data. In the field of research data management, it is also important to support researchers with tools that simplify their work in managing research data.Electronic research management tools are important as they enable researchers to meet the RDM requirements and create a bridge between the different stages in the flow of research data management.Their adoption can help controlling the data life cycle since it is possible to store the data and associate it with metadata in order to make it FAIR - Findable, Accessible, Interoperable, Reusable. In addition, its integration with research data repositories is also essential as it allows the indexing, preservation and availability of data for the scientific community.In order to support the researchers in their research data management tasks, in this this study we collaborate with a group of researchers from the Institute of Research and Innovation in Health (I3S) in order to test and evaluate Dendro platform, developed in FEUP and INESC TEC, as well as to validate a metadata descriptors model developed specifically for the researchers domain.The results obtained from the researchers feedback show that the developed model favours an easy entry point in the data description tasks but does not prevent researchers from presenting its limitations and identifying their specific requirements

    Data from "The variable quality of metadata about biological samples used in biomedical experiments"

    No full text
    <div><div>This fileset provides supporting data and corpora for the empirical study described in:<br></div><div><b><br></b></div><div><b>Rafael S. Gonçalves and Mark A. Musen. The variable quality of metadata about biological samples used in biomedical experiments. Scientific Data, <i>in press</i> (2019).</b></div><div><b><br></b></div><div><br></div><div><b><i><u>Description of files</u></i></b></div></div><div><br></div><div><b>Analysis spreadsheet files:</b></div><div>- <u>ncbi-biosample-metadata-study.xlsx</u> contains data to support the analysis of the quality of metadata in the NCBI BioSample.</div><div>- <u>ebi-biosamples-metadata-study.xlsx</u> contains data to support the analysis of the quality of metadata in the EBI BioSamples.</div><div><br></div><div><b>Validation data files:</b></div><div>- <u>ncbi-biosample-validation-data.tar.gz</u> is an archive containing the validation data for the analysis of the entire NCBI BioSample dataset.</div><div>- <u>ncbi-biosample-packaged-validation-data.tar.gz</u> is an archive containing the validation data for the analysis of the subset of metadata records in the NCBI BioSample that use a BioSample package definition.</div><div>- <u>ebi-ncbi-shared-records-validation-data.tar.gz</u> is an archive containing the validation data for the analysis of the set of metadata records that exist both in EBI BioSamples and NCBI BioSample.</div><div><br></div><b>Corpus files:</b><div>- <u>ebi-biosamples-corpus.xml.gz</u> corresponds to the EBI BioSamples corpus.</div><div>- <u>ncbi-biosample-corpus.xml.gz</u> corresponds to the NCBI BioSample corpus.</div><div>- <u>ncbi-biosample-packaged-records-corpus.tar.gz</u> corresponds to the NCBI BioSample metadata records that declare a package definition.</div><div>- <u>ebi-ncbi-shared-records-corpus.tar.gz</u> corresponds to the corpus of metadata records that exist both in NCBI BioSample and EBI BioSamples.</div

    Big Data For Microorganisms: Computational Approaches Leveraging Large-Scale Microbial Transcriptomic Compendia

    Get PDF
    Genome-wide transcriptomics data captures the molecular state of microorganisms – the expression patterns of genes in response to some condition or stimuli. With advancements in high-throughput sequencing technologies, there are thousands of microbial transcription profiles publicly available. Consequently, this data has been collected and integrated to form transcriptomic compendia, which are collections of diverse gene expression experiments. These compendia were found to be a valuable resource for studying systems level biology and hypothesis generation. We describe the construction, benefits and challenges in creating microbial transcriptomic compendia in Chapter 1. One challenge for compendia, which integrates across many different experiments, is batch effects, which are technical sources of variability that can disrupt the detection of underlying biological signals of interest. In Chapter 2, we use a generative neural network to simulate gene expression compendia with varying amounts of technical variability and assess the ability to detect the underlying biological structure in the data after noise was added and then after batch correction was applied. We define a set of principles for how batch correction should be used in the context of these large-scale compendia. In Chapter 3 and 4 we introduce computational approaches to use compendia to improve the analysis of individual experiments and analysis of genomic patterns respectively. In Chapter 3, we develop a portable framework to distinguish between common and context specific transcriptional signals using a compendium to autogenerate a null set of expression changes. This approach allows researchers to put gene expression changes from their individual experiment of interest into the context of existing compendia of experiments. In Chapter 4 we develop an approach to examine the effect of different Pseudomonas aeruginosa genomes, using two dominant strain types, on transcriptional profiles in order to understand how traits manifest. This genome-wide approach reveals a more complete picture of how different genomes affect expression, which mediates different traits present. Overall, these compendia provide a valuable resource that computational tools can leverage to extract patterns and inform research directions
    corecore