431,624 research outputs found

    Quantifying the consistency of scientific databases

    Full text link
    Science is a social process with far-reaching impact on our modern society. In the recent years, for the first time we are able to scientifically study the science itself. This is enabled by massive amounts of data on scientific publications that is increasingly becoming available. The data is contained in several databases such as Web of Science or PubMed, maintained by various public and private entities. Unfortunately, these databases are not always consistent, which considerably hinders this study. Relying on the powerful framework of complex networks, we conduct a systematic analysis of the consistency among six major scientific databases. We found that identifying a single "best" database is far from easy. Nevertheless, our results indicate appreciable differences in mutual consistency of different databases, which we interpret as recipes for future bibliometric studies.Comment: 20 pages, 5 figures, 4 table

    The Effects of Electronic Access to Scientific Literature in the Consortium of Turkish University Libraries

    Get PDF
    Purpose: To provide some insight to the sharp increase in the scientific publications originating from Turkish academic and research institutions in the last few years. The underlying reasons, widespread access to literature through electronic databases being the most important, are also investigated. Design/methodology/approach: Although it is difficult to gauge national scientific productivity, number of publications in electronic databases which index thousands of scientific journals can give an idea. Web of Science is one of them, and it is provided to the Turkish academic community along with several other databases by the national library consortium. Based on the Web of Science data, a comparative analysis was performed to investigate publications originated from Turkey and other countries. Findings: The analysis revealed sharp increase in publications from Turkish institutions in the last few years. Considering the highest publishing 30 countries out of 190, the increase between 2001 and 2003 is 53.48 percent for Turkey, followed by 34.00 percent for China and 26.87 percent for South Korea. Research limitations: Although one of the largest, only one of several databases was analyzed. Additionally, there are also several other indicators of scientific productivity such as books published and citations received. Originality and value of the paper: The paper provides some insight to the importance of library consortia and efficient literature access it provides to the researchers

    A Data Transformation System for Biological Data Sources

    Get PDF
    Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data

    Computer-supported analysis of scientific measurements

    Get PDF
    In the past decade, large-scale databases and knowledge bases have become available to researchers working in a range of scientific disciplines. In many cases these databases and knowledge bases contain measurements of properties of physical objects which have been obtained in experiments or at observation sites. As examples, one can think of crystallographic databases with molecular structures and property databases in materials science. These large collections of measurements, which will be called measurement bases, form interesting resources for scientific research. By analyzing the contents of a measurement base, one may be able to find patterns that are of practical and theoretical importance. With the use of measurement bases as a resource for scientific inquiry questions arise about the quality of the data being analyzed. In particular, the occurrence of conflicts and systematic errors raises doubts about the reliability of a measurement base and compromises any patterns found in it. On the other hand, conflicts and systematic errors may be interesting patterns in themselves and warrant further investigation. These considerations motivate the topic that will be addressed in this thesis: the development of systematic methods for detecting and resolving con icts and identifying\ud systematic errors in measurement bases. These measurement analysis (MA) methods are implemented in a computer system supporting the user of the measurement base

    Evidence attribution in the UniProt Knowledgebase

    Get PDF
    UniProtKB provides the scientific community with a comprehensive collection of protein sequence records containing extensive curated information including functional and sequence annotation. This information is derived from a variety of sources such as scientific literature and sequence analysis programs as well as data imported from automatic annotation systems and external databases. To allow users to ascertain the origin of each data item in a UniProtKB record, an evidence attribution system is being introduced which links each piece of information to its original source. This system allows users to trace the origin of all information, to differentiate easily between experimental and computational data, and to assess data reliability. The current system and plans for its future development and enhancement will be presented

    Impact of the Information and Communication Technologies on the Education of Students with Down Syndrome: a Bibliometric Study (2008- 2018)

    Get PDF
    This article analyzes the impact of the Information and Communication Technologies (ICT) on students with Down syndrome through the consult of scientific articles published during the 2008 to 2018 period, in five scientific journal databases utilized in the academic world. Through a descriptive and quantitative methodology, the most significant bibliometric data according to citation index is shown. Likewise, a methodology based on the analysis of co-words and clustering techniques is applied through a bibliometric maps, in order to determine the fields of scientific study. The results show that articles published have a medium-low index of impact. There are linked with the importance of using ICT with these students, from educational inclusion and accessibility perspective

    BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

    Get PDF
    Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process

    Exploiting microvariation: How to make the best of your incomplete data

    Get PDF
    n this article we discuss the use of big corpuses or databases as a first step for qualitative analysis of linguistic data. We concentrate on ASIt, the Syntactic Atlas of Italy, and take into consideration the different types of dialectal data that can be collected from similar corpora and databases. We analyze all the methodological problems derived from the necessary compromise between the strict requirements imposed by a scientific inquiry and the management of big amounts of data. As a possible solution, we propose that the type of variation is per se a tool to derive meaningful generalizations. To implement this idea, we examine three different types of variation patterns that can be used in the study of morpho-syntax: the geographical distribution of properties (and their total or partial overlapping, or complementary distribution), the so-called leopard spots variation, and the lexical variation index, which can be used to determine the internal complexity of functional items

    Critical issues in ionospheric data quality and implications for scientific studies

    Get PDF
    Ionospheric data are valuable records of the behavior of the ionosphere, solar activity, and the entire Sun-Earth system. The data are critical for both societally important services and scientific investigations of upper atmospheric variability. This work investigates some of the difficulties and pitfalls in maintaining long-term records of geophysical measurements. This investigation focuses on the ionospheric parameters contained in the historical data sets within the National Oceanic and Atmospheric Administration National Geophysical Data Center and Space Physics Interactive Data Resource databases. These archives include data from approximately 100 ionosonde stations worldwide, beginning in the early 1940s. Our study focuses on the quality and consistency of ionosonde data accessible via the primary Space Physics Interactive Data Resource node located within the National Oceanic and Atmospheric Administration National Geophysical Data Center and the World Data Center for Solar-Terrestrial Physics located in Boulder, Colorado. We find that, although the Space Physics Interactive Data Resource archives contained an impressive amount of high-quality data, specific problems existed involving missing and noncontiguous data sets, long-term variations or changes in methodologies and analysis procedures used, and incomplete documentation. The important lessons learned from this investigation are that the data incorporated into an archive must have clear traceability back to the primary source, including scientific validation by the contributors, and that the historical records must have associated metadata that describe relevant nuances in the observations. Although this report only focuses on historical ionosonde data in National Oceanic and Atmospheric Administration databases, we feel that these findings have general applicability to environmental scientists interested in using long-term geophysical data sets for climate and global change research.Peer ReviewedPostprint (published version

    Ontology (Science)

    Get PDF
    Increasingly, in data-intensive areas of the life sciences, experimental results are being described in algorithmically useful ways with the help of ontologies. Such ontologies are authored and maintained by scientists to support the retrieval, integration and analysis of their data. The proposition to be defended here is that ontologies of this type – the Gene Ontology (GO) being the most conspicuous example – are a _part of science_. Initial evidence for the truth of this proposition (which some will find self-evident) is the increasing recognition of the importance of empirically-based methods of evaluation to the ontology develop¬ment work being undertaken in support of scientific research. Ontologies created by scientists must, of course, be associated with implementations satisfying the requirements of software engineering. But the ontologies are not themselves engineering artifacts, and to conceive them as such brings grievous consequences. Rather, ontologies such as the GO are in different respects comparable to scientific theories, to scientific databases, and to scientific journal publications. Such a view implies a new conception of what is involved in the author¬ing, maintenance and application of ontologies in scientific contexts, and therewith also a new approach to the evaluation of ontologies and to the training of ontologists
    corecore