1,190 research outputs found

    Recommending Datasets for Scientific Problem Descriptions

    Get PDF
    The steadily rising number of datasets is making it increasingly difficult for researchers and practitioners to be aware of all datasets, particularly of the most relevant datasets for a given research problem. To this end, dataset search engines have been proposed. However, they are based on user\u27s keywords and, thus, have difficulty determining precisely fitting datasets for complex research problems. In this paper, we propose a system that recommends suitable datasets based on a given research problem description. The recommendation task is designed as a domain-specific text classification task. As shown in a comprehensive offline evaluation using various state-of-the-art models, as well as 88,000 paper abstracts and 265,000 citation contexts as research problem descriptions, we obtain an F1-score of 0.75. In an additional user study, we show that users in real-world settings are 88% satisfied in all test cases. We therefore see promising future directions for dataset recommendation

    CLiT: Combining Linking Techniques for Everyone

    Get PDF

    Which Publications’ Metadata Are in Which Bibliographic Databases? A System for Exploration

    Get PDF
    The choice of databases containing publications’ metadata (i.e., bibliographic databases) determines the available publication list of any author and, thus, their public appearance and evaluation. Having all publications listed in the various bibliographic databases is therefore important for researchers. However, the average number of publications a researcher publishes per year is steadily rising, making it labor-intensive and time-consuming for authors to investigate whether all their publications are given in all bibliographic databases online. In this paper, we present RefBee, an online system that retrieves the metadata of all publications for a given author from the various bibliographic databases and indicates which publications are missing in which database. Our system is available online at http://refbee.org/ and supports Wikidata, ORCID, Google Scholar, VIAF, DBLP, Dimensions, Microsoft Academic, Semantic Scholar, and DNB/GNB. Our system not only can serve as assistance tool for more than 4.7 million researchers of any discipline and publication’s language, but also incentivizes the usage and population of Wikidata in the scholarly field

    Applied tracers for the observation of subsurface stormflow at the hillslope scale

    Get PDF
    Rainfall-runoff response in temperate humid headwater catchments is mainly controlled by hydrological processes at the hillslope scale. Applied tracer experiments with fluorescent dye and salt tracers are well known tools in groundwater studies at the large scale and vadose zone studies at the plot scale, where they provide a means to characterise subsurface flow. We extend this approach to the hillslope scale to investigate saturated and unsaturated flow paths concertedly at a forested hillslope in the Austrian Alps. Dye staining experiments at the plot scale revealed that cracks and soil pipes function as preferential flow paths in the fine-textured soils of the study area, and these preferential flow structures were active in fast subsurface transport of tracers at the hillslope scale. Breakthrough curves obtained under steady flow conditions could be fitted well to a one-dimensional convection-dispersion model. Under natural rainfall a positive correlation of tracer concentrations to the transient flows was observed. The results of this study demonstrate qualitative and quantitative effects of preferential flow features on subsurface stormflow in a temperate humid headwater catchment. It turns out that, at the hillslope scale, the interactions of structures and processes are intrinsically complex, which implies that attempts to model such a hillslope satisfactorily require detailed investigations of effective structures and parameters at the scale of interest

    Towards Scalable Real-time Analytics:: An Architecture for Scale-out of OLxP Workloads

    Get PDF
    We present an overview of our work on the SAP HANA Scale-out Extension, a novel distributed database architecture designed to support large scale analytics over real-time data. This platform permits high performance OLAP with massive scale-out capabilities, while concurrently allowing OLTP workloads. This dual capability enables analytics over real-time changing data and allows fine grained user-specified service level agreements (SLAs) on data freshness. We advocate the decoupling of core database components such as query processing, concurrency control, and persistence, a design choice made possible by advances in high-throughput low-latency networks and storage devices. We provide full ACID guarantees and build on a logical timestamp mechanism to provide MVCC-based snapshot isolation, while not requiring synchronous updates of replicas. Instead, we use asynchronous update propagation guaranteeing consistency with timestamp validation. We provide a view into the design and development of a large scale data management platform for real-time analytics, driven by the needs of modern enterprise customers

    The OpenCitations Data Model

    Get PDF
    A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations. This diversity, and the reuse of the same ontology terms with different nuances, generates inconsistencies in data. Adoption of a single data model would facilitate data integration tasks regardless of the data supplier or context application. In this paper we present the OpenCitations Data Model (OCDM), a generic data model for describing bibliographic entities and citations, developed using Semantic Web technologies. We also evaluate the effective reusability of OCDM according to ontology evaluation practices, mention existing users of OCDM, and discuss the use and impact of OCDM in the wider open science community.Comment: ISWC 2020 Conference proceeding

    ProofWatch: Watchlist Guidance for Large Theories in E

    Full text link
    Watchlist (also hint list) is a mechanism that allows related proofs to guide a proof search for a new conjecture. This mechanism has been used with the Otter and Prover9 theorem provers, both for interactive formalizations and for human-assisted proving of open conjectures in small theories. In this work we explore the use of watchlists in large theories coming from first-order translations of large ITP libraries, aiming at improving hammer-style automation by smarter internal guidance of the ATP systems. In particular, we (i) design watchlist-based clause evaluation heuristics inside the E ATP system, and (ii) develop new proof guiding algorithms that load many previous proofs inside the ATP and focus the proof search using a dynamically updated notion of proof matching. The methods are evaluated on a large set of problems coming from the Mizar library, showing significant improvement of E's standard portfolio of strategies, and also of the previous best set of strategies invented for Mizar by evolutionary methods.Comment: 19 pages, 10 tables, submitted to ITP 2018 at FLO

    Conservation of core complex subunits shaped the structure and function of photosystem I in the secondary endosymbiont alga Nannochloropsis gaditana

    Get PDF
    Photosystem I (PSI) is a pigment protein complex catalyzing the light-driven electron transport from plastocyanin to ferredoxin in oxygenic photosynthetic organisms. Several PSI subunits are highly conserved in cyanobacteria, algae and plants, whereas others are distributed differentially in the various organisms. Here we characterized the structural and functional properties of PSI purified from the heterokont alga Nannochloropsis gaditana, showing that it is organized as a supercomplex including a core complex and an outer antenna, as in plants and other eukaryotic algae. Differently from all known organisms, the N. gaditana PSI supercomplex contains five peripheral antenna proteins, identified by proteome analysis as type-R light-harvesting complexes (LHCr4-8). Two antenna subunits are bound in a conserved position, as in PSI in plants, whereas three additional antennae are associated with the core on the other side. This peculiar antenna association correlates with the presence of PsaF/J and the absence of PsaH, G and K in the N. gaditana genome and proteome. Excitation energy transfer in the supercomplex is highly efficient, leading to a very high trapping efficiency as observed in all other PSI eukaryotes, showing that although the supramolecular organization of PSI changed during evolution, fundamental functional properties such as trapping efficiency were maintained

    Canonicalizing Knowledge Base Literals

    Get PDF
    Ontology-based knowledge bases (KBs) like DBpedia are very valuable resources, but their usefulness and usability is limited by various quality issues. One such issue is the use of string literals instead of semantically typed entities. In this paper we study the automated canonicalization of such literals, i.e., replacing the literal with an existing entity from the KB or with a new entity that is typed using classes from the KB. We propose a framework that combines both reasoning and machine learning in order to predict the relevant entities and types, and we evaluate this framework against state-of-the-art baselines for both semantic typing and entity matching

    Requirements Analysis for an Open Research Knowledge Graph

    Get PDF
    Current science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get an overview of the state of the art in a certain field, or reproducibility is hampered by fixed-length, document-based publications which normally cannot cover all details of a research work. Recently, several initiatives have proposed knowledge graphs (KGs) for organising scientific information as a solution to many of the current issues. The focus of these proposals is, however, usually restricted to very specific use cases. In this paper, we aim to transcend this limited perspective by presenting a comprehensive analysis of requirements for an Open Research Knowledge Graph (ORKG) by (a) collecting daily core tasks of a scientist, (b) establishing their consequential requirements for a KG-based system, (c) identifying overlaps and specificities, and their coverage in current solutions. As a result, we map necessary and desirable requirements for successful KG-based science communication, derive implications and outline possible solutions.Comment: Accepted for publishing in 24th International Conference on Theory and Practice of Digital Libraries, TPDL 202
    • 

    corecore