1,199 research outputs found
Recommending Datasets for Scientific Problem Descriptions
The steadily rising number of datasets is making it increasingly difficult for researchers and practitioners to be aware of all datasets, particularly of the most relevant datasets for a given research problem. To this end, dataset search engines have been proposed. However, they are based on user\u27s keywords and, thus, have difficulty determining precisely fitting datasets for complex research problems. In this paper, we propose a system that recommends suitable datasets based on a given research problem description. The recommendation task is designed as a domain-specific text classification task. As shown in a comprehensive offline evaluation using various state-of-the-art models, as well as 88,000 paper abstracts and 265,000 citation contexts as research problem descriptions, we obtain an F1-score of 0.75. In an additional user study, we show that users in real-world settings are 88% satisfied in all test cases. We therefore see promising future directions for dataset recommendation
Which Publicationsâ Metadata Are in Which Bibliographic Databases? A System for Exploration
The choice of databases containing publicationsâ metadata (i.e., bibliographic databases) determines the available publication list of any author and, thus, their public appearance and evaluation. Having all publications listed in the various bibliographic databases is therefore important for researchers. However, the average number of publications a researcher publishes per year is steadily rising, making it labor-intensive and time-consuming for authors to investigate whether all their publications are given in all bibliographic databases online. In this paper, we present RefBee, an online system that retrieves the metadata of all publications for a given author from the various bibliographic databases and indicates which publications are missing in which database. Our system is available online at http://refbee.org/ and supports Wikidata, ORCID, Google Scholar, VIAF, DBLP, Dimensions, Microsoft Academic, Semantic Scholar, and DNB/GNB. Our system not only can serve as assistance tool for more than 4.7 million researchers of any discipline and publicationâs language, but also incentivizes the usage and population of Wikidata in the scholarly field
Applied tracers for the observation of subsurface stormflow at the hillslope scale
Rainfall-runoff response in temperate humid headwater catchments is mainly controlled by hydrological processes at the hillslope scale. Applied tracer experiments with fluorescent dye and salt tracers are well known tools in groundwater studies at the large scale and vadose zone studies at the plot scale, where they provide a means to characterise subsurface flow. We extend this approach to the hillslope scale to investigate saturated and unsaturated flow paths concertedly at a forested hillslope in the Austrian Alps. Dye staining experiments at the plot scale revealed that cracks and soil pipes function as preferential flow paths in the fine-textured soils of the study area, and these preferential flow structures were active in fast subsurface transport of tracers at the hillslope scale. Breakthrough curves obtained under steady flow conditions could be fitted well to a one-dimensional convection-dispersion model. Under natural rainfall a positive correlation of tracer concentrations to the transient flows was observed. The results of this study demonstrate qualitative and quantitative effects of preferential flow features on subsurface stormflow in a temperate humid headwater catchment. It turns out that, at the hillslope scale, the interactions of structures and processes are intrinsically complex, which implies that attempts to model such a hillslope satisfactorily require detailed investigations of effective structures and parameters at the scale of interest
Towards Scalable Real-time Analytics:: An Architecture for Scale-out of OLxP Workloads
We present an overview of our work on the SAP HANA Scale-out Extension, a novel distributed database architecture designed to support large scale analytics over real-time data. This platform permits high performance OLAP with massive scale-out capabilities, while concurrently allowing OLTP workloads. This dual capability enables analytics over real-time changing data and allows fine grained user-specified service level agreements (SLAs) on data freshness. We advocate the decoupling of core database components such as query processing, concurrency control, and persistence, a design choice made possible by advances in high-throughput low-latency networks and storage devices. We provide full ACID guarantees and build on a logical timestamp mechanism to provide MVCC-based snapshot isolation, while not requiring synchronous updates of replicas. Instead, we use asynchronous update propagation guaranteeing consistency with timestamp validation. We provide a view into the design and development of a large scale data management platform for real-time analytics, driven by the needs of modern enterprise customers
The OpenCitations Data Model
A variety of schemas and ontologies are currently used for the
machine-readable description of bibliographic entities and citations. This
diversity, and the reuse of the same ontology terms with different nuances,
generates inconsistencies in data. Adoption of a single data model would
facilitate data integration tasks regardless of the data supplier or context
application. In this paper we present the OpenCitations Data Model (OCDM), a
generic data model for describing bibliographic entities and citations,
developed using Semantic Web technologies. We also evaluate the effective
reusability of OCDM according to ontology evaluation practices, mention
existing users of OCDM, and discuss the use and impact of OCDM in the wider
open science community.Comment: ISWC 2020 Conference proceeding
ProofWatch: Watchlist Guidance for Large Theories in E
Watchlist (also hint list) is a mechanism that allows related proofs to guide
a proof search for a new conjecture. This mechanism has been used with the
Otter and Prover9 theorem provers, both for interactive formalizations and for
human-assisted proving of open conjectures in small theories. In this work we
explore the use of watchlists in large theories coming from first-order
translations of large ITP libraries, aiming at improving hammer-style
automation by smarter internal guidance of the ATP systems. In particular, we
(i) design watchlist-based clause evaluation heuristics inside the E ATP
system, and (ii) develop new proof guiding algorithms that load many previous
proofs inside the ATP and focus the proof search using a dynamically updated
notion of proof matching. The methods are evaluated on a large set of problems
coming from the Mizar library, showing significant improvement of E's standard
portfolio of strategies, and also of the previous best set of strategies
invented for Mizar by evolutionary methods.Comment: 19 pages, 10 tables, submitted to ITP 2018 at FLO
Conservation of core complex subunits shaped the structure and function of photosystem I in the secondary endosymbiont alga Nannochloropsis gaditana
Photosystem I (PSI) is a pigment protein complex catalyzing the light-driven electron transport from plastocyanin to ferredoxin in oxygenic photosynthetic organisms. Several PSI subunits are highly conserved in cyanobacteria, algae and plants, whereas others are distributed differentially in the various organisms. Here we characterized the structural and functional properties of PSI purified from the heterokont alga Nannochloropsis gaditana, showing that it is organized as a supercomplex including a core complex and an outer antenna, as in plants and other eukaryotic algae. Differently from all known organisms, the N. gaditana PSI supercomplex contains five peripheral antenna proteins, identified by proteome analysis as type-R light-harvesting complexes (LHCr4-8). Two antenna subunits are bound in a conserved position, as in PSI in plants, whereas three additional antennae are associated with the core on the other side. This peculiar antenna association correlates with the presence of PsaF/J and the absence of PsaH, G and K in the N. gaditana genome and proteome. Excitation energy transfer in the supercomplex is highly efficient, leading to a very high trapping efficiency as observed in all other PSI eukaryotes, showing that although the supramolecular organization of PSI changed during evolution, fundamental functional properties such as trapping efficiency were maintained
Canonicalizing Knowledge Base Literals
Ontology-based knowledge bases (KBs) like DBpedia are very valuable resources, but their usefulness and usability is limited by various quality issues. One such issue is the use of string literals instead of semantically typed entities. In this paper we study the automated canonicalization of such literals, i.e., replacing the literal with an existing entity from the KB or with a new entity that is typed using classes from the KB. We propose a framework that combines both reasoning and machine learning in order to predict the relevant entities and types, and we evaluate this framework against state-of-the-art baselines for both semantic typing and entity matching
Requirements Analysis for an Open Research Knowledge Graph
Current science communication has a number of drawbacks and bottlenecks which
have been subject of discussion lately: Among others, the rising number of
published articles makes it nearly impossible to get an overview of the state
of the art in a certain field, or reproducibility is hampered by fixed-length,
document-based publications which normally cannot cover all details of a
research work. Recently, several initiatives have proposed knowledge graphs
(KGs) for organising scientific information as a solution to many of the
current issues. The focus of these proposals is, however, usually restricted to
very specific use cases. In this paper, we aim to transcend this limited
perspective by presenting a comprehensive analysis of requirements for an Open
Research Knowledge Graph (ORKG) by (a) collecting daily core tasks of a
scientist, (b) establishing their consequential requirements for a KG-based
system, (c) identifying overlaps and specificities, and their coverage in
current solutions. As a result, we map necessary and desirable requirements for
successful KG-based science communication, derive implications and outline
possible solutions.Comment: Accepted for publishing in 24th International Conference on Theory
and Practice of Digital Libraries, TPDL 202
- âŠ