31,444 research outputs found
Recommended from our members
SemTab 2019: Resources to Benchmark Tabular Data to Knowledge Graph Matching Systems
Tabular data to Knowledge Graph matching is the process of assigning semantic tags from knowledge graphs (e.g., Wikidata or DBpedia) to the elements of a table. This task is a challenging problem for various reasons, including the lack of metadata (e.g., table and column names), the noisiness, heterogeneity, incompleteness and ambiguity in the data. The results of this task provide significant insights about potentially highly valuable tabular data, as recent works have shown, enabling a new family of data analytics and data science applications. Despite significant amount of work on various flavors of this problem, there is a lack of a common framework to conduct a systematic evaluation of state-of-the-art systems. The creation of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) aims at filling this gap. In this paper, we report about the datasets, infrastructure and lessons learned from the first edition of the SemTab challenge
On-the-fly Table Generation
Many information needs revolve around entities, which would be better
answered by summarizing results in a tabular format, rather than presenting
them as a ranked list. Unlike previous work, which is limited to retrieving
existing tables, we aim to answer queries by automatically compiling a table in
response to a query. We introduce and address the task of on-the-fly table
generation: given a query, generate a relational table that contains relevant
entities (as rows) along with their key properties (as columns). This problem
is decomposed into three specific subtasks: (i) core column entity ranking,
(ii) schema determination, and (iii) value lookup. We employ a feature-based
approach for entity ranking and schema determination, combining deep semantic
features with task-specific signals. We further show that these two subtasks
are not independent of each other and can assist each other in an iterative
manner. For value lookup, we combine information from existing tables and a
knowledge base. Using two sets of entity-oriented queries, we evaluate our
approach both on the component level and on the end-to-end table generation
task.Comment: The 41st International ACM SIGIR Conference on Research and
Development in Information Retrieva
A line-binned treatment of opacities for the spectra and light curves from neutron star mergers
The electromagnetic observations of GW170817 were able to dramatically
increase our understanding of neutron star mergers beyond what we learned from
gravitational waves alone. These observations provided insight on all aspects
of the merger from the nature of the gamma-ray burst to the characteristics of
the ejected material. The ejecta of neutron star mergers are expected to
produce such electromagnetic transients, called kilonovae or macronovae.
Characteristics of the ejecta include large velocity gradients, relative to
supernovae, and the presence of heavy -process elements, which pose
significant challenges to the accurate calculation of radiative opacities and
radiation transport. For example, these opacities include a dense forest of
bound-bound features arising from near-neutral lanthanide and actinide
elements. Here we investigate the use of fine-structure, line-binned opacities
that preserve the integral of the opacity over frequency. Advantages of this
area-preserving approach over the traditional expansion-opacity formalism
include the ability to pre-calculate opacity tables that are independent of the
type of hydrodynamic expansion and that eliminate the computational expense of
calculating opacities within radiation-transport simulations. Tabular opacities
are generated for all 14 lanthanides as well as a representative actinide
element, uranium. We demonstrate that spectral simulations produced with the
line-binned opacities agree well with results produced with the more accurate
continuous Monte Carlo Sobolev approach, as well as with the commonly used
expansion-opacity formalism. Additional investigations illustrate the
convergence of opacity with respect to the number of included lines, and
elucidate sensitivities to different atomic physics approximations, such as
fully and semi-relativistic approaches.Comment: 27 pages, 22 figures. arXiv admin note: text overlap with
arXiv:1702.0299
Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets.
BackgroundBioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis.Main textWe present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others.ConclusionsKeemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes
Consistent thermodynamic derivative estimates for tabular equations of state
Numerical simulations of compressible fluid flows require an equation of
state (EOS) to relate the thermodynamic variables of density, internal energy,
temperature, and pressure. A valid EOS must satisfy the thermodynamic
conditions of consistency (derivation from a free energy) and stability
(positive sound speed squared). When phase transitions are significant, the EOS
is complicated and can only be specified in a table. For tabular EOS's such as
SESAME from Los Alamos National Laboratory, the consistency and stability
conditions take the form of a differential equation relating the derivatives of
pressure and energy as functions of temperature and density, along with
positivity constraints. Typical software interfaces to such tables based on
polynomial or rational interpolants compute derivatives of pressure and energy
and may enforce the stability conditions, but do not enforce the consistency
condition and its derivatives. We describe a new type of table interface based
on a constrained local least squares regression technique. It is applied to
several SESAME EOS's showing how the consistency condition can be satisfied to
round-off while computing first and second derivatives with demonstrated
second-order convergence. An improvement of 14 orders of magnitude over
conventional derivatives is demonstrated, although the new method is apparently
two orders of magnitude slower, due to the fact that every evaluation requires
solving an 11-dimensional nonlinear system.Comment: 29 pages, 9 figures, 16 references, submitted to Phys Rev
PROPHECY—a database for high-resolution phenomics
The rapid recent evolution of the field phenomics—the genome-wide study of gene dispensability by quantitative analysis of phenotypes—has resulted in an increasing demand for new data analysis and visualization tools. Following the introduction of a novel approach for precise, genome-wide quantification of gene dispensability in Saccharomyces cerevisiae we here announce a public resource for mining, filtering and visualizing phenotypic data—the PROPHECY database. PROPHECY is designed to allow easy and flexible access to physiologically relevant quantitative data for the growth behaviour of mutant strains in the yeast deletion collection during conditions of environmental challenges. PROPHECY is publicly accessible at http://prophecy.lundberg.gu.se
- …